Generative AI

ScottGraffius.com | Blog | Intersection of Project Leadership with Business and Technology

Are AI Hallucinations Getting Better or Worse? We Analyzed the Data

07 January 2026

BY SCOTT M. GRAFFIUS | ScottGraffius.com

scott-m-graffius-ai-hallucinations-2026-v3-lwres

Graffius, S. M. (2026, January 7). Are AI Hallucinations Getting Better or Worse? We Analyzed the Data. ScottGraffius.com. https://doi.org/10.13140/RG.2.2.33179.53285

Introduction

AI systems such as ChatGPT and its competitors sometimes produce answers that sound confident but are wrong. In an earlier article, Scott M. Graffius encapsulated it as: "Generative AI can dazzle. However, it’s prone to deliver fiction as fact, a phenomenon known as AI hallucinations" (Graffius, 2025).

Hallucinations can arise from multiple stages in the AI lifecycle, including data collection, model architecture, training processes, and inference. Key contributing factors include:

Biases, incompleteness, or noise in the training data can lead to overgeneralization or pattern misinterpretation.
Model overfitting or architectural limitations in autoregressive decoding, where probabilistic predictions prioritize fluency over factual fidelity.
Lack of grounding in real-world knowledge during generation, exacerbated by the absence of explicit reasoning mechanisms.

The danger to users is that AI hallucinations present fiction as fact—confidently, fluently, and persuasively. Left unchecked, these errors can cause harm. "When AI gets things wrong, using its output can spread false information, damage reputations, and create other issues" (Graffius, 2025).

There’s no singular universal metric for AI hallucinations, but many researchers focus on the percentage of responses containing at least one hallucinated claim. This review uses that measurement.

Different benchmarks test distinct situations. Vectara's Hughes Hallucination Evaluation Model (HHEM) leaderboard focuses on document summarization—how faithfully a model sticks to a provided source (Vectara, 2025). Others, such as SimpleQA and PersonQA, probe general factual accuracy on short, open-ended questions (OpenAI, 2025). This review draws from a range of such tests, reported by multiple sources.

As AI systems improve and become more ubiquitous, a pressing question is: Are AI hallucinations getting better or worse? As detailed next, data from 2024 through 2025 is mixed. On tightly controlled tasks, hallucinations are declining. However, they’ve spiked on more complex tasks.

AI Hallucinations in 2024

In 2024, leading models exhibited hallucination rates in the range of 1-3% on standardized, grounded benchmarks (Stanford HAI, 2024; Vectara, 2025). But the picture was less rosy outside those settings. Domain-specific evaluations (such as scientific, medical, and technical analysis) often reported hallucination rates of 10-20% or higher (Cheilli et al., 2024).

AI Hallucinations in 2025

In 2025, hallucination rates diverged sharply depending on what the models were asked to do.

On apples-to-apples benchmarks, such as Vectara's summarization leaderboard, performance improved across the board. Several top models dropped below 1%, including Google’s Gemini-2.0-Flash at roughly 0.7%, with OpenAI and Gemini variants clustering around 0.8–1.5% (Vectara, 2025; AllAboutAI, 2025). For grounded tasks—where the model can anchor its output to a source document—hallucinations are less frequent over time.

However, newer reasoning-focused models tell a different story. Systems optimized for complex chain-of-thought reasoning hallucinate more on open-ended factual benchmarks. OpenAI’s o3 series, for example, experienced hallucination rates of 33-51% on PersonQA and SimpleQA. That’s more than double earlier o1 models, which hovered around 16% (OpenAI, 2025; Techopedia, 2025). Broader evaluations in 2025 reflect this shift. Across task sets containing both simple and complex cases, hallucination rates are commonly 3-20% or higher (Stanford HAI, 2025).

Conclusion

On comparable benchmarks, hallucinations are declining year-over-year for non-complex cases. Top models dropped from roughly 1–3% in 2024 to 0.7–1.5% in 2025 on grounded summarization tasks. However, hallucinations remain high in complex reasoning and open-domain factual recall, where rates can exceed 33%.

The silver lining is mitigation. Retrieval-Augmented Generation (RAG), which forces models to ground answers in external documents, can reduce hallucinations by 40-71% in many scenarios (AIMultiple, 2025; AboutChromebooks, 2025). Industry guides also recommend complementary best practices, such as domain-specific fine-tuning, careful prompt design to constrain speculation, and instructing models to cite sources or admit uncertainty.

Researchers are also deploying layered defenses, such as multi-stage verification systems (Garcia-Fernandez et al., 2025), continuous detection pipelines (Anaokar et al., 2025), and domain-specific validators (Vangala et al., 2025; Bang et al., 2025).

The picture is mixed, but the takeaway is clear. AI hallucinations are evolving from a blanket failure mode into a situational risk. Where grounding is strong and tasks are constrained, the frequency of hallucinations drops. Where reasoning is expansive and factual recall is open-ended, they surge. Hallucinations will likely persist for the foreseeable future. Managing them requires situational awareness, vigilance, smarter evaluation, layered safeguards, mitigation strategies, and informed human oversight.

Take Action

This article provides a foundational overview. For in-depth guidance on AI, including human-AI teamwork (where the AI is advanced—agentic, autonomous, or autopoietic) and the "exotic team dynamics" which emerge, contact Scott M. Graffius. To request a consultation, speaking engagement, or other work, complete a request form or email him today.

References

AboutChromebooks. (2025). AI hallucination rates across different models in 2025. https://www.aboutchromebooks.com/ai-hallucination-rates-across-different-models/
AllAboutAI. (2025). AI hallucination report 2025: Which AI hallucinates the most? https://www.allaboutai.com/resources/ai-statistics/ai-hallucinations/
AIMultiple. (2025). AI hallucination: Compare top LLMs. https://research.aimultiple.com/ai-hallucination/
Anaokar, S., Ganatra, S., Kashid, H., Bhattacharyya, S., Nair, S., Sekhar, R., Manohar, S., Hemrajani, R., & Bhattacharyya, P. (2025). HalluDetect: Detecting, mitigating, and benchmarking hallucinations in conversational systems. arXiv. https://arxiv.org/abs/2509.11619
Bang, Y., Ji, Z., Schelten, A., Hartshorn, A., Fowler, T., Zhang, C., Cancedda, N., & Fung, P. (2025). HalluLens: LLM hallucination benchmark. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. https://aclanthology.org/2025.acl-long.1176/
Chelli, M., Descamps, J., Lavoué, V., Trojani, C., Azar, M., Deckert, M., Raynier, J. L., Clowez, G., Boileau, P., & Ruetsch‑Chelli, C. (2024). Hallucination rates and reference accuracy of ChatGPT and Bard for systematic reviews: Comparative analysis. Journal of Medical Internet Research, 26, e53164. https://www.jmir.org/2024/1/e53164/
Garcia-Fernandez, C., Felipe, L., Shotande, M., Zitu, M., Tripathi, A., Rasool, G., El Naqa, I., Rudrapatna, V., & Valdes, G. (2025). Trustworthy AI for medicine: Continuous hallucination detection and elimination with CHECK. arXiv. https://arxiv.org/abs/2506.11129
Graffius, S. M. (2025, June 25). The "Pants-on-Fire Index for AI". ScottGraffius.com. https://scottgraffius.com/blog/files/pants-on-fire-index-for-ai.html
OpenAI. (2025). OpenAI o3 and o4-mini system card. https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f3722c1/o3-and-o4-mini-system-card.pdf
Stanford Human-Centered Artificial Intelligence Initiative. (2024). Artificial Intelligence Index Report 2024. https://hai.stanford.edu/ai-index/2024-ai-index-report
Stanford Human-Centered Artificial Intelligence Initiative. (2025). Artificial Intelligence Index Report 2025. https://hai.stanford.edu/ai-index/2025-ai-index-report
Techopedia. (2025). 48% error rate: AI hallucinations rise in 2025 reasoning systems. https://www.techopedia.com/ai-hallucinations-rise
Vangala, B. P., Mahmud, S., Neupane, P., Selvaraj, J., & Cheng, J. (2025). HalluMat: Detecting hallucinations in LLM-generated materials science content. arXiv. https://arxiv.org/abs/2512.22396
Vectara. (2025). Hallucination leaderboard – HHEM evaluation model. https://github.com/vectara/hallucination-leaderboard

scottgraffius-com-article-sections-v2026-about-lwres

About Scott M. Graffius

scott_m_graffius_-_blue_-_1000x1000_-lwres

Scott M. Graffius is a strategic transformation leader who drives AI, Agile, and broader business and technology initiatives to deliver measurable value across projects, programs, portfolios, and PMOs. He is an expert in the teamwork tradecraft of both human and human-AI teams, including the “exotic team dynamics” that emerge. He is also an authority on the temporal patterns of social media, including the half-life of audience engagement.

He’s a practitioner, researcher, thought leader, award-winning author, and keynote speaker who’s taken the stage at 96 conferences and other events across 25 countries.

He’s delivered over $2.3 billion in value for Fortune 500 companies and other leaders in technology, entertainment, financial services, healthcare, and beyond.

Businesses, professional associations, government agencies, and universities use Graffius and feature his work. Examples include Adobe, Bayer, Boston University, Ford, Gartner, Harvard Medical School, IEEE, Johns Hopkins University, Microsoft, MSN, National Academy of Sciences, Oracle, Pinterest Inc., Project Management Institute, UC San Diego, Verizon, Yale University, and others.

The following sections provide additional information on his experience, contributions, and influence.

Experience

Graffius heads the professional services firm Exceptional PPM and PMO Solutions, along with its subsidiary Exceptional Agility. These consultancies offer strategic and tactical advisory, training, embedded expertise, and consulting services to the public, private, and government sectors. They help organizations enhance their capabilities and results in agile, project management, program management, portfolio management, and PMO leadership, supporting innovation and driving competitive advantage. The consultancies confidently back services with a Delighted Client Guarantee™.

Graffius is a former VP of project management with a publicly traded provider of diverse consumer products and services over the Internet. Before that, he ran and supervised the delivery of projects and programs in public and private organizations with businesses ranging from e-commerce to advanced technology products and services, retail, manufacturing, entertainment, and more.

He has experience with consumer, business, reseller, government, and international markets.

Award-Winning Author

Graffius has authored three books.

Agile Scrum: Your Quick Start Guide with Step-by-Step Instructions, his first book, earned 17 awards.
Agile Transformation: A Brief Story of How an Entertainment Company Developed New Capabilities and Unlocked Business Agility to Thrive in an Era of Rapid Change, his second book, was named one of the best Scrum books of all time by BookAuthority.
Agile Protocol: The Transformation Ultimatum, his third book and his first work of fiction, was released in April 2025. The book trailer is on YouTube.

International Public Speaker

Organizations worldwide engage Graffius to present on tech (including AI), Agile, project management, program management, portfolio management, and PMO leadership. He crafts and delivers unique and compelling talks and workshops. Graffius has conducted 96 sessions across 25 countries. Select examples of events include Agile Trends Gov, BSides (Newcastle Upon Tyne), Conf42 Quantum Computing, DevDays Europe, DevOps Institute, DevOpsDays (Geneva), Frug’Agile, IEEE, Microsoft, Scottish Summit, Scrum Alliance RSG (Nepal), Techstars, and W Love Games International Video Game Development Conference (Helsinki), and more.

With an average rating of 4.81 (on a scale of 1-5), sessions are highly valued.

The speaker engagement request form is here.

Thought Leadership and Influence

Prominent businesses, professional associations, government agencies, and universities have showcased Graffius and his contributions—spanning his books, talks, workshops, and beyond. Select examples include:

Adobe,
American Management Association,
Amsterdam Public Health Research Institute,
Bayer,
BMC Software,
Boston University,
Broadcom,
Cisco,
Coburg University of Applied Sciences and Arts - Germany,
Computer Weekly,
Constructor University - Germany,
Data Governance Success,
Deimos Aerospace,
DevOps Institute,
Dropbox,
EU's European Commission,
Ford Motor Company,
Gartner,
GoDaddy,
Harvard Medical School,
Hasso Plattner Institute - Germany,
IEEE,
Innovation Project Management,
Johns Hopkins University,
Journal of Neurosurgery,
Lam Research (Semiconductors),
Leadership Worthy,
Life Sciences Trainers and Educators Network,
London South Bank University,
Microsoft,
MSN,
NASSCOM,
National Academy of Sciences,
New Zealand Government,
Oracle,
Pinterest Inc.,
Project Management Institute,
Mary Raum (Professor of National Security Affairs, United States Naval War College),
SANS Institute,
SBG Neumark - Germany,
Singapore Institute of Technology,
Torrens University - Australia,
TBS Switzerland,
Tufts University,
UC San Diego,
UK Sports Institute,
University of Galway - Ireland,
US Department of Energy,
US National Park Service,
US Soccer,
US Tennis Association,
Verizon,
Wrike,
Yale University,
and many others.

Graffius has played a key role in the Project Management Institute (PMI) in developing professional standards. He was a member of multiple teams that authored, reviewed, and produced:

Practice Standard for Work Breakdown Structures—Second Edition.
A Guide to the Project Management Body of Knowledge—Sixth Edition.
The Standard for Program Management—Fourth Edition.
The Practice Standard for Project Estimating—Second Edition.

Additional details are here.

He was also a subject matter expert reviewer of content for the PMI’s Congress. Beyond the PMI, Graffius also served as a member of the review team for two of the Scrum Alliance’s Global Scrum Gatherings.

Acclaimed Authority on Teamwork Tradecraft

Scott-M-Graffius-Phases-Of-Team-Development-2026-Update-v26010307G2-jpg-lwres

Graffius is a renowned authority on teamwork tradecraft. Informed by the research of Bruce W. Tuckman and Mary Ann C. Jensen, over 150 subsequent studies, and Graffius' first-hand professional experience with, and analysis of, team leadership and performance, Graffius created his "Phases of Team Development" intellectual property as a unique perspective and visual conveying the five phases of team development. First introduced in 2008 and periodically updated, his work provides a diagnostic and strategic guide for navigating team dynamics. It provides actionable insights for leaders across industries to develop high-performance teams. Its adoption by esteemed organizations such as Yale University, IEEE, Cisco, Microsoft, Ford, Oracle, Broadcom, the U.S. National Park Service, and the Journal of Neurosurgery, among others, highlights its utility and value, solidifying its status as an indispensable resource for elevating team performance and driving organizational excellence. In 2026, Graffius added human-AI teamwork—including the "exotic team dynamics" which emerge when advanced AI collaborates as a teammate—to his "Phases of Team Development."

The 2026 edition of Graffius' "Phases of Team Development" intellectual property is here.

Expert on Temporal Dynamics on Social Media Platforms

scott-m-graffius-lifespan-halflife-of-social-media-posts-2025-edition

Graffius is also an authority on temporal dynamics on social media platforms. His 'Lifespan (Half-Life) of Social Media Posts' research—first published in 2018 and updated annually—delivers a precise quantitative analysis of post longevity across digital platforms, utilizing advanced statistical techniques to determine mean half-life with precision. It establishes a solid empirical base, effectively highlighting the ephemeral nature of content within social media ecosystems. Referenced and applied by leading entities such as the Center for Direct Marketing, Fast Company, GoDaddy, Pinterest Inc., and PNAS, among others, his research exemplifies methodological rigor and sustained significance in the field of digital informatics.

The 2025 edition of Graffius "Lifespan (Half-Life) of Social Media Posts" research is here.

Education and Professional Certifications

Graffius has a bachelor’s degree in psychology with a focus in Human Factors. He holds eight professional certifications:

Certified SAFe 6 Agilist (SA),
Certified Scrum Professional - ScrumMaster (CSP-SM),
Certified Scrum Professional - Product Owner (CSP-PO),
Certified ScrumMaster (CSM),
Certified Scrum Product Owner (CSPO),
Project Management Professional (PMP),
Lean Six Sigma Green Belt (LSSGB), and
IT Service Management Foundation (ITIL).

He is an active member of the Scrum Alliance, the Project Management Institute (PMI), and the Institute of Electrical and Electronics Engineers (IEEE).

Advancing AI, Agile, and Project/PMO Management

Scott M. Graffius continues to advance the fields of AI, Agile, and Project/PMO Management through his leadership, research, writing, and real-world impact. Businesses and other organizations leverage Graffius’ insights to drive their success.

Discover Scott’s Books

Connect with and follow Scott on LinkedIn, X, YouTube, Facebook, Threads, Bluesky, Mastodon, and ResearchGate.

pinterest-inc-references-scott-m-graffius0027-research---v1-3---lwres

cisco-features-scott-m.-graffius0027-0027phases-of-team-development0027-work---rectange---lwres

ieee-xplore-publication-featured-scott-m-graffius-phases-of-team-development-work---rev-sept-19-2024---hires

ucsd-featured-ip-of-scott-m-graffius---v25122907-giesel-lwres

johns-hopkins-university-features-work-of-scott-m-graffius---v24080107---tw---lwres

semiconductor-manufacturing-firm-lam-research-features-scott-m-graffius2019-2018phases-of-team-development2019-intellectual-property---tw-sz-format---lwres

bayer-licensed-ip-of-agile-expert-scott-m-graffius---agilescrumguide_com---lwres

award-winning-agile-scrum-book-by-scott-m-graffius-v24122207lwres

scott_m_graffius_agile_transformation_sq_lr_1000x1000

agile protocol - prime style - custom - v251024 -lwres

gifts-that-inspire-joy---scottgraffius-dot-com---v251014---lwres

scottgraffius-com-article-sections-v2026-more-lwres

List of Additional Articles

scottgraffius-com-article-sections-v2026-cite-lwres

How to Cite This Article

Graffius, S. M. (2026, January 7). Are AI Hallucinations Getting Better or Worse? We Analyzed the Data. ScottGraffius.com. https://doi.org/10.13140/RG.2.2.33179.53285

scottgraffius-com-article-sections-v2026-doi-lwres

Digital Object Identifier (DOI)

https://doi.org/10.13140/RG.2.2.33179.53285

scottgraffius-com-article-sections-v2026-ackn-lwres

Content Acknowledgements

Names, marks, and content are the property of their respective owners.

scottgraffius-com-article-sections-v2026-tags-hashtags-lwres

Tags and Hashtags

This is the extended list of tags and hashtags for this article:

AI accuracy
AI benchmarks
AI Hallucinations
AI Trends
Artificial Intelligence
Generative AI
Large language models (LLMs)
Responsible AI

#AIAccuracy
#AIBenchmarks
#AIHallucinations
#AITrends
#FactualAI
#GenerativeAI
#LargeLanguageModels
#ResponsibleAI

scottgraffius-com-article-sections-v2026-notes-lwres

Post-Publication Notes

If there are any supplements or updates to this article after the date of publication, they will appear here.

Copyright

Copyright © Scott M. Graffius. All rights reserved.

Content on this site—including text, images, videos, and data—may not be used for training or input into any artificial intelligence, machine learning, or automatized learning systems, or published, broadcast, rewritten, or redistributed without the express written permission of Scott M. Graffius.

Tags: AI Hallucinations • Generative AI • Responsible AI • AI Trends • Artificial Intelligence • Large language models (LLMs) • AI accuracy • AI benchmarks