Study: AI search engines give more false or misleading answers than correct ones.

Hamza Musa

23 Mar 2025 — 4 min read

⬤ A recent study revealed that AI-powered search engines make 60% errors, raising concerns about their reliability.

⬤ The study showed that these engines struggle to provide accurate citations and create fake information and links.

⬤ Despite the high cost of paid versions, they are no more accurate and, in some cases, even perform worse than free ones.

A recent study has revealed serious issues with the accuracy of AI-powered search engines, raising questions about their reliability as sources of information. The comprehensive study, conducted by the Columbia Journalism Review's Tow Center for Digital Journalism, showed that these smart tools make a shocking 60% error rate on average, undermining their promise of accurate and reliable search results.

The study included a thorough evaluation of eight leading AI search engines, including ChatGPT Search from OpenAI, Gemini from Google, Copilot from Microsoft, Perplexity and its paid version, Perplexity Pro, along with DeepSeek Search and the Grok-2 and Grok-3 engines from X. The researchers analyzed the responses of these tools using 200 news articles published in trusted media outlets.

Although Perplexity and Perplexity Pro performed relatively better than their competitors, the overall results were far from reassuring. All of these engines demonstrated significant difficulty in providing accurate citations related to the original article, the news source, or even its URL. According to the study's accuracy scale, which ranged from "completely true" to "completely false," the overall error rate was 60%.

Bad Numbers!

Speaking of numbers, despite answering all 200 questions, ChatGPT Search only achieved 28% correct answers, while 57% were completely wrong. Grok-3 Search performed the worst, with an inaccuracy rate of 94%. Copilot struggled, failing to answer 104 of the 200 queries. Of the remaining 16, only 16 were completely correct, while the vast majority, nearly 70%, were inaccurate.

This data confirms what many have feared: the large language models (LLMs) that power these engines can be, as one commentator put it, "the most brilliant deceivers of all time," confidently presenting fabricated information even when it is completely wrong. Researchers have observed this pattern repeatedly, with these systems tending to insist on errors and even fabricating additional information when their answers are questioned.

In addition to these issues, the study highlighted potential violations related to data collection ethics, as some AI search engines ignored the Robot Exclusion Protocol, which imposes restrictions on information extraction. It also found that these tools often provided fabricated links or cited secondary and copied sources instead of the original source.

Expensive Services!

The biggest irony lies in the high cost of some of these services. For example, the monthly subscription fee for Perplexity Pro is around $20, while Grok-3 Search costs $40 per month. Surprisingly, despite achieving relatively higher query completion rates, the paid versions recorded higher error rates than the free versions, raising fundamental questions about their usefulness to users.

However, not all reviews were negative. A number of journalists and users had generally positive impressions. Naturally, however, the study's findings highlight a fundamental cognitive challenge facing the AI search engine industry before it can be a reliable source of information.

Study: AI search engines give more false or misleading answers than correct ones.

Hamza Musa

Bad Numbers!

Expensive Services!

Read more

Understanding AI-To-AI Loop, and Why It is Important for Healthcare!

Stop Wasting Tokens: How CLAUDE.md Slashes Your AI Development Costs

How Patients Can Use AI to Strengthen Their Medico-Legal Claims

Why and How Modern Developers Build on Cloudflare