When AI Search Goes Wrong: Why We Can’t Trust Chatbots with Our Questions Yet

Hamza Musa

21 Mar 2025 — 7 min read

As a medical doctor, developer, and head of the AI-Club, I’ve always been fascinated by how technology shapes the way we access information. Whether it's diagnosing a patient or debugging code, reliable data is everything. And while traditional search engines like Google and Bing have their quirks—hello, crowded pages full of ads—they still get the job done most of the time. But what about these shiny new AI-powered search tools? Are they ready to replace good old-fashioned keyword searches? Spoiler alert: Not even close.

In this post, I want to dive into some troubling findings from a recent study conducted by the Tow Center for Digital Journalism (yes, the same one we referenced in earlier posts). It highlights glaring issues with generative AI search tools—tools that are increasingly being marketed as smarter alternatives to traditional search engines. Buckle up because things got messy.

The Shocking Truth: 60% Error Rates Across the Board

Let’s start with the elephant in the room: According to the study, AI search tools fail to provide accurate answers nearly 60% of the time. Yes, you read that right. If you ask an AI chatbot a question, there's roughly a six-in-ten chance it will give you the wrong answer—or worse, make something up entirely.

This wasn’t just limited to obscure topics either. The researchers tested eight popular generative search tools, including heavyweights like OpenAI’s ChatGPT, Google’s Gemini, Microsoft’s Copilot, and Perplexity Pro. They fed these systems excerpts directly taken from real news articles published by reputable sources. Their goal? To see if the bots could correctly identify the article’s headline, publisher, publication date, and URL.

Here’s where it gets ugly:

ChatGPT, arguably the most well-known model, only managed to answer 28% of queries fully correct.
On the other end of the spectrum, Grok 3 had a staggering error rate of 94%.
Even premium versions like Perplexity Pro, which costs $20/month, performed worse than its free counterpart when it came to accuracy.

What does this mean for users? Well, imagine relying on one of these tools to find critical information—say, symptoms of a rare disease—and getting confidently wrong advice. That’s not just frustrating; it’s dangerous.

“Confidently Incorrect”: The Scariest Phrase in AI

One of the biggest problems uncovered in the study is how confident these models sound—even when they’re dead wrong. For example, ChatGPT provided incorrect answers 134 times out of 200 queries but signaled uncertainty in just 15 cases. That’s a recipe for disaster.

Imagine asking your AI assistant, “What should I do if my child has a high fever?” and receiving a detailed response that starts with, “Based on trusted sources…” Only later do you realize the treatment plan suggested doesn’t match any credible medical guidelines. By then, harm may already be done.

Premium models were especially guilty of this behavior. While they answered slightly more questions accurately than their free counterparts, they also made bolder claims. This unearned confidence creates a false sense of reliability, making it hard for users to distinguish fact from fiction.

Ignoring Boundaries: Crawlers Gone Rogue

Another alarming finding was how many AI tools ignored publishers’ preferences regarding web crawling. Publishers use something called the Robot Exclusion Protocol (via robots.txt files) to control which parts of their websites can be accessed by automated systems. Yet multiple chatbots bypassed these settings altogether.

For instance:

Perplexity Pro correctly identified nearly a third of excerpts from articles it shouldn’t have had access to.
Despite blocking crawlers, National Geographic content showed up in Perplexity’s responses anyway.
Similarly, USA Today blocked ChatGPT’s crawler, but the bot cited syndicated versions of its articles republished elsewhere.

This blatant disregard for boundaries raises serious ethical questions. Should AI companies be allowed to scrape content without permission? What happens when publishers lose control over how their work is used?

Fabricated Links and Misattributed Sources

If inaccurate answers weren’t bad enough, another major issue is how often these tools fabricate links or misattribute sources. Out of 200 prompts given to Grok 3, a whopping 154 citations led to error pages. Meanwhile, DeepSeek misattributed sources 115 times.

Even when the bots got the source “right,” they frequently linked to syndicated versions of articles hosted on platforms like Yahoo News instead of the original publisher’s site. This deprives legitimate publishers of traffic and revenue—a double whammy considering many rely heavily on ad income generated through referrals.

The Myth of Licensing Deals

Some AI companies, like OpenAI and Perplexity, have struck deals with publishers to gain direct access to their content. These partnerships supposedly ensure better accuracy and proper attribution. But guess what? The study found no significant improvement in citation quality for partner publishers.

Take Time Magazine, for example. Despite having agreements with both OpenAI and Perflexity, none of the associated models identified Time’s content perfectly every time. On the flip side, The San Francisco Chronicle, part of Hearst’s partnership program with OpenAI, saw dismal results. ChatGPT only correctly identified one excerpt out of ten—and even then, it failed to include a working URL.

Clearly, signing deals isn’t enough to guarantee fair representation. As Mark Howard, COO of Time, noted during interviews, “Today is the worst that the product will ever be.” He remains hopeful that future iterations will improve. But until then, publishers are left holding the bag.

Why Traditional Search Engines Still Matter

Now let me tell you why I’m sticking with Google and Bing—for now. Sure, Google’s search results page might feel like navigating a minefield of sponsored content, but at least it works. You type in a query, and you’ll likely land on a relevant webpage within seconds. There’s transparency—you know exactly where the information comes from. Plus, you can easily verify claims by clicking through to the source.

Compare that to AI search tools, which often present fabricated summaries wrapped in authoritative tones. Without clear links back to original sources, verifying facts becomes nearly impossible. And don’t get me started on fabricated URLs—it’s like trying to chase ghosts.

A Call for Accountability

So where do we go from here? First, AI developers need to prioritize accuracy over speed. Confidence without correctness is useless—and potentially harmful. Second, publishers deserve more control over how their content is used. Ignoring robots.txt directives undermines trust and sets a dangerous precedent.

Finally, users must remain vigilant. Don’t blindly trust AI-generated answers, especially when stakes are high. Double-check facts using multiple sources, and remember: Just because an answer sounds convincing doesn’t mean it’s true.

Final Thoughts

We’ve talked about the promises and pitfalls of AI search before on this blog, but this latest study underscores just how far these tools have to go before they become truly reliable. Until then, stick with what works. Use Google and Bing, warts and all. At least you know what you’re getting.

Stay curious, stay skeptical, and keep questioning. After all, knowledge is power—but only if it’s accurate.