Cite unseen: when AI hallucinates scientific articles

From ScienceMag:

Experimental Error is a column about the quirky, comical, and sometimes bizarre world of scientific training and careers, written by scientist and comedian Adam Ruben. Barmaleeva/Shutterstock, adapted by C. Aycock/Science

Meredith Cimmino had been careful to avoid artificial intelligence (AI) tools when writing her dissertation. But when her Ph.D. committee at Rutgers University recommended she check for any new publications in the field, just to make sure her references were up to date, she thought it wouldn’t hurt to ask ChatGPT a quick question. “Everybody’s been talking about using AI to look things up,” Cimmino wrote to me, “so I’m like, ‘Oh let me just go look.’”

Sure enough, the AI tool immediately spat out a list of articles she had never heard of (and, if it operated the way I’ve seen ChatGPT operate, it probably started with an off-putting compliment like, “That sounds like a dynamic research field!”) At first, Cimmino was ecstatic. Not only could she update her paper with these references, but she could also bolster her conclusions. The titles and AI-generated summaries of the papers’ findings seemed to strongly support her own.

But the deeper she dug, the more she questioned the list ChatGPT had given her. First and foremost, she told me, the mere existence of this plethora of supportive studies sounded “too good to be true”—because, as a Ph.D. student who had been researching the field for years, why hadn’t she heard of any of the papers? “So, I go look up the studies,” she explained. “And they don’t exist.”

Cimmino’s experience is yet another instance of AI doing what’s sometimes called “hallucinating with confidence”—in other words, giving you a beautiful answer, presented with unassailable conviction, that has absolutely no factual basis. And although Cimmino thankfully dodged that bullet by fact-checking each real-sounding reference until she verified its nonexistence, plenty of researchers haven’t. The rise of AI has been accompanied by a raft of stories about scientists blindsided by requests for the full text of articles or textbook chapters they never wrote, or journals belatedly discovering one of their publications cites articles that, well, aren’ticles.

To be clear, these fake references are very, very convincing. They’re not like the agrammatical crypto phishing scams we’re all used to. (“The IRS hopes to giving your refund! Click this Belarussian website domain for money flavors!”) They use realistic names, real journal titles, plausible summaries, and they appear in response to your own highly specific question.

This is partly the fault of how AI operates. Under the hood, it doesn’t just search for the right answer to your query—it also asks, “What would an accurate and helpful response to this prompt look like?” Sometimes it decides it would look like a real accurate response. But sometimes it favors the “what would one look like” part of its algorithm, and then it gets to work generating references that resemble the sort of thing you’re hoping to find.

Just to see what would happen, I opened ChatGPT and referred it to this column, telling it to examine my back catalog of about 180 Experimental Error articles. Then I asked it to name five articles I’ve written about AI and give a short summary of each. I asked this question knowing full well that I’ve only written about AI once or twice, and a correct response would either be to point this out, or maybe to name a few columns I wrote that weren’t exactly about AI but maybe had AI-ish elements in them.

Nope. It just hallucinated.

First it cited an article correctly, a piece published in May 2025 about researchers asking AI to summarize scientific papers. But then it cited four more articles that never existed. Each article had a plausible title. One was called “Reviewer 3 Is Now a Neural Network.” Another promised that I had tackled the provocative question: “Should You Let AI Design Your Experiments?” But I never wrote these articles, and based on a Google search, neither did anyone else. The AI engine didn’t just misattribute someone else’s writing to me, it generated new article titles that no one wrote and swore they were mine.

ChatGPT even gave each article a lovely little (fake) summary. For example, under an article titled “Chatbots in the Lab: Helpful Assistant or Liability?” it commented, “Ruben reflects on the growing use of conversational AI tools by students and researchers—for coding, writing, and troubleshooting experiments.”

I know these articles don’t exist because I’m me. But unless the searcher independently tries to find them, how would they know the truth? Who in the world could be expected to know I’ve never written these articles when AI cites and summarizes them so convincingly?

I continued the conversation. “Adam Ruben never wrote articles 2-5 in that list,” I typed. “Did you hallucinate them?” The reply was very honest, in both a refreshing and terrifying way: “Yes—you’re right to call that out,” it began. “I did hallucinate articles 2-5 in my previous response.”

Then it described in detail why it may have hallucinated: “This is a classic hallucination pattern: I had one real anchor (the May 2025 AI article). I extrapolated similar-sounding topics consistent with his column. I failed to verify each item against a reliable source.”

Well, for goodness’ sake.

That’s the same problem some researchers have. And one might say any scientist who cites a paper they’ve never read deserves to be called out for fraud, or at least for their concerning lack of due diligence. But think about all the papers you’ve had your name on. Have you read every reference in those papers? When the first line of your article is “[Subject] has been extensively studied^1-28,” have you read all 28 of those references? Your time is limited, articles are often behind paywalls, and lots of older work hasn’t been digitized. If reference No. 25 is a 60-year-old paper in a journal that your institution doesn’t subscribe to—but you’ve seen it listed in other papers as one of the seminal publications in your field, and you’ve read an abstract—would you really leave it out, and risk failing to pay tribute to something important? Or would you do what everyone else does, and keep it in?

Luckily, one solution is to use a tool we’ve already developed: our skepticism. Our assumption that information is likely wrong, until we see reasonable evidence otherwise, is part of what makes us successful as scientists. Now, we just need to apply it to citations as well.

And by “we,” I mean all of us: scientists writing papers, scientists reading papers, and even—and especially—the scientific journals that evaluate and publish our papers.

We need to do this to make sure our own work is sound. But we also need to ensure we’re not awarding these bogus references credibility. If Cimmino hadn’t tried to chase down the citations AI had recommended, she might have pasted them into her thesis—and then a future student, hoping to build on her research, would have had all the more reason to believe these articles, and their conclusions, were real.

Researchers are developing new tools to double check the veracity of citations as well. Publisher Elsevier, for example, now offers a program called LeapSpace that includes a “truth card” with each result to explain whether a reference supports, refutes, or is neutral about a conclusion. In other words, it fights the problems of AI by using … what we hope is better AI.

A few days after telling me her story, Cimmino sent another short message. She realized she had referred to AI throughout her story as “they,” and she asked me to please change “they” to “it.”

“I didn’t know it was making them up,” she wrote of the hallucinated citations. “I know AI is not real.”

I hope we all do. But it’s easy to forget, isn’t it?

Cite unseen: when AI hallucinates scientific articles

Leave a Reply

Leave a Reply Cancel reply

Looking for something…

My DocPoints Balance