From busy parents to curious patients, when it comes to getting quick medical information and answers to our questions, more of us are relying on artificial intelligence.
There's just one problem: Popular AI tools may not only be giving the wrong answers, they may be confidently making things up, as a study by researchers at the Icahn School of Medicine at Mount Sinai in New York shows.
The team of researchers tested large language models (LLMs) such as Chat GPT, Google, Gemini and others, by asking a variety of medical questions. They noticed a troubling trend: when AI tools were asked questions containing fake medical terms, they didn't hesitate. They elaborated on the fiction, providing confident, also false, explanations.AI chatbots can be easily misled by false medical details and generate information that sounds plausible, but is entirely false.
Worrisome? Absolutely, But there's some good news, too. Researchers also found that when a simple one-line warning was added to the chatbot prompt, it dramatically reduced errors. Think of it as a speed-bump, a brief caution that helped the AI slow down and double-check before diving into a fictional diagnosis.
The difference was striking. Without the warning, the AI systems routinely spun detailed (and false) narratives. With the warning, these types of “hallucinations” dropped significantly.
“Even a single made-up term could trigger a detailed decisive response based entirely on fiction,” Eyal Klang, co-corresponding senior author and Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at Mount Sinai.
Fortunately, the researchers found a well-timed safety reminder built into the prompt made cut those errors nearly in half.
Why does this matter to you? Many patients now use AI to prepare for doctor visits or even make decisions about their treatment options. AI tools are designed to sound smart, but as this study shows, they are not necessarily accurate. They can “hallucinate”, a term used to describe when AI systems generate information that appears to be plausible but is entirely false.For now, AI tools in medicine are promising but not perfect. “Even a single made-up term could trigger a detailed decisive response based entirely on fiction.”
If a chatbot can turn a made-up condition into an entire treatment plan, what happens when a real person relies on that information? Misdiagnoses. Misinformed choices. Missed warning signs. That's where things get dangerous.
The researchers are now exploring how these simple cautionary prompts, and more sophisticated safety checks, can be embedded into chatbots before they're used in clinical settings. They're also testing their approach using real, de-identified patient records, which could help developers and hospitals evaluate their AI's system's safety before it ever touches a patient's chart.
“A single misleading phrase can prompt a confident yet entirely wrong answer,” Girish added. “The solution isn't to abandon AI in medicine, but to engineer tools that can spot dubious input, respond with caution, and ensure human oversight remains central.”
The bottom line is that AI's bedside manner needs work. Confidence isn't competence especially when advice is based on fiction. For now, AI tools in medicine are promising but not perfect. Mount Sinai's study sends a clear message: trust must be earned not assumed. Until then, AI can support, but never substitute for, the medical judgment of a trained clinician.
The study is published in Communications Medicine.