AI chatbots pose 'dangerous' risk when giving medical advice, study suggests

AI chatbots pose 'dangerous' risk when giving medical advice, study suggests

Contradictions at the Heart of Automated Advice

In a study led by the University of Oxford, 1,300 participants were presented with everyday medical dilemmas—from persistent headaches to the sleeplessness of new parenthood. Each tested scenario was met with a range of responses when filtered through an AI chatbot. According to Dr Rebecca Payne, the principal clinician behind the research, the variable quality of these answers leaves users in a risky no-man’s land: “dangerous” was the term she chose.

More troubling is how AI advice changes depending on how questions are asked. The study’s authors observed that laypeople, uncertain of what details to share, often provide incomplete symptom accounts. This fragmented input is then interpreted by the chatbot, which may respond with a list of potential diagnoses lacking any meaningful prioritisation—or a clear link to action. Dr Adam Mahdi, another senior researcher, noted that this ambiguity forces users to play guessing games with their health: ‘People share information gradually. They leave things out, they don’t mention everything… this is exactly when things would fall apart.’

This variability poses a more insidious challenge: inconsistent advice doesn’t just risk misguidance. It actively undermines user confidence, blurring the line between credible guidance and dangerous speculation. The “black box” quality of AI outputs means patients may believe what sounds authoritative, even if it is, on closer inspection, pure conjecture.

The Allure—and Limits—of Digital Wellbeing

That AI is being used for mental health and wellness support at scale speaks to both need and opportunity. Digital diagnostics promise anonymity and immediacy—rare commodities in a system sometimes stretched to breaking point. Yet the Oxford research underscores a sharp limitation. The existing generation of generalist chatbots is simply not up to the complexity of clinical triage, particularly when users struggle to articulate symptoms, or when nuance is lost in translation.

Clinical experts are quick to acknowledge another risk, unaddressed in most mainstream hype: bias. Dr Amber W. Childs, a psychiatry professor at Yale, reminds us that AI trained on the sum total of existing medical literature risks amplifying embedded prejudices and gaps. “A chatbot is only as good a diagnostician as seasoned clinicians are, which is not perfect either,” Childs points out. No small reminder that even the promise of high-tech neutrality does not guarantee better, or even adequate, care.

Still, the rise of AI support for wellbeing appears inexorable. While swathes of NHS patients still wait weeks for a GP appointment, it is hardly surprising people turn to tools offering a semblance of guidance. The technology may be imperfect, but the alternatives are often unavailable—or unattainable—in practice.

Specialised Models and the Path Forward

What’s interesting here is the degree to which chatbot reliability might not be a static target, but a moving one. The Oxford team’s critique focused on general-purpose AI assistants. Yet, in recent months, major players like OpenAI and Anthropic have rolled out dedicated health versions of their chatbots, promising both improved medical competency and safety features attuned to the real-world stakes of triage and advice.

Dr Bertalan Meskó, editor of The Medical Futurist and a close watcher of health technology, told researchers these specialised models could “definitely yield different results in a similar study.” If true, it would mark a substantial shift in both the safety and usefulness of digital health support. However, such systems are neither widely tested nor universally available in the UK—and remain firmly within the realm of private sector experimentation for now.

It is also important to note that no AI chatbot, regardless of sophistication or branding, is licensed to diagnose or prescribe treatment in the UK. These tools may be helpful as a prompt for further discussion or as a way to translate medical jargon, but they are not a substitute for a consultation with a qualified healthcare professional. Some NHS trusts are quietly piloting AI triage systems, but every deployment so far has included strong disclaimers and clear paths to escalation if risk is detected.

For the average user, something crucial gets lost amid rapid innovation: the need for robust guidance on when and how to trust digital health advice. Left unchecked, misleading AI advice could delay necessary medical attention, compounding rather than solving existing healthcare disparities.

Guardrails and Regulation: What Will Make AI Safer?

The next evolution in digital diagnostics will be regulatory as much as technical. The key question is not how insightful chatbots can become, but how safeguards can keep pace. Experts from all sides now call for clearer guardrails—national regulations, updated clinical standards, and oversight mechanisms as rigorous as those faced by any medical device or pharmaceutical.

There are signs that regulatory frameworks are beginning to emerge. The Medicines and Healthcare products Regulatory Agency (MHRA) has started mapping out standards for artificial intelligence in direct patient care, though concrete requirements remain nascent. The real test will be whether these can move quickly enough to match the accelerating pace of AI development.

For users navigating a healthcare landscape peppered with digital tools—some promising, others a recipe for confusion—caution is the order of the day. Until health-specific AI demonstrates consistent improvements, and official guidance catches up, the safest course remains a familiar one: healthy scepticism, with any worrisome symptom sent first to a GP, not a chatbot.

How the next generation of AI chatbots will reshape the delivery of medical advice is far from settled. What is clear, as this new research underscores, is that hype about digital diagnostics must be matched by a seriousness about safety, lest we turn to machines for comfort at the cost of our wellbeing. The coming months will test whether specialist models—and the rules built to contain them—can finally bridge the trust gap between patient, provider, and machine.