AI beats doctors in accuracy of diagnoses and treatment, study finds

2025-04-06 09:39:26 英文原文

作者：Eitan Gefen|Add a commentPrintFind an error? Report us

Study led by Israeli expert finds AI-generated medical recommendations surpass human accuracy in urgent care in most cases, prompting cautious optimism for better care in the future

Could artificial intelligence soon greet patients at the clinic, diagnose illnesses and recommend treatment plans? A new study led by Prof. Dan Zeltzer, a digital health expert from Tel Aviv University's School of Economics, suggests that day may be closer than expected.

The study, published Friday in Annals of Internal Medicine and presented at the annual conference of the American College of Physicians (ACP), found that AI-generated diagnostic and treatment recommendations were more accurate than those of human doctors in an urgent care setting.

The research was conducted at Cedars-Sinai Medical Center’s virtual urgent care center in Los Angeles, which operates in partnership with Israeli startup K Health.

The virtual care center at Cedars-Sinai offers video consultations with physicians specializing in family and emergency medicine. Recently, it integrated an AI system that uses machine learning to conduct an initial patient interview via structured chat, incorporating the patient's medical history and suggesting detailed diagnoses and treatments — including prescriptions, tests and referrals — for the attending physician to review.

Here's how it works: After chatting with the AI, the patient proceeds to a video visit with a doctor who makes the final call. The AI, trained on millions of anonymized medical records, only provides recommendations when it has high confidence — roughly in four out of five cases. In about 20% of cases, it withholds a recommendation due to uncertainty.

Zeltzer explained that, in a previous study published last year, the team compared the AI’s diagnostic suggestions to those of physicians and found significant alignment on common symptoms, especially those related to respiratory and urinary issues. The new study took it a step further, comparing the quality of recommendations using a panel of experienced physicians.

Researchers analyzed 461 online patient visits in July 2024 involving adults with relatively common complaints — respiratory, urinary and eye symptoms, as well as gynecological and dental issues. In each case, the AI provided diagnostic and treatment suggestions before the patient was seen by a physician.

A panel of doctors with at least 10 years of clinical experience then evaluated all recommendations — AI and human — using a four-tier scale: optimal, acceptable, inadequate, or potentially harmful. Evaluations considered each patient’s full medical record, the consultation transcript and clinical data.

• AI recommendations were rated as optimal in 77% of cases, compared to 67% for physicians.

• Potentially harmful recommendations were less frequent with AI (2.8%) than with doctors (4.6%).

• In 68% of visits, the AI and physician received the same rating.

• In 21% of cases, the AI outperformed the physician; in 11%, the reverse was true.

“These findings surprised us,” said Zeltzer. “Across a wide range of symptoms, the expert panel rated the AI’s recommendations as optimal more often — and dangerous less often — than those made by physicians.”

One notable example was antibiotic prescribing. “Doctors sometimes prescribe antibiotics unnecessarily, such as in viral infections where they’re ineffective,” said Zeltzer. “Patients may pressure physicians into giving antibiotics but AI doesn’t budge. It won’t recommend treatment that goes against clinical guidelines.”

AI also proved better at quickly cross-referencing medical history. “Doctors working under pressure don’t always look at the full patient record,” he noted. “AI can, instantly.”

Take urinary tract infections: treatment depends on whether it’s a first occurrence, a recurrence or a case where past antibiotics failed. “Some doctors didn’t factor that in and offered less precise treatments,” Zeltzer said. “The AI picked it up and adjusted accordingly.”

Still, the AI missed some clinical nuances. “Doctors have the advantage of observing the patient,” Zeltzer noted. “Someone with COVID-19 might report shortness of breath. AI would refer them to the ER but a doctor on video call might see they’re not actually struggling to breathe — it’s just nasal congestion. In such cases, human judgment was more accurate.”

When asked about the risk of AI generating false or misleading recommendations — so-called “hallucinations” — Zeltzer explained that this study used a different class of AI than popular language models like ChatGPT.

“Those models were trained on internet text and built to generate plausible-sounding responses, not to assess probabilities or medical accuracy,” he said. “By contrast, this AI system was trained on real medical data and designed specifically to calculate the likelihood of diagnoses. If confidence is low, it doesn’t make a recommendation.”

The system used in the study issued a recommendation in 80% of cases and withheld it in 20%. It also aligns its suggestions with established medical guidelines, which enhances reliability in high-stakes clinical settings.

Why test this now? “This virtual clinic gave us a rare chance to evaluate AI in real-world conditions,” said Zeltzer. “A lot of AI research is based on medical exam questions or textbook cases — but actual patients are messier. They don’t always describe symptoms clearly. That’s the real challenge.”

As for what this means going forward, Zeltzer is cautiously optimistic. “We can’t generalize these findings to all medical conditions but in many cases, the algorithm gave more accurate advice than the average doctor — even in a very good hospital,” he said. “This suggests real potential for improving care and saving time.”

Due to technical limitations, the researchers couldn’t determine whether doctors saw or used the AI’s suggestions, so the study didn’t measure how AI influenced physician decision making. A follow-up study is underway.

The results, Zeltzer added, show that AI can reach a high level of precision and have practical applications in medicine.

Still, many questions remain: How should doctors and AI collaborate? When should recommendations be shown? Should algorithms ever make decisions autonomously? What safeguards need to be in place?

“The pace of innovation is fast but implementing it responsibly takes time,” Zeltzer said. “We’ll likely face new challenges as we go. But it’s not hard to imagine a future where algorithms help flag key information, support decisions and reduce human error in medicine.”

关于《AI beats doctors in accuracy of diagnoses and treatment, study finds》的评论

暂无评论

发表评论

摘要

A study led by Prof. Dan Zeltzer from Tel Aviv University shows that AI-generated medical recommendations are more accurate than those of human doctors in urgent care settings for common complaints like respiratory and urinary issues. Conducted at Cedars-Sinai Medical Center’s virtual urgent care center, the research found AI recommendations rated as optimal 77% of the time compared to 67% for physicians, with fewer potentially harmful suggestions. The study suggests potential improvements in medical care but raises questions about collaboration between doctors and AI, decision-making autonomy, and necessary safeguards.

AI beats doctors in accuracy of diagnoses and treatment, study finds

Study led by Israeli expert finds AI-generated medical recommendations surpass human accuracy in urgent care in most cases, prompting cautious optimism for better care in the future

关于《AI beats doctors in accuracy of diagnoses and treatment, study finds》的评论

发表评论

摘要

相关新闻

相关讨论