AI can't beat doctors at this clinical reasoning test

AI can’t beat doctors at this clinical reasoning test

A study shows large language models often perform worse than senior residents in script concordance tests because of overconfidence.

With all the hype around AI passing medical exams, researchers have put AI models like ChatGPT, Google Gemini and DeepSeek through their paces to see if they can reason like doctors when given new information.

It turns out that, on the whole, they cannot.

A US-led study has concluded that the so-called large language models often perform worse than senior residents in script concordance tests (SCTs) because of their overconfidence.

SCTs involve providing a participant with a clinical vignette and asking them to form a hypothesis, including what condition the patient has and what treatment is needed.

Welcome to AusDoc. Enabling and empowering the medical community to come together.

Sign in

You're missing out on this article.