AI can’t beat doctors at this clinical reasoning test
With all the hype around AI passing medical exams, researchers have put AI models like ChatGPT, Google Gemini and DeepSeek through their paces to see if they can reason like doctors when given new information.
It turns out that, on the whole, they cannot.
A US-led study has concluded that the so-called large language models often perform worse than senior residents in script concordance tests (SCTs) because of their overconfidence.
SCTs involve providing a participant with a clinical vignette and asking them to form a hypothesis, including what condition the patient has and what treatment is needed.