MGB online leaderboard tracks LLM patient care performance

刊登時間 06/19/2026

At Boston-based Mass General Brigham, clinical researchers have created a new framework for assessing how well large language models understand clinical text in electronic health records, case reports and patient-doctor consultations.

The new online tool, Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text, or BRIDGE, could help clinicians evaluate and compare LLMs to use in specific contexts, they said.

MGB's multilingual benchmark evaluated how LLMs comprehend clinical patient-care text, including language used in EHRs, across nine languages.

The researchers used BRIDGE to evaluate 95 LLMs from 59 clinical sources on 87 real-world clinical tasks across 14 clinical specialties, triage, information extraction, diagnosis, prognosis, and billing coding.

While the highest performing LLM scored as high as 92% on standardized medical exams, according to an announcement from MGB, it earned only 44.8% on BRIDGE.

【MORE】