At Boston-based Mass General Brigham, clinical researchers have created a new framework for assessing how well large language models understand clinical text in electronic health records, case reports and patient-doctor consultations.
The new online tool, Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text, or BRIDGE, could help clinicians evaluate and compare LLMs to use in specific contexts, they said.
MGB's multilingual benchmark evaluated how LLMs comprehend clinical patient-care text, including language used in EHRs, across nine languages.
The researchers used BRIDGE to evaluate 95 LLMs from 59 clinical sources on 87 real-world clinical tasks across 14 clinical specialties, triage, information extraction, diagnosis, prognosis, and billing coding.
While the highest performing LLM scored as high as 92% on standardized medical exams, according to an announcement from MGB, it earned only 44.8% on BRIDGE.
【MORE】