Study: Stanford's VeriFact uses AI to verify LLM-generated clinical records

Updated 01/02/2026

Researchers at Stanford University have developed a platform called VeriFact that pulls clinical data from a patient's EHR and uses an large language model to determine whether AI-generated documentation about that patient is accurate.

According to a study published in NEJM AI, researchers sought to test the accuracy of text generated by LLMs in the clinical setting compared with a patient's real medical record.

Researchers created VeriFact, a system that pulls relevant data from the EHR and analyzes it, using an "LLM-as-a-judge" approach to evaluate whether the generated statements are factually supported by the EHR data.

The researchers also introduced a clinician‑annotated benchmark dataset, VeriFact‑Brief Hospital Course (VeriFact‑BHC), that analyzes hospital discharge narratives into individual claims and labels whether each claim is supported by the actual EHR.

VeriFact achieved 93.2% agreement with clinicians.

The highest interrater agreement among clinicians was 88.5%, indicating that VeriFact can produce more consistent fact verification than humans.

【MORE】