OpenAI Launches HealthBench, a Dataset That Benchmarks Healthcare AI Models

Updated 05/12/2025

OpenAI has a new open-source large language model called HealthBench that lets the health care industry benchmark AI models, the company said in a blog post on Monday.

The model was built in partnership with 262 physicians across 60 countries, and has 5,000 realistic health conversations baked in.

Each response is measured against a physician-written rubric criterion, with each criterion weighted to match the physician's judgement.

The model can handle 49 languages, including Amharic and Nepali, and includes 26 medical specialties, such as neurological surgery and ophthalmology.

【MORE】