Deidentifying Medical Documents with Local, Privacy-Preserving Large Language Models: The LLM-Anonymizer

刊登時間 03/27/2025

This study discusses the development and evaluation of the LLM-Anonymizer, a tool designed to deidentify medical documents using local, privacy-preserving large language models (LLMs). The tool aims to automate the anonymization process, which is challenging due to the need to balance privacy with the usefulness of data for research.

The study tested eight LLMs on 250 clinical letters, demonstrating high accuracy and sensitivity in removing personal identifying information (PII). The LLM-Anonymizer achieved a success rate of 99.24% with the Llama-3 70B model, missing only 0.76% of PII and mistakenly redacting 2.43% of characters.

The tool is open-source, user-friendly, and operates on local hardware without requiring programming skills. It supports various document formats and includes a browser-based interface for easy use.

The study highlights the potential of LLMs in facilitating secure and efficient deidentification of medical text data, addressing key challenges in medical data sharing.

The tool outperformed existing anonymization tools, CliniDeID and Presidio, in sensitivity and accuracy. Future improvements include extending evaluation criteria to comply with HIPAA and testing on larger, more diverse datasets. The LLM-Anonymizer represents a significant advance in privacy-preserving technologies for healthcare data management.

【MORE】