In India, the most populous country in the world, efficient management — access, storage and retrieval — of healthcare data is increasingly critical. The researchers explored how artificial intelligence (AI) can be harnessed to de-identify patient records, ensuring that sensitive information remains confidential while still being useful for research and policy-making.
To mitigate these risks, healthcare data must be de-identified, stripping it of any personal information that could reveal the patient’s identity. Natural Language Processing (NLP) can scan through text, identify personal health information (PHI), and mask it.
The study from IIT Kanpur and Miimansa tackled this challenge head-on. Using a dataset of fully de-identified discharge summaries from an Indian hospital (the Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow), the researchers ran existing de-identification models, including commercial solutions.
【MORE】