Automating the Assignment of Diagnosis Codes to Patient Encounters Using Example-based and Machine Learning Techniques
- Affiliation of the authors: Division of Biomedical Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN
- Correspondence and reprints: Serguei V.S. Pakhomov, PhD, 200 First Street SW, Rochester, MN 55905; e-mail: <pakhomov.serguei{at}mayo.edu>
- Received 6 February 2006
- Accepted 30 May 2006
Abstract
Objective Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.
Methods We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed.
Measurements Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed.
Results At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.
Conclusion Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.
Footnotes
-
The authors thank Dr. Robyn McClelland for her assistance with designing the test sets and providing us with excellent statistician's expertise and perspective. The authors also thank Barbara Abbot and Deborah Albrecht for helping us with developing the reference standard test set and for sharing their expertise and experience in medical coding.
-
↵1 ICD-8 is the 8th edition of the International Classification of Diseases. ICD-10 is the most current edition and is used for mortality coding world-wide; ICD-9CM (Clinically Modified) is usually used for billing in the United States. The Mayo research coding system is based upon a morbidity oriented adaptation of ICD-8, HICDA-2 which has been augmented with concepts whose granularity and relevance are more appropriate for health science research.









