rss
JAMIA 2004;11:392-402 doi:10.1197/jamia.M1552
  • Original Investigation
  • Research Paper

Automated Encoding of Clinical Documents Based on Natural Language Processing

  1. Carol Friedman,
  2. Lyudmila Shagina,
  3. Yves Lussier,
  4. George Hripcsak
  1. Affiliation of the authors: Department of Biomedical Informatics, College of Physicians and Surgeons, Columbia University, New York, NY
  1. Correspondence and reprints: Carol Friedman, PhD, Department of Biomedical Informatics, Columbia University, 622 West 168 Street, VC-5, New York, NY 10032; e-mail: <friedman{at}dbmi.columbia.edu>
  • Received 4 February 2004
  • Accepted 13 April 2004

Abstract

Objective The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method.

Methods An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts.

Results Recall of the system for UMLS coding of all terms was .77 (95% CI .72–.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79–.87). Recall of the system for extracting all terms was .84 (.81–.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87–.91), and precision of the experts ranged from .61 to .91.

Conclusion Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.

Footnotes

  • Supported by grants LM06274 and LM7659 from the National Library of Medicine.

Access policy for JAMIA

All content published in JAMIA is deposited with PubMedCentral by the publisher but with varying embargo times. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication. Research funded by government and other recognised agencies is deposited with a 12 month embargo. All other content is deposited with a 36 month embargo.

AMIA members log in here to access the full text of JAMIA.

Register for free content

Individuals may register for a free 30 day online trial to all content.

The Journal of the American Medical Informatics Association is published for the American Medical Informatics Association by BMJ Publishing Group Ltd.