“Understanding” Medical School Curriculum Content Using KnowledgeMap
- Affiliations of the authors: School of Medicine, Department of Biomedical Informatics, Vanderbilt School of Medicine, Nashville, Tennessee (JCD, JDS); Department of Biomedical Informatics, Vanderbilt School of Medicine, Nashville, Tennessee (RAM); Department of Medicine, Department of Biomedical Informatics, Vanderbilt School of Medicine, Nashville, Tennessee, USA (AS)
- Correspondence and reprints: Anderson Spickard, III, MD, MS, 7040 Medical Center East, Vanderbilt School of Medicine, Nashville, TN 37232; e-mail: < >
- Received 14 June 2002
- Accepted 5 March 2003
Objective To describe the development and evaluation of computational tools to identify concepts within medical curricular documents, using information derived from the National Library of Medicine's Unified Medical Language System (UMLS). The long-term goal of the KnowledgeMap (KM) project is to provide faculty and students with an improved ability to develop, review, and integrate components of the medical school curriculum.
Design The KM concept identifier uses lexical resources partially derived from the UMLS (SPECIALIST lexicon and Metathesaurus), heuristic language processing techniques, and an empirical scoring algorithm. KM differentiates among potentially matching Metathesaurus concepts within a source document. The authors manually identified important “gold standard” biomedical concepts within selected medical school full-content lecture documents and used these documents to compare KM concept recognition with that of a known state-of-the-art “standard”—the National Library of Medicine's MetaMap program.
Measurements The number of “gold standard” concepts in each lecture document identified by either KM or MetaMap, and the cause of each failure or relative success in a random subset of documents.
Results For 4,281 “gold standard” concepts, MetaMap matched 78% and KM 82%. Precision for “gold standard” concepts was 85% for MetaMap and 89% for KM. The heuristics of KM accurately matched acronyms, concepts underspecified in the document, and ambiguous matches. The most frequent cause of matching failures was absence of target concepts from the UMLS Metathesaurus.
Conclusion The prototypic KM system provided an encouraging rate of concept extraction for representative medical curricular texts. Future versions of KM should be evaluated for their ability to allow administrators, lecturers, and students to navigate through the medical curriculum to locate redundancies, find interrelated information, and identify omissions. In addition, the ability of KM to meet specific, personal information needs should be assessed.
The authors thank Mr. Michel Décary of Cogilex R&D, Inc., for providing the part-of-speech tagging software. The authors thank Alice Coogan, MD, David Wasserman, MD, Owen McGuiness, PhD, Terrence Dermody, MD, Luc Van-Kaer, PhD, Joseph Awad, MD, and Richard Shelton, MD, for their work toward establishing the gold standard terms in documents. Finally, the authors thank the National Library of Medicine for developing and making available the UMLS.
↵* Work was completed prior to graduation from medical school.
↵* In these cases, there were two components of KM's algorithm that equally caused a success or a failure, so each was given a score of 0.5.
↵* This appendix was written by H. Wayne Lambert, PhD, and is used here with his permission.