Mapping Abbreviations to Full Forms in Biomedical Articles
- Correspondence and reprints: Hong Yu, MS, MPhil, Department of Medical Informatics, 622 W 168th Street, Vanderbilt Clinic, 5th Floor, New York, NY 10032; e-mail: <hy52{at}columbia.edu>
- Received 11 January 2001
- Accepted 24 October 2001
Abstract
Objective To develop methods that automatically map abbreviations to their full forms in biomedical articles.
Methods The authors developed two methods of mapping defined and undefined abbreviations (defined abbreviations are paired with their full forms in the articles, whereas undefined ones are not). For defined abbreviations, they developed a set of pattern-matching rules to map an abbreviation to its full form and implemented the rules into a software program, AbbRE (for “abbreviation recognition and extraction”). Using the opinions of domain experts as a reference standard, they evaluated the recall and precision of AbbRE for defined abbreviations in ten biomedical articles randomly selected from the ten most frequently cited medical and biological journals. They also measured the percentage of undefined abbreviations in the same set of articles, and they investigated whether they could map undefined abbreviations to any of four public abbreviation databases (GenBank LocusLink, SWISSPROT, LRABR of the UMLS Specialist Lexicon, and BioABACUS).
Results AbbRE had an average 0.70 recall and 0.95 precision for the defined abbreviations. The authors found that an average of 25 percent of abbreviations were defined in biomedical articles and that of a randomly selected subset of undefined abbreviations, 68 percent could be mapped to any of four abbreviation databases. They also found that many abbreviations are ambiguous (i.e., they map to more than one full form in abbreviation databases).
Conclusion AbbRE is efficient for mapping defined abbreviations. To couple AbbRE with abbreviation databases for the mapping of undefined abbreviations, not only exhaustive abbreviation databases but also a method to resolve the ambiguity of abbreviations in the databases are needed.
Footnotes
-
This work was supported by research training grant LM07079 (HY), grant R01 LM06910 (GH), and grant R01 LM06274 (CF), all from the National Library of Medicine, and by DLI2 grant NSF 11S-9817434 from the National Science Foundation (CF).








