rss
JAMIA 2007;14:467-477 doi:10.1197/jamia.M2314
  • Original Investigation
  • Research Paper

Semantic Classification of Biomedical Concepts Using Distributional Similarity

  1. Jung-Wei Fan,
  2. Carol Friedman
  1. Affiliations of the authors: Department of Biomedical Informatics, Columbia University, New York, NY
  1. Correspondence and reprints: Carol Friedman, PhD, Department of Biomedical Informatics, Vanderbilt Clinic, 5th Floor, 622 West 168th Street, New York, NY 10032; e-mail: <carol.friedman{at}dbmi.columbia.edu>
  • Received 25 October 2006
  • Accepted 9 April 2007

Abstract

Objective To develop an automated, high-throughput, and reproducible method for reclassifying and validating ontological concepts for natural language processing applications.

Design We developed a distributional similarity approach to classify the Unified Medical Language System (UMLS) concepts. Classification models were built for seven broad biomedically relevant semantic classes created by grouping subsets of the UMLS semantic types. We used contextual features based on syntactic properties obtained from two different large corpora and used α-skew divergence as the similarity measure.

Measurements The testing sets were automatically generated based on the changes by the National Library of Medicine to the semantic classification of concepts from the UMLS 2005AA to the 2006AA release. Error rates were calculated and a misclassification analysis was performed.

Results The estimated lowest error rates were 0.198 and 0.116 when considering the correct classification to be covered by our top prediction and top 2 predictions, respectively.

Conclusion The results demonstrated that the distributional similarity approach can recommend high level semantic classification suitable for use in natural language processing.

Footnotes

  • The authors thank NLM’s Dr. Alan Aronson, Guy Divita, and James Mork for help with the MetaMap program and the MBR database. We also would like to thank Dr. George Hripcsak for performing the expert evaluation to determine the semantic classification for a set of UMLS concepts. We thank Jessica Ancker and Chintan Patel for discussing some questionable UMLS classifications and thank Dr. Peter Hung for validating an example.

  • This work was supported by Grants R01 LM7659 and R01 LM8635 from the National Library of Medicine.

Access policy for JAMIA

All content published in JAMIA is deposited with PubMedCentral by the publisher but with varying embargo times. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication. Research funded by government and other recognised agencies is deposited with a 12 month embargo. All other content is deposited with a 36 month embargo.

The Journal of the American Medical Informatics Association is published for the American Medical Informatics Association by BMJ Publishing Group Ltd.