rss
J Am Med Inform Assoc 2009;16:89-102 doi:10.1197/jamia.M2541
  • Original Investigation
  • Viewpoint Paper

Auditing the Semantic Completeness of SNOMED CT Using Formal Concept Analysis

  1. Guoqian Jiang,
  2. Christopher G Chute
  1. Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN
  1. Correspondence: Dr. Guoqian Jiang, Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN 55905; e-mail: <jiang.guoqian{at}mayo.edu>
  • Received 23 June 2007
  • Accepted 23 September 2008

Abstract

Objective This study sought to develop and evaluate an approach for auditing the semantic completeness of the SNOMED CT contents using a formal concept analysis (FCA)–based model.

Design We developed a model for formalizing the normal forms of SNOMED CT expressions using FCA. Anonymous nodes, identified through the analyses, were retrieved from the model for evaluation. Two quasi-Poisson regression models were developed to test whether anonymous nodes can evaluate the semantic completeness of SNOMED CT contents (Model 1), and for testing whether such completeness differs between 2 clinical domains (Model 2). The data were randomly sampled from all the contexts that could be formed in the 2 largest domains: Procedure and Clinical Finding. Case studies (n = 4) were performed on randomly selected anonymous node samples for validation.

Measurements In Model 1, the outcome variable is the number of fully defined concepts within a context, while the explanatory variables are the number of lattice nodes and the number of anonymous nodes. In Model 2, the outcome variable is the number of anonymous nodes and the explanatory variables are the number of lattice nodes and a binary category for domain (Procedure/Clinical Finding).

Results A total of 5,450 contexts from the 2 domains were collected for analyses. Our findings revealed that the number of anonymous nodes had a significant negative correlation with the number of fully defined concepts within a context (p < 0.001). Further, the Clinical Finding domain had fewer anonymous nodes than the Procedure domain (p < 0.001). Case studies demonstrated that the anonymous nodes are an effective index for auditing SNOMED CT.

Conclusion The anonymous nodes retrieved from FCA-based analyses are a candidate proxy for the semantic completeness of the SNOMED CT contents. Our novel FCA-based approach can be useful for auditing the semantic completeness of SNOMED CT contents, or any large ontology, within or across domains.

Footnotes

  • Supported in part by R01 LM07319.

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.