Topological Analysis of Large-scale Biomedical Terminology Structures
- Affiliations of the authors: Department of Biomedical Informatics (MEB, SBJ), Columbia University, New York, NY; Department of Medicine (YAL), University of Chicago, Chicago, IL
- Correspondence: Stephen Johnson, Department of Biomedical Informatics, Columbia University, Vanderbilt Clinic, 5th Floor, 622 West 168th Street, New York, NY 10032; email: <stephen.johnson{at}dbmi.columbia.edu>
- Received 10 February 2006
- Accepted 26 July 2007
Abstract
Objective To characterize global structural features of large-scale biomedical terminologies using currently emerging statistical approaches.
Design Given rapid growth of terminologies, this research was designed to address scalability. We selected 16 terminologies covering a variety of domains from the UMLS Metathesaurus, a collection of terminological systems. Each was modeled as a network in which nodes were atomic concepts and links were relationships asserted by the source vocabulary. For comparison against each terminology we created three random networks of equivalent size and density.
Measurements Average node degree, node degree distribution, clustering coefficient, average path length.
Results Eight of 16 terminologies exhibited the small-world characteristics of a short average path length and strong local clustering. An overlapping subset of nine exhibited a power law distribution in node degrees, indicative of a scale-free architecture. We attribute these features to specific design constraints. Constraints on node connectivity, common in more synthetic classification systems, localize the effects of changes and deletions. In contrast, small-world and scale-free features, common in comprehensive medical terminologies, promote flexible navigation and less restrictive organic-like growth.
Conclusion While thought of as synthetic, grid-like structures, some controlled terminologies are structurally indistinguishable from natural language networks. This paradoxical result suggests that terminology structure is shaped not only by formal logic-based semantics, but by rules analogous to those that govern social networks and biological systems. Graph theoretic modeling shows early promise as a framework for describing terminology structure. Deeper understanding of these techniques may inform the development of scalable terminologies and ontologies.
Footnotes
-
Support for this research was provided by NLM training grant5T15LM07079. This work was partially supported by grantsLM008308-01 and 1U54CA121852. The authors would like to thankDrs. Olivier Bodenreider, James Cimino, William Hole, and AdamRothschild, who provided invaluable guidance during the preparationof this manuscript. We also thank Drs. Patrick Mary and DavidAuber for support with Tulip software. In addition, this manuscripthas benefited greatly from the insightful comments of two anonymousreviewers. We thank them for their diligence.
Authors Yves A. Lussier and Stephen B. Johnson contributed equally to the work.









