rss
JAMIA 2008;15:559-568 doi:10.1197/jamia.M2732
  • Original Investigation
  • Model Formulation

semCDI: A Query Formulation for Semantic Data Integration in caBIG

  1. E Patrick Shironoshitaa,
  2. Yves R Jean-Marya,
  3. Ray M Bradleya,
  4. Mansur R Kabukaa,b
  1. aINFOTECH Soft, Inc., Miami, FL
  2. bUniversity of Miami, Coral Gables, FL
  1. Correspondence and reprints: Dr. Mansur R. Kabuka, INFOTECH Soft, Inc., 9200 Dadeland Blvd., Ste 620, Miami, FL 33156; email: kabuka{at}infotechsoft.com
  • Received 28 January 2008
  • Accepted 16 April 2008

Abstract

Objectives To develop mechanisms to formulate queries over the semantic representation of cancer-related data services available through the cancer Biomedical Informatics Grid (caBIG).

Design The semCDI query formulation uses a view of caBIG semantic concepts, metadata, and data as an ontology, and defines a methodology to specify queries using the SPARQL query language, extended with Horn rules. semCDI enables the joining of data that represent different concepts through associations modeled as object properties, and the merging of data representing the same concept in different sources through Common Data Elements (CDE) modeled as datatype properties, using Horn rules to specify additional semantics indicating conditions for merging data.

Validation In order to validate this formulation, a prototype has been constructed, and two queries have been executed against currently available caBIG data services.

Discussion The semCDI query formulation uses the rich semantic metadata available in caBIG to build queries and integrate data from multiple sources. Its promise will be further enhanced as more data services are registered in caBIG, and as more linkages can be achieved between the knowledge contained within caBIG's NCI Thesaurus and the data contained in the Data Services.

Conclusion semCDI provides a formulation for the creation of queries on the semantic representation of caBIG. This constitutes the foundation to build a semantic data integration system for more efficient and effective querying and exploratory searching of cancer-related data.

Footnotes

  • This work is supported by NIH grant 1R43CA132293. The authors also wish to acknowledge the contribution of Mr. Thomas Taylor and Mr. Michael Ryan of INFOTECHSoft, Inc., and the insights given by Drs. Thomas Deisboeck at Massachusetts General Hospital and Drs. Robert Clark and Stephen Byers at Georgetown University. The intellectual property rights for the semCDI query formulation presented in this paper are held by INFOTECHSoft, Inc.

Access policy for JAMIA

All content published in JAMIA is deposited with PubMedCentral by the publisher but with varying embargo times. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication. Research funded by government and other recognised agencies is deposited with a 12 month embargo. All other content is deposited with a 36 month embargo.

The Journal of the American Medical Informatics Association is published for the American Medical Informatics Association by BMJ Publishing Group Ltd.