rss
JAMIA 1999;6:374-392 doi:10.1136/jamia.1999.0060374
  • Original Investigation
  • Model Formulation

Automated Diagnosis of Data-Model Conflicts Using Metadata

  1. Richard O Chen,
  2. Russ B Altman
  1. Affiliation of the authors: Stanford University, Stanford, California
  1. Corresdpondence and reprints: Russ B. Altman, MD, PhD, Stanford Medical Informatics, Stanford University School of Medicine, Medical School Office Building, Room X-215, 251 Campus Drive, Stanford, CA 94305-5479. e-mail: 〈altman{at}smistanford.edu
  • Received 3 December 1998
  • Accepted 26 April 1999

Abstract

The authors describe a methodology for helping computational biologists diagnose discrepancies they encounter between experimental data and the predictions of scientific models. The authors call these discrepancies data-model conflicts. They have built a prototype system to help scientists resolve these conflicts in a more systematic, evidence-based manner.

In computational biology, data-model conflicts are the result of complex computations in which data and models are transformed and evaluated. Increasingly, the data, models, and tools employed in these computations come from diverse and distributed resources, contributing to a widening gap between the scientist and the original context in which these resources were produced. This contextual rift can contribute to the misuse of scientific data or tools and amplifies the problem of diagnosing data-model conflicts. The authors' hypothesis is that systematic collection of metadata about a computational process can help bridge the contextual rift and provide information for supporting automated diagnosis of these conflicts.

The methodology involves three major steps. First, the authors decompose the data-model evaluation process into abstract functional components. Next, they use this process decomposition to enumerate the possible causes of the data-model conflict and direct the acquisition of diagnostically relevant metadata. Finally, they use evidence statically and dynamically generated from the metadata collected to identify the most likely causes of the given conflict. They describe how these methods are implemented in a knowledge-based system called Grendel and show how Grendel can be used to help diagnose conflicts between experimental data and computationally built structural models of the 30S ribosomal subunit.

Footnotes

  • This work was supported in part by grants LM05652 and LM06422 from the National Institutes of Health, grant DBI-9600637 from the National Science Foundation, and grants from IBM and SUN Microsystems.

Access policy for JAMIA

All content published in JAMIA is deposited with PubMedCentral by the publisher but with varying embargo times. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication. Research funded by government and other recognised agencies is deposited with a 12 month embargo. All other content is deposited with a 36 month embargo.

AMIA members log in here to access the full text of JAMIA.

Register for free content

Individuals may register for a free 30 day online trial to all content.

The Journal of the American Medical Informatics Association is published for the American Medical Informatics Association by BMJ Publishing Group Ltd.