Translational bioinformatics: linking knowledge across biological and clinical realms
- Indra Neil Sarkar1,2,3,
- Atul J Butte4,
- Yves A Lussier5,6,7,
- Peter Tarczy-Hornoch8,9,10,11,
- Lucila Ohno-Machado12
- 1Center for Clinical and Translational Science, University of Vermont, Burlington, Vermont, USA
- 2Department of Microbiology and Molecular Genetics, College of Medicine, University of Vermont, Burlington, Vermont, USA
- 3Department of Computer Science, College of Engineering and Mathematical Sciences, University of Vermont, Burlington, Vermont, USA
- 4Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, USA
- 5Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, USA
- 6UC Comprehensive Cancer Center, Ludwig Centre for Metastasis Research, University of Chicago, Chicago, Illinois, USA
- 7Institute of Genomics and Systems Biology, Institute for Translational Medicine, Computational Institute, University of Chicago, Chicago, Illinois, USA
- 8Division of Biomedical and Health Informatics, University of Washington, Seattle, Washington, USA
- 9Institute of Translational Health Sciences, University of Washington, Seattle, Washington, USA
- 10Institute for Genomic Medicine, University of Washington, Seattle, Washington, USA
- 11Department of Computer Science, University of Washington, Seattle, Washington, USA
- 12Division of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
- Correspondence to Dr Indra Neil Sarkar, Center for Clinical and Translational Science, University of Vermont, 89 Beaumont Avenue, Given Courtyard N309, Burlington, VT 05405, USA;
- Received 14 March 2011
- Accepted 19 April 2011
- Published Online First 10 May 2011
Nearly a decade since the completion of the first draft of the human genome, the biomedical community is positioned to usher in a new era of scientific inquiry that links fundamental biological insights with clinical knowledge. Accordingly, holistic approaches are needed to develop and assess hypotheses that incorporate genotypic, phenotypic, and environmental knowledge. This perspective presents translational bioinformatics as a discipline that builds on the successes of bioinformatics and health informatics for the study of complex diseases. The early successes of translational bioinformatics are indicative of the potential to achieve the promise of the Human Genome Project for gaining deeper insights to the genetic underpinnings of disease and progress toward the development of a new generation of therapies.
- Translational bioinformatics
- systems medicine
- systems biology
- biomedical informatics
- knowledge representation
- information retrieval
- modeling physiologic and disease processes
- linking the genotype and phenotype
- identifying genome and protein structure and function
- visualization of data and knowledge
- simulation of complex systems (at all levels: molecules to work groups to organizations)
- knowledge representations
- uncertain reasoning and decision theory
- computational methods
- statistical analysis of large datasets
- advanced algorithms
- text and data mining methods
- natural-language processing
- automated learning
The study of complex diseases requires the effective integration and analysis of disparate features that originate from genotypic, phenotypic, and environmental sources. In contrast to microscopic approaches that focus on detailed analyses of a single data type, a macroscopic approach offers a holistic view for exploring systems of relationships.1 Meaningful insights from a systems theory approach require the coalescence of many, often intractable, heterogeneous data types.2 Traditionally, biomedical informatics innovations have focused (‘microscopically’) on innovations constrained to particular domains3 (eg, clinical innovations in health informatics; biological innovations in bioinformatics). This has led to a perceived gulf between bioinformatics and health informatics, thus decreasing the potential impact of a ‘macroscopic’ approach. Recent years have seen recognition of the growing need to bridge these domains through the development of trans-disciplinary training programs and curricula4 as well as venues specifically designed to share innovations that span the laboratory and clinical spaces (eg, the AMIA Summit on Translational Bioinformatics). Translational bioinformatics (TBI) has thus emerged as a systems theory approach to bridge the biological and clinical divide through a combination of innovations and resources across the entire spectrum of biomedical informatics.5 Along with complementary areas of emphasis, such as those focused on developing systems and approaches within clinical research contexts,6 insights from TBI may enable a new paradigm for the study and treatment of disease.
The rapid escalation of activity in TBI can be attributed to parallel advancements in the biological and clinical realms. In biology, we have seen unprecedented advances in technology, such as those associated with generation of molecular sequences.7 In healthcare, we are observing a new era of clinical data acquisition and decision support that is driven by Federal legislation fostering adoption of electronic health records and enablement of seamless exchange of health information.8 9 The challenges have been paralleled in the biological and clinical realms, where there are common challenges in heterogeneous data integration, missing data, and semantic mapping. Nonetheless, opportunities to develop linkages between genetic and clinical information are also increasing as a result of participatory initiatives, such as those promoted by some direct-to-consumer genetic test vendors.10 Furthermore, there is great opportunity to leverage complementary approaches to address these common challenges (eg, some of the tools developed by clinical research informatics researchers6).
The promise of the $2.7 billion Human Genome Project was to enable scientists to understand the genetic basis of human disease.11 However, nearly a decade since the completion of the first draft of the human genome,12 there is still much to be elucidated. Through technological and computational advances, the $1000 genome is becoming a very real possibility.13 The availability of a large number of complete human genomes with clinical, phenotype, and environmental information may enable a new paradigm for the development of new sets of hypotheses pertaining to complex diseases, such as those that involve multiple genes and environmental parameters.14 A major goal of TBI is thus to develop informatics approaches for linking across traditionally disparate data and knowledge sources enabling both the generation and testing of new hypotheses.15 As large volumes of linked biological and clinical data become available, the complexity of disease may be dissected using novel TBI approaches designed in silico, but validated in traditional in vitro or even in vivo interventions.
Building on previous successes
TBI is built on the successes of research that have evolved in the 30 years since the first use16 of the term ‘bioinformatics.’ Four notable areas germane to the present discourse are clinical genomics, genomic medicine, pharmacogenomics, and genetic epidemiology (figure 1). The acceptance of clinical genomics (which has the purpose of identifying clinically relevant molecular biomarkers) by the clinical community can be measured by the growing number of clinically relevant genetic tests.17 Genomic medicine, or ‘personalized medicine,’ (which aims to identify genotype–phenotype correlations relevant to individuals, or haplotype variation) is positioned to uncover large-scale genotype–phenotype associations as a result of genome-wide testing and increased resolution of representation of clinical data. Pharmacogenomics may also benefit from ascertaining correlations with data captured for clinical purposes (eg, such as captured in electronic health records). For example, it may enable correlation of genomic measurements with clinical phenotypes observed relative to pharmacological substances (eg, as listed in the Pharmacogenomics Knowledge Base (PharmGKB)18). It may also potentially provide patient-specific prescribing advice through decision support systems. Finally, genetic epidemiology is rising to new levels with the aggregation of genome-based data alongside public health and environmental registries (eg, such as cataloged in HuGENet19). Collectively, these sub-disciplines of bioinformatics have been suggested as core to the integration of biological and health data.20 However, the mere availability of observations or statistically significant associations is of little practical value without explanations of potential clinical utility. This challenge of finding true biomedical explanations has been reflected before in medicine, for example, when improved methods for acquiring physiological data were developed.21
The ability to sequence a patient's genome as routinely as other routine clinical laboratory tests is no longer a far-fetched possibility.13 Accordingly, the sheer volume of potentially available data poses significant challenges for their integration in a form that can be used to either test current hypotheses or develop new ones. The heterogeneity of data suggests the need for new multi-dimensional paradigms for knowledge integration, requiring a deeper understanding of biology than previously required by informatics practitioners. Should one only consider single nucleotide polymorphic markers, or also include intronic (non-coding DNA) regions that have been shown to participate in gene regulation? Can gene expression measurements capture the effects of the environment? How do we then integrate relevant biological data, such as from proteomic studies, and correlate them with fidelity to phenotype data to track subtle, but essential, environmental phenomena? Parallel to the difficulty in addressing these queries there will be significant ethical, legal, and social implication issues to consider.22
At the core of TBI is the development of new hypotheses originating from the integration of genomic and clinical data. TBI reflects a new era of trans-disciplinary science, and reflects the needed unification of multi-scale biological and clinical information for enabling the formal postulation of a deeper understanding of disease such as originally proposed by Blois23 and more recently by Kalet.24 Understanding the genomic influences on the complex evolution of disease, the impact of therapeutic approaches as can be measured by molecular biomarkers, and the overall consistency of genotype–phenotype–environmental correlations across populations forms the basis of focus for the TBI community.
Challenges in studying complex diseases
Understanding complex diseases toward the development and assessment of putative therapies requires traversing between the bench and bedside, often referred to as the ‘T1 translational barrier.’25 26 As a goal, the objective is uncomplicated—to ascertain how basic science observations can be applied to clinical contexts, either in the form of prognostic, diagnostic, or therapeutic approaches to disease. As an endeavor, it represents a grand challenge in modern medicine and also a potential paradigm shift for how to integrate a broad set of data points.
The high dimensionality of potential data types when considering the full array of biological and clinical data that can be generated dwarfs any previous attempt at heterogeneous data integration. There is therefore a need to develop the next generation of clinical decision support systems that can incorporate data from massive biological datasets that will need to be combined with relevant disease phenotype information and computable knowledge bases to offer clinically useful suggestions. Perhaps more mundane, but of equal significance, is the need to develop approaches that can accommodate a dizzying set of file formats and representation standards. These are not, by themselves, completely new challenges to the biomedical informatics community. Nonetheless, they reflect a core area of emphasis where energy is needed to integrate knowledge across clinical genomics, genomic medicine, pharmacogenomics, and genetic epidemiology in light of the avalanche of additional genomic and clinical data and the corresponding knowledge of inter-relationships.
Amidst the challenges of knowledge integration and handling unprecedented volumes of data, TBI is greatly challenged with developing approaches that can bridge biological knowledge and place it into a meaningful clinical context. The volume of data can lead to spurious correlations that may be an artifact of the data and neither biologically nor clinically insightful. For example, if a physician had access to a patient's entire genome, how could it be leveraged to provide clinically insightful knowledge that would not have been possible using solely data already in a medical chart (eg, family history of a disease)? As shown for the genomic era's ‘Patient 0,’ it is plausible to integrate genomic data with relevant clinical data to develop prognostic approaches.27 The potential to provide appropriate care with respect to predicted disease outcome or efficacy of therapeutics offers great incentive for developing TBI approaches that integrate the full complement of biological, clinical, and environmental data. For this reason, phenotypic annotation of samples whose gene expression or single nucleotide polymorphic information is available in genomic data repositories such as GEO28 and dbGAP29 is underway in different laboratories,30 31 involving methodologies that are widely used in health informatics (eg, natural language processing, ontology mapping). Finally, approaches such as those implemented by the Crimson system32 hold promise for capitalizing on the clinical data that are captured as an artifact of standard clinical care. The extent to which this type of relatively noisy data can be used for research is still the object of active research by the TBI community.
Projects that involve TBI approaches to integrate biological and clinical data are already underway. The NIH-funded eMERGE (Electronic Medical Records and Genomics) project is a multi-site endeavor exploring issues involved with linking genomic information (from genome-wide association studies) with clinical data for individuals with specific conditions.33 Other efforts such as the Personal Genome Project,34 the Exome Project,35 the Million Veteran Program,36 and the 1000 Genomes Project37 reflect the increasing interest of the biomedical research and clinical communities in studying the complexity of genotype–phenotype relationships as well as postulating hypotheses for disease that incorporate genomic data. In addition to human-based genome projects, there are also initiatives such as the Human Microbiome Project (HMP38) and Metagenomics of the Human Intestinal Tract (MetaHIT39) that strive to provide a census of commensal microbial flora potentially related to disease.40
The emerging TBI toolbox
The relationship between bioinformatics and health informatics, while conceptually related under the umbrella of biomedical informatics,26 has not always been very clear. The TBI community is specifically motivated with the development of approaches to identify linkages between fundamental biological and clinical information. As technological advances continue to produce data that enhance our ability to further understand the biological underpinnings of complex diseases,41 the clinical community will depend on the development of approaches to interpret these data such that they can be clinically actionable.
TBI approaches are emerging as a melding of a complementary suite of techniques that strive to meet this need. Network approaches42 have led to the development of new techniques to study drug–target43 and gene–disease relationships44 as well as to provide a deeper understanding of the human metabolism.45 Techniques have also been developed to combine genomic and public datasets for studying allelic variation at the population level.46 Systems biology approaches have been used to identify genomic signatures that correlate with the potential efficacy of vaccines.47 Finally, high-throughput sequence based approaches are showing promise for the identification of prognostic genetic markers for increasing numbers of rare diseases.48–50 As the results of these early successes suggest, the TBI community is beginning to work closely with biomedical scientists to develop a new cadre of approaches to study the complex relationships between genotypic, phenotypic, and environmental data. Building on these endeavors will bring us closer than ever before to an entirely new generation of prognostic tests and highly effective and personalized clinical interventions.
The decade following the completion of the first draft of the human genome has witnessed unprecedented technological advancements that have led to the increasing prominence and importance of bioinformatics and health informatics for biology and healthcare, respectively. The exponential growth of genomic data, along with parallel achievements in acquiring and analyzing clinical data position the biomedical research enterprise to deliver on the promise of the Human Genome Project. TBI is accordingly positioned to enable a systems view of complex disease.
The authors wish to acknowledge Casey Overby, PhD and Elizabeth Chen, PhD for valuable discussion.
Funding INS is funded in part by a grant from the National Institutes of Health (R01LM009725). AJB is funded in part by grants from the Lucile Packard Foundation for Children's Health, Hewlett Packard Foundation, and the National Institutes of Health (R01LM009719). YAL is funded in part by a grant from the National Institutes of Health (UL1RR024999). LOM is funded in part by grants from the National Institutes of Health (R01LM009520, U54HL108460, and UL1RR031980), the Komen Foundation, and the Agency for Healthcare Research and Quality (R01HS019913). PTH is funded in part by grants from the Washington Life Sciences Discovery Fund (‘Institute for Genomic Medicine’) and National Institutes of Health (T15LM07442, UL1RR025014, P41LM007242, R01HG02288).
Competing interests INS, YAL, PTH, and LOM declare they have no competing interests. AJB COI has been submitted in accordance to the ICMJE COI form.
Provenance and peer review Not commissioned; externally peer reviewed.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.