Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory
- Haiquan Li1,2,3,
- Younghee Lee1,2,
- James L Chen1,4,
- Ellen Rebman1,3,
- Jianrong Li1,3,
- Yves A Lussier1,2,3,5
- 1Center for Biomedical Informatics, Department of Medicine, University of Chicago, Illinois, USA
- 2Section of Genetic Medicine, Department of Medicine, University of Chicago, Illinois, USA
- 3Department of Medicine, University of Illinois at Chicago, Illinois, USA
- 4Section of Hematology/Oncology of the Department of Medicine, University of Illinois at Chicago, Illinois, USA
- 5Comprehensive Cancer Center, Ludwig Center for Metastasis Research, Computation Institute, Institute for Translational Medicine, and Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, USA
- Correspondence to Dr Yves A Lussier, AMB N660B, 909 South Wolcott Avenue, Chicago, IL 60612, USA;
Contributors HL conducted the metrics design and computational analysis of the study, YL carried out the early development of the metrics, JLC and ER performed the case studies, JL made the visualization analysis, and YAL conceived and directed the project.
- Received 13 July 2011
- Accepted 20 December 2011
- Published Online First 25 January 2012
Objective Thousands of complex-disease single-nucleotide polymorphisms (SNPs) have been discovered in genome-wide association studies (GWAS). However, these intragenic SNPs have not been collectively mined to unveil the genetic architecture between complex clinical traits. The authors hypothesize that biological annotations of host genes of trait-associated SNPs may reveal the biomolecular modularity across complex-disease traits and offer insights for drug repositioning.
Methods Trait-to-polymorphism (SNPs) associations confirmed in GWAS were used. A novel method to quantify trait–trait similarity anchored in Gene Ontology annotations of human proteins and information theory was developed. The results were then validated with the shortest paths of physical protein interactions between biologically similar traits.
Results A network was constructed consisting of 280 significant intertrait similarities among 177 disease traits, which covered 1438 well-validated disease-associated SNPs. Thirty-nine percent of intertrait connections were confirmed by curators, and the following additional studies demonstrated the validity of a proportion of the remainder. On a phenotypic trait level, higher Gene Ontology similarity between proteins correlated with smaller ‘shortest distance’ in protein interaction networks of complexly inherited diseases (Spearman p<2.2×10−16). Further, ‘cancer traits’ were similar to one another, as were ‘metabolic syndrome traits’ (Fisher's exact test p=0.001 and 3.5×10−7, respectively).
Conclusion An imputed disease network by information-anchored functional similarity from GWAS trait-associated SNPs is reported. It is also demonstrated that small shortest paths of protein interactions correlate with complex-disease function. Taken together, these findings provide the framework for investigating drug targets with unbiased functional biomolecular networks rather than worn-out single-gene and subjective canonical pathway approaches.
- Complex disease
- gene ontology
- protein-interaction networks
- information theory
- translational bioinformatics
- complex disease
- prostate cancer
- protein networks
- pathway analysis
- network modeling
- knowledge representations
- uncertain reasoning and decision theory
- languages and computational methods
HL and YL contributed equally to this work.
HL, ER, JL and YAL conducted part of this work at the University of Chicago.
Funding This work was supported in part by NIH grants (UL1RR029879, 1S10RR029030-01 BEAGLE, and K22LM008308).
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The data are provided as supplementary tables.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.