Network models of genome-wide association studies uncover the topological centrality of protein interactions in complex diseases
- Younghee Lee1,2,
- Haiquan Li1,2,3,
- Jianrong Li1,2,3,
- Ellen Rebman1,3,
- Ikbel Achour3,
- Kelly E Regan3,4,
- Eric R Gamazon2,
- James L Chen1,4,
- Xinan Holly Yang1,2,
- Nancy J Cox2,5,
- Yves A Lussier1,2,3,5,6
- 1Center for Biomedical Informatics, The University of Chicago, Chicago, Illinois, USA
- 2Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, Illinois, USA
- 3Department of Medicine, The University of Illinois at Chicago, Chicago, Illinois, USA
- 4Departments of Biomedical Informatics and Internal Medicine (Division of Medical Oncology), Ohio State University College of Medicine, Columbus, Ohio, USA
- 5Institute for Genomics and Systems Biology, The University of Chicago, Chicago, Illinois, USA
- 6Computation Institute, The University of Chicago, Chicago, Illinois, USA
- Correspondence to Dr Yves A Lussier, 909 South Wolcott Avenue, Chicago, IL 60612, USA;
- Received 5 December 2012
- Revised 5 December 2012
- Accepted 5 January 2013
- Published Online First 25 January 2013
Background While genome-wide association studies (GWAS) of complex traits have revealed thousands of reproducible genetic associations to date, these loci collectively confer very little of the heritability of their respective diseases and, in general, have contributed little to our understanding the underlying disease biology. Physical protein interactions have been utilized to increase our understanding of human Mendelian disease loci but have yet to be fully exploited for complex traits.
Methods We hypothesized that protein interaction modeling of GWAS findings could highlight important disease-associated loci and unveil the role of their network topology in the genetic architecture of diseases with complex inheritance.
Results Network modeling of proteins associated with the intragenic single nucleotide polymorphisms of the National Human Genome Research Institute catalog of complex trait GWAS revealed that complex trait associated loci are more likely to be hub and bottleneck genes in available, albeit incomplete, networks (OR=1.59, Fisher's exact test p<2.24×10−12). Network modeling also prioritized novel type 2 diabetes (T2D) genetic variations from the Finland–USA Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics and the Wellcome Trust GWAS data, and demonstrated the enrichment of hubs and bottlenecks in prioritized T2D GWAS genes. The potential biological relevance of the T2D hub and bottleneck genes was revealed by their increased number of first degree protein interactions with known T2D genes according to several independent sources (p<0.01, probability of being first interactors of known T2D genes).
Conclusion Virtually all common diseases are complex human traits, and thus the topological centrality in protein networks of complex trait genes has implications in genetics, personal genomics, and therapy.
In spite of the vast number of reproducibly associated polymorphisms that genome-wide association studies (GWAS) have found for complex traits, their underlying biological mechanisms remain elusive, and they have in many cases not been able to achieve their intended goal: to discover and understand the functional underpinnings of complex traits. Moreover, while those studies have contributed to a comprehensive catalog of disease-associated loci (http://www.genome.gov/gwastudies), the polymorphisms each contribute only marginally to the heritability of the disease1 and have afforded us little new knowledge of the trait's key functional mechanisms.
To address this issue, two approaches have been used to increase statistical power and unveil additional single nucleotide polymorphisms (SNP) buried in the large lists of polymorphisms published in GWAS. The predominantly data-driven approach consists of the integration or meta-analysis of multiple GWAS datasets comprising tens of thousands of samples.2 The second approach, knowledge-driven, consists of incorporating external mechanistic facts into enhanced statistical genetic models before the GWAS case–control analysis. SNP have thus been prioritized using biological functions,3 canonical pathways4–8 or expression quantitative trait loci (eQTL) associations.9 ,10 Surprisingly, protein interactions have not yet been fully exploited for characterizing SNP mechanistically for complex diseases. So far, they have been implicated in the inter-trait modularity of Mendelian disorders,11 cancer,12 and have successfully predicted novel disease genes from the interaction partners of known causal disease genes cataloged in the online Mendelian inheritance in man (OMIM).13 Conclusively, protein interactions and network modeling are thought to be key techniques for shedding new light on the poorly understood complex genetic basis of many common disorders.14
The protein interaction networks help reveal the importance of disease-associated variations, their host genes, protein products and interacting partners. Topologically central genes or proteins characterized as ‘hubs’, or ‘bottlenecks’ are those within a network defined as the top most connected proteins or those with the topmost ‘betweenness scores’,15 respectively (cut-off of 20%). Furthermore, proteins that lie close to each other in a network are more likely to have similar functions16 and further support the protein interaction networks (PIN) analysis of variations associated with complex traits with poorly understood genetic etiology. First interactors of known disease or phenotype genes have been shown to be more likely to be involved in the same disease or biological process,13 ,17 and thus mutations in genes interacting with disease genes tend to lead to similar disease phenotypes.18 Therefore, direct protein interactions or first interactors can be used to estimate functional linkage between novel candidate disease genes and those with established pathophysiological biology.
Studies have shown that shared genes associated with multiple diseases (pleiotropic effects) are more topologically central than specific genes associated with only one disease.19 Cancer disease genes tend to encode in central hubs of highly interconnected modules within protein–protein interaction networks.20 Those findings suggest that at least some types of disease genes may possess some topology characteristics in protein–protein interactions. Here, we hypothesized that protein interaction modeling of GWAS findings, especially the topological features of proteins in the network, could help unveil the genetic architecture of complex diseases. We first confirm the topological centrality of trait-associated SNP found in the National Human Genome Research Institute (NHGRI) online catalog, which contains significant variations published in GWAS.21 Second, we propose a novel predictive network modeling of protein interactions that reprioritizes intragenic SNP associated with a complex disorder according to GWAS, and using conservative empirical controls, we computationally identify unexpected pairs of interactors in type 2 diabetes (T2D). We evaluate the predicted SNP against an independent gold standard, and confirm a significantly increased OR of correct findings among network-reprioritized SNP compared to the original GWAS prioritization. Furthermore, we evaluate the topology of the predicted network constructed from available, albeit incomplete sources, and show increased presence of hub and bottleneck proteins of the host genes consistent with properties described in the NHGRI catalog of SNP–trait associations. Therefore, the molecular mechanisms supporting the contribution of protein interactions to complex traits are systematically identified. The results, reported here, strongly imply that the statistical signals inferred from GWAS have not yet been exhausted and can be revealed without requiring massive increased sample sizes.
Supplementary table S10 (available online only) recapitulates the abbreviations and key concepts (eg, host genes, etc) also defined in this paper at their first occurence.
Datasets used in this study
GWAS and protein interaction datasets and their preparations are summarized in tables 1 and 2, respectively. The preparation of gold standard GWAS SNP utilized to select the optimal network model of protein interactions is described below in the ‘Gold standard’ section of the Methods. Finally, the independent non-GWAS sources of biologically validated T2D genes are described in table 3.
First, we demonstrate that topological protein network properties from available, albeit incomplete, networks are significantly enriched in complex disease traits. This finding serves to establish the merit of developing and evaluating a predictive model of complex trait polymorphisms anchored on protein interactions. Next, we describe in detail how we extend the SPAN model12 to reprioritize statistically unexpected host gene interactions among modestly ranked intragenic SNP of GWAS. This method is evaluated with three independent GWAS involving two different complex traits in order to show its robustness and reproducibility. Third, we report the GWAS SNP reprioritized by a genetically constrained protein interaction model of T2D derived from independent SPAN analyses of two GWAS (outlined in figure 1). We further evaluate these results according to different network centrality metrics of these reprioritized SNP host genes: hubness, bottleneckness, their overlap with validated T2D genes discovered by biologists, and their number of direct interactors to the host genes of validated T2D SNP that were not used to select the optimal model.
Empirical control for protein network model
To conduct an empirical control for T2D network analysis, intragenic SNP were resampled (1000 bootstraps) to create empirical SNP lists and derived their corresponding list of host genes (figure 1 and see supplementary methods, available online only).
Protein interaction network modeling: using SPAN to reprioritize GWAS-ranked host genes
Protein network models are constructed by analyzing the protein interactions among GWAS-ranked host genes at different cut-offs (figure 1 and see supplementary table S1, available online only). We used SPAN,12 ,25 to prioritize host genes by estimating the expected frequency (p value) of single (individual) protein interactions occurring between these top GWAS-ranked host genes using empirical controls. In particular, GWAS-ranked host genes from each GWAS were individually analyzed using SPAN and controlled by bootstrapping SNP. Briefly, SPAN is a network model in which the observed number of interactions between the host genes of GWAS-prioritized intragenic SNP is compared with an expected distribution through permutation resampling (see supplementary methods, available online only). First, we calculated the likelihood of obtaining the same number of interactions in the empirical distribution that were found for each host gene in the observed network (empirical SPAN frequency). Then, GWAS-ranked host genes contained in each set created at each cut-off were reprioritized using this calculated empirical SPAN frequency. By design, this model is controlled for topological properties such as bottleneckness and hubness. The resulting statistically significant genes at the best cut-off were then aggregated to form an ‘optimal network model of the GWAS’.
Reference trait-associated SNP and gold standards
SNP–trait associations data, identified for T2D and Crohn's disease (independently) from GWAS, were collected from the NHGRI online catalog (http://www.genome.gov/gwastudies/; excel format on 5 November 2010),21 and then intragenic SNP are determined (figure 2). The set of selected intragenic T2D SNP were used to assess the accuracy of network models constructed for the Wellcome Trust Case Control Consortium (WTCCC) and the Finland–USA Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics (FUSION), respectively (figure 2, see supplementary tables S2 and S3, available online only). The set of selected intragenic Crohn's disease in the Inflammatory Bowel Disease Genetics Corsortium (IBDGC) GWAS platform as well as in the PIN database was used to assess the accuracy of our Crohn's network models (see supplementary table S4, available online only).
Optimal SPAN model of T2D and its evaluation
For each GWAS, the best OR was used to determine the optimal network model and its associated parameters (host gene cut-off, empirical SPAN frequency; figure 5B,C). In order to improve statistical power the optimal network models of FUSION and WTCCC were combined and we quantitatively evaluated the relevant reprioritized host genes within the optimal SPAN model of T2D in three ways: their probability of being first interactors of known T2D genes (empirical edgetic p value; see supplementary methods and table S10, available online only), the enrichment of T2D reported genes from authoritative databases, and their enrichment in hubs and bottleneck genes.
Hub and bottleneck enrichment analysis of SNP sets
The hubness or bottleneckness of an intragenic SNP was defined based on the connectivity of its host gene where hubs are defined as those genes within the top 20% of degree distribution (node degree) and bottlenecks as genes in the top 20% of betweenness scored proteins using an established algorithm (http://www.gersteinlab.org/proj/bottleneck/).15 To investigate whether hub and bottleneckness properties are more likely to be enriched in complex disease traits, we utilized the entire set of SNP in the NHGRI catalog of SNP–trait associations. The enrichment is tested using Fisher's exact test and for each intragenic SNP in dbSNP two criteria were used: whether the SNP is cataloged in NHGRI (or GWAS for T2D and Crohn's); and whether the host gene of the SNP is a hub and/or bottleneck gene. The background SNP utilized in this study are from dbSNP.
Host genes of complex trait-associated SNP are enriched in hub and bottleneck centrality properties in currently available networks
We conducted a thorough analysis of hubness and bottleneckness for complex trait-associated intragenic SNP and report these two topological properties for the 1390 trait SNP from the NHGRI GWAS catalog21 (see Methods section and figure 3). Conclusively, complex trait SNP are significantly more likely to lie in hub genes (OR 1.59, p=2.24×10−12; Fisher's exact test) and bottleneck genes (OR 1.56, p=1.90×10−12; Fisher's exact test) when using the human intragenic SNP of dbSNP whose host genes occur in the PIN as a background, albeit these networks comprise some ascertainment-biased interactions that may favor this hypothesis (although we systematically removed publication-based interactions). Considering that hub and bottleneck proteins have been associated with several key biological functions and gene expression dynamics,15 it is not surprising that complex trait-associated SNP are enriched in these network topology properties.
GWAS SNP reprioritized by protein interaction models are more likely to be validated in ulterior studies
Trait-associated intragenic SNP prioritized by GWAS are stratified according to their original GWAS rank and are translated into their host genes that serve to constrain the protein interaction modeling genetically (figures 4 and 5 and see supplementary methods, available online only). The statistical likelihood, or empirical SPAN frequency, of obtaining the number of observed interactors for each host gene in the genetically constrained network was introduced as a second parameter. The empirical SPAN frequency is equal to the number of times the exact (or greater) node degree (count of direct protein interactors among the prioritized host genes of SNP) for any given protein is found in an empirical distribution made of a 1000 random samplings of GWAS intragenic SNP (converted to host genes) that is the same node degree found in the actual given gene set.
We analyzed two independent T2D GWAS, FUSION22 and WTCCC.23 In FUSION, the OR of the control was highest at 150 GWAS-ranked host genes, and from 250 on showed a linear decrease with respect to the size of the GWAS-ranked host gene set (figure 4A). From 500 on, every subsequent SPAN model improved the GWAS-ranked SNP OR. The SPAN model at 600 GWAS-ranked host genes with a frequency of 0.1% or less yielded the best OR of 2512 (10-fold increase; p=0.00059; Fisher's exact test). In WTCCC, the OR of the baseline control showed linear decreases with respect to the size of the GWAS-ranked host gene set in between each peak at 50, 500 and 800 (figure 4B). From 600 host genes on, the SPAN model with a frequency of 0.1% or less yielded the best OR of 1585 (p=0.00073; Fisher's exact test), with 46% improvement compared to the baseline. We also examined the Crohn's disease GWAS data to determine if the T2D GWAS result is trait specific or could also be extend to this disorder. A SPAN model of Crohn's disease at 700 showed the best accuracy for re-capturing gold standards with the OR of 3981 for IBDGC (figure 4C). We selected the model at 600 GWAS-ranked host genes with a frequency of 0.1% or less as the optimal network as it consistently yielded the best OR. This model contains 12, 10, and one host gene(s) from FUSION (figure 5A), WTCCC (figure 5B), and KCNJ11 (rs5215), which is common between both studies (see supplementary tables S1 and S7, available online only). Taken together, these results demonstrate the scalability of the network modeling and we also conducted additional studies to demonstrate its robustness (see supplementary figures S1 and S2B, available online only).
SPAN reprioritized genes interact directly with known T2D genes
Interestingly, first interactors of known disease genes have been shown to be more likely to be true disease genes,13 ,17 ,18 and can serve as candidate genes for further analysis (figure 5C,D). Furthermore, closer distances between two proteins in a network have been shown to signify increased functional similarity.16 Therefore, we can infer that first interactors of known T2D genes are more likely to be related to the pathophysiology of the disease. We analyzed then our combined optimal network model for biological and functional relatedness. We also evaluated how many of our prioritized host genes were first interactors of NHGRI known T2D genes. Importantly, these genes were not used to prioritize our network; rather, they were considered as established T2D genes by the NHGRI, and served as markers for GWAS result reprioritization. In our combined optimal networks of WTCCC and FUSION (figure 5C and see supplementary table S8, available online only), five genes (KCNJ11, KCNJ10, LFNG, MAP3K1, and ZBTB17) were shown to interact directly with three gold standard genes (NOTCH2, KCNQ1, and PPARG, figure 5D). Using an empirical distribution of random interactions (edge) between 21 reprioritized host genes and 12 gold standards (13 known T2D genes reported in the NHGRI catalog excluding the overlapping gene KCNJ11), we calculated their probability of being first interactors of known T2D genes. In the optimal network model, five direct interactions with T2D gold standard genes were identified to be significant (0.0036<edgetic p<0.0538; see supplementary methods and table S9, available online only). This proxy estimation further supports our network model, since the candidate T2D associated genes connect directly with gold standards more than expected by chance. In addition, biologically or clinically validated T2D genes are enriched in the optimal SPAN model of T2D (figure 5E and see supplementary table S6, available online only).
Hub and bottleneck genes are enriched in the optimal T2D GWAS network
By definition, the overall protein interaction network includes 20% hubs and 20% bottleneck genes. In the optimal SPAN protein interaction network model of T2D, we observe 29% hubs and 33% bottleneck genes (figure 6). Therefore, the topological centrality of reprioritized host genes of the optimal T2D GWAS network concurs with the broader observation that hubs and bottlenecks are enriched in GWAS-associated complex diseases (figure 3).
As GWAS reveal minimal biological mechanisms underlying discovered SNP, known molecular functions or pathways have been implemented as straightforward, high-throughput post-GWAS analyzes.31 Wang et al7 demonstrated that pathway approaches, which take into account multiple SNP, can provide additional value to GWAS data. Furthermore, Baranzini et al5 demonstrated that modestly prioritized SNP from two multiple sclerosis GWAS could reveal statistical associations at the canonical pathway level. In addition, pathway approaches have elucidated the potential utility of protein interaction analyses of GWAS data,7 which have been confirmed in yeast using eQTL data,10 ,32 and used to identify subnetworks that may contribute to complex human traits using text mining knowledge.33 Convincingly, the discovery of new biological modules is possible by utilizing protein interaction network approaches along with GWAS knowledge; however, no previous studies have provided insight into the network topology associated with complex traits.
Edgetic perturbation: a model for complex trait analysis
Zhong and colleagues11 established that an ample proportion of disease gene alterations, reported in OMIM, in fact cause changes in protein interactions, deemed edgetic perturbations, rather than complete protein (or node) removal. In our analysis, we extend this principle to examine the host genes of intragenic SNP that confer various effects on their final protein products. For example, synonymous or intronic SNP may cause aberrant alternative splicing,34 ,35 which has been shown to be an important mechanism in disease,36 ,37 and that may affect both expression levels and protein sequence. Furthermore, SNP may alter expression by being located in the promoter region or the 3′ untranslated region targeted by microRNA, which can decrease the stability or reduce the level of interaction enough to produce an edgetic change.11 Goldstein's group also demonstrated that prioritized SNP in GWAS may be markers of a common variant in linkage disequilibrium (LD) with the SNP, as well as a marker of rare variants creating synthetic associations that are attributed to common variants.38 We thus propose that SPAN prioritized intragenic SNP are possible markers of either mechanism: a common variant or rare genetic variants. Therefore, genetic alterations that underlie edgetic perturbations are not limited to non-synonymous mutations, but may also include those that cause alternative types of changes in protein functionality and interrelationships. In this vein, our method not only focuses on edgetic changes by examining protein interactions, but also incorporates this idea by analyzing intragenic SNP that cause various or unknown alterations including synonymous and non-synonymous mutations.
SPAN analysis repeatedly identifies secondary ranked GWAS host genes at a reproducible cut-off
Apart from the topmost prioritized SNP (generally top 10), it is now known that GWAS lack reproducibility of prioritized SNP that are ranked lower than 10 according to their GWAS p value. This indicates that the SNP ranking, taken in isolation, is not sufficiently precise (positive prediction value) to identify novel or important disease genes. Interestingly, the SPAN modeling method reproducibly identified two biomolecular modules comprising sets of interacting genes in two independent T2D GWAS (figure 5A,B). Based on edgetic and topological evaluations, the application of protein interaction information appears to benefit our understanding of T2D GWAS host genes that impact or are potentially true disease-related genes. We have shown that meaningful polymorphisms are likely to be present in secondary ranked signals (200–800 according to their GWAS p values among half a million SNP) identified by GWAS, and were consistently found in the same range for both complex traits. Using this approach and protein knowledge can bring us closer to developing a more comprehensive understanding of GWAS results by highlighting potentially essential disease-related signals.
SPAN results provide the rationale for investigating edgetic models of complex diseases
Our network model of WTCCC and FUSION T2D GWAS results has revealed 21 potential biological candidate T2D genes. As an example, three SPAN-prioritized genes were both supported by ulterior sources and were found to be first interactors of T2D gold standard genes (figure 5D), and thus may serve as quintessential T2D candidate genes: KCNJ10, MAP3K1 and ZBTB17. A comprehensive literature review for the 21 gene set revealed possible biological explanations for 13 genes (see supplementary table S6, available online only). In combination, our results suggest that examination of GWAS results using protein interactions may provide a theoretical stepping stone for gaining biological insight into complex diseases and may be useful for GWAS follow-up studies in general.
Enrichment of hubs and bottlenecks in complex traits and multiple controls
Focusing on edgetic functionality is supported by the theory that the importance of a hub is in fact related to the loss of a hub's essential protein interactions (edges), rather than the loss of a hub gene itself.39 Genes that have increased connectivity within the network confer greater biological effects, and are in fact three times more likely to be essential than those with a limited number of connections.40 However, depending on the biological scale of the measurement, the positive or negative association between diseases and hubness remains complex and controversial due to the heterogeneity of the studies. For example, the enrichment of hub genes is shown Barabasi et al18 and others for diseases of Mendelian inheritance in OMIM,41 ,42 which is attributable to the subset of disease genes that are also essential.43 Similar results are reproduced in the current study for traits and diseases of complex inheritance (figure 3).
To ensure that the enrichment of disease-associated SNP in hub or bottleneck genes was not due to any of the following straightforward genetic properties, we confirm that hub or bottleneck genes were not more likely to be prioritized in SPAN because of gene length, higher numbers of SNP per gene, and automatically containing SNP with lower GWAS-ranked p values (data not shown). Furthermore, we conducted several studies to address other biological explanations or biases of the association of PIN topological centrality to complex disease that may transpire at a higher level, including: LD/haplotype analysis: hub and bottleneck genes are not found to enrich in long LD/haplotype blocks; the correlation between node degree and the recombination rate of a host gene was inconsequential (spearman correlation r=0.0257, p=0.00345; see supplementary methods, available online only); gene expression: difference in gene expression hub/non-hub genes are minimal and not significant in two T2D gene expression datasets; and selection bias (experimental or sociological).
Further analysis of GWAS findings with their function and protein interactions has shown an excess modularity on a genome scale. Indeed, by applying this SPAN network analysis to the entire NHGRI catalog of SNP–trait associations, and showing a significant enrichment in network centrality (figure 4), this model is designed to highlight the possible importance of network centrality genes and their interactions in complex diseases in general. Subsequently, we showed that our T2D model shows the same enrichment of topological properties seen for the NHGRI data, further suggesting that this approach may identify regulatory networks perturbed in complex disease. Taken together, these results show that protein interaction modeling may prioritize genes that are more biologically essential, and imply that our method can be extended to examine other complex diseases that have been investigated by GWAS.
The protein interaction-centric approach of our study targets intragenic and near-intragenic SNP and, by design, does not take into account signals that exist in intergenic regions. Furthermore, the interactions contained within STRING inherently contain their own biases that vary for each type of interaction evidence. For example, interactions based on experimental-derived data may have a sociological selection bias because the proteins studied are generally those for which there is great interest (ie, disease associated). In addition, many different types of experimental methods utilized for text mining are based on interaction results, and although they were only included in STRING V.6.3 in our study, these studies are not restricted specifically to large-scale studies that were unbiased in their selection of baits and preys. We also recognize that the association of centrality genes to disease could possibly be due to their higher than physiologically normal level of gene expression; however, we verified that there is no statistical association between gene expression and T2D hub genes (data not shown). In addition, a lower range of assayed proteins through high-throughput methods could confound our analysis through selection bias.
Network modeling has the potential to tease out the elusive aspects of complex disease biology and highlight critical signals that would otherwise have remained buried in the deluge of GWAS results. As new methods are required to identify true disease-critical genes accurately, additional investment in SPAN analysis is warranted to generalize further the usage of network modeling and protein interactions. By exploring the importance of topological centrality and edgetic perturbation in complex traits, we have shown the value of network modeling for increasing our understanding of how proteins and genes interact and differ under conditions of health and disease. Network modeling can provide a platform for evaluating the molecular interactions and the multitude of variations that in combination lead to complex traits and disease development, and for facilitating the integration of multiple scales of biological complexity. With a predictive model, we demonstrate the utility of protein interactions to enhance genetic inquiry of complex diseases. For example, these approaches could contribute to GWAS in prioritizing SNP ab initio in a way analogous to that of eQTL.9 In personal genomics, predictive models of the implications of rare variants in health are required and cannot by design be validated in cohorts at the single rare-variant level. Topological properties of protein networks may, however, provide useful insights for inferring their functions at the individual patient level. For instance, first interactors with known T2D genes were demonstrated to be enriched in conservatively controlled models and could be utilized to implicate ‘families of nucleotide polymorphisms’ across multiple patients to obtain statistical power. This approach establishes their contribution to disease and thus also infers the function of rare variants. Finally, conventional therapy for complex diseases focuses on single targets or well-established molecular pathways. The knowledge that interactions are not only important in cancer and single-gene inheritance disease, but also in other complex diseases, provides an opportunity for future paradigm-shifting, unbiased network-targeted therapy, in which proteins and their interactions are modeled more extensively for improved drug development and the earlier identification of toxicities that can be overlooked under the current single-gene paradigm.
In conclusion, the topological centrality of protein interactions has been well established in two types of genetic inheritance: in Mendelian inheritance by edgetic perturbation models11 as well as in somatic mutations of cancer.44 Here, we observe similar results for 310 complex traits documented in the NHGRI catalog of SNP–trait associations and show that SNP are significantly enriched in two key protein interaction properties: hubs and bottlenecks within currently available networks. We also observe these enrichments in topological findings specifically in the prioritized T2D protein interaction networks constrained by GWAS-identified genetic signals. Taken together, these results support the hypothesis that hub and bottleneck proteins are also central in complex trait inheritance that merits further investigation as larger, more statistically powered and less sociologically biased sets of genome-wide interactions become available. Because common diseases comprise a large subclass of complex traits, the topological architecture of the protein networks associated with their inheritable polymorphisms has implications in genetics, personal genomics, and in therapy.
The authors would like to thank Dr M Boehnke for graciously providing the extensive prioritized list of SNP from FUSION. WTCCC was downloaded during the period when it was publicly available.
YL, HL, JL and ER contributed equally to the work.
Contributors The work of HL, JL, ER, JLC and YAL was completed partly at The University of Chicago.
Funding This research was supported in part by NIH grants R01 MH090937, U01 GM61393, KL2TR000431, K22LM008308, UL1TR000050, and UL1RR029879. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests None.
Ethics approval Ethics approval was granted by the institutional review board of The University of Chicago.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The SPAN algorithm written in R language and bioconductor is made available at https://bitbucket.org/lussierlab/faime-opensource.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/