Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs
- Mei Liu1,
- Yonghui Wu1,
- Yukun Chen1,
- Jingchun Sun1,
- Zhongming Zhao1,
- Xue-wen Chen2,3,
- Michael Edwin Matheny1,4,5,6,
- Hua Xu1
- 1Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, Tennessee, USA
- 2Bioinformatics and Computational Life Sciences Laboratory, Information and Telecommunication Technology Center, University of Kansas, Lawrence, Kansas, USA
- 3Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, Kansas, USA
- 4Department of Biostatistics, Vanderbilt University, School of Medicine, Nashville, Tennessee, USA
- 5Division of General Internal Medicine, Vanderbilt University, School of Medicine, Nashville, Tennessee, USA
- 6Geriatric Research Education and Clinical Care, Veterans Health Administration, Nashville, Tennessee, USA
- Correspondence to Dr Hua Xu, Department of Biomedical Informatics, Vanderbilt University, School of Medicine, 2209 Garland Ave, EBL 412, Nashville, TN 37232, USA;
Contributors ML, YW, and HX were responsible for the overall design, development, and evaluation of this study. YW, YC, ML, and XC worked on the machine learning experiments. JS and ZZ extracted the biological features for this study. MM designed and reviewed the clinical validation experiments of Baycol and Vioxx. ML and HX did the bulk of the writing, and ZZ, XC, and MM also contributed to writing and editing of this manuscript. All authors reviewed the manuscript critically for scientific content, and all authors gave final approval of the manuscript for publication.
- Received 14 November 2011
- Accepted 30 March 2012
Objective Adverse drug reaction (ADR) is one of the major causes of failure in drug development. Severe ADRs that go undetected until the post-marketing phase of a drug often lead to patient morbidity. Accurate prediction of potential ADRs is required in the entire life cycle of a drug, including early stages of drug design, different phases of clinical trials, and post-marketing surveillance.
Methods Many studies have utilized either chemical structures or molecular pathways of the drugs to predict ADRs. Here, the authors propose a machine-learning-based approach for ADR prediction by integrating the phenotypic characteristics of a drug, including indications and other known ADRs, with the drug's chemical structures and biological properties, including protein targets and pathway information. A large-scale study was conducted to predict 1385 known ADRs of 832 approved drugs, and five machine-learning algorithms for this task were compared.
Results This evaluation, based on a fivefold cross-validation, showed that the support vector machine algorithm outperformed the others. Of the three types of information, phenotypic data were the most informative for ADR prediction. When biological and phenotypic features were added to the baseline chemical information, the ADR prediction model achieved significant improvements in area under the curve (from 0.9054 to 0.9524), precision (from 43.37% to 66.17%), and recall (from 49.25% to 63.06%). Most importantly, the proposed model successfully predicted the ADRs associated with withdrawal of rofecoxib and cerivastatin.
Conclusion The results suggest that phenotypic information on drugs is valuable for ADR prediction. Moreover, they demonstrate that different models that combine chemical, biological, or phenotypic information can be built from approved drugs, and they have the potential to detect clinically important ADRs in both preclinical and post-marketing phases.
- Adverse drug reaction prediction
- drug side-effects prediction
- post-marketing drug surveillance
- in silico drug side-effect profiling
The US public spends billions of dollars on prescription drugs every year, resulting in a significant healthcare burden from adverse drug reactions (ADRs). ADRs are defined as those unintended and undesired responses to drugs beyond their anticipated therapeutic effects during clinical use at normal doses.1 It is estimated that 6–7% of hospitalized patients experience severe ADRs each year with a potential of 100 000 deaths, which makes it the fourth largest cause of death in the USA.2 Within the past 10 years, both reported ADRs and related deaths have increased ∼2.6 times and led to a number of drug withdrawals, with rofecoxib (Vioxx) and cerivastatin (Baycol) among the most prominent examples.3 ,4 Therefore, it is extremely important to predict and monitor a drug's ADRs throughout its life cycle, from preclinical screening phase to post-market surveillance.
The fundamental method for predicting or assessing potential ADRs early in the drug development pipeline is the application of preclinical in vitro safety profiling by testing compounds with biochemical and cellular assays.5 However, experimental detection of ADRs using extensive in vitro safety pharmacology profiling remains challenging in terms of cost and efficiency.5 For post-market surveillance, it often relies on public databases containing ADR reports voluntarily submitted by physicians,6–15 which take time to accumulate before a signal can be detected. Recently, a large amount of effort has been devoted to developing in silico approaches to predict ADRs using available large public datasets of drugs, at both preclinical16 and post-market17 stages. Most of these methods have used either chemical structure or protein target information on drugs to build the prediction models, and some have shown promising results.18–27
In this study, we proposed a new drug surveillance framework by investigating three types of information for ADR prediction: (1) chemical properties such as compound fingerprints or substructures; (2) biological properties including protein targets and pathways; and (3) phenotypic properties including indications and other known ADRs if available. Our evaluation showed that the phenotypic information (when available) largely improved the performance of ADR prediction models. The framework suggests an efficient way to optimize ADR prediction by combining different types of information at the different stages of drug surveillance (eg, ‘chemical + biological’ for preclinical drug screening and ‘chemical + biological + phenotypic’ for post-market surveillance).
A number of computational methods have been developed to predict potential ADRs from preclinical characteristics of the compounds or screening data and post-marketing evidence. Existing efforts to predict ADRs from preclinical data can be categorized into protein-target-based and chemical-structure-based approaches. The underlying principle of the protein-target-based approach is that drugs with similar in vitro protein-binding profiles tend to exhibit similar side effects.18 Scheiber et al20 demonstrated the concept by comparing pathways affected by toxic compounds versus those affected by non-toxic compounds. Fukuzaki et al21 proposed a method to predict ADRs using sub-pathways that share correlated modifications of gene-expression profiles in the presence of the drug of interest. However, their work depends on the availability of gene-expression data observed under chemical perturbations by the drug. Xie et al22 developed a chemical systems biology approach to identify off-targets of a drug by docking the drug into binding pockets of proteins that are similar to its primary target. Then the drug–protein interaction pair with the best docking score was mapped to known biological pathways to identify potential off-target binding networks of the drug. However, scalability of the method is hindered by its requirement for protein three-dimensional structures and known biological pathways.
Alternatively, the chemical-structure-based approach attempts to link ADRs to their chemical structures. As a proof-of-concept, Bender et al23 explored the chemical space of drugs and established its correlation for ADR prediction. Scheiber et al24 presented a global analysis that identified chemical substructures associated with ADRs, but the method was not designed to predict ADRs for any specific drug molecule. Yamanishi et al25 proposed a method that predicted pharmacological effects from chemical structures and then used the effect similarity to infer drug–target interactions. Hammann et al26 employed decision tree modeling to determine the chemical, physical, and structural properties of compounds that predispose them to causing ADRs. Notably, ADR-predictive models developed on preclinical characteristics could provide additional evidence to support potential signals from post-marketing surveillance. For example, a recent study by Pouliot et al17 utilized screening data from the PubChem BioAssay28 database to determine the correlation of post-marketing ADRs with drug bioactivity across vast BioAssay screens. However, most of these methods were not designed to predict high-dimensional side-effect profiles for drugs. In order to accomplish this goal, Pauwels et al27 developed a sparse canonical correlation analysis method to predict high-dimensional side-effect profiles of drug molecules based on their chemical structures.
Despite the success of using chemical and biological information of drugs for ADR prediction, few studies have investigated the use of phenotypic information (eg, indication and other known ADRs). Existing resources, such as the SIDER29 (Side Effect Resource) database, contain comprehensive drug phenotypic information such as indications and known ADRs. Such phenotypic information has been demonstrated to be useful for other drug-related studies. For example, Campillos et al19 identified new drug targets by comparing the similarity of side effects of drugs. Here, we propose to investigate the use of phenotypic information on drugs, together with chemical and biological properties, to predict ADRs. Similarly to the work by Pauwels et al,27 we conducted a large-scale study to develop and validate the ADR prediction model using 1385 known ADRs for 832 FDA (US Food and Drug Administration)-approved drugs in SIDER29 using various machine learning (ML) algorithms. In addition, we comprehensively evaluated different combinations of features to see how each feature set contributes to prediction accuracy. Our experimental results show that integration of chemical, biological, and phenotypic properties outperformed the chemical-structured-based method and has the potential to detect clinically important ADRs at both preclinical and post-market phases for drug surveillance.
To build and evaluate the proposed ADR-prediction model, we used data from SIDER.29 SIDER presents an aggregate of dispersed public information on drug side effects and indications. SIDER extracted information on marketed medicines and their recorded ADRs from public documents and package inserts, which resulted in a collection of 888 drugs and 1385 side-effect keywords. There are a total of 61 102 associations between drugs and side-effect terms in SIDER, and each drug has an average of 68.8 side effects.
The chemical structures of drugs were collected from PubChem,30 ,31 biological properties were obtained from the DrugBank32–34 and KEGG,35–37 and phenotypic data were from SIDER.29 To link these databases, we mapped drugs in SIDER to DrugBank.32–34 Fifty-six drug names from SIDER could not be mapped to their respective DrugBank IDs, resulting in a final dataset of 832 drugs, each of which has a ‘Yes’ or ‘No’ label for each of the 1385 side effects, indicating whether a drug has a specific side effect or not.
The PubChem, DrugBank, and KEGG databases comprise data that are available during chemical and animal trials, and are available before or during phase I clinical trials. However, the phenotypic data from SIDER are collected from phase I all the way through phase IV post-marketing surveillance. As such, this work describes a surveillance framework that allows pre-human association detection all the way through pre-marketing clinical trial phases to post-marketing surveillance. Figure 1 provides a visualization of the proposed ADR-prediction framework at different phases of drug surveillance.
Each drug is associated with a 1385 dimensional binary side-effect profile, y, whose elements correspond to the presence or absence of each of the side-effect concepts with 1 or 0, respectively. Each drug is also associated with three types of feature: chemical, biological, and phenotypic properties. Table 1 shows the subgroups of each feature type, its source, and dimension. To encode the drug's chemical structure, we used fingerprints corresponding to 881 chemical substructures defined in PubChem.30 ,31 The biological properties consisted of drug protein targets, transporters (for drug transportation), enzymes (for drug metabolism), and derived pathway information from the protein targets. Information on the protein targets, transporters, and enzymes of a given drug was directly obtained from DrugBank.32–34 Each drug target was then mapped to the corresponding KEGG pathway35–37 through its protein-coding gene symbol. The phenotypic information included indications and other known ADRs of drugs. Both sets of data were obtained directly from SIDER. Therefore, for a particular ADR yi, each drug is represented by its chemical, biological, and phenotypic properties as a 4276 (881+1142+2253) dimensional vector in which each element is either 1 or 0, respectively, for the presence or absence of each PubChem substructure, drug target, transporter, enzyme, KEGG pathway, indication, and remaining known ADRs.
In this study, we treat the ADR-prediction task as a classic binary classification problem where each drug either causes or does not cause a particular ADR. For each ADR, we built a classifier and evaluated its performance using 832 drugs as samples. We then repeated the process for each of the 1385 ADRs and summarized performance across all ADRs.
Evaluation was designed from different angles. First, we assessed the contributions of each feature type and their combinations to ADR prediction, using a fixed algorithm, support vector machine (SVM). Next, we compared the performance of five ML algorithms in predicting ADRs using an optimized feature set. Owing to the abundant variance of ADRs, we suspected that common ADRs (with more positive samples) might behave differently. Therefore we defined a subset of common ADRs, which were ADRs associated with more than 50 of the 832 drugs (denoted as ‘ADR_50+’). We evaluated the performance of these ADRs separately and compared it with the performance of all ADRs.
Five ML algorithms—logistic regression (LR), naïve Bayes (NB), K-nearest neighbor (KNN), random forest (RF), and SVM—were investigated for the prediction task. To build the LR model, we used the L2-regularized logistic regression solver in LIBLINEAR.38 An object-oriented Matlab(R) ML package called CLOP39 was used to implement the NB classifier. The popular ML software, WEKA,40 was used for the KNN and RF modeling. Lastly, LIBSVM41 was applied as the SVM learner for prediction.
For each ADR, a classifier was built and evaluated using a fivefold cross-validation on 832 drugs. As a consequence, n classifiers will be constructed for n side effects where n is 1385. Performance of the proposed method was assessed by a receiver operating characteristic (ROC) curve, which is a graphical plot of sensitivity or true positive rate against false positive rate (1 − specificity). Sensitivity is defined as the proportion of actual positives that are correctly identified as such (ie, SN = TP/(TP+FN)), and specificity measures the proportion of actual negatives that are correctly predicted as such (ie, SP = TN/(TN+FP)), where FN is false negative, FP is false positive, SN is sensitivity, SP is specificity, TP is true positive, and TN is true negative. The ROC curve can be plotted by varying threshold values for prediction scores above which the output is predicted as positive and negative otherwise.
Area under the ROC curve (AUC), accuracy, precision, and recall were calculated as well. AUC provides a single measurement of the performance of a ROC curve. Accuracy (ACC) is the proportion of true results obtained (ie, ACC = (TP + TN)/(TP + FP + FN + TN)). Precision (P) is defined as the proportion of true positives against all predicted positive results (ie, P = TP/(TP+FP)). Recall is also known as the true positive rate or sensitivity, which is defined above.
To summarize the global performance across 1385 ADRs, there are two possible approaches. One can compute an evaluation measure for each ADR and then average the measures over all ADRs to obtain an overall score, which is called macro-averaging. Another approach is to merge the prediction scores for all drugs over all ADRs, and then compute the overall measure, which is referred to as micro-averaging. The study by Pauwels et al27 reported a global AUC across all ADRs by merging the prediction scores for all ADRs into one big matrix and drawing a global ROC curve from the matrix, which is a similar approach to micro-averaging. Here, we followed their approach to generate the global AUC and accuracy. In addition, we reported micro-averaging precision and recall. The reported accuracy, precision, and recall were obtained from the best cut-off points or operating points of the global ROC curve, so that it gives the best tradeoff between false positives and false negatives.
Statistical significance test
In order to assess whether the improvement in performance by adding feature spaces to the baseline chemical space is significant, the two-sample Kolmogorov–Smirnov test (KS test)42 ,43 was computed. The two-sample KS test is a general non-parametric method for comparing two samples to test whether the two underlying probability distributions differ. We calculated the KS test over the AUC scores generated by different feature sets for each ADR. For example, in the case of comparing the baseline chemical space ‘chem’ with the combined set ‘chem+bio’, a set of AUC scores is generated for predicting each of the 1385 ADRs using each feature set, and then the KS test assesses if the AUC scores generated by ‘chem+bio’ are stochastically larger than the scores generated by ‘chem’. Finally, since we were making multiple comparisons for different feature pairs, the p values from the KS test were corrected by Bonferroni correction.44
To demonstrate the clinical significance of the proposed model, we evaluated the model's ability to predict post-market ADRs that caused the withdrawals of cerivastatin (Baycol) and rofecoxib (Vioxx). Cerivastatin is a statin used to lower cholesterol and prevent cardiovascular disease and was voluntarily withdrawn from the market in 2001 because of reports of fatal rhabdomyolysis. Rofecoxib is a non-steroidal anti-inflammatory drug used to treat osteoarthritis, acute pain conditions, and dysmenorrhea, and was withdrawn in 2004 over safety concerns about increased risk of heart attack. A physician manually reviewed both drugs' ADRs in SIDER and identified seven ADRs related to rhabdomyolysis for Baycol and four ADRs related to heart attack for Vioxx (see table 4). For each of the 11 ADRs, we built a prediction model based on the remaining drugs and applied it to either Baycol or Vioxx. To compare the effect of different feature sets, we reported the prediction results for ‘chem’, ‘chem+bio’, and ‘chem+bio+pheno’. As these seven ADRs related to rhabdomyolysis correlated highly and the use of the other six ADRs as features to predict the remaining ADR may make the task easier, we created a higher-level ADR for rhabdomyolysis by grouping all seven ADRs into one (the same applies for heart attack). We then built the prediction models and reported the performance of the grouped ADRs for rhabdomyolysis as well as for heart attack.
First, we assessed the abilities of different feature combinations to predict known side effects using SVM through a fivefold cross-validation with chemical structures as the baseline feature. To conduct a fair and accurate comparison across different feature sets, the same experimental conditions were maintained by using the same training drugs and test drugs for each fold. SVM parameters were empirically optimized using the AUC as an objective function. The best results for SVM were obtained by a Radial Basis Function (RBF) kernel with kernel parameter g = 0.008 and penalty parameter C = 2. When chemical structure alone was adopted, the best resulting AUC was 0.9054, which is similar to the finding (AUC = 0.8930) of Pauwels et al.27 Figure 2 shows the ROC curves for different feature sets based on cross-validation experiments, and table 2 summarizes the evaluation results.
When the feature spaces were compared independently (table 2), the phenotypic features appeared to be the most informative (highest AUC of 0.9542), and ‘chem’ and ‘bio’ achieved similar AUC. Adding biological features on top of chemical structures improved AUC slightly (from 0.9054 to 0.9098), whereas the increase obtained by adding phenotypic features was dramatic (from 0.9054 to 0.9526). When all three levels of features were combined (‘chem+bio+pheno’), the performance was almost the same as the ‘chem+pheno’ or ‘pheno’ alone. For example, the ROC curves of ‘chem+pheno’ and ‘chem+bio+pheno’ in figure 2 almost overlap. On the other hand, if we focus on precision and recall, the improvement by adding biological features was more obvious (∼3% in precision and ∼1% in recall). Adding the phenotypic features yielded much larger increases, with ∼21% in precision and ∼15% in recall. Statistical analysis using the KS test42 ,43 showed that the improvement in AUC was significant for the addition of biological features to the chemical features (p = 1.45E-07), as well as for the addition of biological and phenotypic features to the chemical features (p = 1.10E-15). Compared with ‘pheno’ alone, the addition of ‘chem’ and ‘bio’ produced a reduction in the global AUC; however, the reduction was not statistically significant according to the KS test (p=0.177).
The resulting ROC curves of the common ADRs (ie, ADR_50+) are shown in figure 3, and corresponding results are summarized in table 2. When compared with the results of all ADRs, a decrease in AUC and accuracy was observed as expected because rare ADRs that may distort the measures were excluded from the calculation. Thus in figure 3, there are larger separations between the ROC curves. For instance, when all ADRs were used in the calculation, the biological properties only increased the AUC by 0.004, but when we only considered the common ADRs, the increment was 0.02.
We compared the abilities of five ML algorithms—LR, NB, KNN, SVM, and RF—to predict known side effects of drugs by a fivefold cross-validation using all chemical, biological, and phenotypic properties as the feature set. Parameters for all classifiers presented here were empirically optimized using the AUC score. The best result for LR was obtained with parameters C = 10 and epsilon = 1, and for KNN the optimized number of neighbors is k = 55. For RF, we grew 100 decision trees in each ensemble. ROC curves of the five methods are shown in figure 4. AUC and accuracy over all ADRs versus the common ADRs are summarized in table 3.
As shown in figure 4, SVM performed the best followed by RF, KNN, NB, and LR. For LR, NB, and KNN, the AUC score is almost the same when calculated across all ADRs, but diverges greatly when calculated across the common ADRs. Nevertheless, all measures of RF and SVM outperform others by a large margin. Although over all ADRs, AUC scores of SVM and RF are almost the same, SVM produced a higher precision of 66.17% and recall of 63.06% compared with RF (63.10% for precision and 62.50% for recall).
Clinical validation examples
Table 4 shows the prediction results on ADRs related to rhabdomyolysis for Baycol and heart attack for Vioxx. Prediction performance was in the order ‘chem’ < ‘chem+bio’ < ‘chem+bio+pheno’. The classifiers based on ‘chem’ detected only one ADR related to rhabdomyolysis and none for heart attack. The classifiers based on ‘chem+bio’ detected five of seven rhabdomyolysis-related ADRs, but none for heart attack. For the classifiers using all features, five of seven rhabdomyolysis-related ADRs and two of four heart attack-related ADRs were predicted successfully. For the two grouped ADRs for rhabdomyolysis and heart attack, all classifiers predicted them successfully, which was probably due to increased sample sizes after grouping.
In this study, we conducted a large-scale ADR prediction of FDA-approved drugs and investigated three types of feature: (1) chemical structures; (2) biological properties—protein targets, transporters, enzymes, and pathways; (3) phenotypic characteristics—indication and other known ADRs. Our evaluation showed that drug phenotypic information (when available) is informative for ADR prediction, indicating its potential use for early detection of post-market ADR signals. In addition, our study demonstrated that the combination of chemical and biological features improved the AUC as well as precision (∼3% increase) and recall (∼1%), suggesting that such a data fusion approach is promising for preclinical screening of potential ADRs. The combination of all three types of information (‘chem+bio+pheno’) had lower global AUC than the ‘pheno’-only classifier (but this was not statistically significant), indicating that the simple feature combination method may not work well in this case. We then compared the true positive predictions by classifiers that used individual feature sets (‘chem’, ‘bio’, or ‘pheno’) and measured the overlap between each pair of classifiers. As shown in figure 5, 5072 ADRs were detected by ‘chem’ or ‘bio’ but not by ‘pheno’, and 10 581 ADRs were detected by ‘pheno’ but not by ‘chem’ or ‘bio’, indicating that ADRs predicted by each feature type are complementary, and higher performance could be achieved through development of more advanced methods for feature integration. We further analyzed the significance of associations between each of the 4276 features and each of the 1385 ADRs using χ2 statistics in which a feature is regarded as informative if the p<0.05. Distribution of the informative features is shown in online supplementary table S1.
During revision of this paper, Cami et al45 published a similar study, where they proposed an integrative approach for predicting new ADRs by utilizing structure attributes of the network formed by known drug–ADR relationships from drug safety data, as well as specific drug information including Anatomical Therapeutic Chemical taxonomy, molecular descriptors, and Medical Dictionary for Regulatory Activities (MedDRA) taxonomy of adverse events. Thus we believe that the models built on large-scale approved drugs have the potential to detect clinically important ADRs at both preclinical and post-market phases for new drugs.
In a further analysis, we found that the contribution of phenotypic features was mostly due to other known ADRs rather than indications. A major reason that existing ADRs contributed significantly to performance could be the existence of high correlations between ADRs. For instance, nausea and headache co-occurred with 596 of the total 832 drugs, and 49 pairs of ADRs co-occurred with more than 400 drugs. As SIDER represents ADRs as unified medical language system (UMLS)46 concept unique identifiers (CUIs), one side effect may be represented by a group of CUIs (see table 4 for seven concepts related to rhabdomyolysis). To predict one ADR CUI by using other ADR CUIs in the same group may introduce biases and overestimate the performance of the model. Therefore, an appropriate grouping schema for ADRs will be investigated in the future. The drug indication information only improved the AUC slightly from 0.9054 (ie, chemical structures only) to 0.9110 (ie, chemical structures + indications). One possible way to improve this is to build a better representation of the indication data. Currently, similar diseases with different CUIs were observed for drug indications in SIDER, for example, C0019693 for ‘HIV infection’ and C0019699 for ‘HIV positive’. Thus, for future work, it may be useful to group the indications.
The improvement produced by biological features was not as much as we initially expected, which may be the result of a few issues. First, the body's response to a drug is a complex process. When a drug enters the body and interacts with its intended targets, favorable effects are expected. However, at the same time, a drug often binds to other protein pockets with varying affinities (off-target interactions), leading to observed side effects. Furthermore, the biological features (ie, protein targets, transporters, enzymes, and pathway) used in this study are relatively simple and probably do not provide the details of molecular processes associated with the drugs.
One problem with the proposed ADR prediction model is imbalanced samples. Of the 1385 ADRs in our dataset, 554 were observed to be associated with fewer than five drugs. Therefore, for these ADR predictions, the dataset has an approximate 1:166 positive to negative ratio, which causes a serious problem for classification algorithms. In the case of an imbalanced classification problem such as this, the large preponderance class will dominate the decision process, which produces classification bias toward the majority class (negative class in this case). As a result, the precision for these ADR predictions would be close to 0%, but accuracy would be near 100%. To compare with results reported in Pauwels et al,27 we followed their approach to report global AUC values. However, owing to the imbalance problem, the global AUC could be very high (over 0.9 in this task), but the actual ability to detect and predict positive samples (the ADRs) could be low. Therefore we reported precision and recall in addition to the AUC. As expected, although ‘chem’ features achieved over 0.9 AUC, precision and recall were <0.5 (table 2). Furthermore, when the global AUC and accuracy is used, any improvements in the prediction accuracy of the common ADRs might be diluted by the 554 rare ADRs; thus the contribution of the feature addition could be severely underestimated. For example, after the inclusion of biological properties, the AUC remained relatively similar, but the precision actually improved from 43.37% to 46.23%, with relatively similar recall of 50%. We also analyzed different feature sets by only focusing on ADRs associated with at least 50 drugs so that we have sufficient positive samples. As expected, the results showed more significant contributions by each feature addition in terms of AUC, accuracy, precision and recall because rare ADRs that may distort the measures were excluded. For example, in the case of biological properties, its improvement in AUC was 0.02 for common ADRs as opposed to 0.004 for all ADRs.
Different methods have been proposed to address the imbalanced classification problem.47–49 As a further analysis, we tested a simple method for addressing the sample imbalance problem by adjusting the class weights of the RF and SVM classifiers (ie, weight = 1 − (class samples/total samples)) and observed improvement in AUC only for RF (increased from 0.9491 to 0.9524). SVM did not improve with class weight adjustment because it is very sensitive to parameters; thus parameters must be reoptimized when weights are adjusted. In the future, we plan to explore other techniques such as feature selection and resampling algorithms as suggested previously.47–49
Furthermore, the clinical validation examples of Baycol and Vioxx support the utility by detecting post-market adverse drug events using information from other medications in the database. For Baycol, the model based on ‘chem’ detected only one ADR related to rhabdomyolysis, while the use of ‘chem+bio’ was able to detect five of seven related ADRs, and the addition of ‘pheno’ did not result in more predictions. For Vioxx, ‘chem+bio+pheno’ was required to detect two of four ADRs related to heart attack. This highlights the utility of chemical and biological data for detecting and predicting likely adverse events, as well as the need for incorporating human adverse event data (phenotypic) as in SIDER to allow detection of other signals. These results suggest that our model has the potential to make clinically important ADR predictions early rather than waiting for sufficient post-market population response data to accumulate.
The study has several limitations, and there is scope for much future work to be carried out. For one, we would like to investigate algorithms that have better interpretability, which can return important features associated with ADRs. Moreover, in this study, representation for phenotypic features was relatively simple. More sophisticated methods (eg, categorizing drug indications via ontologies) could be further examined. Furthermore, a drug acts by inducing perturbations to biological systems, which involve various molecular interactions such as protein–protein interactions, signaling pathways, and pathways of drug action and metabolism.50 Therefore, in future work, we also plan to incorporate more detailed features such as interaction networks and drug bioactivities into the integrative framework for identification of ADRs.
This study proposed a new drug surveillance framework for ADR prediction by integrating chemical (ie, compound signatures), biological (ie, protein targets, transporters, enzymes, and pathways), and phenotypic (ie, indications and other known side effects) properties. Using a set of 1385 side effects for 832 drugs from the SIDER database, we developed ML models to integrate the different sources of information for prediction. Five ML algorithms—LR, NB, KNN, RF, and SVM—were systematically compared through fivefold cross-validations, and SVM was found to outperform the others. The AUC score for SVM was increased from 0.9054 when only chemical structures were used to 0.9524 when all three types of information were integrated. The precision increased from 43.37% to 66.17%, and recall increased from 49.25% to 63.06%. Most importantly, with rofecoxib and cerivastatin used as case studies, the proposed model was able to predict clinically important ADRs. These results suggest that such data fusion approaches are promising for large-scale ADR prediction.
ML and YW contributed equally to this study.
Funding This study was supported in part by grants from the NHLBI 5U19HL065962 and the NCI R01CA141307. ML is supported by the NLM training grant 3T15LM007450-08S1. JS is partially supported by the 2010 NARSAD Young Investigator Award. ZZ is partially supported by the 2009 NARSAD Maltz Investigator Award. MM is supported by a Veterans Administration HSR&D Career Development Award (CDA-08-020).
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The mapping file between drugs in the SIDER database and DrugBank will be available upon request after publication. In addition, the entire training dataset used in our study will be available upon request as well.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.