Computational models for the prediction of adverse cardiovascular drug reactions

Background Predicting adverse drug reactions (ADRs) has become very important owing to the huge global health burden and failure of drugs. This indicates a need for prior prediction of probable ADRs in preclinical stages which can improve drug failures and reduce the time and cost of development thus providing efficient and safer therapeutic options for patients. Though several approaches have been put forward for in silico ADR prediction, there is still room for improvement. Methods In the present work, we have used machine learning based approach for cardiovascular (CV) ADRs prediction by integrating different features of drugs, biological (drug transporters, targets and enzymes), chemical (substructure fingerprints) and phenotypic (therapeutic indications and other identified ADRs), and their two and three level combinations. To recognize quality and important features, we used minimum redundancy maximum relevance approach while synthetic minority over-sampling technique balancing method was used to introduce a balance in the training sets. Results This is a rigorous and comprehensive study which involved the generation of a total of 504 computational models for 36 CV ADRs using two state-of-the-art machine-learning algorithms: random forest and sequential minimization optimization. All the models had an accuracy of around 90% and the biological and chemical features models were more informative as compared to the models generated using chemical features. Conclusions The results obtained demonstrated that the predictive models generated in the present study were highly accurate, and the phenotypic information of the drugs played the most important role in drug ADRs prediction. Furthermore, the results also showed that using the proposed method, different drugs properties can be combined to build computational predictive models which can effectively predict potential ADRs during early stages of drug development. Electronic supplementary material The online version of this article (10.1186/s12967-019-1918-z) contains supplementary material, which is available to authorized users.


Background
Adverse drug reactions (ADRs) are unpleasant, harmful, or unwanted effects caused by a drug and are one of the main reasons for the failure of drugs and their withdrawal from the market [1]. Although a wide range of studies are being carried out on ADRs, these remain a major health concern worldwide and pose severe challenges to public health. ADRs are the fourth primary cause of deaths in the United States resulting in 100,000 every year [2]. The traditional methods of ADR prediction are highly expensive and time consuming as the lead compounds undergo extensive testing for their safety profiles through various biochemical and cellular assays in pre-marketing stages [3]. Furthermore, during post-marketing surveillance, the data on ADRs is collected from various public databases which include reports submitted by physicians and patients' health records which again consumes a great deal of time [4]. Thus, timely and efficient ADR prediction during initial drug discovery and development stages remains a huge problem to be addressed.
In recent years, prediction of potential ADRs has become of extreme importance and several machine learning based methods have been proposed for the prediction of potential ADRs in pre-clinical stages using the chemical features of compounds, drug targets, enzymes, transporters and pathways and information on drug side effects and therapeutic indications. Azuaje et al. [5,6] generated drug-target interaction networks for the prediction of cardiovascular ADRs of non-cardiovascular drugs. One approach proposed by Pauwels et al. [7] involved the use of chemical structures of drugs for the generation of machine-learning models using four algorithms (nearest neighbors, support vector machines (SVMs), canonical correlation analysis, ordinary, and sparse) for ADR prediction tasks and also extracted associated sets of chemical fragments and side effects. Kuang et al. [8] compared and analyzed the existing methods for drug ADRs prediction and proposed a new algorithm, the general weighted profile method, from already existing algorithms by combining their formulas and converting them into a linear model for prediction of ADRs. Cami et al. [9] constructed a bipartite network that represented drugs, side effects and their associations and then predicted adverse drug events using pharmacological network models relying on a logistic regression algorithm. In another study, Liu et al. [10] carried out large scale prediction of ADRs integrating several features of drugs that included drug targets and pathways, chemical properties of drugs, therapeutic indications, and data from other known ADRs. Huang et al. [11] generated SVMs and logistic regression prediction models trained on the combination of drug targets, gene ontology annotations, and protein-protein interaction networks. Zhang et al. [12] considered ADR prediction as a multi-label learning task and proposed a novel approach, 'feature selectionbased multi-label k-nearest neighbor method' that led to simultaneous predictions of relevant features, as well as the generation of highly accurate prediction models.
Although numerous computational methods have been projected so far for ADR prediction problems, a lot can still be improved. The data used for generating the models is severely imbalanced, and imbalanced data classification would lead to predictions biased towards the negative class which is typically the dominating class. Another problem is with the large number of features associated with the drugs: some may be redundant, and not all would be related to ADRs. Recently, feature selection techniques have been increasingly used to identify significantly contributing features from high dimensional feature data sets for improving upon the prediction performances [13][14][15]. The benefits of feature selection for learning can include a reduction in the amount of data needed to achieve learning, improved predictive accuracy, learned knowledge that is more compact and easily understood, and reduced execution time. Also, we have, in our previous studies, already shown how feature selection based models have either outperformed the models generated using all the features or have given very similar results [16][17][18][19][20]_ENREF_15. Earlier, we used the machine-learning based methods to predict neurological ADRs by integrating the chemical, biological, and phenotypic features of drugs [21]. In the present study, without using feature selection the dimension of the files was too large to handle as is evident from initial number of features listed in Table 1, and consumed large computational power and time to generate the models. Table 2 provides a comparison of the number of drugs and types of features used in the present study and for other ADR prediction studies. It has been reported in various studies that the imbalance in the input datasets could result in biased predictions as most of the classifier are biased towards the major classes and hence show very poor classification rates on minor classes. It is also possible that classifier predicts everything as major class and ignores the minor class [22,23]. Thus we used SMOTE technique to overcome the imbalanced data problem and ensure unbiased classification.
In the present study, we have tried to solve the problem of data imbalance using the synthetic minority oversampling technique (SMOTE) balancing method [24], and under-sampling using SpreadSubsample method. We also used a minimum redundancy maximum relevance (mRMR) [25] approach for identifying important and non-redundant features. The outline of the computational workflow followed in this paper has been shown in Fig. 1.

Data sets
The dataset of approved drugs was retrieved from the DrugBank [26] database. DrugBank is a publicly available resource containing complete information on drugs and their actions, targets and structures. The data structural information for drugs for a total of 2141 drugs was obtained in structural data format.

Side-effects
SIDER [27] (SE resource) is a freely accessible database containing information on marketed drugs and their documented adverse effects. As reported on October 2015, the SIDER database, version 4.1, contained data on 1430 drugs and 5880 side effects, with a total of 139,756 drug-SE associations, where each drug had an average of 39% side effects. In the present work, the complete SIDER database was obtained from http://sidee ffect s.embl.de/.
The 2141 approved DrugBank drugs were mapped to SIDER using PubChem compound IDs (CID) as SIDER uses STITCH compounds IDs which were converted to PubChem CIDs according to the rule mentioned in (ftp://xi.embl.de/SIDER /2015-10-21/, Accessed April 2, 2018). The SIDER database was also used to collect data on therapeutic indications of drugs. We obtained information for a total of 970 drugs, which comprised data for 5497 side effects and 1840 therapeutic indications.

Biological features from DrugBank
The biological information of drugs was comprised of drug targets, transporters (for drug transportation) and enzymes (for drug metabolism). We obtained data for 1264 drug targets, 86 transporters, and 182 enzymes, for a total of 970 drugs directly from DrugBank database.

Chemical structure fingerprints
PaDEL [28] software was used to generate 881 PubChem [29] structure fingerprints for each of the 970 drugs in order to obtain chemical information. A substructure is part of a chemical structure, and a fingerprint is a well-ordered list of binary (0/1) bits; these bits are the Boolean representations for the presence of element counts, ring systems, atom pairs, etc. in the chemical structures.

Chemical structural similarity between drugs
Tanimoto coefficient (TC) was calculated to measure the similarity between two drug molecules using the ChemmineR [30] platform of R interface. The chemical structures of drugs in structural data format (SDF) were used to compute atom pair descriptors for all of the drug compounds. Further similarity search was performed using the cmp.search function, with a cut-off value of 0.3, which searched the entire atom pair database generated in the previous step for structurally similar compounds. The compounds having similar chemical structures were removed from the dataset.

Features construction
In classification models generation, one of the most important steps is to generate features which are a  quantifiable property of an instance being classified. In case where the instance is a molecule, the chemical information of the molecule is converted to molecular descriptors, which are the mathematical depictions of chemical compounds. In the present study, we have used six types of features which include the molecular 2D fingerprints denoting chemical features, biological targets, transporters, enzymes associated with drugs representing biological information, therapeutic indications, and other known side effects comprising phenotypic properties of drugs. Each drug, associated with three types of properties, chemical, biological and phenotypic, was represented as a binary matrix, the elements of which were either 1 or 0, respectively indicating the presence or absence of each feature: drug targets, transporters, enzymes, 881 PubChem substructures, remaining known ADRs, and therapeutic indication. To this end, we had a 5497 × 1840 dimensional binary matrix representing phenotypic properties, a 1264 × 86 × 182 dimensional binary matrix demonstrating biological features, and an 881 dimensional binary matrix denoting chemical features for each of the total 965 drugs.

Feature selection
A high dimensional feature space is generated for a small size of samples by the feature construction method. This may result in over fitting of the classification models while contributing to increase in dimensionality of the dataset consequently leading to increased computational time involved. Thus in the present study, a RemoveUseless filter available from Weka [31], a machine learning platform, was used to remove the features (chemical, biological, and phenotypic) having uniform values for all the drug molecules throughout the data set.

Minimum redundancy maximum relevance approach
As mentioned above, a large feature space was generated where each drug was represented by large numbers of features; however not all the features contribute significantly towards classification. Choosing independent, informative, and discriminative features is very important for classification purposes; thus in the present study we used an mRMR approach as the feature selection technique to extract useful features. The mRMR approach selects features having a high correlation with the output class, while with low inter-correlation. F-statistic is used to calculate the correlation with the output class (relevance), and the Pearson correlation coefficient is used to compute correlation amongst the features (redundancy). Two lists of features are generated by this approach: the MaxRel list and mRMR features list, where MaxRel gives the significantly contributing features and the mRMR list contains the features having higher ranks for contribution with least redundancy. An mRMR approach was used with default (fifty features) and eighty features of each kind were selected for model building in the present study.

Experimental setup
The ADRs prediction problem was considered as a binary classification problem where each drug was either associated with a particular SE (Yes) or not (No). Thus a column titled Outcome was appended in all the comma separated value (csv) files for the chemical, biological, and phenotypic features. Before the model generation task, the already existing 124 CV drugs were removed from the dataset and kept as a control to evaluate the performance of the generated classifier models. Machine learning is based on adapting from previous experiences and known data properties and then making predictions on the new unseen data. The feature data set was split into 80% training data (used for generation of classifier models) and 20% testing set (used for model performance evaluation) using in-house Perl script. The mRMR based feature selection was performed on training data and the testing dataset was used as a complete held out data to eliminate any biasness in classification. For each of the 36 cardiovascular ADRs, a classifier model was generated using training data that included features for the resulting 842 drugs. The predictive computational models were generated for each of the individual set of features (chemical, biological, and phenotypic) and their combinations that involved biological + chemical, biological + phenotypic, chemical + phenotypic and biological + chemical + phenotypic feature combinations.

Handling the imbalance amongst output class
The dataset was highly imbalanced in the present study where the instances were dominated by the negative class which could lead to overestimated performance predictions biased towards the negative (dominating) class. In the present study, we incorporated SMOTE (Synthetic Minority Over-sampling Technique), under sampling of the majority class and cross validation balancing methods to alleviate the problem of data imbalance.

SMOTE
To improve upon the performance of the classifier models, we employed a balancing strategy, SMOTE method from Weka, to bring a balance between the majority and minority classes. The SMOTE method oversamples the minority class by generating synthetic examples using the information available from the data set and resampling the data. To oversample, a sample from the minority class in the dataset is taken along with its k-nearest neighbors using Euclidean distance. Next, the difference is computed between the input vector and its nearest neighbor, which is multiplied to a random number that lies between 0 and 1 and further added to the current data point under consideration.

Under-sampling using SpreadSubsample
In the present work, we used SpreadSubsample filter of Weka to under-sample the dominating class to introduce balance between the majority and minority class. The SpreadSubsample filter method produces a random subsample of the dataset and specifies the maximum distribution between the minority and majority class.

Tenfold cross validation
We used tenfold cross validation in the present work to evaluate the performance of the generated predictive models. In tenfold cross validation, the original dataset was randomly partitioned into 10 equivalent sized folds or subsamples. One of the subsamples was retained as the validation set and the left overs were used as part of the training set. The process was repeated ten times (tenfold) until each subsample had been once used as validation data. At the end, the results obtained from all tenfold were averaged to get a single result.

Machine learning models construction
Two different machine-learning algorithms were used for predictive modelling in the present study: random forest (RF) and sequential minimization optimization (SMO) algorithm. Predictive models were implemented using Weka, which is a machine learning platform that supports training models using several machine learning algorithms and their evaluation.

Random forest
RF is an ensemble classifier developed by Leo Breiman that involves integration of multiple decision trees. A multitude of trees is generated during training of the learning model and the output class is the mode of the classes predicted by the individual trees. The larger the number of trees, the higher the accuracy. For prediction of a test case, each tree estimates an outcome which is stored for voting, and the highest-voted prediction is the final output class for the test case.

Model performance assessment
For each CV SE, classifier models were generated using RF and SMO algorithms and were evaluated using tenfold cross validation on 842 drugs. The performance of the predictive models was assessed using accuracy (ACC), precision, recall, F-measure, and Area under Curve (AUC) value. AUC value is a single measurement obtained from the receiver operating characteristic (ROC) curve, which is a plot of sensitivity or true positive rate (TPR) against false positive rate (FPR) or (1-specificity).
where TP is true positive, FP is false positive, TN is true negative and FN is false negative.

Results
In the present study, a total of 504 machine-learning based classifier models were generated for 36 CV ADRs using chemical, biological, and phenotypic features and their two and three level combinations for 842 drugs. Table 3 lists the 36 CV ADRs and their SIDER IDs for which the RF and SMO models were generated.

Analysis of the features used to study CV ADRs
As described in the methods section, chemical, biological and phenotypic features of dimensions 1,532, 881, and 7337, respectively, were used to represent 842 drugs which resulted in a very high dimensional feature space. This increased the probability of redundant and irrelevant feature vectors among the feature sets. Thus we used the RemoveUseless filter and mRMR approach, with default parameters, to obtain non-redundant and significant features. In the case of the mRMR approach, the features with scores larger than zero were selected for generating the trained classifier models. The top 50 features, which is the default number of features for mRMR, from each of the six types-enzymes, targets, transporters, substructure fingerprints, indications and side effectsinvestigated in the present study were used to generate the machine learning models. Table 1 provides the types of features used for the generation of the machine learning models and the number of features obtained after RemoveUseless and mRMR selection approaches. The list of the top 50 features ranked by mRMR has been provided as Additional file 1.

Performance comparison of different machine learning algorithms
For the ADRs prediction task, the following feature combinations were used as input: (1) (7) biological + chemical + phenotypic features (300 dimensional). Two different machine learning algorithms, RF and SMO, were employed for the generation of models using training data sets which were evaluated using tenfold cross validation. Table 4 provides the overall tenfold cross-validation performance of the models generated using training dataset with biological, chemical, and phenotypic features and the combination of the two and three levels of features. It is clearly evident from Table 4 that most of the AUC values are not around 0.50 which indicates that the models generated in the present study are not random predictors. Additional file 2 provide the tenfold crossvalidation performance measures for the RF and SVM models for each cardiovascular ADR using biological, chemical, phenotypic features and their two and three level combinations. Tables 5 and 6 provide the overall performance measures for the classifier models generated using chemical, biological, and phenotypic features and the combination of the two and three levels of features using SMOTE and SpreadSubsample method. In few cases where the AUC score is around 0.5, the probable reason may be due to the fact that only a few drugs in the dataset have these side-effects which makes the prediction difficult. The training set was balanced for minority class using SMOTE and SpreadSubsample method which resulted in significant AUC values in case of cross validated models. And thus most of the AUC values in case of testing data are around 0.50 owing to the low frequency of side Table 4 Provides the overall tenfold cross-validation performance of the models generated using training dataset with biological, chemical, and phenotypic features and the combination of the two and three levels of features  As mentioned above, different features were combined to evaluate their predictive capacities for CV ADRs prediction task. The chemical features models were least informative in the case of both RF and SMO classification models generated using SMOTE method. The biological and phenotypic models were much more accurate than chemical features models with accuracy values of 93.56%, 93.83% and 91.41% in case of RF and 93.28%, 88.75% and 93.83% for SMO models respectively. The accuracy of the models showed significant improvement upon addition of phenotypic features from 88.75 to 90.54% in case of SMO models. The biological and phenotypic combination models yielded better accuracy values (93.66% RF and 93.30% SMO). However the phenotypic features were most accurate and informative, leading to highly predictive models indicating that phenotypic features played an important role in the ADR prediction task. On the other hand the RF and SMO models generated using biological, chemical, and phenotypic and combination models using SpreadSubsample under-sampling method had almost similar performance. Additional files 3 and 4 provide the performance measures for the RF and SVM models for each cardiovascular ADR using fifty biological, chemical, phenotypic features and their two and three level combinations on non-redundant testing dataset using SMOTE and SpreadSubsample data balancing method respectively.
In addition we also generated biological, chemical, phenotypic and combination models using eighty features. The models generated using eighty features performed largely better in terms of accuracy in comparison to the models generated using the default fifty features ( Table 7). All the models were around 93-94% accurate and had precision, recall, F-measure and PRC values in the range of 93-100%. However the RF biological and phenotypic combination models performed best in terms of AUC value (0.72). The individual performance metrics for RF and SMO generated on non-redundant testing dataset using eighty mRMR biological, chemical, phenotypic features and their combinations for 36 CV ADRs has been provided as Additional file 5. Additionally we have also compared the statistical performances of models generated in the present study with other approaches for drug ADR prediction (Table 8).

Case study on cardiovascular drugs
To exhibit the realistic application and clinical importance of the generated predictive models, the models were evaluated for their ability to predict the already reported ADRs of the known CV drugs. Already known CV drugs were removed from the dataset used for generating the models and reserved as controls. A total of 124 CV drugs were obtained from DrugBank, among which we could extract the features for 34 drugs, for which predictions were made. We would like to mention here that out of total 124 CV drugs, the data for targets was available only for 121. Amongst which, 94 drugs could be mapped to enzymes and 63 drugs could be mapped to transporters. While there were only 34 CV drugs which Table 6 Provides the overall performance measures for the models generated using biological, chemical, and phenotypic features and the combination of the two and three levels of features on non-redundant testing dataset using under sampling of majority class all the information i.e. targets, transporters, enzymes as well as therapeutic indications and substructure fingerprints were available. Thus the predictions were made for these 36CV drugs which had all the properties. The most common side effects predicted were bradycardia (Disopyramide, Verapamil, Procainamide, Nifedipine, and Lidocaine); cardiac disorder (Clonidine, Telmisartan, Procainamide, and Nifedipine); congestive cardiac failure (Amlodipine and Metoprolol); cardiac murmur (Nisoldipine, Ibuprofen, and Propafenone) and tachycardia (Amlodipine, Prazosin, Bosentan, Doxazosin, Candesartan cilexetil and Acetylsalicylic acid, Telmisartan, Nifedipine and Carvedilol). These side effects had already been reported to be associated with the drugs in question on SIDER. Additionally, various side effects not reported on SIDER were also predicted by our models. The side effects include atrioventricular block (Acetylsalicylic acid, Telmisartan, Nifedipine and Midodrine); cardiomyopathy (Nifedipine [32]); cardiac failure (Verapamil [33] and Procainamide [34]); cardiopulmonary failure (Amlodipine, Prazosin, Bosentan [33], Doxazosin [33] and Bumetanide); cor pulmonale (Amlodipine, Prazosin, Bosentan [35], Doxazosin, Carvedilol, Furosemide [36] and Bepridil); and heart malformation (Disopyramide, Nisoldipine, Bosentan, Clonidine, Verapamil, Acetylsalicylic acid, Telmisartan, Procainamide, Ibuprofen, Nifedipine, Propafenone, Gemfibrozil, Metoprolol, Lidocaine, Indomethacin, Propanolol, Losartan, Felodipine, Oxepronolol, Lomitapide, Riociguat); irregular heart rate (Disopyramide [37] and Nifedipine [38]) and left ventricular Table 7 Provides the overall performance measures for the models generated using eighty biological, chemical, and phenotypic features and the combination of the two and three levels of features on non-redundant testing dataset  These drugs were predicted to be associated with heart malformation, irregular heart rate (Bezafibrate [40]), acute cardiac failure (Simvastatin [41]) and cardiac murmur (Simvastatin [41] and Bezafibrate [42]) by our classifier models.

Predictions of uncharacterized drugs in SIDER
The predictive computational CV side effect models generated in the present study were used to make predictions on the drugs having no information of side effects on SIDER. The ADR predictions were done for twelve drugs and the most common side effects predicted included cardiac decompensation, congestive cardiac failure, cardiac disorder, cardiac murmur, irregular heart rate, shock, tachycardia and cardiopulmonary failure.
In order to make these findings significant, we performed a thorough literature search to find associations among the ADRs predicted by our models and the drugs with which the side effects were associated. Fluvoxamine was found to be related with decompensation cardiac, and fluvoxamine in combination with risperidone has been known to cause serious adverse cardiovascular drug events [43]. Various cardiac side effects were predicted for Diethylstilbestrol that include shock, congestive cardiac failure, cor pulmonale, cardiopulmonary failure, left ventricular failure, and tachycardia [44]. Mefloquine, which is an antimalarial drug was found to be associated with congestive cardiac failure and tachycardia. Antimalarial drugs have been known to cause serious adverse cardiovascular events that include sinus bradycardia alternating with tachycardia and cardiac failure [45,46]. Cardiac murmur, irregular heart rate, tachycardia and acute cardiac failure were the side effects observed for Famotidine which has already been reported to cause complete atrioventricular block and cardiac arrest [47]. High levels of Urea in blood have been connected to increased risk of cardiovascular events that include cardiac murmur [48], irregular heart rate [49] and acute cardiac failure [50]. Serious cardiovascular events that include block heart, cardiac murmur, irregular heart rate, cardiac failure, arrhythmia and cardiac failure acute were predicted for the drug, Eltrombopag [51][52][53]. According to a FDA report, Tretinoin was found to be associated with various cardiac side effects that include arrhythmias, hypotension, hypertension, cardiac failure, and cardiac murmur. Cardiac murmur, acute cardiac failure and irregular heart rate were the side effects predicted accurately by the models generated in the present study [54]. Ketoconazole in combination with Dofetilide, Pimozide, and Quinidine could also result in serious adverse cardiac effects that include tachycardia, shock, congestive cardiac failure, cardiotoxicity, cor pulmonale, heart malformation and cardiopulmonary failure [55]. Although various researchers have put forward colchicine as a probable cardiovascular agent, incidences have been reported of adverse cardiac events that include shock, tachycardia, cardiac disorder and cardiac failure [56][57][58]. Clomifene, Gallium nitrate and Gabapentin enacarbil were predicted to be associated with cardiac decompensation by our models; however, we could not find any literature evidence to support our observation for these drugs. Table 9 lists the CV ADRs predicted by the machine learning RF and SMO models on uncharacterized drugs in SIDER.

Validation on external dataset
Considering the real-world application of the machine learning models generated in the present study, these models were evaluated on an external library of 16,383 MyriaScreen compound available from Sigma-Aldrich. The most common side effects predicted by RF and SVM models associated with at least 10% of compounds included cardiac murmur and decompensation cardiac. In case of RF models, cardiac murmur was predicted to be associated with 7266 compounds, tachycardia with 3346 compounds, decompensation cardiac with 3303 compounds and cardiac failure congestive with 1670 compounds. In case of SMO models, the most predicted side effect was heart rate irregular for 7272 compounds followed by cardiac murmur for 7266 compounds, left ventricular failure for 3598 compounds, decompensation cardiac with 3112 compounds, cardiopulmonary failure with 2374 compounds and cor pulmonale was associated with 2278 compounds. Few ADRs which were not predicted to be associated with any compound by any machine learning model included atrioventricular block complete, cardiac tamponade, cardiomegaly and conduction disorder. We found that the results obtained were very similar to the predictions on uncharacterized drugs.

Discussion
The present study proposed an exhaustive computational protocol for drug ADR prediction using machine-learning based methods by integrating different levels of information for drugs. A total of 504 computational models were generated for 36 CV ADRs by integrating biological (drug transporters, targets, and enzymes), chemical (substructure fingerprints), and phenotypic (therapeutic indications and other known ADRs) features using RF and SMO machine learning algorithms. To find the informative and discriminative features, we used an mRMR approach which provided us with a list of non-redundant features of maximum relevance for classification, thereby