Skip to main content

A weighted Bayesian integration method for predicting drug combination using heterogeneous data

Abstract

Background

In the management of complex diseases, the strategic adoption of combination therapy has gained considerable prominence. Combination therapy not only holds the potential to enhance treatment efficacy but also to alleviate the side effects caused by excessive use of a single drug. Presently, the exploration of combination therapy encounters significant challenges due to the vast spectrum of potential drug combinations, necessitating the development of efficient screening strategies.

Methods

In this study, we propose a prediction scoring method that integrates heterogeneous data using a weighted Bayesian method for drug combination prediction. Heterogeneous data refers to different types of data related to drugs, such as chemical, pharmacological, and target profiles. By constructing a multiplex drug similarity network, we formulate new features for drug pairs and propose a novel Bayesian-based integration scheme with the introduction of weights to integrate information from various sources. This method yields support strength scores for drug combinations to assess their potential effectiveness.

Results

Upon comprehensive comparison with other methods, our method shows superior performance across multiple metrics, including the Area Under the Receiver Operating Characteristic Curve, accuracy, precision, and recall. Furthermore, literature validation shows that many top-ranked drug combinations based on the support strength score, such as goserelin and letrozole, have been experimentally or clinically validated for their effectiveness.

Conclusions

Our findings have significant clinical and practical implications. This new method enhances the performance of drug combination predictions, enabling effective pre-screening for trials and, thereby, benefiting clinical treatments. Future research should focus on developing new methods for application in various scenarios and for integrating diverse data sources.

Introduction

Combination therapy involves using drugs with distinct pharmacological mechanisms to minimize adverse effects and maximize therapeutic efficacy [1]. Nowadays, combination therapies of drug repurposing are commonly utilized in treating a spectrum of complex diseases, including cancer [2, 3], cardiovascular diseases [4], and type 2 diabetes [5], among others [6,7,8]. These conditions are highly prevalent and impactful [9, 10]. The need for combination therapies arises from the complex nature of these diseases, which involve multiple pathways and genes, making a one-gene, one-drug approach insufficient. Janumet is a combination therapy for type 2 diabetes that merges sitagliptin and metformin [11]. Sitagliptin boosts insulin secretion, while metformin lowers glucose production and enhances cellular uptake [12]. This combination effectively manages blood glucose levels and minimizes side effects compared to higher doses of single drugs.

While current combination therapeutic approaches have achieved increasing success and are becoming more crucial in the treatment of various diseases [2, 13, 14], the size of the pre-selected drug combination space is rapidly expanding with the growing number of drugs. Therefore, identifying drug combinations within this vast space that are effective in treatment remains a challenging endeavor. A significant proportion of drug combinations in widespread use today are constructed from clinical observations [15,16,17]. However, the drawbacks of relying solely on such discovered mechanisms include time-consuming, labor-intensive, and expensive. This implies that developers require a multitude of multifaceted resources to support the discovery process. Furthermore, clinical in vivo experiments, which can involve hundreds of patients and high costs per trial, may sometimes lead to unnecessary or even harmful treatments being administered to patients [18, 19].

One strategy for conducting drug combination researches for in vitro trials is high-throughput screening, which enables the execution of a large number of tests within a reasonable timeframe and at a lower cost compared to clinical trials [20, 21]. For instance, high-throughput screening can process up to 100,000 compounds per day, significantly accelerating the initial discovery phase [22]. However, the high-throughput screening approach is not yet suitable for effectively exploring the entire combination space, as it involves substantial infrastructure costs, and conducting large-scale experiments to test potential drug combinations is both time and cost-intensive. Therefore, there is an urgent need to develop powerful and efficient computational techniques to narrow down the pool of drug candidates for experimentally validated drug combinations in wet trials [23, 24], thereby facilitating the identification of promising drug combinations.

Various computational methods have been proposed to predict drug combinations of drug repurposing [25,26,27,28,29,30]. Early studies adopt systems biology approaches, employing mathematical models to represent drug perturbations through biochemical reactions and kinetic parameters [31, 32]. However, these approaches are often limited to few and well-studied signaling pathways, failing to extract information from the broader pharmacological space. Additionally, chemical biology network analyses have been used to predict effective drug combinations [33,34,35,36,37], but they may struggle to uncover underlying molecular mechanisms or extract information from a larger pharmacological context.

There are computational methods that utilize heterogeneous information of drugs to construct drug knowledge graphs such as multi-dimensional similarity networks, and train computational models to predict drug combinations [26, 27, 38,39,40]. For example, the PEA model [27] proposes the utilization of a  Naive Bayes network for drug combination prediction. Nevertheless, this method assumes the independence of attributes within a designated classification, embodying an ostensibly simple yet notably strong assumption that proves inadequate in capturing the intricacies inherent in real-world scenarios. Another approach involves Gradient Boost Tree based on feature vectors extracted by restart random walk method [26]. However, its integration approach for drug-drug similarities is relatively crude, potentially resulting in incomplete utilization of information. Additionally, a recent approach named NEWMIN [40] has been proposed, which assigns different weights to various similarity features and employs the word2vec method to extract feature vectors for drug pairs. Subsequently, random forest is utilized for prediction. Overall, while each method has its strengths and limitations, integrating advanced computational approaches with comprehensive drug similarity knowledge graphs from heterogeneous information remains crucial for enhancing the accuracy and reliability of drug combination predictions.

In this paper, we propose a novel competitive approach called WBCP for predicting effective drug combinations. WBCP uses a novel weighted Bayesian strategy, that integrates multiple pieces of information from the drug knowledge graph to enhance predictive performance. In contrast to previous studies, our method stands out in several key aspects. First, we convert drug-drug similarities into similarities between query drug pairs and known drug combinations, extracting effective and interpretable features for downstream prediction tasks. Second, the WBCP method enhances Naive Bayes by designing a Bayesian model with attribute weighting and applying it to the likelihood ratios of features, refining the attribute independence assumption to better align with real-world data complexity. Third, it generates a support strength score (0–1), where higher scores indicate greater support for the drug pair belongs to the drug combination class, making the score both intuitive and meaningful. In terms of performance evaluation, our method WBCP has been comprehensively compared with other state-of-the-art methods [26, 27, 40] and several machine learning methods. The results, including the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUPR), among others, indicate that our proposed WBCP method is a competitive approach for predicting drug combinations and can play a significant role in the pre-screening of drug combinations.

Methods

WBCP method presents a novel algorithmic framework for predicting drug combinations through the integration of heterogeneous data. Initially, seven drug similarity networks are constructed from diverse sources data, encompassing information such as chemical structural, target, and side effects (Fig. 1A). For each similarity network, similarities between query drug pairs and all known drug combinations are computed, then the maximum similarity value corresponding to the query drug pair serves as a feature for that query drug pair (Fig. 1B). Subsequently, a weighted Bayesian method is used to amalgamate features into integrated likelihood ratio (ILR), a new statistical measure that evaluates the likelihood of drug combination. The ILR is then transformed into a support strength score (ranging from 0 to 1) using a sigmoid-like function (Fig. 1C). The support strength score reflects the support strength for classifying the query drug pair to positive group over the negative group, with a higher score indicating greater support.

Fig. 1
figure 1

The workflow of WBCP. A. Constructing drug similarity networks: calculation of the drug similarity networks corresponding to each of seven types heterogenous data, encompassing Anatomical Therapeutic and Chemical (ATC) similarity, Simplified Molecular Input Line Entry System (SMILES) structure similarity, target protein sequence similarity, Gene Ontology (GO) semantic similarity, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways similarity, SIDER drug side effects similarity, and OFFSIDEs drug effects similarity. B. Constructing seven features for query drug pairs: for each of seven types heterogenous data, a feature of the query drug pair are defined as the maximum similarity between the query drug pair and all known drug combinations. The black dashed line in the figure represents the similarity between drug pairs and known drug combinations, while the red solid line indicates the maximum similarity between drug pairs and all known drug combinations. C. Predicting the drug combination: seven features are integrated to obtain the integrated likelihood ratio (ILR) using a novel weighted Bayesian method. ILR is then transformed to the support strength score for drug combination prediction. Support strength score ranges from 0 to 1, with higher values indicating greater support for the query drug pairs being categorized into the positive group compared to the negative group

Drug similarity networks

In this study, seven types of drug similarity network are calculated by utilizing diverse types of information, encompassing Anatomical Therapeutic and Chemical (ATC) similarity, Simplified Molecular Input Line Entry System (SMILES) structure similarity, target protein sequence similarity, Gene Ontology (GO) semantic similarity, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways similarity, SIDER drug side effects similarity, and OFFSIDEs drug effects similarity. The data preprocessing includes standardizing data names and formats to ensure consistency across the same type of data. If there are missing data, the corresponding entries are removed.

ATC similarity: ATC code, obtained from the DrugBank database, is a WHO-established pharmaceutical coding system for drug classification based on drug action organ, therapeutic effects, pharmacological features, and chemical characteristics. The inverse document frequency (IDF) is calculated to discount ATC code with the following formula [41]:

$$IDF\left( {t,D} \right) = \log \frac{\left| D \right|}{{n_{t} }},$$

where D represents the set of drugs and nt represents the count of drugs with corresponding ATC code t appears. The ATC similarity of a pair of drugs \(({\text{dA,dB}})\) is calculated as the cosine similarity of their corresponding IDF-weighted vectors \({\mathbf{A}}\) and \({\mathbf{B}}\). The formula is as follows:

$$S_{{\text{dA,dB}}} = \frac{{{\mathbf{A}} \cdot {\mathbf{B}}}}{{\left| {\mathbf{A}} \right| \cdot \left| {\mathbf{B}} \right|}}.$$

SMILES similarity: The SMILES structure is a simplified molecular input line entry system used to describe the structure of chemical molecules in a string format. The SMILES format structure are extracted from the DrugBank database, which are then transformed to SDFset (a median format for the further calculation of SMILES similarity) using “smiles2sdf” function located in the ChemmineR R package [42]. Then the atom pair sequences are obtained for drugs [43], and Tanimoto coefficient between drugs is used for the SMILES similarity measure with “cmp.similarity” function from the ChemmineR R package. The formula is as follows:

$$S_{{\text{dA,dB}}} = \frac{c}{a + b + c},$$

where \(a\) represents the atom pairs unique to drug dA, \(b\) represents the atom pairs unique to drug dB, \(c\) represents the atom pairs shared by both two drugs.

Target protein sequence similarity: The protein sequence data of drug targets is downloaded from  Uniprot database (https://www.uniprot.org/) [44], and sequence-constructed structural and physicochemical features have been proved valuable in inferring drug-drug similarity [27]. Commonly used structural and physicochemical descriptors are obtained from protein sequences. Based on sequence descriptors, the protein sequence similarity is calculated using “parSeqSim” function located in the protr R package [45].

GO semantic similarity: The GO term of drug targets indicate the biological progress related to corresponding drugs, which is constructed from  Uniprot database [44]. The Jaccard’s coefficient is used to estimate the GO semantic similarity of drugs dA and dB, and the formula is as follows:

$$S_{{\text{dA,dB}}} = \frac{{\left| {\Gamma ({\text{dA}}) \cap \Gamma ({\text{dB}})} \right|}}{{\left| {\Gamma ({\text{dA}}) \cup \Gamma ({\text{dB}})} \right|}},$$

where \(\Gamma ( \cdot )\) represents the  GO semantic representation of drugs.

KEGG pathways similarity: If two drugs act on the same pathway, then they may have a higher probability of synergistic and complementary effects [46]. KEGG DRUG contain graphical representation of chemical composition patterns, therapeutic classes, and drug development history. The drug targets enrich in particular KEGG pathways are used for inferring KEGG pathways similarity of drugs based on Jaccard’s coefficient.

SIDER similarity: Campillos et al. demonstrate that if two drugs have similar side effects they elicited, then they tend to share common drug targets [47]. Therefore, these two drugs are more likely to interact with each other and achieve better therapeutic effects. The side effects of drugs are obtained from SIDER database (http://sideeffects.embl.de) [48], and the SIDER drug side effects similarity between two drugs is calculated with the Jaccard’s coefficient.

OFFSIDES similarity: Offsides database (http://PharmGKB.org), as a complement of SIDER database,  provides drug effects not already list on the drug’s package insert [49]. The adverse events may help to uncover potential drug-drug interaction between two drugs, which can identify novel drug combination. In this study, Jaccard’s coefficient is used to infer the OFFSIDES drug effects similarity between two drugs.

As the seven drug similarity networks contained distinct drug nodes, we establish a criterion: if a drug is present in all similarity networks, it is retained along with its connections. Consequently, each similarity network ultimately consists of 558 nodes.

Features for query drug pair

In our study, the features utilized are derived from similarities between pairs of drugs, which are calculated based on seven types of drug-drug similarity. The similarity between a query drug pair (dA, dB) and a known drug combination (dA', dB') is determined from the drug-drug similarities S(dA, dA') and S(dB, dB') [50]. In the evaluation of drug pair similarities, considering the interchangeable nature of drug pairs (dA, dB) and (dB, dA), it is necessary to consider the similarity between the query drug pair (dB, dA) and the known drug combination (dA', dB'). Consequently, the similarity between the query drug pair (dA, dB) and the known drug combination (dA', dB') can be expressed as follows:

$${\text{SP}}\left( {\left( {{\text{dA,}}\,{\text{dB}}} \right){,}\,\left( {{\text{dA}}^{\prime } {,}\,{\text{dB}}^{\prime } } \right)} \right)\,{ = }\,\sqrt {{\text{max}}\left( {{\text{S}}\left( {{\text{dA,}}\,{\text{dA}}^{\prime } } \right) \cdot {\text{S}}\left( {{\text{dB,}}\,{\text{dB}}^{\prime } } \right){,}\,{\text{S}}\left( {{\text{dB,}}\,{\text{dA}}^{\prime } } \right) \cdot {\text{S}}\left( {{\text{dA,}}\,{\text{dB}}^{\prime } } \right)} \right)}.$$

For the i-th drug similarity network, the corresponding \({\text{SP}}_{i} \, (i = 1,2, \cdots ,7)\) can be calculated, we define a feature \(f_{i}\) for the query drug pair (dA, dB):

$$f_{i} \left( {\left( {{\text{dA}},{\text{dB}}} \right)} \right) = \mathop {\max }\limits_{{{\text{(dA}}^{\prime } ,{\text{dB}}^{\prime } {) } \in {\text{ known drug combinations}}}} {\text{SP}}_{i} \left( {\left( {{\text{dA}},{\text{dB}}} \right),\left( {{\text{dA}}^{\prime } ,{\text{dB}}^{\prime } } \right)} \right),i = 1,...,7.$$

Subsequently, for each query drug pair, we generate seven features \((f_{1} ,f_{2} , \cdots ,f_{7} )\) as illustrated in Fig. 1B.

Weighted Bayesian method for integrating features

In the integration of seven features \((f_{1} ,f_{2} , \cdots ,f_{7} )\) for drug pairs, we employ a novel weighted Bayesian approach. A Bayesian network serves as a representation of the joint probability distribution among multivariate variables, simplifying the complexity by decomposing the joint probability into a series of manageable modules. Initiating our method with the Bayesian theorem:

$$\begin{gathered} P\left( { + |f_{1} ,f_{2} , \cdots ,f_{7} } \right) = \frac{{P\left( {f_{1} ,f_{2} , \cdots ,f_{7} | + } \right)}}{{P\left( {f_{1} ,f_{2} , \cdots ,f_{7} } \right)}} \cdot P\left( + \right), \hfill \\ P\left( { - |f_{1} ,f_{2} , \cdots ,f_{7} } \right) = \frac{{P\left( {f_{1} ,f_{2} , \cdots ,f_{7} | - } \right)}}{{P\left( {f_{1} ,f_{2} , \cdots ,f_{7} } \right)}} \cdot P\left( - \right), \hfill \\ \end{gathered}$$

by dividing these probabilities, we obtain

$$\frac{{P\left( { + |f_{1} ,f_{2} , \cdots ,f_{7} } \right)}}{{P\left( { - |f_{1} ,f_{2} , \cdots ,f_{7} } \right)}} = \frac{{P\left( {f_{1} ,f_{2} , \cdots ,f_{7} | + } \right)}}{{P\left( {f_{1} ,f_{2} , \cdots ,f_{7} | - } \right)}} \cdot \frac{P\left( + \right)}{{P\left( - \right)}},$$

\(P( + )\) and \(P( - )\) are the prior probabilities that a drug pair occurs in the positive set and the negative set, and likelihood ratio \(\frac{{P\left( {f_{1} ,f_{2} , \cdots ,f_{7} | + } \right)}}{{P\left( {f_{1} ,f_{2} , \cdots ,f_{7} | - } \right)}}\) constitutes our focal point of interest, encompassing all seven features.

Nevertheless, the computation of the aforementioned likelihood ratio is not a straightforward task. This is because it involves multiple features, necessitating consideration of the relationships between each feature and the integration of the seven features to obtain a comprehensive result. To address the challenge of feature integration as described above, the Naive Bayesian algorithm emerges as a viable approach, recognized for its simplicity, effectiveness, and notable performance across various problem domains [51]. Under the assumption of Naive Bayesian,

$$\frac{{P\left( {f_{1} ,f_{2} , \cdots ,f_{7} | + } \right)}}{{P\left( {f_{1} ,f_{2} , \cdots ,f_{7} | - } \right)}} = \prod\limits_{i = 1}^{7} {\left[ {\frac{{P\left( {f_{i} | + } \right)}}{{P\left( {f_{i} | - } \right)}}} \right]}.$$

However, the Naive Bayesian relies on the assumption of conditional independence between attributes, which is seldom achievable in practical applications.

Consequently, in this paper, inspired by the boosting  naive Bayes model [52], we propose a novel feature-weighted Bayesian method aimed at mitigating this strong assumption. To begin with, calculate the ILR as follows:

$${\text{ILR}} = \prod\limits_{i = 1}^{7} {\left[ {\frac{{P\left( {f_{i} | + } \right)}}{{P\left( {f_{i} | - } \right)}}} \right]^{{w_{i} }} } ,$$

where \(w_{i}\) refers to the weight corresponding to feature \(i \, \left( {i = 1, \, 2, \, \cdots , \, 7} \right)\), \(P\left( {f_{i} | + } \right)\) and \(P\left( {f_{i} | - } \right)\) represent the probability density of a drug pair exhibiting the feature \(f_{i}\) in positive and negative, respectively.

Next, to better quantify the support for classifying the query drug pair to positive group over the negative group, we transform the ILR into a support strength score,

$${\text{support strength score = }}\frac{2}{{1 + {\text{exp}}\left( { - {\text{ILR}}} \right)}} - 1.$$

This score ranges from 0 to 1, with higher values indicating greater support for the query drug pairs being categorized into the positive group compared to the negative group.

Calculation of weights and likelihoods in ILR

First is the calculation of weights. For ease of computation, we first discretize the continuous variables. The discretization process can be carried out using hierarchical clustering [53, 54], where the optimal choice of the number of discrete classes is determined by the profile coefficients [55]. Each discrete class is assigned its corresponding discrete value based on the mean of its results. For the feature \(f_{i} \, (i = 1, \cdots ,7)\), the corresponding discretized variable is denoted as \(f_{i}^{dis}\), with distinct values in the discretized data being \(\{ d_{1} , \cdots ,d_{{M_{i} }} \}\), where \(M_{i}\) is the number of distinct values. The calculation of weights is based on the following formula, which has been modified inspired by the boosting  naive Bayes model [52, 56]:

$$w_{i} = \frac{1}{{M_{i} }}\left( {\sum\limits_{{class \in \{ + , - \} }} {\sum\limits_{k = 1}^{{M_{i} }} {\left[ {P(class|f_{i}^{dis} = d_{k} ) - P(class)} \right]^{2} } } } \right),$$

where the class includes both a positive set and a negative set. By considering both positive and negative sets in the calculation, a composite result is obtained, reflecting the extent to which each feature influences the prediction classification. Additionally, since different features may have varying numbers of discrete values after discretization, normalization is applied to ensure the comparability of feature weights.

Additionally, the ILR of the seven features are constructed as a weighted product of the likelihood ratios generated by each individual feature. Here, the likelihoods \(P(f_{i} | + )\) and \(P(f_{i} | - )\) can be estimated using kernel density estimation [27] of Gaussian mixture functions due to the simple mathematical properties of Gaussian functions:

$$P(f_{i} | + ) = \frac{1}{{N_{ + } b}}\sum\limits_{j = 1}^{{N_{ + } }} {\frac{1}{{\sqrt {2\pi } }}e^{{ - \tfrac{{(f_{i} - f_{j|i| + } )^{2} }}{{2b^{2} }}}} },$$

\(N_{ + }\) is the number of drug pairs belonging to the positive group, \(f_{j|i| + }\) representing the j-th value corresponding to the feature \(f_{i}\) in the positive drug pairs, \(b\) represents the bandwidth which is determined by the Silverman's rule of thumb [57,58,59]. Similarly, the probability \(P(f_{i} | - )\) can be calculated by the above method.

Drug combination datasets

We obtain the drug combination datasets as benchmark data from the following three data resources: (1) DCDB 2.0 database [60]: This dataset comprises information from over 140,000 clinical studies and the U.S. Food and Drug Administration (FDA) Orange Book. It encompasses 1363 pairs of drug combinations involving 904 ingredient compositions. For our analysis, only drug combinations labeled as ‘Efficacious’ are utilized. (2) ASDCD [61]: ASDCD is a database specializing in antifungal synergistic drug combinations. From this resource, we gather 548 pairwise validated combinations. (3) A curated dataset of drug combinations [34]: This dataset integrates drug combination information from the DCDB 2.0 database, Therapeutic Target Database (TTD), and FDA Orange Book. All drug pairs in this dataset have been either FDA-approved or experimentally validated. This dataset serves as a valuable supplement to our previous two datasets.

These datasets are then merged, with redundant entries removed, to create a comprehensive drug combination dataset. Then, we compared the comprehensive drug combination dataset with those in our drug similarity network and took the intersection, resulting in a final set of 831 known drug combinations used in our study.

Results

In this section, we display the experimental results of the novel WBCP method using 831 known drug combinations as positive samples, alongside an equal number of negative samples. To ensure diversity among negative samples, we employ the K-means algorithm to cluster all drug pairs excluding the known drug combinations. The optimal number of clusters is determined utilizing the “cascadeKM” function from the R package “vegan” [62], yielding 4 clusters as the optimal choice. Subsequently, an equivalent number of samples are randomly selected from each cluster to form the negative set. The positive and negative samples constitute our benchmark dataset.

The performance of WBCP is evaluated through tenfold cross-validation on benchmark. We conduct an overall performance comparison of WBCP with other methods firstly, then we analyze the performance of the feature extraction and prediction components of the WBCP scheme separately. For feature extraction, we employ visual analysis; additionally, for prediction, we evaluate multiple prediction methods based on the same feature vectors. The results indicate that the WBCP method exhibits a strong level of competitiveness compared to other methods in the context of drug combination prediction.

Evaluating the performance of WBCP and other methods

In this section, we compare WBCP method with other methods on the benchmark dataset. Distinct from WBCP, WBCP_NW represents a comparable scheme without considering weights, and WBCP_MI involves a different weight configuration, where the weights are determined by the mutual information between each constructed feature and the classification outcome in the training set. Additionally, we compare several recently proposed methods for predicting drug combinations based on knowledge graph similarity networks, including PEA [27], Liu et al. [26], and NEWMIN [40], and provide the results in Table 1.

Table 1 The performance results of WBCP and other approaches

We conduct an evaluation based on six metrics, namely AUC, AUPR, accuracy, precision, recall, F1-score, for various drug combination prediction methods. These metrics serve as standard and effective evaluation measures. The average performances of each method on the selected metrics across the ten-fold cross-validation along with their corresponding standard deviations are calculated. The mean of the metrics indicates the method’s overall performance, while the standard deviation measures the model's variability and reliability.

As shown in Table 1, WBCP demonstrates significantly superior performance compared to others across various metrices, including AUC, AUPR, accuracy, precision, and F1-score. Notably, WBCP exhibits a substantial improvement over other methods (AUC = 0.9188), followed by the WBCP_MI (AUC = 0.9170). While WBCP’s AUPR value (AUPR = 0.9174) is slightly lower than WBCP_MI (AUPR = 0.9177), all other performance metrics for WBCP surpass WBCP_MI. The results indicate that this method has overall advantages and is a competitive approach for predicting drug combinations.

Visualization of features extraction by different methods

In the structure of our WBCP method, the first part involves the extraction of features for drug pairs based on seven similarity networks, while the second part entails the prediction of drug combinations based on the extracted features. Comparable methods with similar structures include those proposed by Liu et al. [26] and NEWMIN [40]. For feature extraction, WBCP method involves the maximum similarity between query drug pair and known drug combinations, Liu et al. utilizes the restart random walk method, while NEWMIN employs the word2vec approach. By employing these three methods of feature extraction, we can derive three distinct types of drug pair features based on the benchmark dataset, designated as WBCP features, restart random walk features, and word2vec features.

Regarding the superiority or inferiority of features, we hypothesize that it may be attributed to the boundaries learned during the feature extraction process that differentiate between positive and negative drug pairs. To visualize the distribution of drug pairs’ feature vectors, we apply the t-distributed stochastic neighbor embedding (t-SNE) algorithm [63] to three distinct types of drug pair features. In the t-SNE plot, tight clustering of positive (or negative) set data indicates that the extracted features effectively capture the similarity within the same categories, while the distance or boundary between positive and negative set clusters suggests effective discrimination between the positive and negative categories.

Figure 2A illustrates the spatial distribution of positive and negative drug pair features extracted by WBCP. These two independent classes appear somewhat distinguishable, with certain combination pairs even forming distinct clusters. Figure 2B depicts the spatial distribution of restarted random walk features for the two classes of drug pairs. While negative samples are relatively concentrated, multiple clusters are formed among the positive samples. However, it can be observed that each cluster of positive samples contains negative samples, which may have some impact on classification. Figure 2C shows the performance of word2vec features for the two classes of drug pairs. Both positive and negative pairs are uniformly distributed in space without clear boundaries or clusters, suggesting that these features may be somewhat generic. Relatively speaking, WBCP features may be a promising option for feature extraction in drug combination prediction. Overall, the features extracted by WBCP perform the best in the t-SNE plot, this performance is likely attributable to the fact that WBCP extracts drug combination features based on similarities with known drug combinations.

Fig. 2
figure 2

The t-SNE plots of three types of drug pair features in benchmark: A WBCP features, B restart random walk features, C word2vec features. The positive pairs are represented by yellow dots, while the negative pairs are represented by blue triangles. Evaluating feature effectiveness through the clustering of two categories in the t-SNE plot

Evaluating the performance of WBCP and traditional machine learning methods

Currently, there are various machine learning methods available for predicting drug combinations based on the constructed feature vectors of drug pairs. In this section, we evaluate the predictive and classification performance of various commonly used machine learning methods and the prediction parts of WBCP and WBCP_MI based on the simple and naive feature vectors in WBCP. The commonly used machine learning methods for prediction and classification include support vector machine (SVM), logistic regression, random forest, k-nearest neighbors (KNN),  Naive Bayesian, AdaBoost, and Gradient Boost Tree (more details can be found in Supplementary note S4).

As depicted in Table 2, based on the constructed feature vectors proposed in WBCP method, various machine learning methods are employed for drug combination prediction. Evaluation metrics encompass AUC, AUPR, and F1-score, the reasons for choosing these three metrics are that AUC and AUPR can assess the overall performance of the methods, while F1-score balances precision and recall, providing a comprehensive evaluation. We have also calculated other metrics, with the results presented in Supplementary Table S1. The evaluation results of WBCP and WBCP_MI are generally leading, which could be attributed to the consideration of attribute-weighted likelihood ratios. This suggests that for predicting drug combinations based on the same feature vectors or embedding vectors, our WBCP method may be competitive across various machine learning methods.

Table 2 The performance results of WBCP and machine learning methods based on WBCP’s features

Based on the aforementioned analyses, we seek to validate further the prediction performance of our WBCP across different configurations of drug pair feature vectors. We conduct a comparative assessment of prediction and classification performance of different methods using two other types feature vectors mentioned in the feature visualization section, namely restarted random walk features and word2vec features, extracted from the benchmark dataset. Evaluation metrics encompass AUC, AUPR, and F1-score, we have also calculated other metrics, with the results presented in Supplementary Table S2.

The results displayed in Table 3 affirm the superior performance of the new method WBCP, notably excelling in AUC and AUPR metrics compared to traditional machine learning methodologies. Notably, when employing the restart random walk features, the SVM achieved the best F1-score performance. This disparity may be attributed to our approach of partitioning support strength scores using thresholds, which are determined by the optimal cutoff points derived from the ROC curves [64, 65].

Table 3 The performance results of WBCP and machine learning methods based on other features

Drug combinations prediction and external literature validation

In this section, we employ WBCP method to integrate multi-dimensional similarity networks from heterogenous information to predict drug combinations. We do not directly use the benchmark dataset containing 1662 samples as the training dataset, but instead randomly select 750 pairs of known drug combinations as positive set, along with an equal number of negative samples from it as negative set. The prediction samples consist of all samples except for the training samples, containing a total of 153903 query drug pairs.

Next, we conduct two aspects of analysis on the prediction results: first, the prediction and classification results of the remaining 81 pairs of known drug combinations, excluding those in the training samples; second, we rank the support strength scores of all query drug pairs predicted in WBCP, and conduct corresponding literature validation for top-ranking predicted combinations to analyze the reliability of the WBCP prediction process. The literature validation involves searching relevant academic databases such as PubMed and Web of Science using specific keywords to find supporting literature for the drug combination. Keywords include drug names along with terms like ‘combination’, ‘synergy’, and ‘therapy’. For top-ranking drug pairs, if some have not yet been validated in the literature, this suggests that they are potentially valuable combinations for research and warrant further experimental investigation.

For the 81 pairs of known drug combinations in the prediction set, we analyze their support strength scores (more details in supplementary Table S3). Among them, 29 pairs of drug combinations (approximately 35.80%) have support strength scores exceeding 0.9, while 42 pairs (approximately 51.85%) have support strength scores exceeding 0.7. Next, we conduct literature validation for the top-ranking drug combinations predicted by WBCP method (more details in supplementary Table S4). Among the top 20 predicted drug combinations indicate in Table 4, 13 combinations are supported by existing literature, while seven combinations lacked literature support, indicating potential novel drug combinations.

Table 4 Top 20 drug combinations predicted by WBCP method and their literature validations

The drug pair with the highest predicted support strength score is palonosetron and prednisone. In a study involving patients undergoing radiotherapy and concurrent cisplatin treatment for antiemetic prophylaxis, they find that 57% of patients had no vomiting after 5 weeks of treatment, including those treated with palonosetron and prednisone for antiemetic therapy [66]. Palonosetron is a 5-HT3 receptor antagonist used to prevent and treat chemotherapy-induced nausea and vomiting, while prednisone is a corticosteroid used to treat inflammation or immune-mediated reactions, as well as endocrine or neoplastic diseases [67, 68]. On one hand, palonosetron and prednisone can alleviate nausea and vomiting symptoms through different mechanisms, leading to enhanced antiemetic effects when combined [69]. On the other hand, as palonosetron targets chemotherapy-induced vomiting and prednisone addresses vomiting caused by other reasons, their combination can cover a broader spectrum of vomiting types. Additionally, prednisone may have anti-inflammatory and immunomodulatory effects, which can alleviate other chemotherapy-related discomforts such as pain and inflammation [70].

In addition to the combination of cancer treatment drugs with antiemetic drugs, another top-ranking prediction validated by literature involves two breast cancer drugs, goserelin and letrozole. As a promising treatment option for premenopausal women with metastatic breast cancer, a combination of gonadotropin-releasing hormone analogs and aromatase inhibitors may be preferable [71]. It has been indicated that the combination of goserelin and letrozole provides clinical benefits to most patients. Goserelin is a hormone antagonist commonly used to treat breast cancer, prostate cancer, and endometriosis [72]. It works by inhibiting the release of gonadotropin-releasing hormone (GnRH) from the pituitary gland, thereby suppressing the production of estrogen and progesterone by the ovaries, leading to the inhibition of cancer cell growth [73]. Letrozole is an aromatase inhibitor. It works by decreasing the amount of estrogen produced in the body [74]. In premenopausal women with hormone receptor-positive breast cancer, the combination of a GnRH agonist like goserelin along with an aromatase inhibitor like letrozole may be used to suppress ovarian function and lower estrogen levels, thereby slowing the growth of hormone-sensitive tumors [75]. Additionally, in women undergoing fertility treatment, the combination of goserelin and letrozole may be used to induce ovulation by suppressing the natural hormonal fluctuations that interfere with the ovulation process [76].

Another top-ranking validated combination involves two drugs used in the treatment of type 2 diabetes mellitus: chlorpropamide and pioglitazone. Combinations of sulfonylurea drugs with thiazolidinedione drugs have been widely reported to enhance glucose-lowering effects [77]. Chlorpropamide is an oral antidiabetic medication belonging to the sulfonylurea class. Its mechanism of action primarily involves stimulating insulin release and reducing hepatic glycogen synthesis, thereby lowering blood sugar levels [78]. Pioglitazone, on the other hand, is an oral antidiabetic medication classified as an insulin sensitizer [79]. It acts on the liver, adipose tissue, and muscles to enhance tissue sensitivity to insulin, increase glucose utilization by tissues, and lower blood sugar levels. The two drugs have different mechanisms of action: chlorpropamide primarily stimulates insulin release, while pioglitazone primarily enhances tissue sensitivity to insulin. They complement each other, and their combination can achieve better glucose-lowering effects [77]. Additionally, at appropriate doses, the combination of the two drugs can alleviate potential side effects that may occur with monotherapy.

Finally, apart from the drug pairs validated by literature, we conduct partial analysis on the predicted potential drug pairs. Among the newly predicted potential drug combinations, the top-ranking combination is ondansetron and palonosetron. Both drugs are used to address gastrointestinal issues and treat vomiting. Ondansetron is a serotonin 5-HT3 receptor antagonist primarily used to prevent nausea and vomiting in cancer chemotherapy and postoperative settings [80]. Palonosetron, also a serotonin antagonist, is used for prophylaxis or control of chemotherapy-induced nausea and vomiting, as well as postoperative nausea and vomiting [81]. It may be possible to increase the blocking effect on 5-HT3 receptors by combining these two drugs, thereby enhancing the antiemetic effect. The next combination is palonosetron and paroxetine. As mentioned earlier, palonosetron is a serotonin 5-HT3 receptor antagonist used for chemotherapy-induced nausea and vomiting, while paroxetine is a selective serotonin reuptake inhibitor mainly used to treat depression, anxiety, and other psychiatric disorders [82]. This potential combination may be based on several considerations. First, paroxetine is sometimes used to treat nausea and vomiting, particularly those associated with anxiety and depression. Second, in certain situations, there may be a need to enhance the antiemetic effect, especially for patients who do not respond well to standard treatments or require stronger antiemetic effects.

Discussion

The focal point of drug combinations prediction is to discern effective drug combinations that collaborate synergistically in the treatment of diseases while minimizing adverse effects. The primary challenges in drug combination prediction arise from the vast scale of the search space for potential combinations. Moreover, incorporating heterogeneous information into drug combination prediction further amplifies complexity. One strategy to accomplish this objective is through the utilization of databases enriched with substantial information on drug combinations, coupled with a computational method designed for predicting therapeutic effects. In previous research, the integration between drug similarity networks may have been too coarse, resulting in incomplete utilization of information [26]. Additionally, some researchers have proposed computational methodologies based on relatively strong assumptions, which may be impractical within real-word scenarios [27]. Consequently, there exists considerable latitude for refinement and advancement in the computational approaches employed for drug combination prediction. We propose a novel approach, WBCP, tailored for drug combination prediction. The features constructed by the WBCP method are associated with a set of known drug combinations. This method integrates heterogeneous information related to multiple drugs and provides support strength scores describing the priority of predicting drug combinations. Additionally, our WBCP method is applicable for the large-scale prediction of drug combinations, offering both convenience and high practical utility.

WBCP is a specifically designed to accommodate diverse types of data. By integrating weak predictive features, such as chemical structure, target protein sequences, and side effects, within a unified framework, comprehensive feature augmentation is achieved (more details in supplementary note S3). The primary rationale lies in Bayesian methods transforming constructed features into a probabilistic framework to extract latent patterns. Additionally, our proposed inclusion of a novel weighting mechanism in the algorithm weakens the initial assumption of conditional independence between features, thereby enhancing the performance of integration-based computational approaches. Concretely, WBCP transforms the constructed features into the ILR as described Sect. “Weighted Bayesian method for integrating features”, the ILR is further converted into the support strength score, which ranges from 0 to 1. Higher support strength score indicate a greater level of support that the drug pair forms an effective drug combination. Traditional methods typically rely on a pattern of extracted feature vectors and strong assumptions made by machine learning predictions. In contrast, our method constructs maximum similarity between query drug pairs and a set of known drug combinations, and mitigating the strong assumptions previously imposed that did not align with real-world scenarios.

We conduct experiments on large-scale datasets, and the results demonstrate that WBCP outperforms other state-of-the-art methods. Our research reveals some notable findings, with one key observation emphasizing that integrating features by assigning individual weights to each feature, rather than assuming conditional independence, not only alleviates potential assumptions but also enhances the predictive capability of the model (more details in supplementary Table S5). Additionally, visual analysis of the features extracted by our method compared to others reveals that our newly constructed features, namely the maximum similarity feature between queried drug pairs and known drug combinations, exhibit blurred boundaries and clustering or cluster formation in spatial distribution, which is advantageous for distinguishing positive and negative samples. In the literature validation, many of the top-ranked drug combinations have been supported by previous studies demonstrating their efficacy through experiments or clinical use. For example, the combination of the two breast cancer drugs, goserelin and letrozole, has been validated [71]. In conclusion, compared to other computational methods, WBCP demonstrates greater capability in identifying drug combinations.

There are several key points to mention about this study. Utilizing drug characterization within the framework of molecular networks, along with target and side effect information and so on, in the prediction of drug combinations may offers a transparent comprehension and interpretability of the underlying biological mechanisms, facilitating further research. Additionally, we incorporate two sources of drug side effect information, one obtained from SIDER database and the other from Offsides database. The inclusion of the latter is particularly noteworthy, as it serves as a valuable supplement to the former, providing information on drug side effects not documented in drug manuals. Additionally, we explore the application of this method to the performance of drug combinations in various cell lines. The data is high-dimensional, containing not only drug information but also cell line information. The method's performance is presented in the supplementary note S5.

Secondly, our method exhibits exceptional scalability. On one hand, when predicting the effects of a drug pair with unknown outcomes for the first time, it requires only the computation of constructed features for the drug pair within a set of known drug combinations. Subsequently, the ILR is calculated and converted into a support strength score. On the other hand, beyond the seven features previously mentioned, numerous additional heterogenous information can characterize a drug. If any such information proves to be an effective complement, our method allows for the efficient calculation of its weight, facilitating its seamless integration into the overall process. This simplicity and effectiveness enhance the versatility of our method.

We posit that our method holds considerable potential across various domains, notably in drug development, large-scale clinical trial design, and the strategic guidance of in vitro experiments. The potential impact of our method is underscored by its capacity to inform and optimize the identification of promising drug combinations, providing researches with a robust tool for enhancing efficacy and resource efficacy in the design of expensive clinical trials. Moreover, the structured application of our method facilitates the strategic planning and execution of in vitro experiments, offering a systematic approach to enhance precision and mitigate costs associated with extensive trial initiatives.

In the drug combination prediction framework, there are still some directions for further exploration in future research. Firstly, we posit that the incorporation of additional clinical information, such as drug dosage, could enhance advantages. Given that one of the primary objectives of drug combinations is to mitigate drug side effects, some of which may result from overdosing, integrating more comprehensive clinical information may enhance the chances of success [83]. Secondly, Specifying the disease itself is another noteworthy direction for future research in predicting drug combinations. Since drug combinations are designed for specific indications, incorporating disease-specific information would offer a more nuanced perspective. This approach could be especially beneficial for complex conditions such as cancer or cardiovascular diseases, where tailored drug combinations are critical [2, 10]. Lastly, the mode of drug delivery emerges as a pertinent element that can be considered in the future, as it has been shown that different drug dosage forms and delivery modes have a significant impact on drug efficacy [84].

Conclusion

This study proposes a computational method for predicting drug combinations, named WBCP. The WBCP algorithm integrates various data types, such as chemical structures, target protein sequences, and side effects, into a unified framework. The method integrates a Bayesian approach with a novel weighting mechanism to relax the assumptions of conditional independence between features, demonstrating competitive performance in comprehensive evaluations. WBCP method presents significant potential in drug development, clinical trial design, and in vitro experiment planning, optimizing the identification of promising drug combinations and thereby improving efficacy and reducing costs in these processes.

Availability of data and materials

The datasets and codes analyzed during the current study are available in the website: https://github.com/YQHuFD/WBCP.

References

  1. He B, Lu C, Zheng G, et al. Combination therapeutics in complex diseases. J Cell Mol Med. 2016;20(12):2231–40.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Mokhtari RB, Homayouni TS, Baluch N, et al. Combination therapy in combating cancer. Oncotarget. 2017;8(23):38022–43.

    Article  PubMed Central  Google Scholar 

  3. Ahmed F, Samantasinghar A, Soomro AM, et al. A systematic review of computational approaches to understand cancer biology for informed drug repurposing. J Biomed Inform. 2023;142:104373.

    Article  PubMed  Google Scholar 

  4. Mangiafico S, Costello-Boerrigter LC, Andersen IA, et al. Neutral endopeptidase inhibition and the natriuretic peptide system: an evolving strategy in cardiovascular therapeutics. Eur Heart J. 2013;34(12):886–93.

    Article  CAS  PubMed  Google Scholar 

  5. The ACCORD Study Group. Effects of combination lipid therapy in type 2 diabetes mellitus. N Engl J Med. 2010;362(17):1563–74.

    Article  PubMed Central  Google Scholar 

  6. Butterfield LH, Najjar YG. Immunotherapy combination approaches: mechanisms, biomarkers and clinical observations. Nat Rev Immunol. 2024;24(6):399–416.

    Article  CAS  PubMed  Google Scholar 

  7. Samantasinghar A, Ahmed F, Rahim CSA, et al. Artificial intelligence-assisted repurposing of lubiprostone alleviates tubulointerstitial fibrosis. Transl Res. 2023;262:75–88.

    Article  CAS  PubMed  Google Scholar 

  8. Ahmed F, Ho SG, Samantasinghar A, et al. Drug repurposing in psoriasis, performed by reversal of disease-associated gene expression profiles. Comput Struct Biotec. 2022;20:6097–107.

    Article  CAS  Google Scholar 

  9. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Ca-Cancer J Clin. 2024;74(3):229–63.

    Article  PubMed  Google Scholar 

  10. Samuel PO, Edo GI, Emakpor OL, et al. Lifestyle modifications for preventing and managing cardiovascular diseases. Sport Sci Hlth. 2024;20(1):23–36.

    Google Scholar 

  11. Reynolds JK, Neumiller JJ, Campbell RK. Janumet™: a combination product suitable for use in patients with type 2 diabetes. Expert Opin Inv Drug. 2008;17(10):1559–65.

    Article  CAS  Google Scholar 

  12. Daneshjou D, Mehranjani MS, Zadehmodarres S, et al. Sitagliptin/metformin improves the fertilization rate and embryo quality in polycystic ovary syndrome patients through increasing the expression of GDF9 and BMP15: a new alternative to metformin (a randomized trial). J Reprod Immunol. 2022;150:103499.

    Article  CAS  PubMed  Google Scholar 

  13. Tambuyzer E, Vandendriessche B, Austin CP, et al. Therapies for rare diseases: therapeutic modalities, progress and challenges ahead. Nat Rev Drug Discov. 2020;19(2):93–111.

    Article  CAS  PubMed  Google Scholar 

  14. Samantasinghar A, Sunildutt NP, Ahmed F, et al. A comprehensive review of key factors affecting the efficacy of antibody drug conjugate. Biomed Pharmacother. 2023;161:114408.

    Article  CAS  PubMed  Google Scholar 

  15. Jia J, Zhu F, Ma X, et al. Mechanisms of drug combinations: interaction and network perspectives. Nat Rev Drug Discov. 2009;8(2):111–28.

    Article  CAS  PubMed  Google Scholar 

  16. Al-Lazikani B, Banerji U, Workman P. Combinatorial drug therapy for cancer in the post-genomic era. Nat Biotechnol. 2012;30(7):679–92.

    Article  CAS  PubMed  Google Scholar 

  17. Asif A, Park SH, Manzoor Soomro A, et al. Microphysiological system with continuous analysis of albumin for hepatotoxicity modeling and drug screening. J Ind Eng Chem. 2021;98:318–26.

    Article  CAS  Google Scholar 

  18. Idée J, Port M, Raynal I, et al. Clinical and biological consequences of transmetallation induced by contrast agents for magnetic resonance imaging: a review. Fund Clin Pharmacol. 2006;20(6):563–76.

    Article  Google Scholar 

  19. Sunildutt N, Parihar P, Salih ARC, et al. Revolutionizing drug development: harnessing the potential of organ-on-chip technology for disease modeling and drug discovery. Front Pharmacol. 2023;14:1139229.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Sun X, Vilar S, Tatonetti NP. High-throughput methods for combinatorial drug discovery. Sci Transl Med. 2013;5(205):205rv1.

    Article  PubMed  Google Scholar 

  21. Cokol M, Chua HN, Tasan M, et al. Systematic exploration of synergistic drug pairs. Mol Syst Biol. 2011;7:544.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Armstrong JW. A review of high-throughput screening approaches for drug discovery. Am Biotechnol Lab. 1999;17(1):26–8.

    CAS  Google Scholar 

  23. Liu H, Fan Z, Lin J, et al. The recent progress of deep-learning-based in silico prediction of drug combination. Drug Discov Today. 2023;28(7):103625.

    Article  CAS  PubMed  Google Scholar 

  24. Costello JC, Heiser LM, Georgii E, et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014;32(12):1202–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Gu J, Bang D, Yi J, et al. A model-agnostic framework to enhance knowledge graph-based drug combination prediction with drug–drug interaction data and supervised contrastive learning. Brief Bioinform. 2023;24(5):bbab285.

    Article  Google Scholar 

  26. Liu H, Zhang W, Nie L, et al. Predicting effective drug combinations using gradient tree boosting based on features extracted from drug-protein heterogeneous network. BMC Bioinformatics. 2019;20(1):645.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Li P, Huang C, Fu Y, et al. Large-scale exploration and analysis of drug combinations. Bioinformatics. 2015;31(2):2007–16.

    Article  CAS  PubMed  Google Scholar 

  28. Ahmed F, Kang IS, Kim KH, et al. Drug repurposing for viral cancers: a paradigm of machine learning, deep learning, and virtual screening-based approaches. J Med Virol. 2023;95(4):e28693.

    Article  CAS  PubMed  Google Scholar 

  29. Ahmed F, Lee JW, Samantasinghar A, et al. SperoPredictor: an integrated machine learning and molecular docking-based drug repurposing framework with use case of COVID-19. Front Public Health. 2022;10:902123.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Sunildutt N, Ahmed F, Salih ARC, et al. Integrating transcriptomic and structural insights: revealing drug repurposing opportunities for sporadic ALS. ACS Omega. 2024;9(3):3793–806.

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Ryall KA, Tan AC. Systems biology approaches for advancing the discovery of effective drug combinations. J Cheminformatics. 2015;7:1–15.

    Article  Google Scholar 

  32. Chou T-C. Drug combination studies and their synergy quantification using the Chou-Talalay method. Cancer Res. 2010;70(2):440–6.

    Article  CAS  PubMed  Google Scholar 

  33. Tang J, Karhinen L, Xu T, et al. Target inhibition networks: predicting selective combinations of druggable targets to block cancer survival pathways. PLoS Comput Biol. 2013;9(9):e1003226.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Cheng F, Kovács IA, Barabási AL. Network-based prediction of drug combinations. Nat Commun. 2019;10(1):1197.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Ahmed F, Soomro AM, Chethikkattuveli Salih AR, et al. A comprehensive review of artificial intelligence and network based approaches to drug repurposing in Covid-19. Biomed Pharmacother. 2022;153:113350.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Ahmed F, Samantasinghar A, Ali W, et al. Network-based drug repurposing identifies small molecule drugs as immune checkpoint inhibitors for endometrial cancer. Mol Divers. 2024;1–17.

  37. Ahmed F, Yang YJ, Samantasinghar A, et al. Network-based drug repurposing for HPV-associated cervical cancer. Comput Struct Biotec. 2023;21:5186–200.

    Article  CAS  Google Scholar 

  38. Iadevaia S, Lu Y, Morales FC, et al. Identification of optimal drug combinations targeting cellular networks: integrating phospho-proteomics and computational network analysis. Cancer Res. 2010;70(17):6704–14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Huang L, Li F, Sheng J, et al. DrugComboRanker: drug combination discovery based on target network analysis. Bioinformatics. 2014;30(12):i228-236.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Yu L, Xia M, An Q. A network embedding framework based on integrating multiplex network for drug combination prediction. Brief Bioinform. 2022;23(1):bbab364.

    Article  PubMed  Google Scholar 

  41. Kastrin A, Ferk P, Leskošek B. Predicting potential drug-drug interactions on topological and semantic similarity features using statistical learning. PLoS ONE. 2018;13(5):e0196865.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Cao Y, Charisi A, Cheng LC, et al. ChemmineR: a compound mining framework for R. Bioinformatics. 2008;24(5):1733–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Chen X, Reynolds CH. Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J Chem Inf Comput Sci. 2002;42(6):1407–14.

    Article  CAS  PubMed  Google Scholar 

  44. UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49(D1):D480–9.

    Article  Google Scholar 

  45. Xiao N, Cao DS, Zhu MF, et al. protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics. 2015;31(11):1857–9.

    Article  CAS  PubMed  Google Scholar 

  46. Dücker C, Brockmöller J. How precise is quantitative prediction of pharmacokinetic effects due to drug-drug interactions and genotype from in vitro data? A comprehensive analysis on the example CYP2D6 and CYP2C19 substrates. Pharmacol Ther. 2021;217:107629.

    Article  PubMed  Google Scholar 

  47. Campillos M, Kuhn M, Gavin AC, et al. Drug target identification using side-effect similarity. Science. 2008;321(5886):263–6.

    Article  CAS  PubMed  Google Scholar 

  48. Kuhn M, Letunic I, Jensen LJ, et al. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016;44(D1):D1075–9.

    Article  CAS  PubMed  Google Scholar 

  49. Tatonetti NP, Ye PP, Daneshjou R, et al. Data-driven prediction of drug effects and interactions. Sci Transl Med. 2012;4(125):125ra31.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Song D, Chen Y, Min Q, et al. Similarity-based machine learning support vector machine predictor of drug-drug interactions with improved accuracies. J Clin Pharm Ther. 2019;44(2):268–75.

    Article  CAS  PubMed  Google Scholar 

  51. Rish I. An empirical study of the naive Bayes classifier. IJCAI 2001 workshop on empirical methods in artificial intelligence. 2001; 3(22):41–46.

  52. Vidaurre D, Bielza C, Larrañaga P. Forward stagewise naïve Bayes. Prog Artif Intell. 2012;1:57–69.

    Article  Google Scholar 

  53. Dash R, Paramguru RL, Dash R. Comparative analysis of supervised and unsupervised discretization techniques. Int J Adv Sci Technol. 2011;2(3):29–37.

    Google Scholar 

  54. Chmielewski MR, Grzymala-Busse JW. Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason. 1996;15(4):319–31.

    Article  Google Scholar 

  55. Murtagh F, Contreras P. Algorithms for hierarchical clustering: an overview. Wires Data Min Knowl. 2012;2(1):86–97.

    Article  Google Scholar 

  56. Ferreira JTAS, Denison DGT, Hand DJ. Data mining with products of trees. International symposium on intelligent data analysis. Berlin, Heidelberg: Springer; 2001. pp. 167–76.

    Google Scholar 

  57. R Core Team. R: A language and environment for statistical computing. 2021. https://www.R-project.org/.

  58. Andersson B, von Davier AA. Improving the bandwidth selection in kernel equating. J Educ Meas. 2014;51(3):223–38.

    Article  Google Scholar 

  59. Chen S. Optimal bandwidth selection for kernel density functionals estimation. J Probab Stat. 2015;2015(1): 242683.

    Google Scholar 

  60. Liu Y, Wei Q, Yu G, et al. DCDB 2.0: a major update of the drug combination database. Database (Oxford). 2014;2014:bau124.

    Article  PubMed  Google Scholar 

  61. Chen X, Ren B, Chen M, et al. ASDCD: antifungal synergistic drug combination database. PLoS ONE. 2014;9(1): e86499.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Oksanen J, Simpson GL, Blanchet FG, et al. vegan: Community ecology package. 2022. https://CRAN.R-project.org/package=vegan.

  63. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(11):2579–605.

    Google Scholar 

  64. Nahm FS. Receiver operating characteristic curve: overview and practical use for clinicians. Korean J Anesthesiol. 2022;75(1):25–36.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Habibzadeh F, Habibzadeh P, Yadollahie M. On determining the most appropriate test cut-off value: the case of tests with continuous results. Biochem Medica. 2016;26(3):297–307.

    Article  Google Scholar 

  66. Ruhlmann CH, Belli C, Dahl T, et al. Palonosetron and prednisolone for the prevention of nausea and emesis during fractionated radiotherapy and 5 cycles of concomitant weekly cisplatin—a phase II study. Support Care Cancer. 2013;21(12):3425–31.

    Article  PubMed  Google Scholar 

  67. Piechotta V, Adams A, Haque M, et al. Antiemetics for adults for prevention of nausea and vomiting caused by moderately or highly emetogenic chemotherapy: a network meta-analysis. Cochrane Db Syst Rev. 2021;11(11):CD012775.

    Google Scholar 

  68. Shaikh S, Verma H, Yadav N, et al. Applications of steroid in clinical practice: a review. Int Sch Res Notices. 2012;2012(1):985495.

    Google Scholar 

  69. Fabi A, Malaguti P. An update on palonosetron hydrochloride for the treatment of radio/chemotherapy-induced nausea and vomiting. Expert Opin Pharmaco. 2013;14(5):629–41.

    Article  CAS  Google Scholar 

  70. Glare P, Aubrey K, Gulati A, et al. Pharmacologic management of persistent pain in cancer survivors. Drugs. 2022;82(3):275–91.

    Article  PubMed  PubMed Central  Google Scholar 

  71. Yao S, Xu B, Li Q, et al. Goserelin plus letrozole as first- or second-line hormonal treatment in premenopausal patients with advanced breast cancer. Endocr J. 2011;58(6):509–16.

    Article  CAS  PubMed  Google Scholar 

  72. Moore HC, Unger JM, Phillips K-A, et al. Goserelin for ovarian protection during breast-cancer adjuvant chemotherapy. New Engl J Med. 2015;372(10):923–32.

    Article  CAS  PubMed  Google Scholar 

  73. Limonta P, Moretti RM, Marelli MM, et al. The biology of gonadotropin hormone-releasing hormone: role in the control of tumor growth and progression in humans. Front Neuroendocrinol. 2003;24(4):279–95.

    Article  CAS  PubMed  Google Scholar 

  74. Haynes BP, Dowsett M, Miller WR, et al. The pharmacology of letrozole. J Steroid Biochem. 2003;87(1):35–45.

    Article  CAS  Google Scholar 

  75. Lu Y-S, Wong A, Kim HJ. Ovarian function suppression with luteinizing hormone-releasing hormone agonists for the treatment of hormone receptor-positive early breast cancer in premenopausal women. Front Oncol. 2021;11:700722.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Zhou H, Cao D, Yang J, et al. Gonadotropin-releasing hormone agonist combined with a levonorgestrel-releasing intrauterine system or letrozole for fertility-preserving treatment of endometrial carcinoma and complex atypical hyperplasia in young women. Int J Gynecol Cancer. 2017;27(6):1178–82.

    Article  PubMed  Google Scholar 

  77. Scheen AJ. Drug interactions of clinical importance with antihyperglycaemic agents: an update. Drug Saf. 2005;28(7):601–31.

    Article  CAS  PubMed  Google Scholar 

  78. Osadebe PO, Odoh EU, Uzor PF. Oral anti-diabetic agents-review and updates. Br J Med Med Res. 2015;5(2):134–59.

    Article  Google Scholar 

  79. Mudaliar S, Henry RR. New oral therapies for type 2 diabetes mellitus: the glitazones or insulin sensitizers. Annu Rev Med. 2001;52(1):239–57.

    Article  CAS  PubMed  Google Scholar 

  80. Ye J-H, Ponnudurai R, Schaefer R. Ondansetron: a selective 5-HT3 receptor antagonist and its applications in CNS-related disorders. CNS Drug Rev. 2001;7(2):199–213.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Muchatuta NA, Paech MJ. Management of postoperative nausea and vomiting: focus on palonosetron. Ther Clin Risk Manag. 2009;5(1):21–34.

    CAS  PubMed  PubMed Central  Google Scholar 

  82. Wagstaff AJ, Cheer SM, Matheson AJ, et al. Paroxetine: an update of its use in psychiatric disorders in adults. Drugs. 2002;62(4):655–703.

    Article  CAS  PubMed  Google Scholar 

  83. Powell JR, Cook J, Wang Y, et al. Drug dosing recommendations for all patients: a roadmap for change. Clin Pharmacol Ther. 2021;109(1):65–72.

    Article  PubMed  Google Scholar 

  84. Wen H, Jung H, Li X. Drug delivery approaches in addressing clinical pharmacology-related issues: opportunities and challenges. AAPS J. 2015;17(6):1327–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Thanks to all those who maintain excellent databases and to all experimentalists who enabled this work by making their data publicly available.

Funding

This research is supported by National Key R&D Program of China (2023YFF1205101).

Author information

Authors and Affiliations

Authors

Contributions

T. L wrote the manuscript; Y. H and T. L designed the research; T. L performed the research; T. L analyzed the data; L. X, H. G and A. C contributed comments on the first manuscript.

Corresponding author

Correspondence to Yue-Qing Hu.

Ethics declarations

Competing interests

No potential competing interest is reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, T., Xiao, L., Geng, H. et al. A weighted Bayesian integration method for predicting drug combination using heterogeneous data. J Transl Med 22, 873 (2024). https://doi.org/10.1186/s12967-024-05660-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12967-024-05660-3

Keywords