 Research
 Open access
 Published:
MKRMDA: multiple kernel learningbased Kronecker regularized least squares for MiRNA–disease association prediction
Journal of Translational Medicine volume 15, Article number: 251 (2017)
Abstract
Background
Recently, as the research of microRNA (miRNA) continues, there are plenty of experimental evidences indicating that miRNA could be associated with various human complex diseases development and progression. Hence, it is necessary and urgent to pay more attentions to the relevant study of predicting diseases associated miRNAs, which may be helpful for effective prevention, diagnosis and treatment of human diseases. Especially, constructing computational methods to predict potential miRNA–disease associations is worthy of more studies because of the feasibility and effectivity.
Methods
In this work, we developed a novel computational model of multiple kernels learningbased Kronecker regularized least squares for MiRNA–disease association prediction (MKRMDA), which could reveal potential miRNA–disease associations by automatically optimizing the combination of multiple kernels for disease and miRNA.
Results
MKRMDA obtained AUCs of 0.9040 and 0.8446 in global and local leaveoneout cross validation, respectively. Meanwhile, MKRMDA achieved average AUCs of 0.8894 ± 0.0015 in fivefold cross validation. Furthermore, we conducted three different kinds of case studies on some important human cancers for further performance evaluation. In the case studies of colonic cancer, esophageal cancer and lymphoma based on known miRNA–disease associations in HMDDv2.0 database, 76, 94 and 88% of the corresponding top 50 predicted miRNAs were confirmed by experimental reports, respectively. In another two kinds of case studies for new diseases without any known associated miRNAs and diseases only with known associations in HMDDv1.0 database, the verified ratios of two different cancers were 88 and 94%, respectively.
Conclusions
All the results mentioned above adequately showed the reliable prediction ability of MKRMDA. We anticipated that MKRMDA could serve to facilitate further developments in the field and the followup investigations by biomedical researchers.
Background
MicroRNAs (miRNAs) are a class of endogenous and small noncoding RNAs, which function in RNA silencing and posttranscriptional regulation of gene expression via basepairing with complementary sequences within mRNA molecules [1,2,3,4,5,6]. However, some researches have shown that in some cases miRNAs could also function as positive regulators [7, 8]. Since the first discovery of miRNAs (C. elegans lin4) in the early 1990s, thousands of currently annotated miRNAs have been identified from a wide variety of species, ranging from nematodes to humans (for example, more than 1800 homo sapiens miRNAs according to miRBase21) [9,10,11,12,13]. In addition, plenty of evidences have shown that miRNAs play important roles in many fundamental and critical biological processes, such as cell growth, proliferation, differentiation, development, apoptosis, metabolism, aging, signal transduction, viral infection and so on [14,15,16,17,18,19]. Thus, it is not surprising that more and more miRNAs have been reported to be associated with various complex human diseases [20,21,22]. For example, compared with normal tissue controls as measured by microarray, miR129, miR142, and miR25 were differentially expressed in every pediatric brain tumor type [23]. Furthermore, according to hepatitis C virus (HCV) case report, the miR122 expression level could be downregulated by HCV core protein in a time and dosedependent manner [24]. Moreover, compared with the healthy gingiva, in periodontitis cases, six miRNAs (let7a, let7c, miR130a, miR301a, miR520d and miR548a) were upregulated more than eightfold [25]. Additionally, miR372 and miR373 were highly upregulated in the cerebellar tumors compared with normal cerebellum or whole brain [26]. Therefore, identifying potential diseaserelated miRNAs could not only significantly contribute to comprehending the diseases mechanisms, but also be beneficial to the prognosis, diagnosis, treatment and prevention of human complex diseases [27,28,29,30]. However, as is known, traditional experimental methods are usually expensive and timeconsuming. Fortunately, as the accumulated results of vast biology experiments, some reliable miRNArelated datasets have been constructed and updated. So it is necessary and viable to develop more efficient and feasible computational approaches to predict underlying diseases associated miRNAs based on available biological datasets. In addition, the promising predicted results obtained by computational methods could be used as guidance for further experimental validation [31, 32].
In fact, based on the hypothesis that functionally similar miRNAs are often associated with phenotypically similar diseases and vice versa [12, 33,34,35,36,37], many computational models have been proposed for predicting diseaseassociated miRNAs during the last years. For example, Jiang et al. [27] presented a networkbased approach, which scored each miRNA in the miRNA network through the cumulative hypergeometric distribution to predict potential miRNA–disease associations. Considering the functional connections between miRNA targets and disease genes in protein–protein interaction (PPI) networks, Shi et al. [38] developed a computational method to identify miRNA–disease associations by performing random walk. Their model took advantage of human PPIs, the miRNA–target interactions and disease–gene associations to predict potential associations between the miRNAs and diseases based on the assumption that miRNAs could tend to be associated with diseases which have more correlated associations with the miRNA targets. By integrating protein–disease associations and miRNA–protein interactions, Mork et al. [39] presented a miRPD (miRNA–protein–disease) approach to predict novel miRNA–disease associations. In their model, they inferred disease–miRNA associations and ranked them according to a scoring scheme that combined the miRNA–protein association scores and protein–disease association scores. However, all the above three models strongly depended on the miRNA–target interactions with high rate of falsepositive and high falsenegative results. Chen et al. [40] presented the Random Walk with Restart for MiRNA–disease association (RWRMDA) model. In their method, they mapped all the miRNAs (containing seed miRNAs and candidate miRNAs) to miRNA functional similarity network. Then, they implemented random walk with restart until they got stable probability. Finally, they ranked all the candidate miRNAs based on the stable probability to select potential diseaserelated miRNAs for experimental validation. Meanwhile, that approach was the first global networkbased method and it did not rely on predicted miRNA–target interactions. Xuan et al. [41] developed a HDMP method based on weighted k nearest neighbors. They calculated the miRNAs functional similarity matrix by incorporating the semantic similarity and the phenotype similarity between diseases. Then they adopted a unique weight assignment of miRNAs based on miRNA family or cluster. Finally, the relevance score of unlabeled miRNA with investigated disease was calculated by considering the functional similarities of its weighted k most similar neighbors and the distribution information of the labeled miRNAs in these neighbors. Considering that the simple similaritybased ranking of knearestneighbors was not reliable for further prediction, Chen et al. [42] proposed a computational method of rankingbased KNN for miRNA–disease association prediction (RKNNMDA) to predict potential related miRNAs by reranking these previously similaritybased sorted neighbors for better prediction results. Li et al. [43] developed a matrix completion for MiRNA–disease association prediction (MCMDA) using matrix completion algorithm based on the known miRNA–disease associations to predict the potential miRNA–disease associations. Although the prediction performances of these mentioned approaches were pretty good, they could not be implemented for the diseases without known related miRNAs. Furthermore, HDMP strongly relied on the selection of the number of nearest neighbors considered in the model and it failed to set different values of this parameter when different diseases were investigated. Recently, Chen et al. [44] proposed the model of within and between score for MiRNA–disease association prediction (WBSMDA). WBSMDA integrated miRNA functional similarity, disease semantic similarity, known miRNA–disease associations, and Gaussian interaction profile kernel similarity for diseases and miRNAs into an integrated similarity for diseases and miRNAs respectively, then the model combined withinscore and betweenscore from the view of miRNAs and diseases to calculate the association probability for miRNA–disease pairs. WBSMDA could be implemented for the diseases without known associated miRNAs. Then, Chen et al. [45] developed the computational model of Heterogeneous Graph Inference for MiRNA–disease association prediction (HGIMDA) by integrating known verified miRNA–disease associations, miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity into a heterogeneous graph. Then they could infer potential association between disease and miRNA by summarizing all paths with the length equal to three in the graph. Compared with previous computational models, HGIMDA model got a better prediction performance and could be effectively applied to new diseases and new miRNAs without any known associations, which overcame the important limitations of many previous computational models.
Additionally, some studies developed machine learningbased computational models to predict potential disease–miRNA associations. For example, according to the assumption that miRNAs associated with specific tumor phenotype would show aberrant regulation of their target genes, Xu et al. [5] proposed an approach based on the miRNA targetdysregulated network (MTDN) to prioritize potential diseases associated miRNAs. Based on the network topology information, some feature measures were extracted for miRNAs in MTDN. Then the authors used support vector machine (SVM) to construct classifier for distinguishing positive miRNA–disease associations from negative associations. Nowadays, by utilizing the network information flow model, Yu et al. [46] developed a combinatorial prioritization algorithm of maximizing network information flow (MaxFlow) to predict microRNA–disease associations based on the microRNAomephenome network. To overcome the negative influence on model prediction performance that resulted from the selection bias of negative samples, Chen et al. [47] developed a computational model of regularized least squares for MiRNA–disease association (RLSMDA). RLSMDA model was implemented in the framework of a semisupervised learning, which meant that it needed no negative samples. Recently, considering that no previous computational methods could predict the types of disease–miRNA associations, Chen et al. [48] developed the model of Restricted Boltzmann machine for multiple types of miRNA–disease association prediction (RBMMMDA). RBMMMDA model could obtain not only new miRNA–disease associations, but also the corresponding association types by employing Restricted Boltzmann machine (RBM). Predicting the different types of disease–miRNA associations could be beneficial for our understanding about the molecular basis of diseases in the level of miRNAs. RBMMMDA model is the first model that could infer association types of miRNA–disease pairs on a large scale.
Before presenting our model, we briefly introduced some information about kernelbased methods. Given a known disease–miRNA association network, kernel based methods could be implemented to predict unknown miRNA–disease interactions, where a kernel could be seen as a similarity matrix of miRNAs or diseases. Kernel based approaches used some base kernels, such as disease semantic similarity or miRNA functional similarity, to measure the similarity between diseases or miRNAs. Then, a pairwise kernel function, which measured the similarity between disease–miRNA pairs, could be calculated by combining a miRNA base kernel and a disease base kernel via kernel product. Multiple kernel learning (MKL) was a machine learning method focusing on the search for an optimal combination of base kernels [49]. However, since traditional MKL methods were based on SVM [49, 50], they were subject to memory limitations imposed by the pairwise kernel function and the difficulty of obtaining negative samples in supervised learning. Kronecker regularized least squares approach (KronRLS) [51] abandoned SVM and took advantage of the algebraic properties of Kronecker product to implement predictions without the explicit calculation of pairwise kernels function. However, KronRLS method could not be conducted to solve multiple kernels situations because it was initially developed to handle single kernel situation.
In this work, we proposed a computational approach named Multiple kernel learningbased Kronecker Regularized least squares for MiRNA–disease association prediction (MKRMDA). To this end, we extended the KronRLS method to a MKL scenario. Our method used L2 regularization to produce a finally optimized nonsparse combination of multiple base kernels, which was then used for the prediction process. Additionally, the proposed method could cope with large disease and miRNA association matrices. Furthermore, we implemented Leaveoneout cross validation (LOOCV) for MKRMDA. As a result, MKRMDA obtained a global AUC value of 0.9040 and a local AUC value of 0.8446, performing better than some previous models mentioned above, such as WBSMDA [44], HDMP [41], RLSMDA [47], HGIMDA [45], MCMDA [43], RKNNMDA [42] and MaxFlow [46]. Moreover, we carried out three different patterns of case studies in this work (more details in part 3.2). As mentioned in abstract, there were high ratios of the predicted miRNAs confirmed in all three ways of case studies by corresponding databases. Therefore, it showed the effectivity of MKRMDA in predicting potential miRNA–disease associations for various categories of diseases.
Methods
Human miRNA–disease associations
Human miRNA–disease associations dataset employed in this work were obtained from the HMDDv2.0 database [52], consisting of 5430 experimentally confirmed human miRNA–diseases associations about 495 miRNAs and 383 human diseases. We adopted the adjacency matrix A to clearly describe the known miRNAs–disease associations. Specifically, if miRNA m(i) was confirmed to be related to disease d(j), the entity A(i,j) was assigned 1, otherwise 0.
MiRNA functional similarity
MiRNA functional similarity has been worked out previously by Wang et al. [35]. In this study, benefitting from their relevant researches, we downloaded the relevant miRNA functional similarity measures information from http://www.cuilab.cn/files/images/cuilab/misim.zip and constructed the corresponding miRNA functional similarity matrix FS, where FS(i,j) was denoted as the functional similarity score between miRNA m(i) and m(j). We got the known miRNA functional similarity about 271 miRNAs in this way. For the rest 224 miRNAs without known functional similarity, we calculated the Gaussian interaction profile kernel similarity, which would be introduced in part 2.5. By integrating the known 271 miRNA similarity entries and the 224 newly calculated Gaussian similarity entries, the miRNA similarity matrix had exact 495 entries for prediction work.
Disease semantic similarity model 1
Based on the disease MeSH descriptor downloaded from the National Library of Medicine (http://www.nlm.nih.gov/), the relationship between different diseases could be represented by a structure of directed acyclic graph (DAG). For an arbitrary disease D, DAG(D) = (D, T(D), E(D)) can be defined to represent the disease D, where T(D) is a node set, consisting of D itself and all its ancestor nodes, E(D) is the corresponding edge set, consisting of the directed edges pointing from parent nodes to child nodes [35]. The semantic value of disease D could be defined as follows:
where \(\Delta\) is the semantic contribution factor. It is obvious that for a given disease D, as the distance between D and another disease, d, increases, the contribution score of d for disease D decreases. In this method, diseases located in the same layer would contribute the same score to the semantic value of disease D. Finally, the semantic similarity between disease d(i) and d(j) can be calculated based on the observation that two diseases with larger common part of their DAGs will have larger similarity score:
where SS1 represents the disease semantic similarity matrix in this model.
Disease semantic similarity model 2
In this calculation method of disease semantic similarity, different from the above method, we assign different contribution value to the diseases in the same layer of DAG(D) out of the consideration that disease which appears in less DAGs contributes to the semantic similarity of disease D at a higher contribution level. So the contribution of disease d in DAG(D) to the semantic value of disease D is defined as follows when nd represent the number of all diseases and \({\text{DAG}}_{\text{t}}\) represents the number of DAGs including t:
Then, the semantic similarity of disease d(i) and d(j) can be calculated as follows:
where SS2 represents the disease semantic similarity matrix in this model.
Gaussian interaction profile kernel similarity
Gaussian kernel function is a kind of widely used radial basis function (RBS), based on which the Gaussian interaction profile kernel similarity could be calculated by taking advantaging of the known miRNA–disease association information. Specifically, by observing whether a disease d(i) is associated with each miRNA or not, binary vector IP(d(i)), the ith column of the adjacency matrix A, could be obtained and denoted as the interaction profiles of disease d(i). Then, Gaussian kernel similarity between disease d(i) and d(j) can be calculated as follows:
where \(r_{d}\) is adopted to control the kernel bandwidth, GD represent Gaussian interaction profile kernel similarity of diseases. In addition, \(r_{d}\) can be obtained by normalizing a new bandwidth parameter \(r^{\prime}_{d}\) by the average number of known associations with miRNAs per disease as follows:
where nd is denoted as the number of all the diseases investigated. In principle, the new bandwidth parameter \(r^{\prime}_{d}\) could be set with crossvalidation, but in this article, \(r^{\prime}_{d}\) was set 1 based on previous studies [53, 54].
Additionally, the construction method of miRNA Gaussian interaction profile kernel similarity matrix, GM, is similar to the calculation of disease Gaussian interaction profile kernel similarity:
where nm is denoted as the number of all the miRNAs investigated.
MKRMDA
With the advance of sequencing technology and biology, more and more reliable biological data about disease and miRNA had been released, including various similarity information about disease and miRNA. If we could efficiently take advantage of the multisource similarity data as more as possible, we could obtain more precise information about disease–miRNA associations. Hence, in this work, we proposed the MKRMDA to predict potential disease associated miRNAs in the situation where multiple kernels were involved, meaning that much more similarity information could be integrated. To this end, at first we briefly introduced the relevant classification algorithm, which could be used in single kernel problem. Given a set of diseases \({\text{D}} = \left\{ {d(1),d(2) \ldots ,d(nd)} \right\}\), a set of miRNAs \({\text{M}} = \left\{ {m(1),m(2) \ldots ,m(nm)} \right\}\), we could obtain a set of training samples \(S = \left\{ {\left( {x_{1} ,y_{1} } \right),\left( {x_{2} ,y_{2} } \right) \ldots \left( {x_{n} ,y_{n} } \right)} \right\}\), \(x_{i}\) represented a disease–miRNA pair, and \(y_{i}\) represented the corresponding binary labels, where 1 stood for a known association and 0 otherwise with \(1 < i \le n,n = nd \times nm\), which meant the number of all disease–miRNA pairs. In our model, if a miRNA–disease pair \(x_{i}\) was a known miRNA–disease association recorded in HMDDv2.0 database, the corresponding \(y_{i}\) was set 1, otherwise 0. Denoting the training set as S, our goal was to learn a function f that could generalize well on new samples, namely new disease–miRNA pairs. Then this problem could be solved based on the closely related (via Lagrange multipliers) Tikhonov minimization problem as follows [55]:
where V was a smooth loss function, \(f_{K}\) was the norm of the prediction function f associated to the kernel K, and λ > 0 was a regularization parameter balancing the prediction error and the complexity of the model. Then considering that we aimed to obtain a function f, which could assign close value for every disease–miRNA pairs compared with their initial values in S, we could use the following simple squareloss function:
Based on the Representer Theorem [56], the solution of Eq. 9 could be written in the following form:
Furthermore, with the fact that \(f_{K}^{2} =\varvec{\alpha}^{T} K\varvec{\alpha}\) [57] we could obtain the classification function for single kernel problem:
Hence if \(\varvec{\alpha}\) could be calculated, the prediction score for all the disease–miRNA pairs in S could be obtained.
In fact, according to previous study [55],α could be obtained by solving a single of system linear equations:
In single kernel situation, we could construct such pairwise kernel K as the Kronecker product of the two base kernels [58]: \(K = K_{D} \otimes K_{M}\). Unfortunately, the Kronecker product kernel directly would involve calculating the inverse of an (nd × nm) × (nd × nm) matrix, which would take O((nd × nm)^{3}) operations. Thus, the size of the base kernel matrix made the model training computationally unfeasible even for moderate number of diseases and miRNAs. Hence, in order to make training process more efficient, we could further take advantage of two specific algebraic properties of the Kronecker product [59] and use the eigendecomposition of the Kronecker product [60] to calculate \(\varvec{\alpha}\).
Let \(K_{D} = Q_{D} \varLambda_{D} Q_{D}^{T}\) and \(K_{M} = Q_{M} \varLambda_{M} Q_{M}^{T}\) be the eigendecomposition of the kernel matrices \(K_{D}\) and \(K_{M}\). Since the eigenvalues (vectors) of a Kronecker product are the Kronecker product of eigenvalues (vectors), for Eq. 13, the solution \(\varvec{\alpha}\) can be calculated by KroneckerRLS method as follows [60]:
where vec(·) stacked the columns of a matrix into a vector, and C was a matrix defined as: \(vec\left( C \right) = \left( {\varLambda_{D} \otimes \varLambda_{M} } \right)\left( {\varLambda_{D} \otimes \varLambda_{M} + \lambda I} \right)^{  1} vec\left( {Q_{M}^{T} Y^{T} Q_{D} } \right)\).
So far, the single kernel problem had been introduced, and the solution, \(\varvec{\alpha }\), could be calculated successfully and efficiently.
Next, we would introduce how MKRMDA could be designed for multiple kernels problem, which meant that MKRMDA could integrate more similarity information about disease and miRNA. It was natural that if we could combine different kernels by an optimized and reasonable way, we could make the best of relevant data information. We considered various base kernels for diseases and miRNAs as \(\varvec{K}_{D} = \left( {K_{D}^{1} ,K_{D}^{2} , \ldots ,K_{D}^{{P_{D} }} } \right)\;{\text{and }}\varvec{K}_{M} = \left( {K_{M}^{1} ,K_{M}^{2} , \ldots ,K_{M}^{{P_{M} }} } \right),P_{D}\) and \(P_{M}\) were the number of base kernels investigated for diseases and miRNAs, respectively. In MKRMDA, different base kernels could be finally combined by a linear function, such as \(K_{D}^{*} \;{\text{and}}\;K_{M}^{*}\):
where \(\varvec{\beta}_{\varvec{D}} = \left\{ {\beta_{D}^{1} ,\beta_{D}^{2} , \ldots \beta_{D}^{{P_{D} }} } \right\}\) and \(\varvec{\beta}_{\varvec{M}} = \left\{ {\beta_{M}^{1} ,\beta_{M}^{2} , \ldots \beta_{M}^{{P_{M} }} } \right\}\) corresponded to the weights of disease and miRNA kernels, respectively. Then \(K_{D}^{*} \;{\text{and}}\;K_{M}^{*}\) could be used as single base kernel for disease and miRNA, which suited for single kernel problem. To obtain optimal \(\varvec{\beta}_{\varvec{D}} \;{\text{and}}\;\varvec{\beta}_{\varvec{M}}\), we used a twostep optimization process [49], in which the optimization of the vector \(\varvec{a}\) was interleaved with the optimization of the kernel weights. Step 1 was that given two initial weight vectors, \(\varvec{\beta}_{\varvec{D}}^{0} \;{\text{and}}\;\varvec{\beta}_{\varvec{M}}^{0}\), an optimal value for the vector \(\varvec{a}\) could be calculated by Eq. 14. Step 2 was that using the optimized \(\varvec{a}\), we could proceed to find optimal \(\varvec{\beta}_{\varvec{D}} \;{\text{and}}\;\varvec{\beta}_{\varvec{M}}\). These two steps were repeated until convergence, resulting in the finally optimal \(K_{D}^{*} \;{\text{and}}\;K_{M}^{*}\) for disease and miRNA, respectively (due to limited space, for further information, see Additional file 1).
As mentioned before, after this twostep optimization process reached the convergence, we obtained the optimized single kernel both for disease and miRNA, \(K_{D}^{*} \;{\text{and}}\;K_{M}^{*}\), then we could make use of these two kernels in single kernel situation introduced before, finally the prediction scores for all disease–miRNAs pairs were generated by MKRMDA (see Fig. 1).
Additionally, in our model, we set the mean of all the base kernels of miRNA and disease as the initial value for the twostep optimization iterative process, which was employed to further calculate the optimal kernel weights for multiple kernels involved as mentioned above. The mean disease kernel was computed as \(K_{D}^{*} = 1/P_{D} \mathop \sum \nolimits_{i = 1}^{{P_{D} }} K_{D}^{i}\), and the same could be done for miRNAs, analogously. In addition, the λ parameter was evaluated in the interval \(\left\{ {2^{  15} ,2^{  10} , \ldots ,2^{30} } \right\}\). The σ regularization coefficient was also optimized in the interval \(\left\{ {0,0.25,0.5,0.75,1} \right\}\).
Results
Cross validation
LOOCV was often implemented to evaluate the performance of prediction model. In this work, we conducted LOOCV in two different ways: global and local LOOCV. Like the meaning of ‘local’, local LOOCV was implemented as follows: firstly, we chose a disease, then each known miRNA associated with this chosen disease was left out in turn as test sample and the other associated miRNAs were used as seed samples, thirdly each time we ranked the predicted association probability of current test sample with the candidate samples, which were the miRNAs without known association with the chosen disease. If the rank of the test miRNA exceeded the given threshold, the model was considered to successfully predict this miRNA–disease association. While, global LOOCV was implemented in a different way: firstly, we considered all the diseases simultaneously, which meant that each time the known disease–miRNA associations in HMDD v2.0 was left out in turn as test sample. Then all the other associations were set as seed samples and all the unknown associations were considered as candidate samples. Thirdly, same as local method, if the rank of test association exceeded the given threshold, the model was considered to successfully predict this association.
Furthermore, receiveroperating characteristics (ROC) curve was drawn by plotting true positive rate (TPR, sensitivity) against false positive rate (FPR, 1specificity) at different thresholds. Specifically, sensitivity was denoted as the percentage of the correctly identified positive samples among all the positives. Meanwhile, specificity was denoted as the percentage of negative miRNA–disease pairs ranked below the threshold among all negatives. Furthermore, the predictive performance of MKRMDA could be evaluated by calculating the area under ROC curve (AUC). Specifically, AUC = 1 meant the perfect predictive performance of the model, and AUC = 0.5 indicated a random performance.
Figure 2 showed the performance comparisons of the global and local LOOCV results between several computational models. As shown in the figure, MKRMDA, HGIMDA, RLSMDA, HDMP, WBSMDA, MCMDA, RKNNMDA obtained AUCs of 0.9040, 0.8781, 0.8426, 0.8366, 0.8030, 0.8749 and 0.7159 in the global LOOCV, respectively. For the local LOOCV, MKRMDA, HGIMDA, RLSMDA, HDMP, WBSMDA, RWRMDA, MCMDA and RKNNMDA obtained AUCs of 0.8446, 0.8077, 0.6953, 0.7702, 0.8031, 0.7891, 0.7718 and 0.8221, respectively. The MaxFlow model obtained AUC of 0.8693 according to their paper, was also a little lower than MKRMDA’s. RWRMDA model could not implement global LOOCV because this model could not be implemented for all the diseases simultaneously. Additionally, RBMMMDA [48] was not included in the comparison with MKRMDA because the result of RBMMMDA were the corresponding association types between miRNAs and diseases, which were different from the input and output of our algorithm. As a result, MKRMDA had shown excellent and reliable prediction performance. We thought that MKRMDA may provide potential reference value for miRNA–disease association predictive experiments.
In addition, we also adopted fivefold cross validation for prediction evaluation, which was conducted in this way: all the known miRNA–disease associations were randomly divided into 5 groups with equal sizes, then each of the 5 groups was set as test samples and the other groups as training samples. Hence, when a group test samples was chosen, MKRMDA would be implemented and the prediction scores of every test sample in this group would be compared with the scores of candidate miRNAs. To reduce the possible impact caused by random divisions in the process of obtaining test samples, fivefold cross validation was conducted 100 times. Finally, MKRMDA achieved reliable performance with AUC of 0.8894 ± 0.0015, higher than those generated by other models, such as RLSMDA: 0.8569 ± 0.0020; HDMP: 0.8342 ± 0.0010; WBSMDA: 0.8185 ± 0.0009 MCMDA: 0.8767 ± 0.0011; RKNNMDA: 0.6723 ± 0.0027.
Case studies
MKRMDA had been applied to predict potential miRNA–disease associations for all the diseases investigated in this paper. To further demonstrate the prediction ability of MKRMDA, as mentioned before, three ways of case studies were carried out. Case studies on colonic cancer, esophageal cancer and lymphoma were implemented in the first way of case study, in which the disease–miRNA associations recorded in HMDDv2.0 [52] were used as training samples and miRNAs without known associations with currently considered diseases were regarded as test samples. After MKRMDA was implemented, we verified the top 50 miRNAs predicted to be associated with corresponding disease based on the experimental associations recorded in miR2Disease [61] and dbDEMC database [62].
Colonic cancer is a complex disease in which cancer cells form in the tissues of the colon, and colonic cancer is reported to the second leading cause of cancer death in the United States with the 5 year survival rates of 65% in the United States [63]. As many colonic cancers arise from adenomatous polyps without obvious symptoms, screening test for this cancer is effective not only for early detection but also for prevention. Additionally, with the rapid development of highthroughput sequencing technologies, researchers have identified many miRNAs associated with colonic cancer. For example, miR141 and miR200b were confirmed to be highly overexpressed in colonic cancer [64]. In the case study for colonic cancer, candidate miRNAs were prioritized according to the scores obtained from MKRMDA, as a result, 38 out of top 50 were confirmed by recent experimental results in miR2Disease and dbDEMC (see Table 1). For example, miR183, highly ranked and confirmed by miR2Disease and dbDEMC databases simultaneously, was significantly deregulated in colorectal cancer cells [65].
Esophageal cancer is the eighth common cancer worldwide and is one of the deadliest cancers worldwide because of its extremely aggressive nature and poor survival rate [66]. The overall 5year survival of esophageal cancer ranges from 15 to 25% [67, 68]. There is research suggesting that the survival rate could increase to 90% if the tumors could be diagnosed at an early stage [69]. Therefore, the early detection is vital for timely treatment of esophageal cancers [70]. Many miRNAs have been reported to be related with esophageal cancers. For example, by posttranscriptionally regulating enhancer of zestehomolog 2, miR214 and miR98 could suppress migration and invasion in human esophageal squamous cell carcinoma [71]. As mentioned before, in the first way of case study for esophageal cancer, 47 out of top 50 predicted miRNAs for esophageal cancer were confirmed by at least one of miR2Disease and dbDEMC databases (see Table 2).
Lymphoma is a group of blood cell tumors that develop from lymphocytes and lymphoma most often spreads to the lungs, liver, and brain. The two main types of lymphoma are Hodgkin lymphoma and nonHodgkin lymphoma (NHL) [72]. Meanwhile, lymphomas, including HL and NHL, are reported as the seventhmost lethality cancers worldwide and lymphomas are also the thirdmost common cancer in children [72]. However, lymphomas may be curable if detected in early stages with modern treatment. Recent experimental research found that miR175p showed an increased expression level compared with normal canine peripheral blood mononuclear cells and normal lymph nodes (LN). In the case study on lymphoma, for the top 50 predicted lymphomaassociated miRNAs ranked by MKRMDA, we had 44 associations confirmed by experimental literature evidences (see Table 3).
These 3 cancers were chosen mainly because these 3 cancers (included some other cancers) were very important and these cancers were often taken as case studies in many computational models such as HGIMDA (colonic cancer, esophageal cancer), RKNNMDA (colonic cancer, esophageal cancer), MCMDA (colonic cancer, lymphoma) and so on. What’s more, we also compared the confirmed case studies results of HGIMDA and RKNNMDA on previously mentioned three cancers for the top 50 predicted miRNAs (see Additional file 1). We chose these two models because they were ranked first in the models whose performance were compared with our computational model in the global LOOCV and local LOOCV, respectively.
In addition, we conducted case study of hepatocellular carcinoma (HCC) in the second way, in which we removed all the related miRNAs information of HCC to model the situation where a new disease without known miRNA associations was investigated. Then we verified the prediction results of HCC with HMDD v2.0 database, miR2Disease, and dbDEMC database. Hepatocellular carcinoma is the most common type of liver cancer. Meanwhile, HCC is the sixth most prevalent cancer and the third most frequent cause of cancerrelated death [73]. More than 30 miRNAs have been validated to be related to the development of HCC in the gold standard dataset. For example, the expression of miR125a and miR99b were quite lower in HCC compared to normal liver [74]. MiR122a was a liverspecific miRNA and it was frequently downregulated in HCC [75]. Among the top 50 predicted potential HCCrelated miRNAs, there were 44 miRNAs confirmed by aforementioned various databases, i.e. HMDDv2.0, miR2Disease and dbDEMC database (see Table 4). For example, miR21, which was ranked first in the top 50 predicted miRNAs, had been reported to be upregulated in patients with HCC and it had strong potential to serve as novel biomarker for liver injury [76].
Furthermore, to test the robustness of MKRMDA, we presented case study for breast cancer in the third way, in which we only used the known disease–miRNA associations in HMDDv1.0 database as training samples and used associations in HMDD v2.0 database, miR2Disease, and dbDEMC database as test datasets. Breast cancer is currently reported as the deadliest cancer in women, accounting for 25% of all cancer caused death cases [72]. Specifically, breast cancer is more common in developed countries and is about 100 times more common in women than in men. The majority deaths of the breast cancer come from the developing countries, where most of the women are diagnosed in late stages [77]. There are about 176 miRNAs known to be related to the breast cancer in the golden standard dataset. For example, miR122 was downregulated in breast cancer cells, while, the expression levels of miR10b and miR21 were reported significantly increased in the CSF (cerebrospinal fluid) of patients with breast cancer, compared with patients in nonneoplastic conditions [78, 79]. We implemented MKRMDA to prioritize candidate miRNAs without the known associations with breast cancer in HMDDv1.0. As a result, among the top 50 potential breast cancerrelated miRNAs, there were 47 associations which have been verified by known miRNA–disease associations in at least one of HMDD v2.0 database, miR2Disease, and dbDEMC database (see Table 5).
In conclusion, the promising results obtained from LOOCV, fivefold cross validation and case studies in three different ways had demonstrated the reliable prediction performance of MKRMDA. Therefore, we further prioritized all the candidate miRNAs for all the diseases recorded in HMDD v2.0 database. The predicted ranks of miRNAs for each disease were publicly released for further experimental validation (see Additional file 2). A higher prediction score meant a higher association probability of the corresponding disease and miRNA. While, we had to point out that the negative scores did not mean that the relevant miRNA and disease were negatively correlated. Our case studies focus on the top prediction scores, which generally were all positive. The potential disease–miRNA associations with relatively high ranks were expected to be confirmed by biological experiments and clinical observation in the future.
Discussion
The excellent and reliable prediction performance of MKRMDA could largely be owed to the following several factors. Firstly, the known experimentally confirmed disease–miRNA associations in HMDDv2.0, which we used as training samples in the prediction process, were abundant and reliable. Secondly, MKRMDA fully took advantage of heterogeneous datasets (known disease–miRNA associations, miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity for miRNAs and diseases) to predict the potential associations. Thirdly, MKRMDA used a twostep optimization process to automatically optimize the combination of the involved multiple kernels in the prediction progress, which significantly improved the prediction performance. Additionally, MKRMDA conquered the memory limitation difficulty by using some algebraic properties of Kronecker product. All in all, MKRMDA could handle data from different resources by twostep optimal decision for automatically combining them to fully take use of them for biology research or multisource data fusion research.
Of course, MKRMDA also needs to be improved in the future for the reasons as follows: first, MKRMDA was developed mainly based on the assumption that functionally similar miRNAs were more likely to have associations with phenotypically similar diseases, which might cause bias to miRNAs with more known associated diseases. Furthermore, how to appropriately choose proper values for the parameters involved in the model of MKRMDA from the alternative values need to be further solved. In addition, in the optimization iterative procedure, the method used to set initial values might also be opportunely improved to get more reliable prediction result.
Conclusion
Identifying novel miRNA–disease associations is a vitally important goal of biological development, and it also plays a critical role in the understanding of disease pathogenesis at the miRNA level. In this paper, we proposed the computational method, MKRMDA, to predict potential diseases related miRNAs. The performance of MKRMDA was evaluated by implementing LOOCV and fivefold cross validation based on the known experimentally verified miRNA–disease associations. The AUC scores, 0.9040 in global LOOCV and 0.8446 in local LOOCV, demonstrated the reliable and effective performance of MKRMDA. Moreover, we implemented three different kinds of case studies for further evaluations. As mentioned before, in the first case study, 38, 47, and 44 out of top 50 predicted miRNAs for colonic cancer, esophageal cancer, and lymphoma were verified by recent experimental reports, respectively. In the second and third way of case study for hepatocellular carcinoma and breast cancer, 44 and 47 out of top 50 predicted miRNAs were verified by recent experimental researches, respectively. All of these showed the reliable performance of MKRMDA. It was anticipated that MKRMDA could be an important and valuable computational tool for miRNA–disease association prediction and miRNA biomarker identification for human disease diagnosis, treatment, prognosis and prevention. In addition, MKRMDA was well suited for research situations where abundant kernelrelated data from different resources was provided, especially when researchers expected to find an appropriate and optimal method to combine the different types of relevant data for the best use of them. All the abovementioned results sufficiently showed the reliability of MKRMDA in predicting potential disease–miRNA associations. MKRMDA was hoped to be helpful for miRNA–disease association prediction and relevant miRNA research from the perspective of computational biology.
Abbreviations
 MiRNA:

microRNA
 LOOCV:

leaveoneout cross validation
 fivefold CV:

fivefold cross validation
 ROC:

receiveroperating characteristics curve
 AUC:

the area under ROC curve
References
Ambros V. The functions of animal microRNAs. Nature. 2004;431:350–5.
Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–97.
Meister G, Tuschl T. Mechanisms of gene silencing by doublestranded RNA. Nature. 2004;431:343–9.
Ambros V. microRNAs: tiny regulators with great potential. Cell. 2001;107:823–6.
Xu J, Li CX, Lv JY, Li YS, Xiao Y, Shao TT, Huo X, Li X, Zou Y, Han QL, et al. Prioritizing candidate disease miRNAs by topological features in the miRNA targetdysregulated network: case study of prostate cancer. Mol Cancer Ther. 2011;10:1857–66.
Kong W, He L, Richards EJ, Challa S, Xu CX, PermuthWey J, Lancaster JM, Coppola D, Sellers TA, Djeu JY, Cheng JQ. Upregulation of miRNA155 promotes tumour angiogenesis by targeting VHL and is associated with poor prognosis and triplenegative breast cancer. Oncogene. 2014;33:679–89.
Jopling CL, Yi M, Lancaster AM, Lemon SM, Sarnow P. Modulation of hepatitis C virus RNA abundance by a liverspecific MicroRNA. Science. 2005;309:1577–81.
Vasudevan S, Tong Y, Steitz JA. Switching from repression to activation: microRNAs can upregulate translation. Science. 2007;318:1931–4.
Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin4 encodes small RNAs with antisense complementarity to lin14. Cell. 1993;75:843–54.
Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie AE, Horvitz HR, Ruvkun G. The 21nucleotide let7 RNA regulates developmental timing in Caenorhabditis elegans. Nature. 2000;403:901–6.
Pasquinelli AE, Ruvkun G. Control of developmental timing by micrornas and their targets. Annu Rev Cell Dev Biol. 2002;18:495–513.
Bandyopadhyay S, Mitra R, Maulik U, Zhang MQ. Development of the human cancer microRNA network. Silence. 2010;1:6.
Kozomara A, GriffithsJones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014;42:D68–73.
Cheng AM, Byrom MW, Shelton J, Ford LP. Antisense inhibition of human miRNAs and indications for an involvement of miRNA in cell growth and apoptosis. Nucleic Acids Res. 2005;33:1290–7.
Karp X, Ambros V. Developmental biology. encountering microRNAs in cell fate signaling. Science. 2005;310:1288–9.
Miska EA. How microRNAs control cell division, differentiation and death. Curr Opin Genet Dev. 2005;15:563–8.
Xu P, Guo M, Hay BA. MicroRNAs and the regulation of cell death. Trends Genet. 2004;20:617–24.
Alshalalfa M, Alhajj R. Using contextspecific effect of miRNAs to identify functional associations between miRNAs and gene signatures. BMC Bioinform. 2013;14(Suppl 12):S1.
Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136:215–33.
EsquelaKerscher A, Slack FJ. Oncomirs—microRNAs with a role in cancer. Nat Rev Cancer. 2006;6:259–69.
Latronico MV, Catalucci D, Condorelli G. Emerging role of microRNAs in cardiovascular biology. Circ Res. 2007;101:1225–36.
Lu M, Zhang Q, Deng M, Miao J, Guo Y, Gao W, Cui Q. An analysis of human microRNA and disease associations. PLoS ONE. 2008;3:e3420.
Birks DK, Barton VN, Donson AM, Handler MH, Vibhakar R, Foreman NK. Survey of MicroRNA expression in pediatric brain tumors. Pediatr Blood Cancer. 2011;56:211–6.
Li S, Xing X, Yang Q, Xu H, He J, Chen Z, Zhu H. The effects of hepatitis C virus core protein on the expression of miR122 in vitro. Virol J. 2013;10:98.
Lee YH, Na HS, Jeong SY, Jeong SH, Park HR, Chung J. Comparison of inflammatory microRNA expression in healthy and periodontitis tissues. Biocell. 2011;35:43–9.
Pfister S, Remke M, Castoldi M, Bai AH, Muckenthaler MU, Kulozik A, von Deimling A, Pscherer A, Lichter P, Korshunov A. Novel genomic amplification targeting the microRNA cluster at 19q13.42 in a pediatric embryonal tumor with abundant neuropil and true rosettes. Acta Neuropathol. 2009;117:457–64.
Jiang Q, Hao Y, Wang G, Juan L, Zhang T, Teng M, Liu Y, Wang Y. Prioritization of disease microRNAs through a human phenomemicroRNAome network. BMC Syst Biol. 2010;4(Suppl 1):S2.
Calin GA, Croce CM. MicroRNA signatures in human cancers. Nat Rev Cancer. 2006;6:857–66.
Cho WC. MicroRNAs: potential biomarkers for cancer diagnosis, prognosis and targets for therapy. Int J Biochem Cell Biol. 2010;42:1273–81.
Tricoli JV, Jacobson JW. MicroRNA: potential for cancer detection, diagnosis, and prognosis. Cancer Res. 2007;67:4553–5.
Chen X, Yan CC, Zhang X, You ZH. Long noncoding RNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2017;18(4):558–76.
Chen X, Yan CC, Zhang X, Zhang X, Dai F, Yin J, Zhang Y. Drugtarget interaction prediction: databases, web servers and computational models. Brief Bioinform. 2016;17:696–712.
PerezIratxeta C, Wjst M, Bork P, Andrade MA. G2D: a tool for mining genes associated with disease. BMC Genet. 2005;6:45.
PerezIratxeta C, Bork P, Andrade MA. Association of genes to genetically inherited diseases using data mining. Nat Genet. 2002;31:316–9.
Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA–associated diseases. Bioinformatics. 2010;26:1644–50.
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL. The human disease network. Proc Natl Acad Sci USA. 2007;104:8685–90.
Pasquier C, Gardes J. Prediction of miRNA–disease associations with a vector space model. Sci Rep. 2016;6:27036.
Shi H, Xu J, Zhang G, Xu L, Li C, Wang L, Zhao Z, Jiang W, Guo Z, Li X. Walking the interactome to identify human miRNA–disease associations through the functional link between miRNA targets and disease genes. BMC Syst Biol. 2013;7:101.
Mork S, PletscherFrankild S, Palleja Caro A, Gorodkin J, Jensen LJ. Proteindriven inference of miRNA–disease associations. Bioinformatics. 2014;30:392–7.
Chen X, Liu MX, Yan GY. RWRMDA: predicting novel human microRNA–disease associations. Mol BioSyst. 2012;8:2792–8.
Xuan P, Han K, Guo M, Guo Y, Li J, Ding J, Liu Y, Dai Q, Li J, Teng Z, Huang Y. Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors. PLoS ONE. 2013;8:e70204.
Chen X, Wu QF, Yan GY. RKNNMDA: rankingbased KNN for MiRNA–disease association prediction. RNA Biol. 2017;14:952–62.
Li JQ, Rong ZH, Chen X, Yan GY, You ZH. MCMDA: matrix completion for MiRNA–disease association prediction. Oncotarget. 2017;8(13):21187–99.
Chen X, Yan CC, Zhang X, You ZH, Deng L, Liu Y, Zhang Y, Dai Q. WBSMDA: within and between score for MiRNA–disease association prediction. Sci Rep. 2016;6:21106.
Chen X, Clarence Yan C, Zhang X, You ZH, Huang YA, Yan GY. HGIMDA: heterogeneous graph inference for miRNA–disease association prediction. Oncotarget. 2016;7:65257–69.
Yu H, Chen X, Lu L. Largescale prediction of microRNA–disease associations by combinatorial prioritization algorithm. Sci Rep. 2017;7:43792.
Chen X, Yan GY. Semisupervised learning for potential human microRNA–disease associations inference. Sci Rep. 2014;4:5501.
Chen X, Yan CC, Zhang X, Li Z, Deng L, Zhang Y, Dai Q. RBMMMDA: predicting multiple types of disease–microRNA associations. Sci Rep. 2015;5:13877.
Nen M, Alpay D, Ethem N. Multiple kernel learning algorithms. J Mach Learn Res. 2011;12:2211–68.
Ammaduddin M, Georgii E, Gonen M, Laitinen T, Kallioniemi O, Wennerberg K, Poso A, Kaski S. Integrative and personalized QSAR analysis in cancer by kernelized Bayesian matrix factorization. J Chem Inf Model. 2014;54:2347–59.
van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics. 2011;27:3036–43.
Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T, Cui Q. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014;42:D1070–4.
Chen X, Ren B, Chen M, Wang Q, Zhang L, Yan G. NLLSS: predicting synergistic drug combinations based on semisupervised learning. PLoS Comput Biol. 2016;12:e1004975.
Chen X, Yan GY. Novel human lncRNA–disease association inference based on lncRNA expression profiles. Bioinformatics. 2013;29:2617–24.
Rifkin R, Yeo G, Poggio T. Regularized leastsquares classification. Acta Electronica Sinica. 2003;190:93–104.
Kimeldorf G, Wahba G. Some results on Tchebycheffian spline functions. J Math Anal Appl. 1971;33:82–95.
Hue M, Riffle M, Vert JP, Noble WS. Largescale prediction of protein–protein interactions from structures. BMC Bioinform. 2010;11:144.
Yamanishi Y. Chemogenomic approaches to infer drug–target interaction networks. Methods Mol Biol. 2013;939:97–113.
Nascimento AC, Prudencio RB, Costa IG. A multiple kernel learning algorithm for drug–target interaction prediction. BMC Bioinform. 2016;17:46.
Pahikkala T, Airola A, Pietila S, Shakyawar S, Szwajda A, Tang J, Aittokallio T. Toward more realistic drug–target interaction predictions. Brief Bioinform. 2015;16:325–37.
Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37:D98–104.
Yang Z, Ren F, Liu C, He S, Sun G, Gao Q, Yao L, Zhang Y, Miao R, Cao Y, et al. dbDEMC: a database of differentially expressed miRNAs in human cancers. BMC Genom. 2010;11(Suppl 4):S5.
BibbinsDomingo K, Grossman DC, Curry SJ, Davidson KW, Epling JW Jr, Garcia FA, Gillman MW, Harper DM, Kemper AR, Krist AH, et al. Screening for colorectal cancer: US preventive services task force recommendation statement. JAMA. 2016;315:2564–75.
Cahill S, Smyth P, Denning K, Flavin R, Li J, Potratz A, Guenther SM, Henfrey R, O’Leary JJ, Sheils O. Effect of BRAFV600E mutation on transcription and posttranscriptional regulation in a papillary thyroid carcinoma model. Mol Cancer. 2007;6:21.
Bandres E, Cubedo E, Agirre X, Malumbres R, Zarate R, Ramirez N, Abajo A, Navarro A, Moreno I, Monzo M, GarciaFoncillas J. Identification by realtime PCR of 13 mature microRNAs differentially expressed in colorectal cancer and nontumoral tissues. Mol Cancer. 2006;5:29.
Mao WM, Zheng WH, Ling ZQ. Epidemiologic risk factors for esophageal cancer development. Asian Pac J Cancer Prev. 2011;12:2461–6.
OgataKawata H, Izumiya M, Kurioka D, Honma Y, Yamada Y, Furuta K, Gunji T, Ohta H, Okamoto H, Sonoda H, et al. Circulating exosomal microRNAs as biomarkers of colon cancer. PLoS ONE. 2014;9:e92921.
Enzinger PC, Mayer RJ. Esophageal cancer. N Engl J Med. 2003;349:2241–52.
Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. Global cancer statistics. CA Cancer J Clin. 2011;61:69–90.
Drusco A, Nuovo GJ, Zanesi N, Di Leva G, Pichiorri F, Volinia S, Fernandez C, Antenucci A, Costinean S, Bottoni A, et al. MicroRNA profiles discriminate among colon cancer metastasis. PLoS ONE. 2014;9:e96670.
Guo C, Sah JF, Beard L, Willson JK, Markowitz SD, Guda K. The noncoding RNA, miR126, suppresses the growth of neoplastic cells by targeting phosphatidylinositol 3kinase signaling and is frequently lost in colon cancers. Genes Chromosomes Cancer. 2008;47:939–46.
McGuire S. World Cancer Report. Geneva, Switzerland: World Health Organization, International Agency for Research on Cancer, WHO Press, 2015. Adv Nutr. 2014;2016(7):418–9.
Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer. 2010;127:2893–917.
Feitelson MA, Lee J. Hepatitis B virus integration, fragile sites, and hepatocarcinogenesis. Cancer Lett. 2007;252:157–70.
Diao S, Zhang JF, Wang H, He ML, Lin MC, Chen Y, Kung HF. Proteomic identification of microRNA122a target proteins in hepatocellular carcinoma. Proteomics. 2010;10:3723–31.
Xu J, Wu C, Che X, Wang L, Yu D, Zhang T, Huang L, Li H, Tan W, Wang C, Lin D. Circulating microRNAs, miR21, miR122, and miR223, in patients with hepatocellular carcinoma or chronic hepatitis. Mol Carcinog. 2011;50:136–42.
Kelsey JL, HornRoss PL. Breast cancer: magnitude of the problem and descriptive epidemiology. Epidemiol Rev. 1993;15:7–16.
Liu Y, Zhao J, Zhang PY, Zhang Y, Sun SY, Yu SY, Xi QS. MicroRNA10b targets Ecadherin and modulates breast cancer metastasis. Med Sci Monit. 2012;18:Br299–308.
Wang B, Wang H, Yang Z. MiR122 inhibits cell proliferation and tumorigenesis of breast cancer by targeting IGF1R. PLoS ONE. 2012;7:e47053.
Authors’ contributions
XC conceived the project, developed the prediction method, designed and implemented the experiments, analyzed the result, and wrote the paper. YWN implemented the experiments, analyzed the result, and wrote the paper. GYY and GHW analyzed the result. All authors read and approved the final manuscript.
Acknowledgements
We thank anonymous reviewers for very valuable suggestions.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
The MKRMDA codes and datasets used in the work are freely available at http://www.escience.cn/system/file?fileId=91140. We also provide the MKRMDA codes and datasets as Additional file 3.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Funding
XC was supported by National Natural Science Foundation of China under Grant Nos. 61772531 and 11631014. GHW was supported by National Natural Science Foundation of China under Grant Nos. 11471193 and 11631014, the Foundation for Distinguished Young Scholars of Shandong Province No. JQ201501, the Fundamental Research Funds of Shandong University and Independent Innovation Foundation of Shandong University. GYY was supported by National Natural Science Foundation of China under Grant Nos. 11371355 and 11631014.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Authors and Affiliations
Corresponding authors
Additional files
12967_2017_1340_MOESM1_ESM.docx
Additional file 1. Additional information about the multiple kernel learning method, twostep optimization process and the case studies comparison with HGIMDA and RKNNMDA.
12967_2017_1340_MOESM2_ESM.xlsx
Additional file 2. We further applied MKRMDA to predict candidate miRNAs for all the diseases involved in HMDDv2.0. Prediction results were publicly released for further research and experimental validation.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Chen, X., Niu, YW., Wang, GH. et al. MKRMDA: multiple kernel learningbased Kronecker regularized least squares for MiRNA–disease association prediction. J Transl Med 15, 251 (2017). https://doi.org/10.1186/s1296701713403
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1296701713403