Skip to main content
Fig. 1 | Journal of Translational Medicine

Fig. 1

From: Ensemble learning model for identifying the hallmark genes of NFκB/TNF signaling pathway in cancers

Fig. 1

Performance of the ensemble learning model. a Overview of the model workflow. RNA-Seq data from patients diagnosed with 16 different types of cancers are used as features, while the NFκB/TNF hallmark gene sets are used as positive samples. The bar chart displays the number of patients per cancer type in descending order, and the pie chart represents the proportion of each gene subset. We trained 1000 member classifiers of linear SVM with an NP ratio of 20 to construct the final ensemble learning model for each cancer type. Finally, we applied the majority voting method that sums up the predictions from each member classifier to determine the tested genes' confidence. b Median precision for testing and initial data across all cancer types. The upper panel displays the median precision of the testing data (grey dashed line) and the initial data (red dashed line) at different NP ratios. At the NP ratio of 20, the median precision of the initial data surpasses 0.5, meeting the minimum requirements of a weak classifier. The lower histogram depicts the distribution of precision values from the initial (red) and testing (grey) data at the NP ratio of 20. c The area under the receiver operating characteristic curve (AUC) of the proposed ensemble model and the conventional correlation approach for each cancer type. The AUC values were calculated based on the false positive rate (FPR) and the true positive rate (TPR) obtained from the prediction of the 198 NFκB/TNF hallmark genes. d The receiver operating characteristic curve (ROC) for the pan-cancer ensemble model (upper panel) and conventional correlation approach (lower panel). e Distribution of the average votes of genes in each cancer prevalence. The average votes (Avg. vote) were calculated only from the cancer types that voted on the tested gene rather than all 16 cancers. The cancer prevalence is the number of cancer types in which the tested gene received votes. The median of average votes positively correlates with cancer prevalence. The data points with and without blue borders are the NFκB/TNF hallmark and non-NFκB/TNF hallmark genes, respectively. The genes with zero average votes were excluded here

Back to article page