Skip to main content
Fig. 3 | Journal of Translational Medicine

Fig. 3

From: Ensemble learning model for identifying the hallmark genes of NFκB/TNF signaling pathway in cancers

Fig. 3

Oncogenicity and prognostic impact of the identified genes. a Enrichment analysis of oncogenic genes (C6) for each cancer and pan-cancer model's identified (voted) genes. Z-scores were calculated based on 10,000 random permutations generated in the GSEA process, wherein genes were ranked by votes to hit oncogenic genes in the C6 gene set. b Positive correlation between the proportion of oncogenic genes and average votes. The x-axis denotes the average votes of predicted genes across 16 cancer types. Each point represents the proportion of oncogenic genes with an average vote equal to or greater than the corresponding threshold on the x-axis. The red and gray circles indicate significant and insignificant enrichment, respectively, as determined by Fisher's exact test. c Enrichment analysis of pan-cancer poor-prognostic genes for identified genes of pan-cancer model. Herein, the identified genes were ranked by their average votes in the pan-cancer model to hit the pan-cancer poor-prognostic genes. Z-score and p-value were calculated from 10,000 random permutations of genes’ average vote. The number of hits in the poor-prognostic gene set is denoted as 'hit,' while 'miss' represents the number of the identified genes without hits. To note, among the 4678 identified genes in the pan-cancer model, only 4022 genes are provided with a pre-calculated hazard ratio (the exponential regression coefficient) in the Cox regression model from our previous study. d Kaplan–Meier plots (KM plot) of 5-year survival based on risk score calculated from the identified genes. The risk scores were calculated by combining the identified genes' expression levels with their coefficients in a pre-trained pan-cancer Cox-regression model. We classified patients into low and high-risk groups based on the median risk score. Kaplan–Meier plots were generated for both the internal (cancer types included in our ensemble learning model) and independent (cancer types not included in our ensemble learning model) datasets to evaluate the impact of the identified genes on patient survival. The number of patients in each group is indicated by "n" in the plots. e Risk score of patients in different stages. This figure illustrates the progression of patients’ risk scores across different pathologic stages. It exhibits a trend where the risk scores increase sequentially from stage I to stage IV: stage I < stage II < stage III <  < stage IV

Back to article page