Identification of prognostic molecular subtypes through comprehensive analysis of DNA methylation, CNVs and transcriptome
To identify the prognostic molecular subtypes of cervical cancer, genes showing PCGs, CNV and methylation with prognostic significance were screened based on univariate Cox proportional risk model, with a threshold of p < 0.05. Finally, 3897 genes, 4897 CNV regions and 20,144 CpG sites with significant prognostic association were obtained. Next, iClusterPlus was used for multi-omics cluster analysis, and identified three molecular subtypes, which were Cluster1 (N = 77), Cluster2 (N = 142) and Cluster3 (N = 73). The results of multi-omics clustering were compared with those obtained by separate hierarchical clustering (Fig. S1A), and the results showed that the three molecular subtypes determined by multi-omics clustering had some consistency with those obtained by single-omics clustering. For example, the methylation clustering results of Cluster1 and Cluster2 subgroups overlapped with those of multi-omics patients with Cluster1, suggesting that the molecular subtypes identified by multiple histologies combining the molecular characteristics of multiple histological data are richer in terms of information dimensions than those of individual histology, and that no one histology could reproduce the molecular subtypes identified by multiple histologies alone. The three subtypes had significant prognostic differences (p = 0.0047) (Fig. 1a), among them, Cluster1 showed the most favorable prognosis, while the Cluster3 had the worst prognosis. There was no significant difference in prognosis between the Cluster1 and Cluster2 groups (Fig. 1c), but the prognosis of Cluster3 was greatly worse than that of Cluster1 (p = 0.00247) and Cluster2 (p = 0.01297) (Fig. 1d, e). A total of 16 were obtained by the detection of the top10 high-frequency mutated genes of each subtype (Fig. 1b). The distribution of these genes was similar in all three subtypes, suggesting a high degree of consistency in the genes associated with the most common mutations in the three subtypes. In addition, TTN, PIK3CA, KMT2C, and MUC4 with higher mutation rates may suggest that these genes play a key role in canceration. The findings here indicated that subtype classification based on PCGs, methylation, and CNVs could predict the prognosis of patients with cervical cancer and have certain regulatory relationships at the levels of genome, epigenome, and transcription.
Distribution of differentially expressed lncRNAs and PCGs in the three subtypes
The differentially expressed lncRNAs and PCGs in three subtypes were determined. We found the small number of differentially expressed lncRNAs in both Cluster2 samples and Cluster3 samples. A total of 617 lncRNAs and 1395 PCGs were obtained from the three subtypes (Additional file 1: Table S1). The volcano map data of the differential expressions of the lncRNAs showed that in the three subtypes the number of up-regulated lncRNAs was generally smaller than those down-regulated (Fig. 2a-c). Moreover, the numbers of differentially expressed PCGs was much larger than that of lncRNAs (Fig. 2d). Finally, 584 lncRNAs closely related to cervical cancer were downloaded from LncRNADisease and Lnc2Cancer databases. We found that there was an obvious intersection between differentially expressed lncRNAs in the three subtypes and cervical cancer-related lncRNAs (p < 0.0001) (Fig. 2g). Those data suggested that lncRNAs may play an important role in the heterogeneity and progression of cervical cancer.
Identification of differentially co-expressed lncRNAs-PCGs and functional modules for lncRNAs enrichment in the three subtypes
To identify differentially co-expressed lncRNA-PCGs, based on the differential expression profiles of lncRNAs and PCGs, outlier samples were removed by hierarchical clustering analysis. The linkage method was chosen as the complete method, and the clustering distance was calculated by Euclidean clustering, and those clustering distances exceeding five times the standard deviation (80,572.36) were considered as outliers and were rejected. We finally obtained a total of 307 samples (Fig. 3a). WGCNA was conducted for building expression network, with β = 3 (scale-free R 2 = 0.92) of power as a Soft Thresholding to ensure a scale-free network (Fig. 3b, c). Here, a total of 26 modules were screened (Fig. 3d). The numbers of lncRNAs and PCGs in each module are shown in Additional file 2: Table S2. By analyzing the enrichment level of lncRNAs in each module, the number of lncRNAs and PCGs in each module was first counted, then the significance of lncRNA enrichment in each module was calculated using Fisher exact test with all DE-lncRNAs and DE-PCGs as background. The following four modules with significant enrichment of lncRNAs were identified: green, pink, magenta, and darkgreen (Fig. 3e). The KEGG pathway enrichment analysis of the PCGs in the four modules showed that the PCGs in these four modules were enriched to a total of 41 KEGG pathways (Fig. 4a). The pathways enriched by the four modules each showed limited intersection, and they tended to enrich to different pathways, suggesting that different modules may have different functions. The green module was found enriched to 19 KEGG pathways, and was mainly enriched to p53 signaling pathway, cAMP signaling pathway, Glucagon signaling pathway and some other pathways significantly related to tumorigenesis, development, tumor metabolism of cervical cancer (Fig. 4b). The pink module was enriched to the collecting duct acid secretion and SNARE interactions in vesicular transport pathway (Fig. 4c). As shown in Fig. 5D, the magenta module was enriched to 6 KEGG pathways, which are mainly the pathways related to cardiomyopathy such as Hypertrophic cardiomyopathy (HCM) and Dilated cardiomyopathy (DCM) (Fig. 4d). It is known that advanced cancers can easily induce great changes in metabolism, promote cardiac atrophy, and heart failure [37]. As shown in Fig. 5e, the darkgreen module was enriched to 7 KEGG pathways, mainly to PI3K-Akt signaling pathway, Nicotine addiction and other pathways related to tumorigenesis and development (Fig. 4e). These results indicated that lncRNAs may be directly or indirectly involved in important pathways of the tumorigenesis and development of cervical cancer.
Abnormal expressions of lncRNAs were relevant to CNVs
To analyze the relationship between lncRNA expressions and CNVs, lncRNAs copy number data were extracted from 292 cases of cervical cancer cases obtained from the TCGA, with copy number greater than 1 as the threshold of copy number amplification and less than -1 as the copy number deletion threshold. The ratio of copy number amplification and copy number deletion of each lncRNAs and its distribution in the genome were analyzed (Fig. 5a). Hwere, copy number deletion and copy number amplification showed different distributions on different chromosomes. For example, most copies of chromosome 3, 4, 8, 11 and 17 were absent, while some copies of chromosome 1 and 3 were increased. The association distribution between the expression profile of lncRNAs and copy number demonstrated an overall positive related trend, and the distribution in the actual situation was significantly larger than that in the random case (p < 2.2e-16) (Fig. 5b). The frequently changing regions in the genome of cervical cancer patients were detected using GISTIC algorithm, and the data revealed many regions with significant copy number amplification or deletion of lncRNAs (Fig. 5c), suggesting that the abnormal copy number of lncRNAs may be related to the occurrence and development of cervical cancer. A total of 3 lncRNAs with a copy number ratio of more than 25% in each sample were identified, and their expression differences in copy number amplification/deletion and normal copied samples were analyzed. The data showed that the expressions of lncRNAs in the samples with copy amplification were significantly higher than that in the samples with normal copies (Fig. 5d), indicating that the abnormal expressions of lncRNAs were related to the abnormal copy number.
Identification of lncRNA prognostic markers with abnormal copy number in cervical cancer patients and establishment of a lncRNA signature
A total of 575 lncRNAs with CNV greater than 15% were selected. Univariate Cox analysis was performed to examine the relationship between these lncRNAs and OS, and we found that 41 lncRNAs significantly related to the cancer prognosis p < 0.01 (Additional file 3: Table S3). LASSO Cox regression analysis was further performed to analyze the expression profiles of these lncRNAs. The change trajectory of each independent variable (Fig. 6a) demonstrated that the number of independent variable coefficients close to 0 gradually increased with the gradual increase of lambda (Fig. 6b). The model was built using tenfold cross-validation, and the confidence interval under each lambda was analyzed. When lambda = 0.0285, we found that the model reached the highest performance, and there were 12 lncRNAs, which could serve as the potential prognostic markers. Furthermore, multivariate Cox survival analysis was performed, and 8 lncRNAs (Additional file 4: Table S4) with the lowest AIC value (AIC = 546.05) were obtained as the final prognostic markers to establish a risk regression model:
RiskScore = 0.486*expENSG00000225855 + 2.707*expENSG00000273125 + 0.573*expENSG00000249306−2.601*expENSG00000253490 + 0.096*expENSG00000130600 + 0.768*expENSG00000229373−3.111*expENSG00000260898 + 0.684*expENSG00000265096.
The relationship between risk score and lncRNA expressions was shown in Fig. 7c. As the risk score increased, the mortality rate of the samples also increased. 6 high-expressed lncRNAs associated with high risk score were a risk factors, while 2 low-expressed lncRNAs associated with high risk score were protective factors (Fig. 6c). One-year, three-year and five-year of ROC analysis and prediction showed that the model had a high AUC area, and they all had an AUC above 0.75 (Fig. 6d). Finally, Zscore of the risk score was calculated to divide the samples with a risk score higher than zero into high-risk group (N = 114) and with a score lower than zero into low-risk groups (N = 121). Interestingly, the samples in the high-risk group showed significantly worse prognosis than those in the low-risk group (p < 0.0001, HR = 3.406, 95% CI: 1.912–6.064) (Fig. 6e).
In addition, 50% of the 292 samples were randomly selected and repeated one thousand times. The model was applied to the prognostic prediction of these one thousand samples, and the prognostic significance p-values were calculated for each calculation (Additional file 5: Fig. S1B). The prognostic predictive power in these samples was observed to show significant differences in the prognostic significance p-values for each of the one thousand random samples. In addition, we also analyzed the correlation between 8 lncRNAs (Additional file 5: Fig. S1C), and only a few were significantly correlated, as expected there was no covariance among them.
Prognostic model validation and functional analysis of the 8-lncRNA model
To verify the performance of the 8 CNV-related lncRNAs in predicting the prognosis of cervical cancer, the risk scores of each sample in all TCGA data sets were calculated according to the expressions of the samples, and the predictive classification efficiency of 1-year, 3-year, and 5-year AUC was determined. The results showed that the model had a high AUC line area above 0.75 (Additional file 6: Fig. S2a). The samples were divided into high- and low-risk groups according to the threshold, and we found that the prognosis of the high-risk group was significantly worse than that of the low-risk group (p < 0.0001, HR = 3.133, 95%CI:1.854—5.291) (Additional file 6: Fig. S2a). To further verify the robustness of the model, the GSE44001 data of GPL14951 platform was downloaded. The average AUC of 1-year, 3-year, and 5-year ROC was all higher than 0.6 (Additional file 6: Fig. S2c). Moreover, we also assessed the prognosis of the samples in high-risk group and low-risk group, and observed that the prognosis of the high-risk group was significantly worse than that of the low-risk group (p = 0.036, HR = 1.957, 95%CI: 1.032–3.707) (Additional file 6: Fig. S2d). These data indicated that the performance model of 8-lncRNA signature model had a great robustness. To investigate the relationship between Riskscore and biological functions of different samples, risk score was correlated with the enrichment score of KEGG pathways in each sample. KEGG pathways with a correlation greater than 0.4 and FDR < 0.01 contained 20 pathways (Additional file 6: Fig. S2e). Interestingly, these 20 pathways were negatively correlated with Riskscore and mainly included T CELL RECEPTOR SIGNALING PATHWAY, B_CELL_RECEPTOR_SIGNALING_PATHWAY, CYTOSOLIC_DNA_SENSING_PATHWAY, CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION and some other immune-related pathways. The results suggested that the samples from the high-risk group and the low-risk group may have different immune microenvironments, and that these lncRNAs may be involved in tumor progression by affecting immune-related pathways.
Comparison of 8-lncRNA prognosis model with clinical features and the existing models
To identify the independence of the 8-lncRNA signature, the relationship between T, N, M stage, age, and risk score and prognosis was analyzed by Univariate and multivariate Cox regression analysis. Univariate Cox regression analysis showed that N stage, TNM stage and risk score were significantly related to survival (Additional file 7: Fig. S3a), However, multivariate Cox regression analysis found that only risk score (p < 0.0001, HR = 3.425, 95% CI: 1.922–6.104) was significantly correlated with prognosis (Fig. S3B). Thus, the data revealed that the 8-lncRNAs signature can serve as a prognostic predictor independent of clinical characteristics. Furthermore, we compared the 8-lncRNA signature with four recently reported prognostic-related risk models, namely, the 4-lncRNA signature by sun et al. [33], the 10-lncRNA signature by Shen et al. [34], the 9-lncRNA signature by Yu et al. [35], and the 6-lncRNA signature by Luo et al. [36]. In order to ensure the comparability of the models, according to the corresponding genes in these 4 models, the same method was used to calculate the risk score of each cervical cancer sample in the TCGA dataset and ROC of each model, and KM survival curve was plotted. Although the prognosis of the Risk-H and Risk-L group samples of the four models were significantly different, the AUC prediction accuracy of the four models was lower than that of our 8-lncRNA model (Additional file 7: Fig. S3C-F). The restricted mean survival of these four models was also compared with our 8-lncRNA model. We observed that the 8-lncRNA model was more accurate in predicting a longer follow-up time and the C-index was higher than the other four models (Additional file 7: Fig. S3g). Similarly, the DCA results showed that the risk score of the 8-lncRNA model developed in this study was far more indicative than the other four subtypes (Additional file 7: Fig. S3H). These results suggested that the 8-lncRNA signature is a new reliable prognostic marker independent of clinical stages.