mRNAsi and corrected mRNAsi according to clinical characteristics of BRCA
mRNAsi is a novel stemness index for evaluating the dedifferentiation potential of tumour cells and is thus considered a marker of CSCs. In our study, we found that the mRNAsi in BRCA tissues was significantly higher than that in normal tissues (Fig. 1a). The mRNAsi among different stages of BRCA showed obvious differences, and mRNAsi scores represented a gradually increasing trend with more aggressive clinical traits and tumour stages (Fig. 1b, c). The mRNAsi index was reported to be derived from normal cells and cells with different degrees of stemness via the OCLR algorithm and calculated in TCGA transcriptomic datasets [9]. Tumour tissues were composed of thousands of different cells, including tumour cells and other types of cells, such as stromal and immune cells, which remind us that tumour purity was likely to be an interference factor affecting the evaluation of mRNAsi in clinical characteristics. Given the presence of tumour-associated cells and normal cells in transcriptomic studies, a previous study reported the ESTIMATE method to evaluate tumour purity; thus, the ESTIMATE score of BRCA in TCGA can be obtained [13]. The corrected mRNAsi (mRNAsi/tumour purity), calculated in the same way as a previous report, was used to remedy the influence of tumour purity in our study [14]. We re-analysed the corrected mRNAsi in normal and BRCA tissues and gained a similar result as that with mRNAsi: BRCA tissues had an evidently higher stemness index even after purity correction (Fig. 1d). In addition, the correlation between clinical characteristics and the corrected mRNAsi index was also similar to the correlation with the mRNAsi index (Fig. 1e, f). In the survival analysis, we observed a tendency, without statistical significance, that patients with a higher mRNAsi index had relatively poorer overall survival than those with a lower mRNAsi index (Fig. 1g). Nevertheless, patients with higher corrected mRNAsi values had an apparently poorer survival probability compared with patients who had lower corrected mRNAsi values, and the difference was statistically significant (Fig. 1h). These above results reflected that mRNAsi or corrected mRNAsi in BRCA were closely related to CSCs in terms of clinical characteristics; these results were particularly apparent with the corrected mRNAsi, which may indicate a more accurate relation between CSCs and BRCA.
DEGs between BRCA tissues and normal tissues
The above results revealed that mRNAsi scores in BRCA tissues were higher than those in normal tissues; thus, we thought that there were some differentially expressed key genes that regulated the stemness of tumour cells. To identify these genes, we first downloaded the RNA-Seq data from the public TCGA database and screened the DEGs between BRCA tissues and normal tissues. We identified 4575 DEGs, of which 2698 were upregulated and 1877 were downregulated (Fig. 2a and Additional file 1: Table S1).
WGCNA: identification of mRNAsi-related modules and genes
WGCNA was applied to construct a DEG co-expression network to classify all DEGs into biological gene modules relying on average linkage hierarchical clustering and to further identify genes strongly associated with BRCA stemness. In this study, we selected β = 7 (scale-free R2 = 0.95) as a soft threshold to establish the scale-free network (Additional file 2: Figure S1) and finally obtained 10 gene modules for the next analysis (Fig. 2b, c). To explore the relationship between the gene module and mRNAsi scores, we defined MS as the overall gene expression level of a certain module for the subsequent analysis. As shown in Fig. 2b, the first row of each module was the R2 value, which indicated the degree of correlation between gene expression and BRCA stemness; the closer the R2 was to 1, the stronger the correlation was. The second row of each module was the p value, and p < 0.01 was considered statistically significant. Here, we simultaneously analysed the correlation between gene expression and mRNAsi or EREG-mRNAsi. We noticed that there were three modules, including the turquoise module (R2 = 0.79, p = 8.0e−226) (Fig. 2d), green module (R2 = − 0.67, p = 6.0e−141) (Fig. 2e) and brown module (R2 = − 0.69, p = 3.0e−153) (Fig. 2f), which showed very high R2 values and quite low p values; among these modules, the turquoise module reflected a positive correlation between gene expression and stemness characteristics, while the other two modules showed a negative correlation (Fig. 2b). Then, we selected the turquoise module to screen key genes in the mRNAsi group, and the selection criteria were defined as cor.MM > 0.8 and cor.GS > 0.5. Finally, we screened 32 key genes containing TPX2, HJURP, CDCA8, PLK1, KIFC1, CENPA, CCNB2, KIF2C, EXO1, TTK, KIF4A, CDC25A, MELK, NDC80, NCAPG, CEP55, NCAPH, RAD54L, KIF20A, KIF18B, ORC1, CDC45, KIF23, CDC20, BUB1, AURKB, SKA1, FOXM1, SGO1, DLGAP5, CDCA3, and BUB1B. We extracted the concrete expression value of each key gene and drew the heatmap and box plot, and the results reflected that those key genes were indeed evidently overexpressed in BRCA tissues (Fig. 3a, b).
Gene function annotation and pathway analysis
GO and KEGG analyses were employed to perform the functional enrichment analysis of significant modules and key genes. The results revealed that the major biological processes of the turquoise module were organelle fission, nuclear division and chromosome segregation, while the biological functions of the green and brown modules were enriched in extracellular structural organization and ameboid-type cell migration, respectively. In terms of signalling pathway enrichment analysis, we found those modules mainly focused on the cell cycle and PI3K-AKT signalling pathway (Additional file 3: Figure S2). These analyses also showed that the main functions of key genes were chromosome segregation, mitotic nuclear division and microtubule cytoskeleton organization, which are mostly related to the cell cycle pathway (Fig. 4a, b).
Correlation between key genes at transcription and protein levels
We identified the mutual correlation between key genes and their protein products using Pearson correlation and the STRING online tool. As shown in Fig. 5, we found strong and statistically significant correlations between key genes (p < 0.01). The relationship between KIF2C and CDCA8 had the highest correlation score of 0.86, and the lowest correlation score of 0.58 was for the relationship between BUB1 and AURKB or SGO1 and CDC20 (Fig. 5). The protein interaction relationships between key genes were analysed using STRING, and a wide-ranging and strong relationship between key genes was shown (Fig. 6a). We analysed the edge number of each node gene in the PPI network, and the results demonstrated an almost equal edge number for each gene, which indicated that those key genes composed a quite dense interaction network (Fig. 6b).
Validation and analysis of key genes expression
To systematically understand the expression levels of these key genes, we used two databases, Oncomine and GEPIA, to analyse their expression in multiple cancer types. Through the results of analysis using Oncomine, we found that all key genes were obviously upregulated in more than one cancer type in addition to BRCA, and all of them were ranked in the top 10% of DEGs that were ranked with a relatively high number of datasets (Fig. 7a). Meanwhile, we utilized the other online database GEPIA to verify their expression and acquired a similar result, as shown in Fig. 7b. For the expression verification of key genes in BRCA, we chose to examine the GEO data. We first confirmed the expression of key genes between tumour tissues (54 samples) and normal tissues (10 samples) in GSE29431, and all the genes were upregulated (Fig. 8a). Breast cancer is a heterogeneous neoplasm, and distinct subtypes of breast cancer contribute to different clinical outcomes; thus, we evaluated the expression of key genes in different subtypes, including luminal A, luminal B and triple negative breast cancer (TNBC), in BRCA. As shown in Fig. 8b, all 31 key genes (no expression data of SGO1 in GEO) were apparently highly expressed in TNBC compared with normal tissues. Moreover, we found 29 key genes, not including CENPA, ORC1 and RAD54L, that had higher expression levels in luminal B BRCA than that in luminal A BRCA (Fig. 8c). This result demonstrated that these key genes indeed participated in BRCA stemness maintenance. We further analysed the expression of key genes in two cell types containing stromal cells and epithelial cells in BRCA tissues, and the results revealed that only 8 genes (AURKB, BUB1, CENPA, KIF4A, KIFC1, NCAPG, PLK1, and RAD54L) showed distinct expression levels between these two cell types in BRCA tissues (Fig. 8d). To investigate the role of those key genes in BRCA, we performed survival analysis of these genes using the Kaplan–Meier plotter online tool, and the results indicated that 12 genes of the 32 key genes had an effect on the prognosis of patients with BRCA (Fig. 9).