Dataset characteristics
A total of 31 studies satisfying the inclusion criteria and containing 1277 GBM, 427 A, 189 OD, and 150 NG samples were analyzed. The detailed characteristics of the four comparison groups (GBM vs. NG training set, GBM vs. NG validation set, GBM vs. A set, and GBM vs. OD set) are summarized in Table 1. Since GSE68928, GSE4271, and GSE4412 datasets have two different platforms, we devided each dataset into two datasets respectively.
Integrated analysis of nine training datasets identifies 322 DEGs in GBM vs. NG
We applied two meta-analysis methods to identify DEGs in GBM as described in “Materials and methods” section. By combining p-values, 437 DEGs were detected using Fisher’s sum of logs method (Fig. 2a, b). To further refine the list of DEGs in GBM, we conducted a random effects model to estimate the differences in gene expression across all datasets by combining the individual effect sizes into a meta-effect size [38, 39].
This method identified 393 DEGs at an FDR threshold of 1 × 10−19. Finally, the DEGs obtained from both methods were plotted in a Venn diagram, revealing 322 overlapping DEGs when combining both p-values and effect sizes. This overlapped group contained genes that not only had an overall large effect size across all datasets but also were significantly differentially expressed. To reveal biological functions differentially regulated in GBM, all 322 DEGs were analyzed by using Enrichr. Results for enriched biological pathways and gene ontology are shown in Additional file 1: Table S3. Pathways in cancer, MAPK signaling pathway, Wnt signaling pathway, apoptosis, and cell cycle were enriched pathway terms in GBM vs. NG.
Integrated analysis of the training and validation datasets identifies 33 DEGs in GBM vs. NG
A total of 471 DEGs were detected by combining p-values using Fisher’s sum of logs method (Additional file 2: Figure S1A). By using a random effects model to combine effect sizes, 285 DEGs were identified. Finally, the DEGs obtained from both methods were plotted in a Venn diagram, revealing 188 overlapping DEG when combining p-values and effect sizes.
In our study, we adopted the training-validation approach [10], using the larger dataset (nine datasets) as the training set and the smaller dataset (seven datasets) as the validation set. Venn diagram analysis showed that 33 DEGs, including 28 upregulated DEGs and 5 downregulated DEGs, were significantly expressed in both the training and validation datasets.
However, because only GBM samples were included in this meta-analysis, these 33 genes possibly also abnormally expressed in other subtypes of glioma relative to their expression in normal tissue. Therefore, we sought to determine whether this 33-DEG signature was specific for GBM or whether this set of genes was also significantly differentially expressed in other glioma subtypes.
Integrated analysis of GBM vs. NG, GBM vs. A, and GBM vs. OD sets identifies 8 DEGs significantly and specifically expressed in GBM
We analyzed additional validation datasets, including 21 datasets containing a total of 852 GBM and 427 A samples (Additional file 2: Figure S1B). Finally, we found that a total of 920 genes were significantly differentially expressed as assessed by Fisher’s sum of logs method (FDR < 0.05).
We analyzed additional validation datasets, including 12 datasets containing a total of 590 GBM and 189 OD samples (Additional file 2: Figure S1C). Finally, we found that a total of 482 genes were significantly differentially expressed as assessed by Fisher’s sum of logs method (FDR < 0.05).
In summary, the Venn diagrams of the three integrated analyses revealed eight genes that are significantly and consistently differentially expressed in GBM (six upregulated and two downregulated) and could distinguish GBM samples from NG tissues or tissues of other glioma subtypes (Fig. 2c, Additional file 1: Table S4). Thus, these genes may be potential diagnostic and therapeutic targets in GBM.
Validation in the TCGA-GBMLGG and CGGA cohorts
We further validated the eight DEGs in the TCGA-GBMLGG (158 GBM, 193 A, 188 OD, and 5 NG) and CGGA cohorts (89 GBM, 66 A, 28 OD, and 5 NG). In these two validation cohorts, all eight DEGs, including six upregulated and two downregulated genes, were significantly differentially expressed in GBM vs. NG (Fig. 3, Additional file 1: Tables S5 and S6). BRCA1 associated RING domain 1 (BARD1), CBX3, cathepsin S (CTSS), interferon-related developmental regulator 1 (IFRD1), signal transducer and activator of transcription 1 (STAT1), and myelin-associated oligodendrocytic basic protein (MOBP) were significantly and specifically differentially expressed in GBM.
Further ROC curve analysis based on the upregulated DEGs in the TCGA-GBMLGG cohort revealed that BARD1, CBX3, CTSS, IFRD1, and STAT1 had high accuracy in differentiating GBM samples from NG, A, and OD samples. In the CGGA cohort, BARD1, CBX3, CTSS, IFRD1, and STAT1 were similarly shown to have high accuracy in differentiating GBM samples from NG, A, and OD samples (Additional file 3: Figure S2).
Survival analysis
Further, to investigate the clinical relevance of CBX3 expression in glioma, Kaplan–Meier analysis was conducted to explore whether these eight genes play roles in the survival of patients with glioma. In the TCGA-GBMLGG cohort, the results showed that patients with high CBX3, BARD1, CTSS, epidermal growth factor receptor (EGFR), IFRD1, or STAT1 expression had significantly shorter survival but patients with high guanylate cyclase 1 soluble subunit alpha 3 (GUCY1A3) or MOBP expression had significantly longer survival than patients with lower expression of these genes. In the CGGA cohort, the results showed that patients with high CBX3, BARD1, EGFR, or IFRD1 expression had significantly shorter survival but patients with high GUCY1A3 or MOBP expression had significantly longer survival than patients with lower expression of these genes (Fig. 4). After excluding G-CIMP positive patients in TCGA-GBMLGG, survival analysis showed that patients with high CBX3, BARD1, CTSS, STAT1, or IFRD1 expression had significantly shorter survival but patients with high GUCY1A3 or MOBP expression had significantly longer survival than patients with lower expression of these genes we conducted Kaplan–Meier analysis (Additional file 4: Figure S3).
CBX3 overexpression in human GBM samples was shown by immunohistochemical staining and qPCR
Among these eight DEGs, CBX3 has been revealed to be linked with cancers; however, the precise role of CBX3 in glioma remains unclear, so we focused further investigation on CBX3. CBX3 expression was assessed by immunohistochemical staining. Two tissue microarrays consisting of 60 GBM, 36 A, and 10 NG samples were used to determine CBX3 expression. Immunohistochemical analysis revealed that CBX3 expression was significantly upregulated in the nuclei in GBM and A tissues compared with that in normal brain tissue. However, no significant difference was observed between GBM and A samples, possibly due to the limited sample size. The samples were divided into three groups according to the CBX3 expression score, representing low (0–7), medium (8–10), and high (11, 12) expression levels of CBX3 (Fig. 5a, b).
Furthermore, the expression of CBX3 mRNA in surgical specimens was detected by qPCR. The qPCR results in the surgical specimens showed that CBX3 mRNA expression was highest in GBM tissue and that CBX3 mRNA expression in GBM tissue was different from that in A or NG brain tissue. However, no statistically significant difference was observed between GBM and OD tissues (Fig. 5c).
CBX3 mRNA expression was evaluated by qPCR in glioblastoma cell lines, including SF268, U373, U251, U87MG, U118, and A172 cells (Fig. 6a). Among these cell lines, U373 cells displayed the highest expression of CBX3 and were thus selected for the following studies.
Knockdown of CBX3 inhibits U373 cell growth
To explore the role of CBX3 in glioblastoma, either siRNA targeting CBX3 or nonsilencing RNA sequences were transfected into U373 cells. Approximately 72 h after virus transfection, 90% of the U373 cells exhibited red fluorescence under fluorescence microscopy, indicating CBX3 expression. qPCR analysis showed that the expression of CBX3 mRNA was reduced by approximately 77% in the shCBX3-1 group compared with that in the shCtrl group (Fig. 6b). Moreover, Western blot analysis suggested that the expression of the CBX3 protein was downregulated in the shCBX3-1 group compared to that in cells transfected with the control lentivirus (Fig. 6c).
To determine the effects of CBX3 on glioblastoma cell growth, we monitored proliferation using the CCK8 assay. The proliferation rate of U373 cells transfected with shCBX3-1 was markedly lower than that of cells transfected with shCtrl (Fig. 6d).
Knockdown of CBX3 induced G2/M cell cycle arrest in U373 cells
The cell cycle distribution in cells infected with either shCBX3 or shCtrl lentivirus was explored in an attempt to explain the CBX3-mediated suppression of proliferation. The number of CBX3 knockdown U373 cells in the G0/G1 phase was significantly lower than the number of control cells in the G0/G1 phase, while the number of CBX3 knockdown U373 cells in the G2/M phase was markedly higher than the corresponding number of control cells. Thus, U373 cells exhibited G2/M cell cycle arrest after transfection with shCBX3-1 (Fig. 6e).