Co-expression of cancer driver genes: IDH-wildtype glioblastoma-derived tumorspheres

Background Driver genes of GBM may be crucial for the onset of isocitrate dehydrogenase (IDH)-wildtype (WT) glioblastoma (GBM). However, it is still unknown whether the genes are expressed in the identical cluster of cells. Here, we have examined the gene expression patterns of GBM tissues and patient-derived tumorspheres (TSs) and aimed to find a progression-related gene. Methods We retrospectively collected primary IDH-WT GBM tissue samples (n = 58) and tumor-free cortical tissue samples (control, n = 20). TSs are isolated from the IDH-WT GBM tissue with B27 neurobasal medium. Associations among the driver genes were explored in the bulk tissue, bulk cell, and a single cell RNAsequencing techniques (scRNAseq) considering the alteration status of TP53, PTEN, EGFR, and TERT promoter as well as MGMT promoter methylation. Transcriptomic perturbation by temozolomide (TMZ) was examined in the two TSs. Results We comprehensively compared the gene expression of the known driver genes as well as MGMT, PTPRZ1, or IDH1. Bulk RNAseq databases of the primary GBM tissue revealed a significant association between TERT and TP53 (p < 0.001, R = 0.28) and its association increased in the recurrent tumor (p  < 0.001, R = 0.86). TSs reflected the tissue-level patterns of association between the two genes (p < 0.01, R = 0.59, n = 20). A scRNAseq data of a TS revealed the TERT and TP53 expressing cells are in a same single cell cluster. The driver-enriched cluster dominantly expressed the glioma-associated long noncoding RNAs. Most of the driver-associated genes were downregulated after TMZ except IGFBP5. Conclusions GBM tissue level expression patterns of EGFR, TERT, PTEN, IDH1, PTPRZ1, and MGMT are observed in the GBM TSs. The driver gene-associated cluster of the GBM single cells were enriched with the glioma-associated long noncoding RNAs.

Molecular subtypes have been proposed to account for the heterogeneity of GBM [1,5]. However, these subtypes are not used in clinical diagnosis because of their stochastic nature [6]. To increase the accuracy of diagnosis, clinical glioma classification has recently gravitated toward the analysis of mutations on the core driver genes, such as those of isocitrate dehydrogenase (IDH), epidermal growth factor receptor (EGFR), phosphatase and tensin homolog (PTEN), tumor protein p53 (TP53), telomerase reverse transcriptase (TERT) promoter, protein tyrosine phosphatase receptor type Z1 (PTPRZ1), as well as O-6-methylguanine-DNA methyltransferase (MGMT) promoter methylation [2,7].
IDH is the primary gene used to distinguish between primary and secondary GBM [8]. Their clinical incidence and molecular evidence suggest that these tumor types differ in their mutation and expression profiles [5,9,10]. However, despite its importance, approximately 80% of patients with GBM patients have IDH-wildtype (WT) tumors [11,12]. Furthermore, most of the established PDX models are from IDH-WT GBM which may suggest its importance in the survival of cells [13][14][15].
TERT activity is detected in up to 90% of human primary cancer [27]. The rate of TERT promoter mutations is reported as about 58-90% of IDH-WT GBM patients [27,28]. And 94% of GBM cells are reported to harbor TERT mutation [29]. However, TERT promoter mutation does not significantly affect the prognosis of GBM patients [28].
PTPRZ1 shows a relatively low rate of mutation in the GBM. Recently, this gene is being associated with the origin of glioma cells with the elevated expression in the GBM tissue as well as the subventricular zone [30,31]. As a marker of neuroglial origin, PTPRZ1 may add a bridge between the neurotransmitters, neurodevelopment, and tumor microtubes [10,30,32,33].
MGMT promoter methylation status is observed in the 50% of glioblastoma patients. Its promoter methylation status is correlated with the gene expression [34]. In GBM, unmethylated MGMT promoter status is associated with poor response to alkylating agents [35,36]. Temozolomide (TMZ) is the most important alkylating agent available in the GBM patients [37]. However, contrasting reports shows other mechanisms than MGMT promoter methylation may be involved in the MGMTdeficient GBM cells [38].
Here, a retrospective comparative analysis of RNAseq and single cell RNAseq data from IDH-WT GBM and GBM TS was conducted to find whether the TSs are representing the signatures of tumor tissue. Furthermore, we aimed to find the transcriptomic change after TMZ treatment in the in vitro level.

Clinical samples
IDH-WT GBM tissue samples were obtained from Brain cancer center, Severance hospital (n = 58, from 2016 to 2020, The patient samples were ethically approved by the institutional review board of Severance hospital). Tumorfree cortex samples for control were obtained when available during the resection of subcortical tumors, n = 24). All samples with associated DNA mutation profiles and tumor RNAseq data were included in this retrospective analysis. Samples without tumor mutation profiles were included for comparison. Clinical information, Mutation profiles, and MGMT promoter methylation status were obtained from the electronic medical record of the hospital. Detailed methods are described in each section. Mutation profiles were not evaluated for the healthy cortex controls, but were extrapolated from the results of the matched tumor tissues. Frozen tissue samples of RNAseq were processed in the (Theragen, Seongnam-si, Republic of Korea).

DNA pyrosequencing
All GBM tissue specimens were examined by modified pyrosequencing to evaluate MGMT promoter methylation status in the hospital setting [51]. DNA was extracted from diagnostic formalin-fixed, paraffin-embedded (FFPE) GBM samples using a Maxwell CSC DNA FFPE Kit (Promega, USA). The annealing temperature was 53 °C, and samples were analyzed on a Pyromark Q24 MDx System (Qiagen, Germany). To categorize tumors based on MGMT promoter methylation status, we used a threshold of < 8% for the average percentage of four CpG sites in exon 1 [51].

Mutation calling
Formalin-fixed, paraffin-embedded (FFPE) tissue blocks were sequenced with Trusight Tumor 170 panel (Illumina, United States) [52]. Maxwell CSC DNA FFPE Kit (Promega, United States) was used to prep for DNA/ RNA hybrid capture in the Nextseq 550 Dx (Illumina). Trusight Tumor 170 App Pipeline was used to analyze DNA small variants with Homo sapiens hg19 genome as the reference (Homo sapiens, UCSC). Exonic mutations that passed Illumina QC filter were included. Mutations less than 100 depth or less than 3% of variant allele frequency were excluded from the analysis.

Transcriptome data analysis
The samples of TSs for RNAseq were hybridized with All Human V6 + UTR baits (individual TS, n = 20; for TMZ treatment, triplicated TS13-64 and GSC11). All of the transcripts in this analysis were merged and labeled after same alignment and counting process. GSC11 and TS13-64 TS samples (with TMZ) were analyzed in the same manner. Gene expression level data were calculated by summing up the transcripts in the gene location (GRCh38.p5). Controversial transcripts were reconfirmed in the sequence level that is extracted from gffread (-w option) [53]. An unsupervised selection of the expressed genes (Coefficient of variation > 10) were included for the t-SNE analysis [54].

Single cell RNAsequencing
GBM-derived TS 13-64 maintained under spheroid cell culture condition with B27. Within 30 min before the single-cell RNAsequencing (scRNAseq), the cells were dissociated with accutase. The 10× Genomics Chromium platform was used to capture and barcode the cells to generate single-cell Gel Beads-in-Emulsion (GEMs) by following the manufacturer's protocol. scRNAseq expression data were analyzed with Seurat v2.3.4 (PCA, Cluster, t-SNE and cluster). In brief, the Seurat object was generated from digital gene expression matrices. To maintain the TERT positive cells, the filtering of the number of genes detected in each cell was not restricted.
The percent of mitochondrial genes were not restricted in our analysis. Normalized scaled data was found to have two distinct clusters. Shared nearest neighbor (SNN) modularity optimization-based clustering algorithm revealed two to seven clusters depending on the resolution variable (from 0.01 to 0.5). Receiver operating characteristic (ROC) was used for the identification of differentially expressed genes for each class with log fold change 0.25. We examined the area under the ROC curve (myAUC) with two and three cluster models.

Gene set enrichment analysis (GSEA)
Genes used in enrichment analysis were selected based on their coefficients of variation [the variance divided by the mean across the comparison group (n > 1) and mean expression (> 5 fragments per kilobase of transcript per million mapped reads (FPKM)] at the transcript level. Statistically significant genes were included in GSEA using the Reactome and KEGG database, with a significance threshold of p < 0.01 [55,56]. Pathway significance was calculated as the − log10 of the analysis p-value.

Validation sets
Gene expression level data of TCGA GBM was collected from Xena browser (University of California, United States) [57]. Survival data was gathered from the TCGA GBM, which was processed by GEPIA homepage [58]. Long non-coding RNA list of cancer was obtained from the Gold lab homepage [59].

Statistical analysis
For the group comparison in the Table 1, we used Pearson's Chi-squared test with Yates' continuity correction. Wilcoxon, Kruskal-Wallis, and Student's t-tests were used for intergroup comparisons of gene expression levels. For the scatter plot, Pearson correlations were calculated for individual groups using the ggpubr package in R (v. 0.4.0). For cell data, p < 0.05 was regarded as significant (by t-test). In two-group comparisons, genes with p < 0.0001 by Student's t-test were regarded as significant and included in the heatmaps.

Data availability
The tumor tissue and TS datasets (Severance cohort) analyzed during the current study will be published in the Arrayexpress and GEO databases. The cancer genome atlas (TCGA) data from the gene expression profiling interactive analysis (GEPIA, v. 1) and cBioportal databases were included after data analysis to validate gene correlation [58,61].

TS isolation from the IDH-WT GBM tissues
In this bioinformatics analysis, each tumor tissue was non-selectively cultured to establish GBM TSs for RNAseq (Table 1) [44]. We found that TP53-mutant IDH-WT GBM comprises 48% of samples (28/58) which is relatively consistent with the reports [16].
Severance cohort of TSs revealed TP53 mutation status may be associated with the isolated TS ( Table 1). Most of the isolated TSs are TP53 mutant (80%, 16/20, 3 sample excluded for the absence of next-generation sequencing data). The frequency of the TP53 mutants is different from that of GBM tissue (p = 0.027, Table 1). Other variables show no difference between the tissue and TSs (Table 1). TERT mutant TSs were found in the 90% of samples, however its composition ratio was consistent with a literature [29].

Gene level analysis of GBM TSs
From this finding, we examined the GBM tissue and TSs by these molecular markers: TERT-TP53 correlation was found in the Severance GBM database (Fig. 1). RNAseq revealed TERT and TP53 may be associated regardless of TP53 mutation status (Fig. 1a). Even though, TERT is overexpressed in the GBM tumor and TERT promoter mutated samples (Fig. 1b), these two gene expression levels were more associated in the recurrent GBM (Fig. 1c). GBM TSs also showed stronger association, especially in the TP53 mutant TSs (Fig. 1d). Single cell RNAseq revealed these two genes, as well as other known driver genes, are overexpressed in a single cluster (Fig. 1e).
We examined the IDH-WT GBM tissues and GBM TSs, whether these samples are associated by other factors (Figs. 2, 3). We found TP53, EGFR, IDH1, PTPRZ1, and TERT are significantly overexpressed in the tumor tissue than the control (Fig. 2). However, these gene expression levels were not different by the TP53 mutation status in the tissue (Fig. 2a). GBM TSs showed EGFR, PTEN, IDH1, PTPRZ1 were overexpressed in the TSs than the normal human astrocytes (NHAs). However, TP53 gene was not showing elevated trend than the NHAs (Fig. 2b).
Downregulated trend of PTEN expression in the GBM tissue than the cortex (Fig. 2a) is reflected in the GSC11 and GBM TSs (Fig. 2b). As the BAX, CDKN1A, and MIR34AHG was associated to be elevated in the TP53 mutation status [21,62,63]: BAX, CDKN1A, and MIR34AHG are overexpressed in the IDH-WT GBM tissue than the cortex. However, there was no trend in the TSs by the TP53 mutation status (Additional file 1).
There were seven matching samples of GBM tissues and TSs (Fig. 3). We evaluated the gene expressions of TP53, TERT, MGMT and PTPRZ1 by the molecular markers: TP53 mutation status was not associated with the conservation of the gene expression levels (Fig. 3a). TERT promoter mutation was associated with the higher expression of TERT and TP53 gene in the tissue and TSs (Fig. 3b). MGMT promoter methylation status was associated with the MGMT gene expression (Fig. 3c).

Transcriptomic level analysis of GBM TSs
We examined whether the TP53 mutant TSs are different from the TP53 WT TSs (Fig. 4). Unsupervised gene variability-based t-SNE showed no significant difference by the TP53 mutation status (Fig. 4a). There was no definite difference by other molecular markers (Additional file 2). In our TSs, TP53 mutant TSs were relatively more heterogeneous than the TP53 WT TSs (Fig. 4c). Combining the results of DEG and GSVA, we found the ECM-related signatures came from the mesenchymal subtype of GBM TSs ( Fig. 4b-d, Additional file 3). Additionally, we found no definite association of the gene sets of invasion (or EMT, detail in the method) and glioma neurosphere with the TP53 mutation status of GBM TSs [64,65]. EMT genes were not definitely associated with TP53 mutation status (Additional file 4). Majority of the TS followed the glioma cell expression patterns: Glioma sphere downregulated genes (77%, 17/22, Additional file 5a) and upregulated genes (54%, 12/22, Additional file 5b) [65].

Single cell level RNAseq analysis of GBM TS
One GBM TS  was selected for the single cell RNAsequencing (Figs. 1e, 5). The TS was derived from a 56-year-old female patient with no medical history except a carrier status of hepatitis B virus. Her chief complaint was weakness on the left arm and leg for 2 weeks. Magnetic resonance imaging (MRI) found an invasive phenotype on the MRI with high gadoliniumenhancing mass (Right parietal lesion, 4.93 cm in diameter) surrounded by extensive T2 FLAIR high density [1]. Pathology confirmed the diagnosis of GBM. The initial molecular phenotype of this tumor was IDHwildtype, MGMT promoter unmethylated status, and 1p intact/19q intact. UMAP clustering showed two distinct clusters. DEGs from three cluster showed a cluster was enriched with driver genes and long noncoding RNAs (Fig. 5). Interestingly, in a non-mixed immortal TS line (TS13-64), a driver genes-enriched cluster occupied small number of cells and larger portion was not expressing the driver-associated genes (Fig. 1e,  Additional file 6).
Most of the DEGs are not exclusively expressed in a single cluster (Fig. 5c-e). Single cell level correlation between MALAT1 and NEAT1 was found in the three clusters (Fig. 5f ).

Transcriptomic change after TMZ treatment regardless of TP53 mutation status
We examined whether TMZ can change the gene expression pattern of GBM TS (13-64) and GSC11 (Fig. 6a). We included a driver gene expression matched GSC11 as control. A difference of GSC11 and TS13-64 was TP53 mutation status (Additional file 7). Same amount of TMZ on the same number of cells showed an elevated stress-associated response with CDKN1A and downregulation of KIF20A gene expression (Fig. 6a, Additional file 8) [66]. Gene level downregulation was found in the multiple driver-associated genes and the DEGs from the single cell RNAseq of TS13-64 (Fig. 6b). TS13-64 was not distinct from other GBM TSs in the base expression level (Fig. 6c). In addition to the downregulated genes (Fig. 6d), there were relatively stable genes (Fig. 6e) and upregulated genes such as IGFBP5 (Fig. 6). In our study, TMZ treatment not definitely changed the level of NEAT1. Among the genes of Fig. 6b, NEAT1 was found to be associated with progression-free survival (PFS) in the GBM database (Fig. 6f ). Furthermore, the median level of NEAT1 was associated with poor overall survival and poor PFS in the lower grade gliomas (Additional file 9). NEAT1 expression was elevated in the TP53 wildtype GBM tissues of Severance cohort, but other molecular markers were not associated with the gene level (Additional file 10).

Discussion
TP53 mutation is one of the most common alterations across tumor types [22], and is observed in the early stages of GBM, along with changes in related pathways [17,67,68]. However, its association with GBM TS isolation rate was not reported yet [1,41,44,47,[69][70][71]. In this retrospective analysis, we found that TP53 mutants were more amenable to isolation from tissue (Table 1). RNAseq data shows that TP53 mutants are overexpressing ECM related genes with more mesenchymal subtypes (Fig. 4). CCLE database shows, no tendency by the TP53 mutation status suggesting this finding may be examined in the prospective study (Additional file 11). The details of the DEGs for Fig. 4 are included in Additional file 12.
Using IDH-WT GBM-derived TSs, we found a positive association between the levels of TERT and TP53.
Furthermore, GBM tissues also displayed this association ( Fig. 1) [58,61], and other publicly available data suggest that TERT and TP53 are associated in other tumor types and normal brain tissue, such as lower grade glioma, head and neck cancer, and acute myeloid leukemia [58]. However, not all cancer types exhibit this association (for example, urothelial bladder carcinoma) [58]. In retrospective analysis, our TSs are biased to the TP53 mutants, and the correlation of TERT and TP53 may be examined in the balanced dataset (Fig. 1d). In an attempt to examine the response of TSs to TMZ, we planned a comparison of two cells which are only different by the TP53 mutation status (TS13-64, TP53 mutant; GSC11, TP53 wildtype; Additional file 13). The expression pattern and response to TMZ of IGFBP5, which is commonly upregulated in GBM than cortical tissues, may need attention as it has been studied as one of important factors of GBM and gliomas (Fig. 6b). About the association between TP53 and NEAT1 (Additional file 10), we need more evidences whether they are correlated in biological manner. TP53 mutations are known to have gain-of-function effects in GBM cells [18], and the abundance of TP53 mutant TSs in this TS RNAseq data may be associated to a survival benefit to the TSs. However, we emphasize the overrepresentation of TP53 mutant TSs than the WT TSs does not provide a direct evidence of the gain-of-function effect of the mutation.
Even with these limitations, our study indicates a clue to approach the in vitro models of glioma with the expression pattern of driver genes (including the MGMT promoter methylation status, Additional file 14): not all cells in a glioma TS are directly associated with the driver-associated genes (Fig. 5a). The culture of glioblastoma sphere cell (GSC) became a well-established laboratory technique [72][73][74]. For example, GSC11 was established from the fresh surgically operated GBM tissue, and are used for drug screening or transcriptome analysis [49,75,76]. These cells become necrotic in the orthotopic models and organoid models [77,78]. Established TSs was believed to have stemness, the potential to form orthotopic tumors, and their characteristics do not change with repeated subculture [39-43, 45, 47, 69]. These spheroid cultures of tumor cells however, was not a single homogeneous group of cells (Fig. 5a) [42,45,79]. Furthermore, the result of GSEA should not be regarded as the best representation of a transcriptomic status (Additional file 15 of KEGG database shows different pattern with the same list of genes from Fig. 4b).
Finally, we rediscovered NEAT1 and other long noncoding RNAs (LncRNAs) that are important in the cancer biology as well as in GBM cell growth and invasion [59,80]. Furthermore, NEAT1 distinguishes the survival both in the GBM and lower grade gliomas   (Additional file 9). Our data also shows NEAT1 is overexpressed in the driver-enriched cluster of a GBM TS (Fig. 4). We examined the value of myAUC in the three cluster model of Fig. 5a

Conclusion
We found that GBM TSs represent the tissue level gene expression patterns of EGFR, TERT, PTEN, IDH1, PTPRZ1, and MGMT. Single cell sequencing revealed these driver-associated genes are co-expressed with the cancer driver noncoding genes. Our data shows the association of the protein coding driver genes and the noncoding driver genes.

Supplementary Information
Supplementary information accompanies this paper at https ://doi. org/10.1186/s1296 7-020-02647 -8.  Additional file 3. Gene expression heatmap of the extracellular matrix related gene set (Related to the Fig. 4). This gene set was obtained from a TP53 mutant TS-related Reactome analysis of Fig. 4b.
Additional file 4. mSig DB genes of epithelial mesenchymal transition (Related to the Fig. 4). The criteria of selecting these genes are described in the additional method section.
Additional file 6. Violin plots of gene expression (Related to the Fig. 5