Identification of key DNA methylation-driven genes in prostate adenocarcinoma: an integrative analysis of TCGA methylation data

Background Prostate cancer (PCa) remains the second leading cause of deaths due to cancer in the United States in men. The aim of this study was to perform an integrative epigenetic analysis of prostate adenocarcinoma to explore the epigenetic abnormalities involved in the development and progression of prostate adenocarcinoma. The key DNA methylation-driven genes were also identified. Methods Methylation and RNA-seq data were downloaded for The Cancer Genome Atlas (TCGA). Methylation and gene expression data from TCGA were incorporated and analyzed using MethylMix package. Methylation data from the Gene Expression Omnibus (GEO) were assessed by R package limma to obtain differentially methylated genes. Pathway analysis was performed on genes identified by MethylMix criteria using ConsensusPathDB. Gene Ontology (GO) term enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were also applied for the identification of pathways in which DNA methylation-driven genes significantly enriched. The protein–protein interaction (PPI) network and module analysis in Cytoscape software were used to find the hub genes. Two methylation profile (GSE112047 and GSE76938) datasets were utilized to validate screened hub genes. Immunohistochemistry of these hub genes were evaluated by the Human Protein Atlas. Results A total of 553 samples in TCGA database, 32 samples in GSE112047 and 136 samples in GSE76938 were included in this study. There were a total of 266 differentially methylated genes were identified by MethylMix. Plus, a total of 369 differentially methylated genes and 594 differentially methylated genes were identified by the R package limma in GSE112047 and GSE76938, respectively. GO term enrichment analysis suggested that DNA methylation-driven genes significantly enriched in oxidation–reduction process, extracellular exosome, electron carrier activity, response to reactive oxygen species, and aldehyde dehydrogenase [NAD(P)+] activity. KEGG pathway analysis found DNA methylation-driven genes significantly enriched in five pathways including drug metabolism—cytochrome P450, phenylalanine metabolism, histidine metabolism, glutathione metabolism, and tyrosine metabolism. The validated hub genes were MAOB and RTP4. Conclusions Methylated hub genes, including MAOB and RTP4, can be regarded as novel biomarkers for accurate PCa diagnosis and treatment. Further studies are needed to draw more attention to the roles of these hub genes in the occurrence and development of PCa.


Background
Prostate cancer (PCa) remains the second leading cause of deaths due to cancer in the United States in men [1]. Siegel et al. [2] demonstrated that there will have 164,690 newly diagnosed PCa patients and 29,430 deaths in 2018 in the United States. Thus, the diagnosis of PCa in early stage is vitally important [3]. Currently, the serum prostate-specific antigen (PSA) screening remains the primary way for early diagnosis of PCa. However, the sensitivity and specificity of PSA test remains low [4]. Therefore, it is important to find notable biomarkers for the diagnosis of PCa.
Previous studies [5][6][7] have shown that DNA methylation plays a crucial role in the development and progression of prostate cancer. DNA methylation is treated as a promising investigative tool for the study of progressive prostate cancer because that DNA methylation is a reversible progress [8]. High-throughput screening has been widely used to identify the DNA methylation involved in the initiation and progression of the prostate cancer [9][10][11]. Epigenetic modifications are crucial for diagnosis and prognosis of prostate cancer and proving additional options for prostate cancer diagnosis and treatment strategies [12,13].
In this study, prostate cancer associated DNA methylation-driven genes between cancerous and normal samples were identified and GO term enrichment analysis, KEGG pathway analysis, PPI network analysis were also performed respectively.

Identification of DNA methylation-driven genes
Clinical data of prostate cancer patients extracted from TCGA were demonstrated in Table 1. A total of 266 genes were differentially methylated when comparing tumor to normal by MethylMix criteria for all 553 samples. Representative differential methylation of tumor samples compared with normal samples was demonstrated in Fig. 1 and Table 2. Of these genes, 209 genes (78.57%) were hypermethylated and the remainder of the 57 genes (21.43%) were hypomethylated.
The entire matrix of methylation values was evaluated. In this study, the correlation between gene expression and DNA methylation data was calculated and 266 gene expression were found to be negatively correlated with DNA methylation (Fig. 2). Correlation between genes expression and DNA methylation of top 10 hypermethylated genes (Fig. 3) and top hypomethylated genes (Fig. 4) was also demonstrated. A Heatmap of the methylation values of all patients was demonstrated in Fig. 5.
A total of 369 differentially methylated genes and 594 differentially methylated genes were identified by the R package limma in GSE112047 and GSE76938, respectively.
Gene ontology terms analysis of DNA methylation-driven genes obtained from TCGA database GO term enrichment analysis results varied from GO classification. As to biological process, the DNA methylation-driven genes enriched in oxidation-reduction process, response to reactive oxygen species, xenobiotic catabolic process, bone morphogenesis, hydrogen peroxide biosynthetic process, response to interferon-beta, negative regulation of ATPase activity, and limb morphogenesis. For cellular component, the DNA methylation-driven genes enriched in extracellular exosome. For molecular function, the DNA methylation-driven genes enriched in electron carrier activity, aldehyde dehydrogenase [NAD(P)+] activity, glutathione peroxidase activity, structural molecule activity, and glutathione binding ( Fig. 6 and Table 3).

KEGG pathway analysis of DNA methylation-driven genes obtained from TCGA
KEGG pathway analysis found five significantly enriched pathways. Seven DNA methylation-driven genes enriched in drug metabolism-cytochrome P450. Four DNA methylation-driven genes enriched in Phenylalanine metabolism. Four DNA methylation-driven genes enriched in Histidine metabolism. Five DNA methylation-driven genes enriched in Glutathione metabolism. Four DNA methylation-driven genes enriched in Tyrosine metabolism ( Fig. 7 and Table 4).  There hub genes were IFITM1,  RTP4, ACSF2, GSTM2, GSTM1, ACOX2, COL4A6,  ITGA2,   which might play important roles in DNA methylation in prostate cancer patients. CytoHubba was used to carry out the top 20 hub DNA methylation-driven genes in Cytoscape software (Fig. 9). Correlation between genes

Validation of candidate hub DNA methylation-driven genes by the Gene Expression Omnibus (GEO) database
To further investigate the candidate hub DNA methylation-driven genes, GEO database was used to validate these selected genes. Two methylation profile datasets (GSE76938 and GSE112047) were extracted from the GEO for the validation. We then overlapped the differentially methylated genes among 266 genes obtained from TCGA database (Additional file 1: Table S1), 369 genes obtained from GSE112047 (Additional file 2: Table S2), 594 genes obtained from GSE 76938 (Additional file 2: Table S2), and top 20 hub genes obtained from PPI network analysis and identified a common list of 6 methylated genes including AKR1B1, RTP4, MAOB, GSTM2, GPX3, and COL4A6. The outcome was demonstrated by a Venn diagram (Fig. 11).

Gene ontology terms analysis of DNA methylation-driven genes obtained from GEO database
GO term enrichment analysis was perform based on DNA methylation-driven genes obtained from GEO database. The results of GO analyses demonstrated that DNA methylation-driven genes in GSE112047 were  mainly enriched in regulation of cell differentiation, extracellular matrix, cell adhesion, muscle structure development, and neuron differentiation ( Fig. 12a and Table 5). DNA methylation-driven genes in GSE76938 were mainly enriched in extracellular structure organization, extracellular matrix, central nervous system neuron differentiation, regulation of neurotransmitter levels, and cell fate commitment ( Fig. 12b and Table 5).

KEGG pathway analysis of DNA methylation-driven genes obtained from GEO
KEGG enrichment analysis was perform based on DNA methylation-driven genes obtained from GEO database.
The results of KEGG analyses demonstrated that DNA methylation-driven genes in GSE112047 were mainly enriched in Dopaminergic synapse, Nicotine addiction, and Amphetamine addiction pathways ( Fig. 12c and Table 6). DNA methylation-driven genes in GSE76938 were mainly enriched in Nicotine addiction, Morphine  addiction, and Retrograde endocannabinoid signaling pathways ( Fig. 12d and Table 6).

Validation of MAOB and RTP4 expression in the Human Protein Atlas and TCGA
The results of stains on normal prostate tissues demonstrated that the MAOB (Fig. 13a-c, patient id 1938, 2053, and 2098) and RTP4 were highly expressed in normal prostate tissues. With respect to MAOB, the staining was not detected and the intensity was negative in low grade prostate (Patient id 3910) adenocarcinoma (Fig. 13d). Also, the staining was not detected and the intensity was negative in high grade prostate (Patient id 3561) adenocarcinoma (Fig. 13e). Plus, the results of stains on normal prostate tissues demonstrated that the RTP4 (Fig. 13f-h, patient id 1798, 1984, and 3497) were highly expressed in normal prostate tissues. In terms of RTP4, the staining was low and the intensity was weak in low grade prostate (Patient id 4525) adenocarcinoma (Fig. 13i). The staining was low and the intensity was weak in high grade prostate (Patient id 4347) adenocarcinoma (Fig. 13j). Methylation status and correlation between genes expression and DNA methylation of MAOB and RTP4 were demonstrated in Fig. 14a, b.

Discussion
Epigenetic changes and modifications are a crucial component of initiation and progression of tumorigenesis [14]. DNA methylation has been most studied and hypermethylaiton is associated with a silencing effect on tumor suppressor genes. Previous studies [15][16][17] reported that the modification of the methylation status of specific genes has been associated with worse prognosis, which demonstrated that the modification of epigenetics may be involved in the progression of tumorigenesis.  Previous studies reveals that DNA methylation plays an important role in the development of prostate cancer and associates with adverse clinical outcomes. The methylation state of promoters of specific genes were also involved in the development of prostate cancer [11,18,19]. Larsen et al. [20] reported that DNA-methylation analysis of urine cells captured by microfiltration provides a novel tool for noninvasive detection of high-grade prostate cancer.
The novel genes involved in the epigenetic regulation of prostate cancer could be identified by using highthroughput arrays [7,9,18]. We aimed to elucidate the role and importance of DNA methylation in prostate cancer by analyzing The Cancer Genome Atlas Project. In this study, a model-based tool (MethyMix) was used to identify key genes with aberrant methylation and gene expression was linked to aberrant methylation. In this study, the top 3 hypermethylated genes were demonstrated including HIST1H2BH, FAM200A, and ZFP36L2 and the top 3 hypomethylated genes were also demonstrated including TAF1D, TMEM87A, and KLK2.
In terms of GSTM2, Angulo et al. [21] revealed that GSTM2 hypermethylation could be used to predict biochemical recurrence after radical prostatectomy, which suggested that epigenetic silencing of GSTM2 played an important role in involving biochemical recurrence. Plus, Ashour et al. [22] demonstrated that  GSTM2 was hypermethylated in PCa and was simultaneously methylated in 40.9% if the PCa, which revealed that epigenetic silencing of GSTM2 is a common event in PCa. Maldonado et al. [23] revealed that GPX3 was aberrantly methylated and silenced in PCa tissues and had tumor suppressor activity in PCa cell lines. Plus, Lin et al. [24] revealed that GPX3, as a DNA methylationsilenced tumor suppressor gene, was hypermethylated  -c), the results demonstrated that the staining was not detected and the intensity was negative in low grade (d) and high grade (e) prostate adenocarcinoma, respectively. Also, in terms of RTP4, when compared with normal prostate tissues (f-h), the results demonstrated that the staining was low and the intensity was weak in low grade (i) and high grade (j) prostate adenocarcinoma Xu et al. J Transl Med (2019) 17:311 in PCa. Strand et al. [25] stated that hypermethylation of COL4A6 could be treated as a frequent target in PCa. Ikeda et al. [26] demonstrated that hypermethylation of COL4A5 was one of the events that was responsible for the development of colorectal cancer. With respect to AKR1B1, Theresa et al. [27] stated that AKR1B1 promoter methylation proved to be commonly methylated in breast cancer. Also, Wei et al. [28] demonstrated that AKR1B1 could be potential screening markers of colorectal carcinoma. However, the relationship between AKR1B1 methylation and PCa has not been elucidated yet. To the best of our knowledge, RTP4 and MAOB have not been reported to be associated with DNA methylation in the occurrence and development of cancer, which means RTP4 and MAOB could be regarded as potential targets for the new therapeutic managements in prostate cancer. The Human Protein Atlas validated the staining and intensity of RTP4 and MAOB in PCa tissue, the results were consistent with bioinformatic analysis that MAOB and RTP4 may be novel biomarkers in PCa.
In this study, GO term enrichment analysis showed that extracellular exosome was associated with DNA methylation in prostate cancer. Huang et al. [29] demonstrated that extracellular exosomes, especially plasma exosomes, containing miR-1290 and miR-375 may serves as promising prognostic biomarkers for castration-resistant prostate cancer (CRPC). Bryzgunova et al. [30] also demonstrated that extracellular vesicles founded from urine showed a high specificity and sensitivity in distinguishing prostate cancer patients from healthy donors. GO term enrichment analysis also revealed that oxidation-reduction process was also involved in DNA methylation of prostate cancer. Schlaepfer et al. [31] revealed that inhibition of lipid oxidation plays a crucial role in elevating glucose metabolism of PCa cells. Liu et al. [32] also demonstrated that oxidation-reduction reactions was associated with mitochondria and mitochondrial damage in DU145 and PC3 prostate cancer cell lines.
In this study, KEGG pathway enrichment analysis showed that cytochrome P450 pathway was involved in the methylation process in PCa. Chen et al. [33]  demonstrated that cytochrome P450 (CYP) enzymes including CYP2R1, CYP27B1 and CYP24A1 were involved in the development and progression of PCa and were treated as promising targets in the treatment of PCa. Mandić et al. [34] also revealed that CYP1A1 can bind to DNA and induced the carcinogenesis of prostate cancer via involving in the various endogenous and environmental reactive compounds. The phenylalanine metabolism was also found to be participant in the methylation process in PCa. Gueron et al. [35] reported that the concomitant resistance (CR) phenomenon occurs in murine prostate cancer accompanied with ROS-damaged phenylalanine and the CR phenomenon was reversed when the phenylalanine were injected into mice. Lapek et al. demonstrated that the histidine-phosphorylated proteins with diverse functions including metabolism, protein folding, and motility play important roles in the development and progression of PCa.

Conclusion
Methylated hub genes, including MAOB and RTP4, can be regarded as novel biomarkers for accurate PCa diagnosis and treatment. Further studies are needed to draw more attention to the roles of these hub genes in the occurrence and development of PCa.

Data source
Expressing profiles of gene-specific DNA methylation data and clinical information of patients with PCa were downloaded from The Cancer Genome Atlas (TCGA) database (https ://tcga-data.nci.nih.gov/tcga/) [36]. The clinicopathological information of the patients with prostate cancer were also extracted. Illumina Human Methylation 450 Beadchip (450 K array) was used to measure the DNA methylation data. A total of 482,421 CpG sites throughout the genome would be assessed [37]. We downloaded level 3 methylation data and level 3 RNA-seq data from the TCGA data portal by using the TCGA-Assembler [38]. Two gene methylation datasets (GSE112047 and GSE76938) were downloaded from the Gene Expression Omnibus (GEO, https ://www.ncbi. nlm.nih.gov/geo/). The gene methylation microarray datasets were composed of 16 and 63 normal prostate tissues, respectively, and 16 and 73 PCa tumor tissues, respectively (GPL13534 Illumina HumanMethylation450 BeadChip).

Identification of DNA methylation-driven genes
The expression and methylation were analyzed with R 3.4.4 software (https ://www.r-proje ct.org/). For the data from TCGA, the R package MethylMix was applied to perform an analysis integrating gene expression data and methylation data. MethylMix was designed to identify gene expression that were correlated with methylation events. A total of three parts of MethylMix analysis were described as previous studies [39,40]. For the data from GEO, the R package limma was applied to perform analysis to identify the DMEs. Also, VennDiagram package in R software was used to identify overlapping DNA methylation-driven genes.

Gene ontology analysis
Gene ontology analysis (GO) (http://david .abcc.ncifc rf.gov/) is used for annotating differentially methylated DNA methylation-driven genes [41]. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) (http://david .abcc.ncifc rf.gov/) tool was used for obtaining the enriched GO terms of differentially methylated DNA methylation-driven genes based on the hypergeometric distribution to compute values, which was described as previous study [42]. FDR < 0.05 was set as the threshold value.

Pathway analysis
The gene lists found to be statistically significant by MethylMix were used to perform pathway analysis. The Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.kegg.jp/) was used to perform the pathway enrichment analysis. Furthermore, the Consensus-PathDB was also used to validate the enriched pathway results. ConsensusPathDB was used to performed pathway analysis as previous studies described [40,43]. The default settings were as follow: minimum overlap and p value cutoff of 0.01.

Integration of protein-protein interaction (PPI) network
Search Tool for the Retrieval of Interacting Genes (STRING) database (version 10.5) was used to evaluated the PPI information. Differentially methylated DNA methylation-driven genes were mapped to STRING to evaluate the interactive relationships of these genes. Cytoscape software (version 3.6.1) was used to constructed the PPI networks.

Validation of the candidate hub genes screened from TCGA in GEO database and the Human Protein Atlas
To confirm the candidate hub genes we obtained, two datasets (GSE76938 and GSE112047) was downloaded from GEO database to validate the methylation levels of these candidate hub genes. To confirm the expression of these hub genes in PCa tissues, immunohistochemistry