Methylation and transcriptome analysis reveal lung adenocarcinoma-specific diagnostic biomarkers

Background DNA methylation can regulate the role of long noncoding RNAs (lncRNAs) in the development of lung adenocarcinoma (LUAD). The present study aimed to identify methylation-driven lncRNAs and mRNAs as biomarkers in the prognosis of LUAD using bioinformatics analysis. Methods Differentially expressed RNAs were obtained using the edge R package from 535 LUAD tissues and 59 adjacent non-LUAD tissues. Differentially methylated genes were obtained using the limma R package from 475 LUAD tissues and 32 adjacent non-LUAD tissues. Methylation-driven mRNA and lncRNA were obtained using the MethylMix R package from 465 LUAD tissues with matched DNA methylation and RNA expression and 32 non-LUAD tissues with DNA methylation. Gene ontology and ConsensusPathDB pathway analysis were performed to identify functional enrichment of methylation-driven mRNAs. Univariate and multivariate Cox regression analyses were performed to identify the independent effect of each variable for predicting the prognosis of LUAD. Kaplan–Meier curve analysis of DNA methylation and gene expression might provide potential prognostic biomarkers for LUAD patients. Results A total of 99 methylation-driven mRNAs and 17 methylation-driven lncRNAs were obtained. Univariate and multivariate Cox regression analysis showed that 6 lncRNAs (FOXE1, HOXB13-AS1_2, VMO1, HIST1H3F, AJ003147.8, ASXL3) were retrieved to construct a predictive model associated with overall survival in LUAD patients. Combined DNA methylation and gene expression survival analysis revealed that 4 lncRNAs (AC023824.1, AF186192.1, LINC01354 and WASIR2) and 8 mRNAs (S1PR1, CCDC181, F2RL1, EFS, KLHDC9, MPV17L, GKN2, ITPRIPL1) might act as independent biomarkers for the prognosis of LUAD. Conclusions Methylation-driven lncRNA and mRNA contribute to the survival of LUAD, and 4 lncRNAs and 8 mRNAs might be potential biomarkers for the prognosis of LUAD.

can provide a novel horizon to explore new biomarkers in predicting the prognosis of cancer [10][11][12][13][14]. A large number of studies have demonstrated that DNA methylation can regulate the expression of lncRNA, and this phenomenon may be associated with the prognosis of lung cancer [15]. For instance, the lncRNA AFAP1-AS1 acts as an oncogene in NSCLC, while its expression is tightly regulated by DNA methylation, which might provide prognostic and diagnostic values for NSCLC patients [16].
MethylMix, an algorithm implemented in the R programming environment, identifies disease-specific hyperand hypomethylated genes [17]. Currently, few studies on methylation-driven genes have been reported [18]. Recently, a study based on using MethylMix to explore methylation-driven genes for predicting the prognosis of LUAD was reported; however, they only obtained information about methylation-driven mRNA [19]. In this study, DNA methylation and RNA-Seq data were extracted from The Cancer Genome Atlas (TCGA) database, and we used the MethylMix R package to obtain LUAD-specific methylation-driven lncRNA sequences. Furthermore, a Cox survival predictive model with 6 lncRNAs was constructed to predict the diagnosis and prognosis of LUAD. Finally, the combined effect of DNA methylation and gene expression survival analysis was examined, which might provide a novel insight to explore methylation-driven lncRNA and mRNA for predicting the prognosis of LUAD.

Data retrieving and analyzing
Methylation and RNA-Seq expression data were retrieved from LUADs from the TCGA database. The methylation data were downloaded from 475 cancer tissues and 32 noncancer tissues from the Illumina Human methylation 450k platform. The RNA-Seq data (level 3), including mRNA and lncRNA expression, were retrieved from 535 cancer tissues and 59 noncancer tissues from the IlluminaHiSeq_RNASeq platform. First, on the basis of "limma R" packages in R with absolute fold change (log 2) > 0 and adjusting the false discovery rate (FDR) to a P value < 0.05, we obtained aberrant methylated genes. The methylation difference of the mean of the 3000 bp (base pair) sites upstream of the gene was analyzed to identify the differential level of methylation in the gene promoter [19][20][21][22]. We analyzed the differential level of methylation in the promoter of genes by using the limma R package [23]. Then, based on the "edge R" packages in R with absolute fold change (log 2) > 3 and adjusting the false discovery rate (FDR) to a P value < 0.01 to correct the statistical significance of multiple experiments, we retrieved differentially expressed mRNA and lncRNA. MethylMix is a kind of R statistical package for integrating DNA methylation data and RNA expression data to identify methylation-driven genes in kinds of cancers [17,[24][25][26]. Filtering or eliminating missing value genes is the preprocessing common step when running MethylMix R software [17]. In this present manuscript, we filtered or eliminated missing value genes and intersected DNA methylation data with RNA expression data for matching. Finally, there were a total of 465 LUAD samples with matched DNA methylation and RNA expression and 32 non-LUAD samples with DNA methylation data for entering the MethylMix R package. Then, calculated the correlation between DNA methylation level and RNA expression to find significantly negatively related genes, a beta mixture model was constructed for the degree of methylation of samples, Wilcoxon rank test was used to calculate differential methylation in LUAD and adjacent non-LUAD samples. Finally, the methylation-driven mRNA and lncRNA was obtained. Since the data were directly obtained from the TCGA database, no approval was required from the local ethics committee.

Functional enrichment analysis of methylation-driven mRNA in LUAD
To determine the function represented in the methylation-driven mRNA, we used the Database for Annotation, Visualization and Integrated Discovery (DAVID) (http://david .abcc.ncifc rf.gov/) to perform a functional and enrichment analysis on the methylation-driven mRNA by using GO and ConsensusPathDB analysis. In the GO analysis, a P value of less than 0.05 was considered statistically significant. Furthermore, the GOCircle and GOChord plotting functions of the GOplot R package were used to allow data from expression analysis and data from functional annotation enrichment analysis. ConsensusPathDB (http://cpdb.molge n.mpg.de/) is an online software program that includes binary and complex signaling, gene regulatory and drug-target interactions, and biochemical pathways. P < 0.05 was considered statistically significant.

Construction of a differentially methylated, lncRNA-related predictive model in LUAD
We identified the differentially methylated lncRNA associated with overall survival with P < 0.05 to act as prognostic methylation lncRNA candidates for multivariate Cox regression analysis. On the basis of the median risk score, LUAD patients were divided into two cohorts, high-risk cohorts and low-risk cohorts. Receiver operating curves were used to test the effect of the lncRNA signature (high risk vs low risk) on overall survival. We analyzed the receiver operating curve by calculating the area under the curve (AUC) under the binomial exact confidence interval to reveal prognostic biomarkers for predicting survival in LUAD.

Combined methylation and gene expression survival analysis
To further explore the effect of methylation level and gene expression level of the same methylation-driven key gene on LUAD patient prognosis, we performed a combined methylation and gene expression survival analysis to identify potential methylation-driven mRNAs and methylation-driven lncRNAs for predicting the prognosis of LUAD patients. Therefore, a Kaplan-Meier curve was performed. P < 0.05 was regarded as statistically significant.

Identification of methylation-driven mRNA and lncRNA in LUAD
A total of 99 mRNAs and 17 lncRNAs were identified to be associated with DNA methylation using Meth-ylMix criteria. The methylation-driven mRNAs and IncRNAs are shown in Tables 1 and 2, respectively. We constructed a mixed model and performed a Wilcoxon rank test for determining differential methylation (logFC > 0, P < 0.05, Cor < − 0.3). Figure 1 shows that two methylation-driven mRNAs (ZNF454 and ZNF471) (Fig. 1f, g) and two methylation-driven lncR-NAs (TUSC8 and LINC00676) (Fig. 1h, i) have significant negative correlations between methylation and gene expression levels. In Fig. 1, the distribution of the methylation degree shows that ZNF454 (Fig. 1b), ZNF471 (Fig. 1c), and LINC00676 ( Fig. 1e) are hypermethylated in LUAD patients and hypomethylated in the normal group, while TUSC8 (Fig. 1d) is hypomethylated in LUAD patients and hypermethylated in the normal group. A heat map of methylation-driven mRNAs and lncRNAs is shown in Fig. 2a, b. A flow diagram of the exploration of methylation-driven mRNA and lncRNA in LUAD is shown in Fig. 1a.

Enrichment analysis of methylation-driven mRNAs in LUAD
Gene ontology analysis showed that there were 5 GO terms (regulation of transcription, DNA-templated; transcription factor activity, sequence-specific DNA binding; nucleic acid binding; transcription, DNA-templated; metal ion binding) with significant differences (P < 0.05), and the highest GO biological process was "GO:0006355 regulation of transcription, DNA-templated" (Fig. 3a, c).
The GOChord plot shows the top 30 methylation-driven mRNAs with their related GO terms (Fig. 3b). Figure 4 shows that 11 pathways (Generic Transcription Pathway, Benzene metabolism, RNA Polymerase II Transcription, Gene expression (Transcription), Platinum Pathway, Pharmacokinetics/Pharmacodynamics, Phase II-Conjugation of compounds, Sulfation Biotransformation Reaction, Estrogen metabolism, Glutathionemediated detoxification, Cytosolic sulfonation of small molecules, Drug metabolism-other enzymes-Homo sapiens (human)) were considered statistically significant (P < 0.05). Furthermore, pathway analysis showed that the methylation-driven mRNAs were most enriched in the Generic Transcription Pathway, RNA Polymerase II Transcription and Gene expression (Transcription) pathways (P < 0.01) (Fig. 4). The pathway analysis is shown in Table 3.

Construction of a predictive model of six differentially methylated lncRNAs in LUAD
Univariate Cox regression analysis was performed first to identify the prognosis associated with differentially methylated genes in LUAD, incorporating 10 methylation genes that were conspicuously associated with overall survival (P < 0.05). Next, multivariate Cox regression was used and showed that six lncRNAs were eventually selected to construct a predictive model. We used the linear combination of the expression of the 6 lncRNAs to construct the predictive model. The relative coefficients weighted in the multivariate Cox regression were as follows: survival risk score = (3.0040 × expression value of FOXE1 + 1.0226 × expression value of HOXB13-AS1_2 + 1.0540 × expression value of VMO1 + 1.0050 × expression value of HIST1H3F + (− 3.0925) × expression value of AJ003147.8 + 1.4791 × expression value of ASXL3). The multivariate Cox analysis is shown in Table 4.

Risk groupings and ROC curve analysis
As shown in the heat map, the expression of six prognostic methylation genes was profiled (Fig. 5a). Based on the median risk scores, a total of 449 samples of complete survival information were divided into a highrisk group (n = 224) and a low-risk group (n = 225). We used the Kaplan-Meier curve with a log-rank statistical examination to perform survival analysis. As shown in Fig. 5b, patients in the low-risk group had conspicuously better overall survival than those in the high-risk group (Fig. 5b). The receiver operating characteristic (ROC) curve was analyzed to test the influence on the 6-lncRNA signature associated with overall survival in LUAD (Fig. 5c).

Combined methylation and gene expression survival analysis in LUAD
The combined Kaplan-Meier curve analysis revealed that the combination of methylation and expression of lncR-NAs AC023824.1, AF186192.1, LINC01354 and WASIR2 had a conspicuous correlation with the prognosis of LUAD patients (Fig. 6a-d). The hypermethylation and low-expression survival rate of AC023824.1 was high, while the hypermethylation and low-expression survival rate of AF186192.1, LINC01354 and WASIR2 were low. The combined Kaplan-Meier curve analysis showed that the combination of methylation and mRNA expression of mRNAs CCDC181, EFS, F2RL1, GKN2, ITPRIPL1, KLHDC9, MPV17L, and S1PR1 were associated with overall survival of LUAD (P < 0.05) (Fig. 6e-l). The hypermethylation and low-expression survival rate of F2RL1 was high. However, the hypermethylation and low-expression survival rates of EFS, CCDC181, GKN2, ITPRIPL1, KLHDC9, MPV17L, and S1PR1 were low (Fig. 6).

Discussion
In recent years, with the increasing numbers of advanced diagnoses and poor prognoses in lung adenocarcinoma, it is pivotal to find more effective prognostic biomarkers to predict survival in LUAD. LncRNA-related studies have attracted the attention of various cancer fields. Accumulating studies show that cancer-related lncRNAs may serve as diagnostic or predictive biomarkers of cancer and have a significant effect on the therapeutic treatment of cancer [27]. Emerging evidence shows that studies on the molecular mechanisms and prognostic biomarkers of LUAD associated with methylation-driven lncRNA and mRNA are still lacking. In recent years, epigenetic alterations in DNA methylation, noncoding RNA expression, chromatin modeling and post-transcriptional regulators have been found to play significant roles in the regulation and development of lung cancer pathogenesis [28][29][30][31][32]. Some studies have shown that epigenetic changes in DNA methylation cause changes in the expression of lncRNA, which might provide a novel insight to explore new biomarkers for predicting the prognosis of human cancer [33][34][35][36]. For instance, analysis of microarray data on gene expression and methylation showed that the expressions of lncRNAs LOC146880 and ENST00000439577 were regulated by DNA methylation, which might provide a new horizon to predict the diagnosis and prognosis of NSCLC [37]. Lu et al. indicated that MEG3 is significantly downregulated in NSCLC  tissues that could be affected by DNA methylation [38].
Previous studies have shown that survival-associated, methylation-driven lncRNAs might serve as novel prognostic biomarkers for predicting the prognosis of LUAD.
Recent studies have shown that the roles of lncRNA in tumorigenesis and metastasis can indicate that lncRNA may function as a novel biomarker for the diagnosis and prognosis of cancer [39][40][41][42][43][44]. LncRNA TUBA4B has been reported to serve as a new predictor for prognosis and modulate cell viability in non-small-cell lung cancer [45]. lncRNA AFAP1-AS1 may act as an oncogenic to facilitate the migration of non-small-cell lung cancer (NSCLC) [16]. Long noncoding RNA ANRIL acts as an oncogene by silencing KLF2 and P21 expression to promote the development of NSCLC [46]. LncRNA PANDAR acts as a cancer suppressor gene by regulating Bcl-2 to affect cell apoptosis in NSCLC [47].
In the present study, we retrieved methylation and lncRNA and mRNA expression from the TCGA database by using bioinformatics analysis and obtained methylation-driven lncRNAs and mRNAs to predict the prognosis of LUAD. First, we obtained differentially expressed methylation and lncRNA and mRNA using the MethylMix R package to obtain methylation-driven lncRNA and mRNA. Functional enrichment analysis was performed to analyze the methylation-driven mRNA to identify its biological functions in the regulation and development of LUAD. Furthermore, univariate and multivariate Cox regressions were performed to construct a predictive model for predicting the prognosis of LUAD. Finally, a combined methylation and lncRNA expression survival analysis was carried out, which might provide novel insight to predict the diagnosis and prognosis of LUAD. In the present study, we combined methylation and lncRNA and mRNA expression data with survival analysis to identify 4 lncRNAs and 8 mRNAs that act as independent prognostic factors for predicting the diagnosis and prognosis of LUAD. LINC01354 acts as a ceRNA to predict the early diagnosis and prognosis of colorectal  , and S1PR1 act as methylation-driven genes to reveal prognostic biomarkers in LUAD [19]. GKN2 may contribute to the homeostasis of gastric epithelial cells by inhibiting GKN1 activity [49]. F2RL1 may act as novel acute myeloid leukemia subsets that are meaning  for treatment guidance [50]. MPV17L acts as a unique interacting protein and regulator of HtrA2 protease, mediating antioxidant and antiapoptotic functions in mitochondria [51]. Compared with previous studies, our study first obtained differentially expressed mRNA and lncRNA using the edge R package and aberrant methylated genes using the limma R package, and then we filtered low expression genes and intersected mRNA and lncRNA expression data with DNA methylation data to obtain methylation-driven genes by using the MethylMix R package. The MethylMix (https ://bioco nduct or.riken .jp/packa ges/3.1/bioc/html/Methy lMix.html) is an algorithm implemented to integrate DNA methylation with RNA expression to identify methylation-driven genes in cancers [52]. In summary, MethylMix provides a tool that contributes to the analysis of methylation-driven lncR-NAs and mRNAs in cancer studies from TCGA [17,18]. However, the MethylMix focuses on identifying cis-regulatory effects of DNA methylation on gene expression and does not currently model trans-regulatory effects [18]. Further studies are needed to solve the multiple testing challenge on identifying trans-regulatory effects of DNA methylation on gene expression. Our study may provide a novel method for determining disease-specific prognostic biomarkers in LUAD and may play a significant role in predicting the diagnosis and prognosis of LUAD.
Our study subjects were retrieved from the TCGA database, which is a significant tool for analyzing prognostic biomarkers. It is not known whether our results are applicable to other groups. The predictive prognostic lncRNA and mRNA signature needs to be verified by molecular biologic experiments on clinical samples in future studies. Eventually, large-scale samples and experimental studies could validate the biological function of prognostic biomarkers in LUAD.

Conclusion
In conclusion, our study identified methylation-driven mRNA and lncRNA by using bioinformatics analysis from the TCGA database. A Cox predictive model was performed to identify independent prognostic factors. Methylation and gene expression data combined with survival analysis was used to identify LUAD-specific, methylation-driven lncRNAs and mRNAs for predicting the diagnosis and prognosis of LUAD.

Abbreviations
LncRNA: long noncoding RNA; LUAD: lung adenocarcinoma; GO: gene ontology; NSCLC: non-small-cell lung cancer; FC: fold change; FDR: false discovery rate; TCGA : The Cancer Genome Atlas Project; DAVID: The Database for Annotation, Visualization and Integrated Discovery; ROC: receiver operating curve; AUC : area under the ROC curve.