A new immune signature for survival prediction and immune checkpoint molecules in lung adenocarcinoma

Background Lung adenocarcinoma (LUAD) is the most frequent subtype of lung cancer. The prognostic signature could be reliable to stratify LUAD patients according to risk, which helps the management of the systematic treatments. In this study, a systematic and reliable immune signature was performed to estimate the prognostic stratification in LUAD. Methods The profiles of immune-related genes for patients with LUAD were used as one TCGA training set: n = 494, other validation set 1: n = 226 and validation set 2: n = 398. Univariate Cox survival analysis was used to identify the candidate immune-related genes from each cohort. Then, the immune signature was developed and validated in the training and validation sets. Results In this study, functional analysis showed that immune-related genes involved in immune regulation and MAPK signaling pathway. A prognostic signature based on 10 immune-related genes was established in the training set and patients were divided into high-risk and low-risk groups. Our 10 immune-related gene signature was significantly related to worse survival, especially during early-stage tumors. Further stratification analyses revealed that this 10 immune-related gene signature was still an effective tool for predicting prognosis in smoking or nonsmoking patients, patients with KRAS mutation or KRAS wild-type, and patients with EGFR mutation or EGFR wild-type. Our signature was negatively correlated with B cell, CD4+ T cell, CD8+ T cell, neutrophil, dendritic cell (DC), and macrophage immune infiltration, and immune checkpoint molecules PD-1 and CTLA-4 (P < 0.05). Conclusions These findings suggested that our signature was a promising biomarker for prognosis prediction and can facilitate the management of immunotherapy in LUAD.

of personalized medicine harness the anti-tumor effects of the immune system to obtain a durable cure with minimal toxicity [9,10]. In recent years, the immune checkpoint proteins such as cytotoxic T-lymphocyte antigen 4 (CTLA-4) or the programmed cell death ligand 1/protein 1 pathway (PD-L1/PD-1) have been used as crucial targets for immunotherapy in many cancers, including LUAD [11,12]. For example, PD-L1 expression is a predictive biomarker for worse prognosis of NSCLC and the probability of clinical benefit from immune-modulating drugs is greater in NSCLC patients expressing PD-L1 [13,14]. Most studies on immune-related genes such as PD-L1, CD8A, and CD4 expression have been conducted in NSCLC [15][16][17]. Because the biological differences among the various histological subtypes might have a different therapeutic influence. Moreover, the molecular characteristics describing tumor-immune effects still remain largely unclear in LUAD. Therefore, we focused exclusively on the study of LUAD to further explore the prognostic and predictive significance of the immunerelated genes.
In our present work, multiple cohorts were utilized to construct and validate a novel immune prognostic signature for the stratification of LUAD. This study provided more in-depth insight into the prognostic stratification of patients with LUAD as well as provided a tumor-immune interaction with great promise for the therapeutic interventions of LUAD.

Study samples
The raw expression data (Workflow Type: HTSeq-Counts) for LUAD were enrolled from Genomic Data Commons Data Portal, which fulfilled the approval of the project by the consortium. The Cancer Genome Atlas (TCGA) data were normalized by the Trimmed Mean of M-values (TMM) method and the mean expression levels with ≤ 1 were excluded. Then, the expression data were transformed with log2. For the prognostic information, patients with incomplete follow-up time were excluded. Finally, 494 cases with LUAD had sufficient survival data recorded. The Gene Expression Omnibus (GEO) datasets by microarray were used in the patients with LUAD. IRON normalization was performed on the GSE72094, with log2 expression. The GSE31210 data were normalized by the MAS5 algorithm and the expression data were transformed with log2. The cases with insufficient follow-up time were excluded. Finally, 226 patients from GSE31210 (validation set 1) and 398 cases from GSE72094 (validation set 2) were included.
In this study, TCGA data was applied as a training set. The remaining two cohorts were applied as two validation sets. Overall survival (OS) was estimated from the date of the study enrollment to the recorded date of death of any cause or the last follow-up time. The clinicopathological characteristics of LUAD patients from the training and validation sets are listed in Table 1.

The functional analysis of immune genes
We applied the Immunology Database and Analysis Portal (ImmPort) database to select immune-related genes in immunology research [18,19]. Using univariate Cox regression analysis, 299 survival-related genes with P < 0.05 were found in the TCGA training set (Additional file 1: Table S1), where further analyzed for functional enrichment analysis. The functional analysis was conducted using clusterProfiler package [20] for gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.

Development of the immune-related gene prognostic signature
Immune-related genes were obtained from the ImmPort database (https ://immpo rt.niaid .nih.gov) (Additional file 2: Table S2). We performed the following process to establish the immune signature. First, univariate Cox regression analysis with P < 0.05 was conducted to find the potential immune-related prognostic genes. Three data sets were used to get overlapping genes, which can increase the credibility of the potential immune-related prognostic genes, the final 52 overlapping survivalrelated immune genes were found (Additional file 3: Table S3 and Additional file 4: Figure S1). Second, to achieve the final immune-related prognostic genes in our model, multivariate Cox regression analysis based on the two-step method was finally carried out for these 52 candidate immune-related genes in the TCGA training set. Finally, 10 immune-related genes were identified from the TCGA cohort in the present model (Additional file 5: Table S4).

Tumor-infiltrating immune cells
TIMER was applied to estimate the abundance of tumorinfiltrating immune cells in the tumor microenvironment (TME) (http://cistr ome.dfci.harva rd.edu/TIMER /), including six immune cell types: B cell, CD4 T cell, CD8 T cell, neutrophil, macrophage, and dendritic cell (DC). We investigated whether our signature played a role in immune infiltration.

Statistical analysis
Spearman's correlation coefficient was used to assess the association between risk score and tumor-infiltrating immune cells and immune checkpoint molecules. The cut-off value was determined based on the median value in the training set. Patients were divided into low-risk group and high-risk group. To examine significant differences of between the low-risk group and the corresponding high-risk group, Kaplan-Meier survival methods with the log-rank test were applied. To estimate the predictive accuracy of the immune-related gene signature, time-dependent receiver-operating characteristic (ROC) method was conducted in all datasets. Univariate and multivariate survival analyses were conducted using the Cox proportional hazard model to determine whether there was a significant relationship between the immune signature and the survival of patients with LUAD, along with the clinicopathological variables in the training and validation sets. Subgroup analyses were also carried out according to the stratification of age, gender, smoking status, tumor stage, KRAS mutation, and EGFR mutation status. All data were analyzed by using R software (version 3.5.1; R Foundation for Statistical Computing, Vienna, Austria).

Results
The molecular mechanism of immune-related genes 299 survival-related immune genes were performed for gene enrichment analysis. As shown in Fig. 1, KEGG analysis demonstrated that MAPK, Ras, Rap1, ErbB pathways, EGFR tyrosine kinase inhibitor resistance, proteasome, and endocrine resistance were associated with these immune genes (Fig. 1a). GO analysis showed that the immune-related mechanisms such as immune response-activating/regulating cell surface receptor signaling pathway, T cell activation, T cell receptor signaling pathway, antigen receptor-mediated signaling pathway, antigen processing and presentation, cytokine activity, and MHC protein complex etc. were enriched (Fig. 1b).

Construction of a 10 immune-related gene signature
We finally applied the multivariate Cox regression model to select the final genes in the training cohort. Eventually, 10 immune-related genes were used to construct a prognostic model in LUAD

A 10 immune-related gene signature predicts survival in LUAD
The distribution and survival status of patients for the 10 immune-related gene signature were shown in the training and validation sets (Fig. 2a). Time-dependent ROC results were used to evaluate the predictive capacity of a 10 immune-related gene signature at 1, 3, and 5 years (Fig. 2b). The AUCs (Area under the ROC curve) for the long-time survival at 5 years were > 0.70 in the training set and the other two validation cohorts, indicating a good accuracy of the 10 immune-related gene signature for survival prediction. Kaplan-Meier curves showed that patients with LUAD in the high-risk group had closely shorter prognosis than patients with LUAD in the low-risk group (all P values < 0.001) (Fig. 2c). To examine whether our 10 immune-related gene signature was an independent molecular factor for survival prediction in the training and validation sets, univariate and multivariate Cox models were further carried out in this study (

Predictive role of the 10 immune-related gene signature with the survival in various clinical and mutational characteristics
Stratification analyses were conducted based on age (≥ 65 vs. < 65 years), gender (male and female), smoking behavior (smoking and nonsmoking), tumor stage (stage 3-4: advanced-stage and stage 1-2: early-stage), KRAS mutation status (mutation and wild-type) and EGFR mutation status (mutation and wild-type) in the entire set (Figs. 3,  4). The cut-off value was 0.9325 and cases were divided into high-and low-risk groups. Because patients with various T, M, N, and TP53 mutation status were conducted in only a cohort, these clinical variables such as T, M, N, and TP53 mutation status were removed from subgroup analyses. For early-stage patients, the high-risk group indicated closely poor prognosis than the low-risk group (P < 0.05), but no significant prognostic relationship was observed between the high-and low-risk groups for advanced-stage patients (Fig. 4), which might cause owing to the small study population of advanced-stage patients. The results demonstrated that high-risk LUAD patients in each stratum of age, gender, smoking behavior, KRAS mutation, and EGFR mutation status presented worse survival than low-risk LUAD patients (all P values < 0.05) (Figs. 3, 4), suggesting that our 10 immunerelated gene signature-based risk group stratification was still an effective tool for survival prediction in older or younger, male or female, and smoking or nonsmoking patients with LUAD, patients with KRAS mutation or KRAS wild-type, and patients with EGFR mutation or EGFR wild-type. A stratification analysis of stage 1 and stage 2 tumors was also performed. The result showed that the high-risk group was closely correlated with poor prognosis in stage 1 and stage 2 LUAD tumors (P < 0.05) (Additional file 6: Figure S2), suggesting that our immune signaturebased risk group stratification was still an effective tool for prognosis prediction in stage 1 and stage 2 tumors. Moreover, multivariate Cox analysis within early stages of LUAD tumors demonstrated that our immune signature was still an independent molecular factor for predicting survival (HR = 3.93, 95% CI 2.38-6.5, P < 0.0001) (Additional file 7: Table S5).

Tumor-infiltrating immune cells
We analyzed whether our immune-related gene signature was related to immune infiltration in LUAD, such as B cell, CD4 T cell, CD8 T cell, neutrophil, macrophage, and dendritic cell (DC). As shown in Fig. 5,

Discussion
The most frequent histological subtypes of lung cancer are LUAD and frequently occurs in females and nonsmoking people [23]. LUAD is related to distinct oncogene alterations. The common somatic gene aberrations are EGFR and KRAS mutations, and ALK rearrangements, which have been extensively reported and studied in LUAD [24,25]. The frequency of oncogenic driver aberrations is different based on ethnicity, gender, and smoking behavior, which may result in differences in treatment efficacy [26]. In addition, in recent years reliable prognostic biomarkers have already appeared that are modifying the prognosis, which are applied to select groups of patients who are at high-risk and who may benefit from the personalized therapy [27][28][29], but their accuracy of survival estimation is not shown and remains limited. Therefore, it is imperative to identify precise biomarkers of LUAD and to select the appropriate immunotherapy for improving survival of this disease. In our current work, we constructed and validated a new prognostic signature based on 10 immune-related genes for survival prediction in patients with LUAD.
The immune system has a key role in cancer development progression [7,8]. Such as, MAPK signaling involves in cell proliferation, apoptosis and immune escape, which contributes to the progression of the tumor [30]. Cytokines regulate tumor growth, survival, and invasion, and metastatic colonization [31]. LUAD shows altered T cell and NK cell compartments [32]. Choi et al. reported that high ImmuneScore was associated with favorable prognosis in LUAD [33]. However, immune-related molecular mechanisms involved in LUAD remain largely unclear. In this study, immunerelated genes were found to be associated with immune regulation (i.e. immune response-activating/regulating cell surface receptor signaling pathway, T cell activation, and cytokine activity etc.) and biological pathways such as MAPK signaling in LUAD. Further 10 immune-related gene signature for survival prediction was identified in LUAD. Immune signatures can predict prognosis across solid tumors [34]. For example, a proposed clinicalimmune signature is significantly correlated with worse prognosis and can be a promising biomarker for predicting overall survival in ovarian cancer [35]. The seven immune-related gene signature is significantly related to poor survival and can serve as a potential marker for reflecting the prognosis in clear renal clear cell carcinoma [36]. A prognostic immune signature can predict cervical cancer patients' survival [37]. The nine immune-related gene signature is correlated with worse prognosis and is a potential predictive biomarker in hepatocellular carcinoma [38]. The previous findings were consistent with our current result that the high-risk patients were related to short survival time and worse survival of LUAD. Then, multivariate Cox analysis indicated that our immune signature remained an independent molecular indicator for survival prediction. These analyses confirmed that our immune signature was an effective and good performance for predicting prognosis in LUAD, which was reasonable and reliable. This model for LUAD contained 10 immune-related genes. Among them, FURIN correlates with many cancer-related processes such as cell proliferation, migration, invasion, and angiogenesis, which promotes tumor progression [39]. FURIN has also a crucial function in the adaptive immunity [40]. PSMD14 induces cell cycle and senescence and may participate in the role of the proteasome [41,42]. ARRB1 mediates signaling pathways and correlates with the regulation of DNA damage response [43]. ARRB1 participates in cell invasion and proliferation, thereby contributing to NSCLC progression [44]. TUBB3 is a key mechanism of drug resistance and TUBB3 expression shows predictive value for the prognosis in many cancers [45]. SHC3 regulates signal transduction, thereby stimulating hepatocellular carcinoma proliferation, migration, invasion, and epithelialto-mesenchymal transition (EMT) [46]. BMP5 activates multiple signaling pathways such as p38 MAPK signaling [47]. CD40LG polymorphism is related to various immunological disorders such as tumors [48]. Adrenomedullin (ADM) correlates with cellular growth, anti-apoptotic property, angiogenesis, tumor cell motility and metastasis, inflammatory and immune responses. And ADM is also a significant factor for worse survival in some cancers [49,50]. ZAP70 is related to the T cell antigen receptor (TCR) complex and is required for T cell activation [51,52]. RFXAP is a key transcription factor for major histocompatibility complex (MHC) II [53]. On the basis of the above findings, our work integrated 10 immunerelated genes into a single panel, found patients in the high immune-risk group associated with worse survival, and confirmed the predictive value of our signature in LUAD. The high abundance of T cell and B-cell is correlated with improved survival in many cancer types, including lung cancer [54]. A high level of tumor-infiltrating lymphocytes is associated with better prognosis of LUAD [55]. High B-cell and CD8+ T cell infiltration is reported to be associated with favorable prognosis in LUAD [54,56]. The immune checkpoint molecules PD-1 and CTLA-4 treatment is a promising cancer immunotherapy approach for clinical benefits [21]. Studies have reported the coexpression PD-1, PD-L1, PD-L2, and CTLA-4 immune checkpoint molecules [57]. In our study, we also found PD-1, PD-L1, PD-L2, and CTLA-4 immune checkpoint molecules were coexpressed in LUAD. Although our signature was negatively correlated with tumor-infiltrating immune cells and PD-1 and CTLA-4, the correlation was not very significant. Biomarkers of response to ICB do not frequently only focus on the target molecules, but rather inflammatory/interferon signaling cascades [58][59][60]. Thus, our signature may not have a significant response to immune checkpoint molecules. Additional data on the association of our model with other signatures (either a cytolytic signature, IFNG signature, chemokine signature, or the tumor inflammation signature) could be needed in the future.
The common somatic mutations are EGFR and KRAS in LUAD [24]. Hsiao et al. reported that EGFR mutations were closely correlated with therapeutic efficacy and progression-free survival in LUAD [61]. We further evaluated the predictive effect of our 10 immune-related gene signature in different clinical and molecular features. We found that our 10 immune-related gene signature remained a strongly powerful tool for predicting prognosis in older or younger, male or female, and smoking or nonsmoking patients with LUAD, patients with KRAS mutation or KRAS wild-type, and patients with EGFR mutation or EGFR wild-type. Our immune signature in the high-risk group was observed to be associated with worse prognosis for early-stage tumors. Further multivariate Cox analysis showed that our immune signature was still an independent molecular indicator for survival prediction in early-stage LUAD tumors.
These are several limitations in this work that should be noted. First, our study was a retrospective design, more prospective clinical data sets as possible are needed to validate our result in the future. Second, our immune signature was developed by numerous genes, further biological functions are warranted to be further explored in LUAD. Third, our immune signature was calculated based on the gene expression values. Thus, intra-tumor heterogeneity supported by genetic and phenomenological data, which could cause sampling bias.

Conclusion
In conclusion, a 10 immune-related gene signature was built for prognosis prediction in LUAD. This study demonstrated that this 10 immune-related gene signature may become a promising prognostic marker for LUAD, especially in early-stage patients, which