Identification of a four long non-coding RNA (lncRNA) Signature for Predicting Prognosis of Patients with Non-Small Cell Lung Cancer: a Multicenter Study in China CURRENT REVIEW

Background: This study aims to identify a long non-coding RNA (lncRNA) signature for predicting survival in non-small-cell lung carcinoma (NSCLC) patients and providing additional prognostic information to the tumor node metastasis (TNM) staging system. Methods: NSCLC cases from a hospital were divided into a discovery cohort (n=194) and validation cohort (n=172) and analyzed using a custom lncRNA microarray. Another 73 cases obtained from another hospital were assayed using quantitative reverse transcriptase polymerase chain reaction (qRT-PCR). The differentially expressed lncRNAs were detected by significance analysis of microarrays (SAM) program and used for identifying those associated with survival in the discovery cohort, which were then employed to construct a prognostic lncRNA signature using a risk-score method. The signature was then confirmed in the validation and independent cohort as well. Results : The discovery cohort was found to comprise of 305 lncRNAs, which showed differential expression between the NSCLC and the corresponding normal lung tissues, a 4-lncRNA signature was identified that was found to significantly correlate with the survival of the NSCLC patients. This signature was further validated in the validation and independent cohort. Moreover, multivariate Cox analysis demonstrates that the 4-lncRNA signature is independent of the TNM staging system.as a risk-score model. The receiver operating characteristic (ROC) curve indicates that the prognostic value of the combined model is significantly higher than that of TNM staging alone in all the cohorts. Conclusions: This study identified a 4-lncRNA signature, which is a powerful prognostic biomarker which related to patient survival in addition to the traditional TNM staging system. score model combining the 4-lncRNA signature and the TNM staging was developed and was demonstrated to have a superior power in predicting OS and DFS in all three cohorts as compared to the TNM staging system using Kaplan-Meier survival analysis and ROC analysis. These findings demonstrated that the 4-lncRNA signature can significantly improve the prognostic accuracy of TNM staging and could be considered as a marker for risk assessment in NSCLC patients. Combination of the 4-lncRNA signature with the traditional TNM staging parameters might be a powerful predictor of prognosis in NSCLC patients with the potential to facilitate selection of more aggressive patients who would benefit from adjuvant therapy.


Background
Lung cancer is the most common and lethal malignance in the world, about 85% of which are nonsmall cell lung cancer (NSCLC). [1] In clinical practice, delayed diagnosis and lack of effective prognostic biomarkers are the main reasons for poor survival of NSCLC cases. [2,3] While in the late stages of lung cancer, only 15% of patients are known to survive for five years, 83% of patients with stage I can survive for 5 years. [4] Currently, the treatment strategy and prognosis of lung cancer are mainly determined by the TNM staging system. However, NSCLC patients with the same TNM stage may have different prognosis. [2,5,6] Therefore, the development of new biomarkers that can potentially improve the accuracy of prognosis thereby enhancing the quality of life of patients and the survival rate is warranted. [7] The development and advancement of high-throughput technology has enabled numerous studies to propose a single gene or a gene set (signature) as a biomarker for tumor diagnosis, prognosis, classification, personalized treatment, and so on. Genomic abnormalities such as DNA mutations, copy number variation, DNA methylation, and gene expression have been investigated for identification of prognostic biomarkers in NSCLC patients. Microarray and RNA-seq high-throughput technologies allow us to simultaneously analyze hundreds and thousands of genes and their relationship with clinical features including survival in cancer, leading to a large number of novel biomarkers (single genes or signatures) for diagnosis, prognosis and targeted therapy of NSCLC patients. [8,9] However, only a few molecular biomarkers (mainly as therapeutic targets) have been applied in clinical practice [10] because most of the biomarkers show low accuracy (including low sensitivity and specificity) [11] or need further confirmation with large sample sizes in the independent validation study. [12] Therefore, biomarkers that are more reliable are still needed for diagnosis, prognosis, and personalized therapy of cancer.
Long non-coding RNAs (lncRNA) that exist in large quantities in the body have exhibited a superior potential as novel diagnostic or prognostic biomarkers as compared to protein-coding genes and raise the possibility of finding more reliable biomarkers for lung cancer. [13,14] LncRNA is a kind of noncoding RNA larger than 200 nucleotides with no protein-coding capacity. [15,16] A large number of studies have shown that lncRNA can participate in numerous biological processes, such as epigenetic regulation, cell cycle regulation, and cell differentiation regulation. Growing evidence shows that a large number of lncRNAs are significantly dysregulated in various types of cancers and thereby play important roles in tumorigenesis. [17][18][19] An increasing number of lncRNAs have been proved to be dysregulated and involved in the tumorigenesis of lung cancer and therefore, can be used as biomarkers for diagnosis and prognosis or targets for therapy. For example, lncRNA MALAT1 and NEAT1 play important roles in lung cancer cell proliferation, cell cycle, and apoptosis as well as in tumor progression and prognosis. [20][21][22][23][24] Inhibitor targeting MALAT1 has been shown to significantly reduce lung cancer metastasis in a mouse model. [20] The prognostic role of lncRNA signatures in NSCLC has been investigated in many reports using the data downloaded from the Gene Expression Omnibus (GEO) database or The Cancer Genome Atlas (TCGA) database. However, an lncRNA expression profile especially for identifying prognostic signatures in a large cohort of NSCLC patients based on a multicenter study has not been reported yet. Therefore, detailed elucidation of the prognostic value and the clinical application potential of lncRNA signatures in NSCLC patients is warranted.
This study, to the best of our knowledge, is the first multicenter retrospective study assessing the prognosis of 439 NSCLC patients using a custom lncRNA microarray and qRT-PCR. NSCLC patients from South China were randomly divided into discovery cohort (194 cases) and validation cohort (172 cases) and those from Southwest China were used as an independent validation cohort (73 cases).
LncRNA expression levels in NSCLC tissues were determined using a custom lncRNA microarray in the discovery and validation cohorts and a 4-lncRNA signature was established to predict overall survival (OS) and disease-free survival (DFS) for NSCLC patients in the discovery cohort. The prognostic value of the novel 4-lncRNA signature was then validated in the validation cohort and further confirmed in the independent validation cohort by qRT-PCR.

Patients and Clinical information
A total of 439 samples were collected for this study from the patients who underwent radical resection of lung cancer from the Sun Yat-Sen University Cancer Center (three hundred sixty-six cases) and Yunnan Cancer Hospital (seventy-three cases) between 2003 and 2008. Samples including cancer tissues and corresponding adjacent normal tissues were obtained from each case. The inclusion criteria for the study were: i) all cases confirmed as NSCLC by pathological diagnosis and reviewed by two experienced pathologists; ii) Cases which had not received any form of anti-tumor therapy before surgery; iii) Cases which survived more than a month after surgery; iv) Collected samples preserved at -80 °C immediately after collection. Firstly, the 366 samples collected from the Sun Yat-Sen University Cancer Center were divided randomly into the discovery cohort (194 cases) and the validation cohort (172 cases). Seventy-three NSCLC cases from Yunnan Cancer Hospital with the same criteria as described above were assigned to an independent cohort. Overall survival (OS) was defined as the time from the date of surgery to the date of death or last follow-up and diseasefree time (DFS) was defined as the time from the date of surgery to the date of first recurrence, distant metastasis, death or the last follow-up. The clinic-pathological characteristics of the patients in the three cohorts are listed in (Table 1). This study was reviewed and approved by the Ethical Committees of Sun Yat-Sen University Cancer Center and Yunnan Cancer Hospital. Written informed consent was obtained from each patient.

RNA extraction
RNA from tumor and normal lung tissues was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) and homogenized in a bullet blender (Vortex-Genie 2) according to the manufacturer's instructions. Briefly, 100 mg of tissues were added into 1 mL of TRIzol reagent, and homogenized in Bullet Blender at low temperature for 15 min and then incubated at 25℃ for 5 min. Chloroform was added to the mixtures and they were violently shaken for 15 s and left undisturbed at room temperature for 10 min followed by centrifugation at 4 °C and 12000 g for 15 min. The supernatant was then transferred into a new tube to which one equal volume of isopropyl alcohol was added and mixed. After standing for 10 min at room temperature, the mixture was centrifuged and the supernatant was discarded. The precipitate was washed with 75% alcohol and then the ethanol was removed by centrifugation. After evaporating the remaining residual ethanol, ddH 2 O was added to dissolve the RNA. Finally, the concentration and quality of the extracted RNA were measured in a ND-  Table S1 in Additional File 1. The PCR data were normalized by GAPDH expression and then by the median expression value of a given lncRNA in the corresponding samples. The relative quantification of lncRNA expression was presented as 2 −ΔΔCt .

LncRNA microarray fabrication and hybridization
Human lncRNA transcript sequences selected from the public lncRNA databases including LNCipedia, LncRNAdb, LncRNADisease, and EST database were used for designing probes in order to construct the in-house lncRNA microarray and 2,412 probes were successfully designed. The lncRNA microarray was fabricated in house and hybridized as described by previously reports [25]. RNA extracted from the 366 cases of lung cancer and normal lung tissues in the discovery cohort and validation cohort was subjected to lncRNA microarray examination. Briefly, each probe was mixed with printing buffer to a final concentration of 40 µmol/L and printed in duplicate on the cleaned glass slides (75 × 25 mm). The total RNA (2·0 µg) was labeled with 100 nmol/L of pCp-Cy5 (Jena Bioscience, Germany) in reverse transcription. The mixture of the labeled RNA sample and 1x hybridization solution was then hybridized onto the microarray for 12-18 h at 45 °C. After hybridization, the slides were washed in 1 × saline sodium citrate/1% sodium dodecyl sulphate (1 × SSC/1% SDS) for 10 min at 45 °C, followed by sequential washing in 2 cycles of 0·5 × SSC/0·1% SDS, 2 cycles of 0·2 × SSC and 1 cycle of purified water for 1 min at room temperature and then dried in a special small centrifuge and scanned using the InnoScan 700A Scanner (Innopsys Inc, France).

Microarray Data Procession
The raw microarray data were first subtracted with background and then normalized using the quantile method and log transformation. This log-transformed data was submitted and deposited to GEO database in National Center for Biotechnology Information website (GEO accession number: GSE143018) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE143018).
The lncRNAs differentially expressed between lung cancer and paired normal tissues were identified using the significance analysis of microarrays (SAM) program with the threshold of fold change > 1·25, P-value < 0·01 and false discovery rate < 0·01 (t test). Hierarchical clustering analysis for classification was applied to the samples of the discovery cohort using average linkage method and uncentered Pearson's correlation coefficients in MEV 4·2 version.

Statistical analysis
The correlation of the 4-lncRNA prognostic signature with clinical characteristics was assessed by Fisher's exact test and χ2 test. All these statistical analyses were done using the SPSS Version 23·0 software. The prognostic accuracy of the 4-lncRNA signature, TNM staging system, and combined risk

Results
LncRNA expression profile of NSCLC tissues detected by a custom microarray in a discovery cohort The 366 NSCLC patients from Sun Yat-Sen University Cancer center in South China were randomly divided into a discovery cohort and a validation cohort. The clinical characteristics of these patients are listed in Table 1 Identification of a 4-lncRNA prognostic signature for NSCLC patients in the discovery cohort The elucidation of the prognostic significance of lncRNAs in NSCLC involved univariate Cox regression analysis on all 305 differentially expressed lncRNAs in the discovery cohort. Based on the threshold of the P-value < 0·05, 15 lncRNAs were found to be significantly associated with OS of the lung cancer patients (Table 2), of which 6 lncRNAs were risky and 9 were protective.
The reliability and repeatability of the microarray results were confirmed by evaluating 5 out of the 15 selected prognostic lncRNAs by qRT-PCR in 30 pairs of samples randomly selected from the discovery cohort. Of the 5 lncRNAs, two (NEAT1 and XLOC_009261) were found to be up-regulated and three (XLOC_005302, XLOC_001306, and lnc-GAN1) were found to be down-regulated in the lung cancer tissues as compared to the normal lung tissues. The expression level ratios of the 5 lncRNAs in the cancer tissues to the normal adjacent tissues detected by qRT-PCR were consistent with the results obtained by microarray analysis (Fig. 1a) and significant correlations were found between qRT-PCR and microarray data of the five lncRNAs ( Fig. 1b-1f). These results reveal that the lncRNA expression levels detected by lncRNA microarray are reliable and reproducible which can be used for further analysis.
An optimal lncRNA combination (signature) for predicting the survival outcome in NSCLC patients was identified by employing the 15 lncRNAs associated with survival to establish a prognostic signature with a risk-score method as previously reported. [26,27] Using this method, a 4-lncRNA signature was established with the highest prognostic power, consisting of NEAT1, lnc-GAN1, ASLNC11245, and GSO_1539832_023. Based on the expression levels of the 4 lncRNAs measured by microarray and weighted by their corresponding regression coefficient derived from univariate Cox regression analysis, the risk score formula is as follow: A risk score was calculated for each patient using the risk-score formula and the scores were divided into high-and low-risk groups according to the median risk score. Kaplan-Meier survival analysis displays that patients with high-risk have remarkable poor OS and DFS than those with low-risk ( Fig. 2a), implying that this lncRNA signature could prove to be a highly effective potential prognostic signature for NSCLC patients.
Validation of the 4-lncRNA prognostic signature in NSCLC patients selected from a multicenter registry The prognostic value of the 4-lncRNA signature identified in the discovery cohort was verified by validating it in NSCLC patients from two different geographical areas, one used as an internal validation cohort and the other as an independent validation cohort. The 4-lncRNA signature was first tested in the validation cohort (172 NSCLC samples) acquired from the same center as the discovery cohort in South China. These NSCLC samples were also detected with the same lncRNA microarray as the discovery cohort and the risk scores were computed for each patient in the validation cohort using the same risk-score formula as used in the discovery cohort. Based on the risk scores, patients were classified into high-risk and low-risk groups. Survival analysis showed that patients with high-risk have much worse OS and DFS than those with low-risk (Fig. 2b), which is consistent with the results obtained in the discovery cohort.
The 4-lncRNAs prognostic signature was then tested in 73 more NSCLC samples (as an independent cohort) obtained from another medical center in Southwest China and the expression of the 4 lncRNAs was detected using qRT-PCR. Univariate Cox regression analysis was then performed on the 4 lncRNAs formulating a risk-score formula using the same method as in the discovery cohort: Risk score = (0·297 x NEAT1 level) + (-0·259 x Lnc-GAN1 level) + (-0·706 x ASLNC11245 level) + (-0·153 x GSO_1539832_023 level) The risk score for each of the patients in the independent cohort was calculated using the formula.
The median risk score was applied as the cutoff point and patients were categorized into high-and low-risk groups. As shown in Fig. 2c, OS and DFS of NSCLC patients in the high-risk group were found to be significantly worse than those in the low-risk group, which is in concordance with the results  (Table 4) and DFS (Table 5) Table 6). The independence of the signature as a predictive factor for survival was further confirmed by a stratified analysis on the three different clinical stages with the 4-lncRNA prognostic signature. Based on the risk score of the 4-lncRNA prognostic signature, patients in the same TNM stage (stage I, II, or III) were divided into high-or lowrisk subgroups. The results indicated that NSCLC patients with high-risk scores generally had significantly worse OS and DFS than those with low-risk scores (Fig. 3) in stages I, II and III, indicating that the prognostic signature is independent of the TNM staging system. These results, therefore, indicate that 4-lncRNA molecular signature is a powerful and independent prognostic factor for NSCLC patients.
The 4-lncRNA signature provides additional prognostic information to the TNM staging system in NSCLC patients In clinical practice, the traditional TNM staging system is the main approach for predicting the survival of patients with NSCLC and determining the treatment strategy. However, TNM staging system is mainly based on anatomic information and does not include the tumor biology factors. Therefore, this system is insufficient to predict survival outcome in NSCLC patients. [28] For example, Kaplan-Meier survival analysis on the three cohorts in this study showed that TNM stage system cannot effectively predict the prognosis of NSCLC patients in different stages, especially in stage Ⅰ and Ⅱ (Fig. 4). In order to improve the survival prediction of the TNM staging system, a new risk score model was established by combining the risk scores of the signature and the TNM staging systems. The low and high-risk cases were scored as 0 and 1, respectively while the stage I, II, and III were scored as 1, 2, and 3, respectively. Patients with the combined score of 1, 2-3, and 4 were classified as low-, medium-and high-risk, respectively. The Kaplan-Meier survival analysis was then performed on the patients with different combined risk scores in the three cohorts. The results showed that there was a significant difference in OS and DFS between patients with low-, medium-, and high-risk scores in the discovery cohort (Fig. 5a) and these results were confirmed in the validation and independent cohorts (Fig. 5b-5c).
The ROC analysis was then performed to compare the accuracy of the TNM staging system and the combined risk model. In the ROC curve analysis, the combined risk model achieved a significantly higher predictive accuracy for OS (AUC = 0·726 vs 0·644) and DFS (AUC = 0·723 vs 0·641) than the TNM staging system in the discovery cohort (Fig. 6a), and the same results were observed in the validation and the independent cohorts, respectively (Fig. 6b-6c). All these results proved that the 4-lncRNA signature could provide additional prognostic information and enhance the prognostic power of the TNM staging system.

Discussion
LncRNAs, a novel class of non-coding RNA, have been widely observed to be dysregulated and Among the four lncRNAs consisting of the signature, only NEAT1 has been reported to be linked with cancer. NEAT1 is aberrantly expressed in many human malignancies including lung cancer and functions as an oncogene. Higher NEAT1 expression is correlated with advanced TNM stages and lymphatic metastasis in NSCLC patients. [31] Previous studies have revealed that NEAT1 promotes epithelial mesenchymal transition (EMT) and metastasis in NSCLC via the Wnt/b-catenin pathway. [24,32] However, the association of NEAT1 with the survival of lung cancer patients has not been reported until now. Consistent with published reports, the results obtained in this study found NEAT1 expression to be significantly higher in NSCLC tissues compared with adjacent normal tissues (Fold change = 1.7). Moreover, this study reported for the first time that NEAT1 is an independent prognostic predictor for NSCLC patients (data not published). There is no available functional annotation for the remaining three lncRNAs (lnc-GAN1, ASLNC11245, and GSO_1539832_023) included in the prognostic signature, to the best of our knowledge. In the present study, these three lncRNAs were significantly down-regulated in lung cancer tissues compared with adjacent normal tissues (Fold change = 0·189, 0·749, and 0·785, respectively) and these higher levels could serve as indicators for good prognosis in patients with NSCLC.
Current treatment strategies for lung cancer include a comprehensive treatment plan including surgery, radiotherapy, chemotherapy, targeted therapy, gene therapy and immunotherapy. [33,34] With advances in molecular knowledge in the past 10 years, commonly mutated genes such as EGFR-TKIs (EGFR tyrosine kinase inhibitors), Programmed cell death protein 1 (PD-1) and Epidermal growth factor receptor (EGFR) super-family have been regarded as therapeutic targets in NSCLC [8,35].
Despite improved survival and the quality of life in NSCLC patients due to these therapies, the effect is far from satisfactory in many patients. Most of the patients experience drug-resistance or disease progression after receiving treatment for a certain period. [36,37]

Conclusions
The findings in this study revealed tumor specific lncRNA expression profile in NSCLC tissues and identified a novel prognostic signature based on 4 lncRNAs, which was proved a powerful and independent predictor for OS and DFS of NSCLC patients. Moreover, a prognostic model combining the 4-lncRNA signature and the TNM stage was developed to refine the current staging system and improve the prediction power. This study suggests that the 4-lncRNA classifier system might be a potential predictive biomarker with high precision for selection of high-risk patients who might benefit from adjuvant therapy and thus can guide personalized management of NSCLC patients.

Ethics approval and consent to participate
The study was approved by the Research Ethics Committee of Sun Yat-Sen University Cancer Center.
Research was conducted according to all ethical standards, and written informed consent was obtained from all patients.

Consent for publication
Consent to publish has been obtained from all authors.