Prognosis of clear cell renal cell carcinoma (ccRCC) based on a six-lncRNA-based risk score: an investigation based on RNA-sequencing data

Background The scientific understanding of long non-coding RNAs (lncRNAs) has improved in recent decades. Nevertheless, there has been little research into the role that lncRNAs play in clear cell renal cell carcinoma (ccRCC). More lncRNAs are assumed to influence the progression of ccRCC via their own molecular mechanisms. Methods This study investigated the prognostic significance of differentially expressed lncRNAs by mining high-throughput lncRNA-sequencing data from The Cancer Genome Atlas (TCGA) containing 13,198 lncRNAs from 539 patients. Differentially expressed lncRNAs were assessed using the R packages edgeR and DESeq. The prognostic significance of lncRNAs was measured using univariate Cox proportional hazards regression. ccRCC patients were then categorized into high- and low-score cohorts based on the cumulative distribution curve inflection point the of risk score, which was generated by the multivariate Cox regression model. Samples from the TCGA dataset were divided into training and validation subsets to verify the prognostic risk model. Bioinformatics methods, gene set enrichment analysis, and protein–protein interaction networks, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes analyses were subsequently used. Results It was found that the risk score based on 6 novel lncRNAs (CTA-384D8.35, CTD-2263F21.1, LINC01510, RP11-352G9.1, RP11-395B7.2, RP11-426C22.4) exhibited superior prognostic value for ccRCC. Moreover, we categorized the cases into two groups (high-risk and low-risk), and also examined related pathways and genetic differences between them. Kaplan–Meier curves indicated that the median survival time of patients in the high-risk group was 73.5 months, much shorter than that of the low-risk group (112.6 months; P < 0.05). Furthermore, the risk score predicted the 5-year survival of all 539 ccRCC patients (AUC at 5 years, 0.683; concordance index [C-index], 0.853; 95% CI 0.817–0.889). The training set and validation set also showed similar performance (AUC at 5 years, 0.649 and 0.681, respectively; C-index, 0.822 and 0.891; 95% CI 0.774–0.870 and 0.844–0.938). Conclusions The results of this study can be applied to analyzing various prognostic factors, leading to new possibilities for clinical diagnosis and prognosis of ccRCC.


Background
The role of genomes in biological processes has become better understood in recent decades, as researchers have gradually come to recognize the roles of individual transcripts in particular. New high-throughput sequencing technologies have enabled the detection of novel transcripts through increased sensitivity. These recent advances have facilitated more comprehensive and more thorough research into the effects of transcription and translation [1][2][3]. At present, much is understood about messenger RNAs and other RNAs, including transfer RNAs, small nuclear RNAs, small nucleolar RNAs, and micro RNAs, but the roles, types, and biological significance of long non-coding RNAs (lncRNAs) have yet to be elucidated [4][5][6].
Kidney cancer is one of the most prevalent urinary tract cancers in adults. In the United States, a total of 63,900 new cases of kidney and renal pelvis cancers were projected (40,610 and 23,380 for male and female patients, respectively), with an estimated 14,400 deaths (9470 and 4930 for males and females, respectively) in 2017 [7]. With approximately 3% mortality for all cases, the rate continues to soar [7]. In China, 66,800 cases of kidney cancer were newly diagnosed, with a 2.34% mortality rate in 2015 [8]. Histologically, clear cell renal cell carcinoma (ccRCC) is the most widespread kidney cancer subtype, constituting 70% of kidney cancers, followed by kidney renal papillary cell carcinoma (10%) and chromophobe renal cell carcinoma (5%) [9][10][11].
Recently, lncRNAs have been revealed to play a role in tumorigenesis, disease development, and metastasis in ccRCC, in both oncogenic and tumor-suppressing roles that modulate a number of biological and pathological processes [12][13][14][15][16][17]. Nevertheless, scant prognosis-related research has been conducted on lncRNAs in ccRCC, and more lncRNAs are assumed to influence ccRCC progression via their own molecular mechanisms. Thus, the present study aimed to investigate the prognostic significance of differentially-expressed lncRNAs by mining high-throughput RNA-sequencing data from The Cancer Genome Atlas (TCGA). A risk score based on 6 novel lncRNAs exhibited superior prognostic value for ccRCC outcomes.

Patient cohort from TCGA dataset
RNA sequencing (RNA-Seq) raw counts data (level 3) from ccRCC patients, which were generated using the Illumina HiSeq RNASeq platform, were obtained from the TCGA data portal (https ://tcga-data.nci.nih. gov/tcga/). These data corresponded to 539 ccRCC tissues and 72 adjacent non-tumorous renal tissue samples deposited on or before May 31, 2017. The ultimate status of the ccRCC patients in our study was captured as overall survival (OS) data. The average follow-up period was 44.9 months. The data were retrieved from TCGA, which is a community resource project offering data for research; approval from the local ethics committee was not necessary for the current study, as it complied with TCGA publication principles and data use policies.

Assessment of differentially expressed lncRNAs
The ccRCC RNA-Seq data contained 60,483 messenger RNAs, including 13,198 lncRNAs that have been labelled in NCBI (https ://www.ncbi.nlm.nih.gov/) or GENCODE databases (http://www.genco degen es.org/). Differentially expressed lncRNAs were assessed using edgeR and DESeq packages for the R statistical computing environment (using adjusted P < 0.05 and |log 2 FC| > 2 thresholds, respectively) [18,19]. The expression level of each lncRNA was assessed using DESeq. The lncRNA expression data were displayed as log 2 -transformation. The final candidate lncRNAs were determined using the two R packages. Student's t-tests (SPSS 22.0, IBM Corp., Armonk, NY) were employed to assess differential expression of the 6 candidate lncRNAs for discriminating between ccRCC and non-cancerous kidney tissues.

ccRCC prognosis capabilities based on differentially expressed lncRNAs
The differentially expressed lncRNAs for which relative expression levels were below 1 in more than 10% of all subjects were eliminated from subsequent analyses. Similarly, lncRNAs were excluded if they lacked adequate clinical information. The final prognostic analysis included a total of 530 samples with expression data for 370 lncRNAs. Samples from the TCGA dataset were divided into training and validation sets, which were randomly selected from 530 tumor samples to verify the prognostic risk model.
The prognostic significance of lncRNAs was primarily measured by univariate Cox proportional hazard regression (P < 0.01). Statistically significant indicators, including lncRNAs, were further confirmed via multivariate Cox stepwise regression. Furthermore, the relationships between the expression of these 6 lncRNAs and various clinicopathological features were assessed by Student's t-tests and Spearman correlation analysis.

Clinical role of the risk score generated by the key lncRNAs
An lncRNA-based prognosis risk score was generated from a linear combination of the expression level multiplied by the regression coefficient acquired from the multivariate Cox regression model (β) with the following formula as previously reported [20,21]: The β value is the estimated regression coefficient of the lncRNA derived from the multivariate Cox stepwise regression analysis and e indicates the expression profiles of the lncRNA.
Based on the cumulative distribution curve inflection point of the risk score, ccRCC patients were categorized into high-and low-score cohorts. Univariate and multivariate Cox proportional hazards regression analyses were conducted to further assess the efficacy of this prognostic risk score, and adjustments were made based on risk score, race, sex, age, tumor stage, distant metastasis, lymph node metastasis, neoplasmic cancer status, clinical stage, and tumor grade. Hazard ratios (HRs) with 95% confidence intervals (CIs) were examined. A time-dependent receiver operating characteristic (ROC) curve analysis within 5 years was also performed with the R package survival ROC in order to calculate the prognostic accuracy of the model for time-dependent disease outcomes. Kaplan-Meier (K-M) survival curves were assessed to determine correlations between all parameters (clinical aspects and six-lncRNA-based risk scores) and ccRCC patient OS. A concordance index (C-index) was used to measure the predictive accuracy and discriminative ability of the nomograms.
A ROC curve was used to assess the prognostic effectiveness of the six-lncRNA-based risk scores for clinical progress of ccRCC patients. A two-sided P-value < 0.05 threshold was used to assess corresponding results as statistically significant. SPSS 22.0 (IBM Corp.) was utilized for these statistical analyses.

Different signaling pathways between high-and low-risk groups
Gene set enrichment analysis (GSEA) was carried out using GSEA software (http://www.broad insti tute.org/ gsea) with the MSigDB C2 CP canonical pathways gene set collection [22][23][24][25][26][27]. A total of 60,483 genes were imported for GSEA. Gene sets with a nominal P-value less than 0.05 and a false discovery rate (FDR) value less than 0.25 were considered to be significantly enriched. For the most important pathways, protein-protein interaction (PPI) network analysis was also performed using the Search Tool for the Retrieval of Interacting Genes (STRING) database (http://www.strin g-db.org/) [28,29]. Differentially expressed genes (DEGs) were identified using the edgeR package with P adj < 0.01 and |log 2 FC| > 3 [30][31][32][33] between the high-and low-risk score groups for ccRCC and normal kidney samples. The DEG results were rendered as volcano plots and heatmaps. Identified DEGs were used to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses with the DAVID online tool (http://david .abcc. ncifc rf.gov/) [28,29].

Validation by Gene Expression Omnibus DataSets and International Cancer Genomics Consortium database
We collected the relevant microarrays from Gene Expression Omnibus (GEO) DataSets to validate the clinical roles of the six lncRNAs, the following search terms were used: (kidney OR nephridium OR renal) AND ("clear cell") AND (cancer OR carcinoma OR tumor OR neoplas* OR malignan* OR adenocarcinoma OR ccRCC) [28,34]. Differences in lncRNA expression levels between different groups were assessed using Student's t-tests. Furthermore, we searched ccRCC dataset through the International Cancer Genomics Consortium (ICGC) database (https ://icgc.org/) to verify to verify the effectiveness of prognostic model.

Differentially expressed ccRCC lncRNAs
The analysis of 60,483 TCGA messenger RNAs revealed the differential expression of 13,198 lncRNAs based on the results of the R packages edgeR and DESeq. Significantly differentially expressed lncRNAs (n = 869) were obtained for subsequent prognostic analysis (Fig. 1). Among these 869 lncRNAs, 555 were upregulated and 314 were downregulated.
The association between the expression of the 6 identified lncRNAs and clinicopathological features were further analyzed by t-test. CTA-384D8. 35 The independent prognostic features of these 6 lncRNAs. Survival analysis of these 6 lncRNAs was shown with Kaplan-Meier survival curves importantly, as shown in Table 2 and Figs. 4 and 5, the levels of these 6 lncRNAs predicted the clinical progression of ccRCC.
Clinical role of the six-lncRNA-based risk score Next, the six-lncRNA-based risk score for predicting OS was calculated using a formula consisting of the expression level multiplied by the regression coefficient derived from the multivariate Cox regression model (β) values: The ccRCC patients were classified into two cohorts, high-and low-risk groups, according to the cumulative distribution curve inflection point of the six-lncRNAbased risk score (Fig. 6). We gauged the differences in expression levels for these 6 lncRNAs between the high-and low-risk cohorts. Compared with the low-risk group, expression of LINC01510 was lower in the highrisk group, yet the expression of the other 5 lncRNAs was higher in the high-risk group (Fig. 6). K-M curves indicated that the median survival time of patients in the high-risk group was 73.5 months, which was much shorter than that of the low-risk group (112.6 months, P < 0.05; Fig. 7a). Furthermore, the risk score predicted 5-year survival of ccRCC patients across the entire set (AUC at 5 years, 0.683; C-index, 0.853; 95% CI 0.817-0.889). Moreover, the training and validation sets showed similar performance (AUC at 5 years, 0.649 and 0.680,

Table 2 Association between six lncRNAs and clinical features of clear cell renal cell carcinoma (ccRCC) patients
Italic represented the difference was statistically significant Tumor stage (T3-4/T1-2) respectively; C-index, 0.822 and 0.891; 95% CI 0.774-0.870 and 0.844-0.938) (Fig. 7). Additionally, the risk score HR generated by univariate Cox regression was 2.372 (95% CI 1.712-3.288, P < 0.001), and multivariate Cox proportional hazards regression analysis demonstrated an accordant HR of 1.693 (95% CI 1.181-2.425, P = 0.004), which confirmed that the six-lncRNA-based risk score was an independent indicator of ccRCC patient survival (Table 3). Meanwhile, the prognostic value of a diversity of clinicopathological parameters was also explored. The K-M methodology revealed that the age, tumor stage, distant metastasis, cancer status, clinical stage, and grade could predict the outcome (Fig. 8). Some parameters were discovered to exhibit prognostic value through univariate analysis; nevertheless, it was demonstrated by multivariate analysis that age, metastasis, cancer status, and grade appeared statistically significant (Table 3).

Functional evaluation of the differentially expressed genes in high-and low-risk groups
Volcano plots and heatmaps of DEGs in high/low-risk score group of ccRCC and normal kidney samples were created (Figs. 11 and 12). GO terms and KEGG pathways are shown in Additional file 2: Table S2, Additional file 3:  Table S3, Additional file 4: Table S4, Additional file 5: Table S5 which suggests that different pathways were enriched between the high-and low-risk groups.
GSEA was also performed to investigate related biological processes and signaling pathways [12]. We compared the gene profiles of ccRCC patients in the high-and lowrisk groups categorized by the six-lncRNA-based risk score. The gene sets with significantly different expression (FDR < 0.25 and nominal P < 0.005) were used for GSEA. In total, 6 pathways were found to be significantly enriched in the high-risk group, including primary immunodeficiency, olfactory transduction, allograft rejection, autoimmune thyroid disease, and immune network for IgA production. By contrast, GSEA revealed that the gene sets in the low-risk group were enriched in 152 pathways including several cancer related pathways, such as the ERBB signaling pathway, WNT signaling, and the WNT pathway in cancer (Fig. 13). The associated biological pathways are shown in Tables 5 and 6 as assessed by GSEA, as well as in Additional file 2: Table S2, Additional file 3: Table S3, Additional file 4: Table S4, Additional file 5: Table S5. PPI networks were also analyzed for the genes involved in the 'Renal cell carcinoma pathway, ' and several hub genes, such as PIK3CA, VEGFA, and PIK3CB were noted (Additional file 6: Fig. S1).

Validation of these lncRNAs using Gene Expression Omnibus DataSets and International Cancer Genomics Consortium (ICGC) database
In total, 4030 items (GSE = 248, GPL = 96) were identified from the GEO DataSets through our searching strategies. The standard process for retrieval and inclusion is shown in Additional file 7: Fig. S2. Some annotation for these 6 lncRNAs was found in the following platforms  (Table 7, Fig. 14).
Renal Cell Cancer (RECA-EU) data was selected from the International Cancer Genomics Consortium (ICGC) database, containing 91 ccRCC tissues and 45 adjacent non-tumorous renal tissue samples. Three of the six lncRNAs were matched, including CTD-2263F21.1, LINC01510 and RP11-426C22.4. Differential expression and prognostic value analysis of these three lncR-NAs were performed. The differential expression of these three lncRNAs was meaningful (P < 0.05) and consistent with the results of TCGA. Kaplan-Meier survival curves of CTD-2263F21.1 and RP11-426C22.4 also showed the value of their predicted survival (P < 0.05) (Fig. 15).

Discussion
This study analyzed TCGA sequencing data to discover effective prognostic biomarkers for ccRCC, which have the potential to guide future clinical and basic medical studies. First, we analyzed the statistical significance of differentially-expressed lncRNAs in ccRCC patients using the R packages edgeR and DESeq, and systematically assessed their prognostic value. Notably, the best prognostic value was achieved using a pool that consisted of 6 lncRNAs (CTA-384D8.35, CTD-2263F21.1, LINC01510, RP11-352G9.1, RP11-395B7.2, and RP11-426C22.4), which were obtained via multivariate Cox regression. The resulting six-lncRNA-based risk score accurately predicted the progression and prognosis of ccRCC. With ccRCC patients classified into high-and low-risk groups, we discovered that differentially-expressed genes in these two groups were dissimilar, and the essential signaling pathways were unique as well (Additional file 8: Fig. S3).
Some ccRCC studies have already utilized lncRNA expression profiling. Similarly, studies on lncRNA interactions with other molecules have been on the rise in recent years. The most frequently used research techniques for assessing lncRNA expression profiles of renal cell carcinoma (RCC) include microarray assays and ChIP-Seq experiments [16,[35][36][37][38]. However, these studies were limited by their small sample sizes and insufficient focal lncRNAs. In 2018, Liu et al. published a paper on a novel lncRNA profile reveals potential prognostic biomarkers in clear cell renal cell carcinoma. The expression profile of 1801 lncRNAs of ccRCC patients was obtained using TCGA RNASeqv2 system [39]. To enable a comprehensive understanding of lncRNAs in ccRCC, the present study mined high-throughput TCGA data from 530 patients and analyzed 13,198 lncRNAs. 869 differentially expressed lncRNAs were assessed using edgeR and DESeq packages, and used for subsequent analysis. In the study of Qu et al. and Liu et al., there were only 51 and 247 differentially expressed lncRNAs, respectively [39,40].
Several studies have revealed that abnormal expression levels of lncRNAs are correlated with OS, 5-year survival, disease-free survival, disease grade and stage, recurrence, and metastasis. However, each previous study mainly focused on a single lncRNA. For example, an undesirable prognosis for RCC patients was connected with decreased expression of the lncRNAs NONHSAT123350, CADM1-AS1, TCL6, and lnc-ZNF180-2 [41]. Furthermore, increased expression of SPRY4-IT1, RCCRT1, MALAT1, LINC00152, and PVT1 also indicated unsatisfactory results [41]. Owing to the popularity of high-throughput TCGA data, the use of sequencing data was considered an ideal approach to discover novel lncRNAs. Therefore, using multiple statistical methods for prognostic  importantly, the pool composed of these 6 lncRNAs was the basis for a risk score that provided a superior means of predicting disease progression and prognosis.   11 Volcano plots of differentially expressed genes (DEGs) in high-and low-risk groups. Volcano plots of DEGs were generated using the edgeR package in R with P adj < 0.01 and |log 2 FC| > 3. a High-risk score group. b Low-risk score group Fig. 12 Heatmaps of differentially expressed genes (DEGs) in high-and low-risk groups. Heatmaps of DEGs were generated using the edgeR package in R with P adj < 0.01 and |log 2 FC| > 3. a High-risk score group. b Low-risk score group Very recently, Shi et al. [42] used TCGA reads per kilobase of exon model per million mapped reads (RPKM) data to categorize 9669 lncRNAs from 440 kidney cancer patients into a training set (n = 220) and a testing set (n = 220). They discovered that expression of a five-lncRNA signature (consisting of AC069513.4, AC003092.1, CTC-205M6.2, RP11-507K2.3, and U91328.21) was closely associated with kidney cancer patient OS. Using the training set, lncRNAs were identified with a univariate Cox regression model, and these five lncRNAs were closely linked to patient OS. The five-lncRNA-based risk score was confirmed in both the testing set and the entire set. However, the results of Shi et al. [42] were inconsistent with ours as the five lncRNAs in their study did not overlap with the six lncRNAs in ours. However, the analysis that we conducted had the following advantages. First, more samples were included in our study (n = 539). Second, more lncRNAs were annotated (n = 13,198). Third, we simply analyzed those differentially-expressed lncRNAs for their prognostic value. If lncRNAs exerted inconsiderable influences on tumorigenesis, their prognostic value 2) showed no remarkable differences in expression between ccRCC and non-cancerous renal tissues (Additional file 9: Fig. S4). The reason for this result may be that the value of RPKM data was not suitable for using edgeR to analyze differentially expressed genes [43]. We investigated the prognostic significance of the six lncRNAs identified in the present study based on the premise that their expression patterns exhibited noticeable differences between cancerous and non-cancerous tissues. Consequently, the six lncRNAs identified in the present study (i.e., CTA-384D8.35, CTD-2263F21.1, LINC01510, RP11-352G9.1, RP11-395B7.2, and RP11-426C22.4) functioned not only at the outset of tumorigenesis but also in tumor progression. Fourth, taking other factors into consideration, we applied multivariate Cox proportional hazards regression analysis to discover novel biomarkers with prognostic value, which guaranteed a more valid and comprehensive result. Fifth, the ccRCC dataset was divided into training and validation sets to verify the prognostic efficacy of the six-lncRNA-based signature. Sixth, using GEO and ICGC datasets for validation, we found a ccRCC-related series consisting of 248 samples from the GEO Datasets. CTA-384D8.35, CTD-2263F21.1 and RP11-426C22.4 Table 6   had prognostic value for patients with ccRCC, and the clinical value of three lncRNAs (CTA-384D8.35, RP11-395B7.2, and LINC01510) was also partly verified by six microarrays. Lastly, a total of 530 cases of ccRCC were divided into high-and low-risk groups, and differences in pathways between the two groups were also investigated. Moreover, the potential signaling pathways and molecular mechanism in ccRCC were explored for their influences on prognosis. Through GSEA, it was determined that the six novel lncRNAs may play unique roles in ccRCC via specific signaling pathways. 'Pathway in cancer' (321 genes) includes multiple pathways, such as the 'Renal cell carcinoma pathway' (49 genes). Hub genes in the 'Renal cell carcinoma pathway' based on PPI analysis, such as PIK3CA, VEGFA, and PIK3CB, were noted and have also been observed to play vital roles in ccRCC [44][45][46][47][48][49]. Interestingly, PIK3CA has been identified as a direct target of miR-490-5p and miR-19a in renal carcinoma [44,45]. VEGFA was the most important trigger for angiogenesis [46], and it was the target of miR-185, which acted as a tumor suppressor in ccRCC [47]. VEGFA was also reported to act as a stimulus of ccRCC cell migration, invasion, and angiogenesis [48]. Thus, these six novel lncRNAs may begin their function by activating genes in the 'Renal cell carcinoma pathway. ' In addition to the 'Renal cell carcinoma pathway, ' by modulating the 'Wnt signaling pathway, ' the lncRNAs CCAT2 and Kindlin-2 appear to promote clear cell renal cell carcinoma progression [50,51]. We also found that the top three KEGG pathways for DEGs of patients in the high-risk group included KEGG_PRIMARY_IMMU-NODEFICIENCY, KEGG_OLFACTORY_TRANSDUC-TION, and KEGG_ALLOGRAFT_REJECTION, while in the low-risk group the three most dominant pathways were KEGG_UBIQUITIN_MEDIATED_PROTEOLYSIS, KEGG_OXIDATIVE_PHOSPHORYLATION, and KEGG_PEROXISOME. There were some identified pathways that differed between the high-and the low-risk groups. As the six lncRNAs that we detected were novel and no relevant research has been conducted on their functions, the above analysis of the signaling pathways offers prospects into future research on their molecular mechanisms.
In many cancers, gene expression signatures and prognostic models have proven to be useful tools for predicting clinical outcomes and prognostic value based on molecular characteristics that drive pathogenesis. For example, Brooks et al. [52] developed a 34-gene subtype predictor to classify ccRCC tumors according to good risk (ccA) and poor risk (ccB) subtypes and built a subtype-inclusive model to predict patient survival outcomes. Their model provides prognostic stratification and improves the established algorithms to assess risk of recurrence and death in patients with non-metastatic ccRCC. However, the detection of 34 indicators presents a significant clinical burden. Additionally, a 16-gene recurrence score (RS) assay was developed and validated previously to predict the risk of disease recurrence in patients with stage I-III RCC after nephrectomy [53]. This study used data from the phase-III adjuvant sunitinib (S-TRAC) trial in high-risk phase-III RCC to provide additional validation of the 16-gene RS assay. The strong prognostic performance of the 16-gene RS assay was confirmed in the S-TRAC study, and the RS assay is now supported by IB level data. However, primary analysis focused on patients with T3 RCC and additional studies are needed to determine if RS predicts adjuvant treatment benefits. The (cell cycle progression) CCP score, based on levels of 31 cell cycle genes and 15 control genes from the tumor, had prognostic value in predicting metastatic progression after resection of organ-confined ccRCC by univariate analysis and multivariate logistic regression modeling [54]. The CCP score also had prognostic utility in a second TCGA renal cancer cohort with M1 metastasis at time of surgery. However, because the study cohort was relatively small, other genes in addition to CCP genes may still provide meaningful prognostic information. Because the assay used here was originally derived from prostate cancer, the ideal ccRCC gene set may differ from the genes evaluated in this study.

Conclusion
In conclusion, by using TCGA data to evaluate lncRNAs from 530 ccRCC patients, we developed an effective six-lncRNA-based risk score, which has potential as a novel prognostic biomarker for ccRCC. However, this clinical finding needs further confirmation. Additionally, the function and molecular mechanisms of these novel lncRNAs also require in vitro and in vivo exploration.