A potential signature of eight long non-coding RNAs predicts survival in patients with non-small cell lung cancer

Accumulated evidence suggests that dysregulated expression of long non-coding RNAs (lncRNAs) may play a critical role in tumorigenesis and prognosis of cancer, indicating the potential utility of lncRNAs as cancer prognostic or diagnostic markers. However, the power of lncRNA signatures in predicting the survival of patients with non-small cell lung cancer (NSCLC) has not yet been investigated. We performed an array-based transcriptional analysis of lncRNAs in large patient cohorts with NSCLC by repurposing microarray probes from the gene expression omnibus database. A risk score model was constructed based on the expression data of these eight lncRNAs in the training dataset of NSCLC patients and was subsequently validated in other two independent NSCLC datasets. The biological implications of prognostic lncRNAs were also analyzed using the functional enrichment analysis. An expression pattern of eight lncRNAs was found to be significantly associated with overall survival (OS) of NSCLC patients in the training dataset. With the eight-lncRNA signature, patients of the training dataset could be classified into high- and low-risk groups with significantly different OS (median survival 1.67 vs. 6.06 years, log-rank test p = 4.33E−09). The prognostic power of eight-lncRNA signature was further validated in other two non-overlapping independent NSCLC cohorts, demonstrating good reproducibility and robustness of this eight-lncRNA signature in predicting OS of NSCLC patients. Multivariate regression and stratified analysis suggested that the prognostic power of the eight-lncRNA signature was independent of clinical and pathological factors. Functional enrichment analyses revealed potential functional roles of the eight prognostic lncRNAs in tumorigenesis. These findings indicate that the eight-lncRNA signature may be an effective independent prognostic molecular biomarker in the prediction of NSCLC patient survival.

commonly defined as RNA transcripts longer than 200 nucleotides with little coding capacity [4,5]. Though the functions of only a limited number of lncRNAs have been well characterized, accumulating evidence has suggested that lncRNAs participate in a wide variety of biological processes, including cell differentiation, organogenesis, chromatin modification, genomic imprinting, dosage compensation, respond to diverse stimuli and so on, by exerting their functions as four archetypes: signals, decoys, guides and scaffolds [6,7]. lncRNAs can regulate gene expression at the post-transcriptional level via competing endogenous RNA (ceRNA) crosstalk or at the transcriptional level via cis or trans and at the epigenetic regulation level [8][9][10]. Recently, a number of cancer-related studies have detected many dysregulated lncRNAs associated with tumorigenesis and tumor progression in a variety of cancers [11][12][13]. Like proteincoding genes and miRNA, some dysregulated lncRNAs play oncogene-like roles. For instance, HOTAIR is an lncRNA that is overexpressed in breast tumors and significantly associated with breast cancer metastasis [14]. Overexpression of lncRNA PCAT-1 is associated with poor prognosis in patients with colorectal cancer (CRC) [15]. Other well-studied lncRNAs, such as MEGS, GAS5, LIN00312 and LinRNA-p21, have instead demonstrated tumor suppressive roles [16,17]. For example, lncRNA LIN00312, which is significantly down-regulated in nasopharyngeal carcinoma (NPC), was found to be an independent contributor to NPC [18]. These findings suggest that, like protein-coding genes and miRNAs, lncRNAs could serve as diagnostic and prognostic biomarkers. Li et al. [19] measured lncRNA expression in paired tumors and adjacent normal tissues of 119 patients and identified a three-lncRNA signature that could predict the survival of patients with oesophageal squamous cell carcinoma (OSCC). Recent studies have also demonstrated emerging roles of lncRNAs in NSCLC [20]. For example, lncRNA MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) is up-regulated in NSCLC based on evidence from subtractive hybridization of cDNA libraries, and can be used as an independent prognostic marker of patient survival [21]. White and colleagues [22] found 111 differentially expressed lncRNAs between lung tumors and adjacent normal tissues, some of which have been functionally validated to be involved in cellular proliferation in vitro. Nie et al. [23] identified an lncRNA MVIH which is over-expressed in NSCLC tissues compared with adjacent normal tissues. Subsequent studies, integrating custom-designed gene microarray and clinical information, also discovered lncRNA signatures that were significantly associated with the survival of patients with gliolastoma multiforme [24], colorectal cancer [25] and breast cancer [26]. Other recent studies have characterized tens of lncRNAs that were identified to be associated with the presence of certain lung cancer histological subtypes [27,28]. While the prognostic power of mRNA and miRNA signatures in various cancers is well established, the power of lncRNA signatures in predicting the survival of patients with NSCLC has not yet been investigated.
In the present study, we conducted a comprehensive study of lncRNA expression profiles across 603 NSCLC patients with clinical information by repurposing the previously published NSCLC gene expression profiles, and identified an eight-lncRNA signature associated with survival. A risk score formula was constructed based on the expression data of these eight lncRNAs in the training dataset of NSCLC patients and was further confirmed in another two independent gene expression omnibus (GEO) NSCLC patient cohorts.

Microarray processing and lncRNA profile mining
All the microarray raw data (.CEL files) of three NSCLC cohorts were obtained from the GEO database and processed using the robust multichip average (RMA) algorithm for background adjustment [32,33]. The Affymetrix GeneChip probe-level data was log-2-scale transformed and standardized by transforming the expression data into having a mean of 0 and a standard deviation (SD) of 1. The NetAffx probe set sequences for Affymetrix HG-U133 Plus 2.0 were downloaded from the Affymetrix website (http://www.affymetrix.com). LncRNA expression data from the Affymetrix-based expression profiling of NSCLC cohorts were obtained by repurposing microarray probes based on the sequences of the probe sets and the annotations of lncRNAs in GENCODE (http://www.gencodegenes.org/) (GRCh38, release 21) [34], as previous described [35]. By keeping probes that were uniquely mapped to the genomic coordinate of lncRNAs derived from GENCODE, 3,521 probes (or probe sets) and 2,313 corresponding lncRNA genes were obtained. Multiple probes (or probe sets) mapping to the same gene were integrated by using the arithmetic mean of the values of multiple probes (or probe sets) to generate a single gene expression value (on the log2 scale).

Statistical analysis
A univariable Cox regression analysis was performed to evaluate the relationship between the continuous expression level of each lncRNA and patients' overall survival (OS) in the training dataset. Only those lncRNAs with a p value of <0.005 were considered statistically significant.
To construct a predictive model, each of the selected lncRNA genes was analyzed using a multivariable Cox regression model in the training dataset, with OS as the dependent variable and other clinical information as the covariables. A risk score was then computed as follows: where N is the number of prognostic lncRNA genes, Exp i is the expression value of ln cRNA i , and Coe i is the estimated regression coefficient of ln cRNA i in the multivariable Cox regression analysis. This risk score model was established by taking into account the power of each of the prognostic lncRNA genes.
Using the median risk score in the training dataset as a cutoff value, NSCLC patients in each dataset were divided into high-and low-risk groups. Kaplan-Meier survival analyses were performed to test the equality for survival distributions in different groups for each NSCLC cohort, and statistical significance was assessed using the two-sided log-rank test. Additionally, a multivariable Cox regression analysis and data stratification analysis were performed to test whether the risk score was independent of other clinical features within the available data. The time-dependent receiver operating characteristic (ROC) curve was also used to compare the sensitivity and specificity of the survival prediction of the lncRNA expression-based risk score in the training dataset. Area under the curve (AUC) value was calculated from the ROC curve. All analyses were performed using R software and Bio-conductor. Significance was defined as p < 0.05.

Bioinformatics analysis of lncRNA gene function prediction
The co-expressed relationships between the prognostic lncRNAs and protein-coding genes were computed using Pearson correlation coefficients. Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) enrichment analyses of the co-expressed protein-coding genes with prognostic lncRNAs were performed to predict the biological function of prognostic lncRNAs using the DAVID Bioinformatics Tool (version 6.7), which is a commonly used functional annotation tool that can assess over-representation of functional categories among a gene set of interest [37]. Enrichment analysis was carried out using the functional annotation chart and functional annotation clustering options, and was limited to KEGG pathways and GO terms in the "Biological Process" categories. Functional annotation with p value of <0.05 and an enrichment score of >2 were considered significant.

Derivation of an eight-lncRNA prognostic signature from the training dataset
The NSCLC patient cohort from GSE37745 (n = 196), including the relatively large patient sample size and relatively overall clinical information, was selected as training dataset to explore the association between lncRNA expression and OS of NSCLC patients. We first conducted a univariate Cox proportional hazards regression analysis of the lncRNA expression data with OS as the dependent variable, and identified a set of eight lncRNAs as prognostic lncRNAs which were significantly correlated with patients' OS (p value of <0.005). Table 2 shows a list of these eight prognostic lncRNAs along with important variable information. Of the eight lncRNAs, the higher expression level of lncRNA RP11-21L23.2, GPR158-AS1, RP11-701P16.5 and RP11-379F4.4 was associated with shorter OS (coefficient >0), and the higher expression levels of the remaining four lncRNAs (CTD-2358C21.4, RP11-94L15.2, KCNK15-AS1 and AC104134.2) were associated with longer OS (coefficient < 0). Then we further examined whether these eight prognostic lncR-NAs are differentially expressed between cancer and normal lung tissue. The lncRNA differential expression analysis was performed for GSE18842 dataset (including 46 tumor and 45 normal lung tissue specimens) (http://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc=GSE18842) [38] obtained from GEO database. We found that five of eight prognostic lncRNAs showed significant expression differences between tumor and normal lung tissue (Mann-Whitney U test p < 0.05) (Additional file 1: Figure S1), demonstrating that these selected prognostic lncRNAs are associated with lung cancer.

An eight-lncRNA signature predicts survival of NSCLC patients in the training dataset
To investigate whether the eight-lncRNA signature could provide an accurate prediction of OS in NSCLC patients, the expression data of these eight lncRNAs and other clinical features were fitted into a multivariable Cox regression model as covariates of the training dataset. A risk score was generated for each patient in the training dataset according to the risk-score model (see "Methods") as follows: To evaluate how well the risk score predicts the 5-year survival, the various cutoff values were evaluated using time-dependent ROC curve ( Figure 1a) which is commonly used for revealing the predictive accuracy of a model [39,40]. In the training dataset, AUC for the eight-lncRNA signature prognostic model was 0.78 at an OS of 5 years, demonstrating the better performance for survival prediction of the lncRNA expression-based risk score in the training dataset. All patients in the training dataset were then ranked according to their risk score, and divided into either the A significant association between the eight-lncRNA signature risk score and OS was observed in the univariable Cox regression model ( Table 3). The hazard ratios of the eight-lncRNA signature risk score of the high-risk group versus that of the low-risk group for OS was 2.641 [p < 0.001; 95% confidence interval (CI) 1.887-3.697; Table 3].
The distribution of risk score, survival status and prognostic lncRNA expression in 196 patients of the training dataset are shown in Figure 1c. Of these eight prognostic lncRNAs, the high expression level of lncRNA RP11-21L23.2, GPR158-AS1, RP11-701P16.5 and RP11-379F4.4 was associated with high risk, while the remaining four lncRNAs (CTD-2358C21.4, RP11-94L15.2, KCNK15-AS1 and AC104134.2) were shown to be protective. NSCLC patients with high prognostic scores tended to express high-risk lncRNAs, whereas those with low prognostic scores tended to express protective lncRNAs. Moreover, more deaths were noted for NSCLC patients with highrisk scores than for those with low-risk scores.

Figure 1
The eight-lncRNA signature-focused risk score in prognosis of overall survival in the GSE37745 patient set. a Receiver operating characteristic (ROC) analysis of the risk scores for overall survival prediction in the training dataset. The area under the curve (AUC) was calculated for ROC curves, and sensitivity and specificity were calculated to assess score performance. b The Kaplan-Meier curve for overall survival of two patient groups with high-and low-risk scores in the GSE37745 training set (n = 196). The differences between the two curves were evaluated by the twosided log-rank test. c The eight lncRNA-based risk score distribution, patients' survival status and heatmap of the eight lncRNA expression profiles. The black dotted line represents the cutoff value of the risk score derived from the training set which separated patients into high-and low-risk groups.

Validation of the eight-lncRNA signature for survival prediction in the testing GSE31210 dataset
To validate the prognostic power of the eight-lncRNA signature for survival prediction, the constructed expression-defined lncRNA prognostic model was also evaluated in the testing GSE31210 dataset. The same prognostic risk score model obtained from the training dataset was used to calculate the eight-lncRNA signature-based risk scores for 226 patients in the entire GSE31210 dataset. The cutoff value of the risk score derived from the training dataset without re-estimating parameters was used for the testing dataset to classify the patients into either a high-risk group (n = 111) or a low-risk group (n = 115  Table 3). The distribution of patient lncRNA risk score, survival status and prognostic lncRNA expression in 226 patients of the GSE31210 dataset are shown in Figure 2b, revealing the similar results observed in the GSE37745 training dataset.

Further validation of the eight-lncRNA signature in another independent dataset
To investigate the reproducibility of the eight-lncRNA signature in predicting OS of NSCLC patients, the prognostic power of the eight-lncRNA signature for prediction of survival was further validated in another independent NSCLC cohort of 181 patients whose expression and survival data were obtained from GEO GSE50081. The clinical feature of this independent NSCLC cohort is shown in Table 1. Patients in this independent NSCLC cohort were classified into either a high-risk group (n = 90) or a low-risk group (n = 91) according to the cutoff value of risk scores obtained from the training dataset. The median OS of the high-risk group for the GSE50081 dataset is 4.29 years, whereas that of the low-risk group is 4.99 years (log-rank test p = 1.26E−02). Kaplan-Meier curves for the high-and low-risk groups within the independent GSE50081 cohort is shown in Figure 2c. Further univariable Cox regression analysis revealed that the high-risk scores of eight-lncRNA signature was significantly associated with lower OS of patients in GSE50081 dataset (p = 1.40E−02; HR = 1.795, 95% CI 1.127-2.859; Table 3). Figure 2d shows the distribution of patient risk scores, the survival status and prognostic lncRNA expression in the independent GSE50081 NSCLC cohort, ranked according to the prognostic risk score values for the eight-lncRNA signature, which were similar to those observed in the training and GSE31210 datasets.

Survival prediction by the eight-lncRNA signature is independent of clinical features
To assess whether the prognostic power of the eight-lncRNA signature for prediction of survival was independent of other clinical features, multivariable Cox regression analysis was performed using the lncRNA signature-based risk score and other clinical features, including age, gender, smoking status, tumor stage and subtype, which were used as covariates. The results of multivariable Cox regression analysis from three NSCLC patients datasets showed that the prognostic power of the eight-lncRNA signature risk score (highrisk group vs. low-risk group, HR = 2.761, 95% CI 1.934-3.942, p < 0.001 for GSE37745; HR = 2.643, 95% CI 1.263-5.528, p = 0.01 for GSE31210; HR = 1.752, 95% CI 1.014-3.026, p = 0.044 for GSE50081) for prediction of survival was indeed independent of these clinical features (Table 3). We also found that the two clinical factors, age and stage, also affected overall survival of patients. So, a data stratification analysis was performed according to age and stage. The three GEO datasets (GSE37745, GSE31210 and GSE50081), which included a total of 603 patients, were stratified by age into either a younger stratum (age ≤65) or an elder stratum (age >65). The results of stratified analysis showed effective prognostic power in both the younger and elder patient groups. The eight-lncRNA signature could classify patients within each age stratum into either high-or low-risk groups with significantly different OS (log-rank test p = 4.46E−05 for the younger patient group and p = 6.61E−06 for the elder patient group) (Figure 3a,  b), which suggested that the prognostic power of the eight-lncRNA signature was also age-independent. Then the patients of early (stage I and II) and late (III and IV) stage for GSE37745 dataset were grouped into two separate groups. The stratified analysis was further performed in early and late stage patients to evaluate whether the eight-lncRNA signature could predict survival of patients for different clinical stage. The log-rank test of early stage patients showed that within stage I and II, the eight-lncRNA signature could further subdivide them into either a high-risk group with shorter survival or a low-risk group with longer survival (median OS 2.03 vs. 8.05 years, log-rank test p = 7.81E−09) (Figure 3c). Difference for OS between high-risk group (n = 18) and low-risk group (n = 13) also was observed for late stage patients (median OS 0.975 vs. 3.367 years) (Figure 3d), although the log-rank p value is 0.253 which was above the 0.05 significance level.

Functional characterization of the eight prognostic lncRNAs
To further investigate the potential biological roles involving the eight prognostic lncRNAs, the co-expressed relationships between the expression of eight lncRNAs and those of the protein-coding genes were computed using Pearson correlation coefficients in the GSE37745 dataset of 196 patients. The expression of 679 proteincoding genes were highly correlated with that of at least one of the eight signature lncRNAs (Pearson correlation coefficient >0.40). GO and KEGG pathway function enrichment analysis for these co-expressed protein-coding genes was then performed, using the whole human genome as the background. The results showed that four genes (GATA6, CRISPLD2, CFTR2 and CLPTM1L) have been proven to be involved in lung cancer. GO functional annotation suggested that 679 protein-coding genes were significantly enriched in 28 GO terms (Figure 4a). These significant GO terms were organized into an interaction network with similar functions using the Enrichment Map [41] plugin in Cytoscape [42]. Several clusters of functionally related GO terms were observed including organ development and cell proliferation and immune, response to stimulus, catabolic and metabolic process (Figure 4b). Taken together, these results implied that the eight lncRNAs might be involved in tumorigenesis through interacting with protein-coding genes that affect the tissue/organ development and other important biological processes.

Discussion
During the past few decades, considerable efforts have been made toward the development of gene-expression-based diagnostic and prognostic biomarkers for lung cancer at the protein-coding genes and miRNAs levels [43,44]. However, accumulating evidence suggested that lncRNA are involved in oncogenic and tumor suppressive pathways have opened the door for this new biomarker. Transcriptional profiling analyses have discovered a number of tissue-specific lncRNAs in normal tissues and dysregulated lncRNAs in a variety of human cancers [11,45], and highly aberrant expression of dysregulated lncRNAs is associated with tumorigenesis [17]. Furthermore, these dysregulated lncRNAs have already shown great potential as novel molecular biomarkers for diagnosis, prognosis and treatment of cancer. More recently, several studies conducted arraybased transcriptional analyses of lncRNAs and functionally characterized cancer subtype-associated lncRNAs in breast cancer and lung cancer, proposing a novel clinical implication for lncRNAs as valuable biomarkers for prediction of response to treatment as well as patient outcome [27,46]. Compared to protein-coding genes, the advantage of lncRNAs as molecular biomarkers is that lncRNA expression is more closely associated with its biological function and tumor status [16,47]. However, to date, expression profile-based prognostic lncRNA signature for prediction of survival of NSCLC patients has not been investigated.
Recently, several studies have reported that lncRNA expression profiles can be obtained from publicly available, custom-designed DNA microarrays by re-annotating the array probes [19,25,26,35,47]. In this study, microarray probe re-annotation was used to repurpose the publicly available human Affymetrix microarray data (HG-U133 Plus 2.0) and subsequently obtain lncRNA expression profiles of 603 NSCLC patients from GEO. To identify lncRNAs with prognostic value in NSCLC, survival analysis was performed by integrating lncRNA expression profiles and clinical information in a large cohort of NSCLC patients. An expression pattern of eight lncRNAs was found to be significantly associated with OS of NSCLC patients in the GSE37745 training dataset. Further ROC analysis demonstrated good performance for predicting 5-year OS. A prognostic risk score model was developed based on the expression data of these eight lncRNAs and weighted by the estimated regression coefficients from multivariable Cox regression analysis. With this eight-lncRNA signature, patients in the training dataset with high-risk scores tended to have lower OS than those with low-risk scores. The separation between survival curves for high-and low-risk patients of the training dataset used for model development was observed. A previous simulation study revealed that a prognostic model can also be developed that is significantly associated with survival time in the training dataset when using completely random gene expression profiles [48]. To evaluate the robustness and reproducibility of the prognostic power of the eight-lncRNA signature, it was also tested in the non-overlapping two other independent NSCLC patient cohorts (GSE31210 and GSE50081) using the same model and criteria as those from the training dataset. In these tests, the prognostic power was also strong, indicating that the eight-lncRNA signature demonstrated good reproducibility and robustness for the NSCLC patients.
Several studies have observed different clinical characteristics and survival time among different age groups of NSCLC patients [49][50][51]. Multivariable Cox regression analysis was thus used to assess the independence of the eight-lncRNA signature in predicting OS. With age, gender, smoking status, stage and subtype as covariables in the regression analysis, risk score of the eight-lncRNA signature was found to have maintained an independent correlation with OS. In the stratified analysis, the eight-lncRNA signature showed prognostic power for different age groups, in which patients belonging to the same age group could be classified into high-and low-risk groups with significantly different survival prospects, indicating that the prognostic value of the eight-lncRNA signature was independent of age of the NSCLC patients. In lung cancer, clinicopathological parameters like tumor histology, staging and localization of metastases determine patients' outcome [52]. Since tumor stage and subtype data was only available for the GSE37745 patient dataset, multivariate Cox regression analysis and stratified analysis were performed to assess the stage-and subtypeindependence of prognostic power of the eight-lncRNA signature. The eight-lncRNA signature was indeed found to be stage-dependent in NSCLC patients, and its prognostic power was significant in stage I and II patients, in which all patients in stage I and II could be separated into high-and low-risk groups with significantly different survival. However, the eight-lncRNA signature achieved a p value of 0.253 for OS prediction of late stage patients, which was above the 0.05 significance level, suggesting that patients with early stage cancer may benefit significantly from this eight-lncRNA prognostic signature. Further multivariate Cox regression analysis testing tumor subtype-independence suggested that prognostic power of the eight-lncRNA signature is independent of tumor subtype. Taken together, these results suggest that the prognostic power of the eight-lncRNA signature for predicting OS of NSCLC patients is independent of other clinical features except for stage.
Tens of thousands of lncRNAs have been identified and predicted by large-scale transcriptome analysis in humans [53]. However, the functions of only a few lncRNAs have been well characterized, so no thorough functional annotation data is available for the eight prognostic lncRNAs in the current literature. Recent bioinformatics studies have suggested that the function of lncRNAs could effectively be predicted with the inclusion of different kinds of biological data. To increase our understanding of the biological roles of the eight prognostic lncRNAs in NSCLC, functional enrichment analysis was performed for 679 proteincoding genes co-expressed with the eight prognostic lncRNAs at the GO and KEGG pathway level. The biological processes most highly associated with the genes were organ development, cell proliferation and immune, response to stimulus, catabolic and metabolic process. In particular, several co-expressed protein-coding genes with eight prognostic RNAs were proven to participate in the NSCLC pathway. These results implied important functional roles of the eight prognostic lncRNAs in tumorigenesis.

Figure 3
Survival analyses of all patients with available age or tumor stage information using the eight-lncRNA signature. a Kaplan-Meier survival curves for younger patients with NSCLC (age ≤65, n = 337). b Kaplan-Meier survival curves for elder patients with NSCLC (age >65, n = 226). c Survival prediction in early stage patients: Kaplan-Meier survival curves for all patients with stage I and II (n = 165). d Survival prediction in late stage patients: Kaplan-Meier survival curves for all patients with stage III and IV (n = 31).
Due to the restriction of available data, gene expression profiles of only 2,313 of the tens-of-thousands of known and predicted lncRNAs were obtained. However, the prognostic power of the eight-lncRNA signature uncovered in this study for predicting OS consistently observed in multiple independent datasets. Moreover, the incompleteness and low coverage of available lncRNA-related datasets are common when by studying lncRNA-disease associations. Although the functions of these eight lncRNAs have been inferred by bioinformatics analysis, the biological roles of these eight lncRNAs in tumorigenesis are still not clear and should be investigated in further experimental studies. With the rapid increase of lncRNA-related studies, more comprehensive lncRNA will become available, and lncRNA biomarker development for clinical prognostic evaluation of NSCLC should increase.

Conclusions
In summary, by examining lncRNA expression profiles of patients with NSCLC, our study identified eight lncRNAs associated with overall survival of NSCLC patients. A prognostic lncRNA signature was developed based on the expression patterns of these eight lncRNAs in the training dataset to predict the overall survival, and subsequently was validated in other two independent datasets. Further analysis demonstrated that the prognostic power of the eight-lncRNA signature for prediction of survival was independent of other clinical features. Our results suggested that the eight-lncRNA signature may be an effective independent prognostic molecular biomarker in the prediction of NSCLC patient survival.

Figure 4
Functional enrichment results of the co-expressed protein-coding genes with prognostic lncRNAs. a Significantly enriched GO terms of the co-expressed protein-coding genes with prognostic lncRNAs. b The functional enrichment map of GO terms. Each node represents a GO term, which are grouped and annotated by GO similarity. A link represents the overlap of shared genes between connecting GO terms. Node size represents the number of gene in the GO terms. Color intensity is proportional to enrichment significance.