Systematic analysis of lncRNA–miRNA–mRNA competing endogenous RNA network identifies four-lncRNA signature as a prognostic biomarker for breast cancer

Background Increasing evidence has underscored the role of long non-coding RNAs (lncRNAs) acting as competing endogenous RNAs (ceRNAs) in the development and progression of tumors. Nevertheless, lncRNA biomarkers in lncRNA-related ceRNA network that can predict the prognosis of breast cancer (BC) are still lacking. The aim of our study was to identify potential lncRNA signatures capable of predicting overall survival (OS) of BC patients. Methods The RNA sequencing data and clinical characteristics of BC patients were obtained from the Cancer Genome Atlas database, and differentially expressed lncRNA (DElncRNAs), DEmRNAs, and DEmiRNAs were then identified between BC and normal breast tissue samples. Subsequently, the lncRNA–miRNA–mRNA ceRNA network of BC was established, and the gene oncology enrichment analyses for the DEmRNAs interacting with lncRNAs in the ceRNA network was implemented. Using univariate and multivariate Cox regression analyses, a four-lncRNA signature was developed and used for predicting the survival in BC patients. We applied receiver operating characteristic analysis to assess the performance of our model. Results A total of 1061 DElncRNAs, 2150 DEmRNAs, and 82 DEmiRNAs were identified between BC and normal breast tissue samples. A lncRNA–miRNA–mRNA ceRNA network of BC was established, which comprised of 8 DEmiRNAs, 48 DElncRNAs, and 10 DEmRNAs. Further gene oncology enrichment analyses revealed that the DEmRNAs interacting with lncRNAs in the ceRNA network participated in cell leading edge, protease binding, alpha-catenin binding, gamma-catenin binding, and adenylate cyclase binding. A univariate regression analysis of the DElncRNAs revealed 7 lncRNAs (ADAMTS9-AS1, AC061992.1, LINC00536, HOTAIR, AL391421.1, TLR8-AS1 and LINC00491) that were associated with OS of BC patients. A multivariate Cox regression analysis demonstrated that 4 of those lncRNAs (ADAMTS9-AS1, LINC00536, AL391421.1 and LINC00491) had significant prognostic value, and their cumulative risk score indicated that this 4-lncRNA signature independently predicted OS in BC patients. Furthermore, the area under the curve of the 4-lncRNA signature associated with 3-year survival was 0.696. Conclusions The current study provides novel insights into the lncRNA-related ceRNA network in BC and the 4 lncRNA biomarkers may be independent prognostic signatures in predicting the survival of BC patients. Electronic supplementary material The online version of this article (10.1186/s12967-018-1640-2) contains supplementary material, which is available to authorized users.


Background
Breast cancer (BC) is a heterogeneous and malignant neoplasm derived from breast tissue, and accounts for about 16% of all cancers and 22.9% of invasive cancers in women [1]. The most common cause of the BC-related mortality is metastasis [2,3]. Currently, BC diagnosis and prognosis is evaluated on the basis of disease stage, histological grade, and the expression level of hormone receptors [4]. However, clinical and pathological symptoms have limited predictive value in detecting early BC, and the clinical outcomes are highly variable on account of its heterogeneity. In addition, the underlying molecular mechanisms of BC still remain unclear. Therefore, it is vital to identify potential molecular diagnostic markers and/or therapeutic targets to combat BC, especially the invasive form.
Long non-coding RNAs (lncRNAs) is a class of ncRNA over 200 nucleotides long [5], and are reportedly involved in a number of cellular processes, for example, transcriptional and post-transcriptional regulation [5,6]. Due to their strong tissue specificity, lncRNAs are potentially effective early diagnostic biomarkers of various cancers [7]. Identification of a BC specific lncRNA biomarker may therefore be of clinical significance for the diagnosis and prognosis of BC. Several lncRNAs have been reported to be associated with BC initiation and progression [8,9], and although some have been found to predict clinical outcomes for BC, the results are inconsistent due to limited tissue samples. Furthermore, studies without large sample size are also not able to determine with statistical power whether these lncRNAs are associated with survival or other clinical factors. The Cancer Genome Atlas (TCGA) is an open-access and large-scale database which can provide multidimensional molecular profiles for a large number tumor samples. To increase the statistical reliability of our studies, we identified BC specific lncRNAs using data obtained from TCGA database.
The competing endogenous RNA (ceRNA) hypothesis presented by Salmena et al. [10] was proposed as a novel regulatory mechanism between ncRNA and coding messenger RNA. LncRNAs contain miRNA-response elements (MREs) which function as ceRNAs, and play a key role in various pathological processes like tumorigenesis [11]. Zhang et al. [12] have proved the biological role of lncRNA related-ceRNAs in glioblastomas. A recent study has demonstrated that lncRNA NUTF2P3-001 acts as a ceRNA to communicate with KRAS by competitively binding to hsa-mir-3923, and the up-regulation of NUTF2P3-001 reverses the suppressive effect of hsa-mir-3923 on KRAS, leading to the proliferation and invasion of pancreatic cancer [13]. In addition, the aberrant expression of 7-lncRNA signature (called LncRisk-7) led to differential gene expression via a dysregulated lncRNA-associated ceRNA network, contributing to pancreatic ductal adenocarcinoma progression [14]. Collectively, these findings show that dysregulation of important lncRNAs in the ceRNA network also disrupt the miRNA-mediated lncRNA/mRNA ceRNA interactions and therefore contribute to cancer initiation and progression [15,16]. Nevertheless, very little information is available on BC ceRNAs.
In our work, RNA sequencing data of 1109 BC samples and 113 adjacent non-tumor breast tissues samples were retrieved from the TCGA database. To the best of our knowledge, this is the first study to use large scale sequencing database (TCGA) and ceRNA network to identify BC-specific lncRNAs. This new approach of predicting cancer specific lncRNA and ceRNA networks can elucidate the lncRNA-mediated ceRNA regulatory mechanisms in the development and prognosis of BC, and identify novel lncRNAs as potential diagnostic biomarkers or therapeutic targets.

Patients and samples from the TCGA database
RNA sequencing (RNA-Seq) data associated with BC were retrieved from the TCGA database (https ://porta l.gdc.cance r.gov/, version 10.1, release time: February 15, 2018). A total of 1222 individuals with BC were included in the current study. The exclusion criteria were (1) histological diagnosis negating BC, (2) presence of a malignancy other than BC, and (3) lack of complete clinical data. The gene expression profiles of 1109 BC and 113 adjacent normal breast tissues, and miRNA data of 1103 BC and 104 adjacent normal breast tissues were downloaded. In addition to the RNA expression data, clinical information of BC patients were also downloaded from the TCGA database. No approval from the ethics committee was needed because all the information was required from the TCGA database. The clinical characteristics for BC patients are listed in Table 1.

RNA sequence data processing and differential expression analysis
The raw RNA sequencing (lncRNA, miRNA, and mRNA) reads were post-processed and normalized using the trimmed mean of M-values (TMM) method.
EdgeR package in R (version 3.4.1) was used to identify the differentially expressed mRNAs (DEmRNAs), lncR-NAs (DElncRNAs) and miRNAs (DEmiRNAs) between the BC and adjacent-normal breast tissues [17], and the cut-off criteria were set as P < 0.01 and |logFC| > 2. Volcano plots were visualized using the ggplot2packages in R [18]. The heat map was plotted using the pheatmap function of pheatmap package version 1.0.8 [19].

Functional enrichment analysis
Gene oncology (GO) is widely used as functional enrichment analysis for a large number of genes [21]. The putative biological roles of DElncRNAs corresponds to that of their associated mRNAs. GO function analyses were therefore conducted for the DEmRNAs in the ceRNA network using R clusterProfiler package [22]. Fisher's test was used to identify the significant GO terms, and GO categories with P < 0.05 were considered statistically significant.

Construction of the BC-specific prognostic signatures
Kaplan-Meier and log-rank test was used to determine the association between the DEmRNAs, DElncRNAs and DEmiRNAs in the ceRNA network and the overall survival (OS) of BC patients, and statistical significance was set at P < 0.05. Univariate Cox proportional hazards regression method was implemented to analyze the relationship between the DElncRNAs and OS when a significant level was set at 0.05, in order to determine those with a prognostic value in BC. This was followed by multivariate Cox hazards regression model to determine the independent prognostic factors for BC, and the prognostic risk score for predicting OS was as follows: (where "exp" denotes the expression level of DElncR-NAs, and "β" is the regression coefficient obtained from the multivariate Cox regression model) [23]. Using the median risk score as the threshold, the BC patients Risk score = exp lncRNA1 * β lncRNA1 + exp lncRNA2 * β lncRNA2 + · · · exp lncRNAn * β lncRNAn were stratified into the high-and low-risk groups. The "survival ROC" package in R was used to construct the time-dependent receiver operating characteristic (ROC) curves within 3 years as the defining point, and to measure the risk prediction rate of specific lncRNAs between the two groups. In addition, the univariate and multivariate analyses were used to evaluate the effects of other clinical variables of BC patients on OS risk scores. The R software (version 3.4.1) was used for all statistical analyses.

Identification of DEmRNAs, DElncRNAs, and DEmiRNAs
We identified the DEmRNAs, DElncRNAs, and DEmiR-NAs in BC and adjacent-normal breast tissues using the TCGA database, with P < 0.01 and |logFC| > 2 as the thresholds. A total of 2150 DEmRNAs (1368 up-and 782 down-regulated), 1061 DElncRNAs (839 up-, and 222 down-regulated), and 82 DEmiRNAs (62 up-and 20 down-regulated) were identified between BC and normal samples. Volcano plots displaying the distribution of the DElncRNAs, DEmiRNAs, and DEmRNAs were generated, as shown in Fig. 1a. The heat map showed clear separation and consistency in the expression profiles of the BC and normal samples (Fig. 1b).

MiRNA predicted target analysis and ceRNA network establishment
The differentially expressed RNAs identified above were selected, and the lncRNAs and mRNAs targeted by miR-NAs were extracted to establish the lncRNA-miRNA-mRNA ceRNA network. The relationships among 1061 DElncRNAs and 82 DEmiRNAs were first evaluated.
Since lncRNAs might interact with the miRNAs through MREs, the miRcode tool was then used to detect the potential MREs; 18 BC-specific miRNAs that putatively target 70 BC-specific lncRNAs were then identified (Additional file 1: Table S1). The MiRDB, miRTarBase and Targetscan programs were then used to determine the relationship between the 82 DEmiRNAs and 2150 DEmRNAs, and predict the mRNA targets of miRNAs.

Delineation of GO analysis
In order to better understand the role of the DElncRNAs in BC, we analyzed the mRNAs of the ceRNA network and identified the lncRNA regulated GO terms. The results of GO analysis are shown in Additional file 2: Figure S1. Our data showed that the mRNAs associated to cellular component (CC), was cell leading edge. Ge et al. [24] have demonstrated that trypsin secreted from MDA MB-231 BC cells activates the protease-activated receptor-2 and the activated protease-activated receptor-2 can promote cell migration based on ERK1/2-dependent pathway, involving the formation of a scaffolding complex at the cell leading edge. Meanwhile, the mRNAs related to molecular function (MF) were most relevant to protease binding, alpha-catenin binding, gamma-catenin binding, and adenylate cyclase binding. Proteases provide the cancer a characteristic of being able to invade into other tissues, and protease-activated receptor-1 is involved in the migration and invasion of breast cancer cells [25]. The loss of alpha-catenin has been implicated to be related to the metastasis and poor survival in BC [26]. The expression of gamma-catenin was also reported to be associated with the metastasis in human BC [27]. The activity of the adenylate cyclase is positively linked to the inhibition of cell proliferation, as well as induction of apoptosis in human BC MCF-7 cells [28]. Demonstrated herein, these GO terms are associated with BC pathogenesis and prognosis.

Establishment of the 4-lncRNAs prognostic model
Univariate regression analysis was used to identify the lncRNAs associated with the OS of BC patients. With the significance level cutoff threshold set at P < 0.05, a group of lncRNA signatures including ADAMTS9-AS1, AC061992.1, LINC00536, HOTAIR, AL391421.1, TLR8-AS1, and LINC00491 lncRNAs was detected to have significant prognostic value (Additional file 1: Table S3). Significantly, we found that ADAMTS9-AS1 was simultaneously identified to be connected with OS in Kaplan-Meier (log-rank test) and univariate Cox regression analysis. Fig. 1 a Volcano plots showing the differential expression of RNAs (lncRNAs, miRNAs, and mRNAs) in breast cancer (BC), which were drawn using the R packages ggplot2; X axis indicates the mean expression differences of lncRNAs, miRNAs, and mRNAs between BC and normal samples, and Y axis represents log transformed false discovery rate (FDR) values. b Heatmaps demonstrate differential expression of lncRNAs, miRNAs, and mRNAs between BC and adjacent normal samples, which were plotted using the pheatmap package; X axis denotes differentially expressed lncRNAs (DElncRNAs), DEmiRNAs, and DEmRNAs and Y axis represents the samples. Blue represents normal samples, while red stands for BC samples. All the above lncRNAs were then fitted into the multivariate Cox regression model, which indicated that only four lncRNAs-ADAMTS9-AS1, LINC00536, AL391421.1 and LINC00491-had a significant prognostic value in BC (Additional file 3: Figure S2C), and these four lncRNAs were used to develop an lncRNA prognostic model. A risk score analysis of the four lncRNAs was performed for each patient, and based on the risk scores, the patients were divided into the "low risk" and "high risk" groups (Additional file 3: Figure S2A). The mortality rate of the high risk patients was significantly higher compared to the low risk patients (12.99% vs 5.94%; P < 0.05; Additional file 3: Figure S2B).
In addition, the high risk group was correlated with worse prognosis compared to the low risk group (Fig. 4a). The 3 year survival correlation of the 4-lncRNA signature was analyzed by ROC and AUC was computed to assess the discriminatory capacity of lncRNA signature (Fig. 4b). The AUC of the 4-lncRNA signature was 0.696 indicating its utility as a prognostic model for predicting the survival status of BC.
The 4-lncRNA expression was then analyzed in the tumor and normal tissues, and in the high-and lowrisk patient groups (Fig. 5). ADAMTS9-AS1, and AL391421.1 were expressed at high levels in patients with low-risk scores, whereas LINC00536 and LINC00491 were up-regulated in the high-risk patients. Furthermore, LINC00491, AL391421.1, and LINC00536 were expressed at high levels, and ADAMTS9-AS1 was expressed at low levels in the BC patients.

Prognostic value of the four-lncRNA signature in BC
Univariate and multivariate regression models were used to assess the prognostic power of the 4-lncRNA signature. Univariate analysis indicated that age, pathological stage, N stage, M stage, ER, PR, Her2, and risk scores were significantly correlated with OS of BRCA patients (P < 0.05). Similarly, Kaplan-Meier analysis demonstrated   Fig. 6. Multivariate analysis indicated that only age, pathological stage, Her2, and risk scores were independent prognostic factors of OS (P= 0.003, 0.024, 0.034, and 0.017 respectively; Table 1).

Discussion
BC is a common malignant gynecological cancer, and is one of the main causes for the cancer-related deaths in women [29]. The lack of specific diagnostic and prognostic biomarkers may contribute to the current low survival rate among BC patients. To improve clinical outcomes therefore, it is essential to explore the exact regulatory mechanisms of BC initiation and progression, and to identify the potential BC-related prognostic signatures that predict those outcomes. Growing experimental evidence indicates that lncRNAs play important roles in many biological processes, and ceRNA activity is closely related to the development of cancers [30,31]. In recent years, some studies have investigated the ceRNAs in BC. For instance, Chen et al. [32] analyzed the BC ceRNA network on the basis of common miR-NAs as well as co-expression, but did not consider miRNA expression. Another study also established a BC specific ceRNA network to investigate its underlying molecular mechanisms based on the PCC of miRNA-mRNA pairs [33]. However, both studies focused on the roles of mRNAs rather than that of lncRNAs in the BC ceRNA networks. In 2018, Zhou et al. [34] constructed four BC-related ceRNA networks by combining the miRNA targets and the expression data of lncRNAs, miRNAs and mRNA, but they did not take into account the relationship between survival and lncRNAs, nor construct the prognostic signature. In our study, in addition to constructing the ceRNA networks by combining lncRNA, miRNA, and mRNA expression data, we also investigated the association of lncRNA and OS in BC patients. Furthermore, based on the theory of ceRNA network, we established the 4-lncRNAs   Log-rank method was used to assess the survival differences between the two groups. Horizontal axis is OS time (years) and vertical axis stands for survival function Fig. 4 The 4-lncRNA signature of BC (ADAMTS9-AS1, AL391421.1, LINC00491, and LINC00536) for the outcome. a The survival differences between the high-risk and low-risk groups were determined by the log-rank test (P = 2.18E−04). b ROC curves demonstrated that the area under receiver operating characteristic (AUC) of 4-lncRNA model was 0.696, which exhibited the risk score OS. Multivariate analysis showed significant prognostic value of 4 of those lncRNAs (ADAMTS9-AS1, LINC00536, AL391421.1 and LINC00491) in the OS of BC patients. A cumulative risk score of the 4 lncRNAs was calculated, which indicated that this 4-lncRNA signature independently predicted OS in BC patients. To the best of our knowledge, this is the first report integrating a ceRNA network with TCGA data to build an lncRNA-related risk score, and evaluate the OS of BC patients. Our study will help improve the understanding of lncRNA-mediated ceRNA regulatory mechanisms in BC and identify novel lncRNAs as therapeutic targets.
In the current study, among this 4-lncRNA signature, ADAMTS9-AS1 was demonstrated to play important roles in the progression and prognosis of cancer. ADAMTS9-AS1 is an antisense lncRNA, and growing evidence has implicated that a large amount of antisense lncRNAs play crucial roles in the cancer [35,36]. Li et al. [37] reported that ADAMTS9-AS1 could predict the survival status of patients with esophageal squamous cell carcinoma. Another study has also indicated a prognostic role of ADAMTS9-AS1 in patients with colon adenocarcinoma [38]. In addition, ADAMTS9-AS1 has been demonstrated to be a risk lncRNA in ovarian cancer, which is involved in the progression of ovarian cancer [39]. In our study, we noticed that ADAMTS9-AS1 with lowexpression could compete with up-regulated miRNAs (hsa-mir-182, and hsa-mir-21), to regulate the expression of the target genes such as CHL1, SPRY2, and TCEAL7 involved in the ceRNA network. Previous studies have shown high expression of hsa-mir-182 in MCF-7 breast cancer cells [40,41]. In addition, the high-expression of hsa-mir-21 was reported to be correlated to the metastasis and poor prognosis of BC patients [42]. The remaining three lncRNAs of the ceRNA network (LINC00536/ AL391421.1/LINC00491) were up-regulated and competed with the decreased hsa-mir-204 and hsa-mir-144 levels. Down-regulation of has-mir-204 has been suggested to enhance cell proliferation and invasion in gastric cancer [43], and low-expression of has-mir-204 is related to the poor prognosis of acute myeloid leukemia patients [44]. In addition, decreased expression of hasmir-144 is strongly correlated with the progression of  16:264 colorectal cancer [45]. No study so far has reported any association of LINC00536, AL391421.1 or LINC00491 with cancer. This is the first study to show aberrant expression of ADAMTS9-AS1, LINC00536, AL391421.1 and LINC00491 in BC, and indicates a potential prognostic role of this 4-lncRNA signature in BC. In addition, the bioinformatics based investigation of lncRNAs will be helpful in future experimental studies.
Although the findings of our study have important clinical implications, the limitations must also be noted. First, a longer follow-up duration is required to verify our results, and second, the findings based on the TCGA database will need to be verified using other experimental methods. In addition, the biological roles of ADAMTS9-AS1, LINC00536, AL391421.1, and LINC00491 in BC also need to be further investigated.