Genome-wide profiling reveals alternative polyadenylation of mRNA in human non-small cell lung cancer

Background Lung cancer is the second most common cancer with an extremely poor overall survival rate. Post-transcriptional regulation of gene expression play many important roles in human cancer, and one of the potential mechanisms underlying this is alternative mRNA maturation at its 3′ untranslated regions (3′-UTRs). Methods Cancer tissues and paired adjacent normal lung tissues from 26 patients diagnosed with non-small cell lung cancer (NSCLC) were analyzed by in vitro transcription-sequencing alternative polyadenylation sites (IVT-SAPAS). 41,773,101 reads in average were obtained from each paired sample. A potential regulation of Cleavage Stimulation Factor Subunit 2 (CSTF2) on 3′UTR length of genes was tested in H460 cells. Results 1439 (10.26%) genes showed up-regulated expression and 1364 (9.72%) genes showed down-regulated expression in lung cancer tissue versus normal lung tissue, and shorten 3′UTR in cancer tissue was detected in cancer tissues collected from 96.2% (25/26) patients, indicating lung cancer tend to have shortened 3′UTRs of these identified genes. KEGG analysis showed 1855 genes with shorten 3′UTR were enriched in mTOR signaling, ubiquitin mediated proteolysis and RNA degradation. Knocking down CSTF2 expression in H460 cells results in 3′UTR elongation of genes that was identified to be with shortened length in cancer tissues. Conclusion Alternative polyadenylation (APA) site-switching of 3′UTRs is prevalent in NSCLC, and CSTF2 may serve as an oncogene regulates the 3′UTR length of cancer related genes in NSCLC. Electronic supplementary material The online version of this article (10.1186/s12967-019-1986-0) contains supplementary material, which is available to authorized users.


Introduction
Lung cancer is the second most common cancer and the leading cause of cancer-related death for both man and woman worldwide, with an extremely poor overall survival rate, 5-year survival rate is only approximate 18% [1]. Although, lung cancer can be divided into many subtypes, historically lung cancer can be classified into two distinct types according to clinical tumor behaviors, small cell lung carcinoma (SCLC) and non-small cell lung carcinoma (NSCLC) [2]. NSCLC is the most common type of lung cancer, accounting for 80-85% of all lung cancer cases, and, as a major subtype, about 45% to 50% of NSCLC is diagnosed as lung adenocarcinoma, 30-35% is squamous cell carcinomas [3,4].
Cancer is a disease with multiple genetic alternations. For example, proto-oncogenes are often activated in cancer cells and contribute to the tumorigenesis. The activation of oncogene can be caused by a mutation that leads to constitutive activation of the encoded proteins or by a mutation that induce gene expression, the latter occur through mechanisms like amplification of the gene copy number or chromosomal translocations resulting in replacement with a constitutive active promoter [5]. However, studies have revealed that, in addition to these genetic alterations, post-transcriptional regulation of gene expression also play many important roles in cancer development and progression, and one of the potential mechanisms underlying this is alternative mRNA maturation at its 3′ untranslated regions (3′-UTRs) which leads to distinct mRNA isoforms from the same gene [6,7].
The architecture of pre-mRNA consists of coding regions and non-coding regions located separately at the 5′ and 3′ end of mRNA. The 3′ ends of almost all eukaryotic mRNAs are formed in a two-step process, an endonucleolytic cleavage followed by polyadenylation (the addition of a poly-adenosine or poly(A) tail) [8]. The formation of 3′-UTRs through polyadenylation process of sequential addition of a poly(A) tail on 3′ end of precursor mRNA is the last key step in mRNA maturation process [9].
3′-UTR of mRNA often contains many binding sites for regulatory RNA-binding proteins and microRNAs, and thus the length of 3′-UTR is important for regulation of mRNA stability, localization and protein translation efficiency [10]. Alternative polyadenylation (APA), a phenomenon that RNA molecules with different 3′ ends originate from distinct polyadenylation sites of a single gene, can often results in mRNA isoforms with the same coding sequence but different lengths of 3′-UTRs, which is the last step of the post-transcriptional processing. APA has been discovered to be an emerging mechanism of regulating gene expression [11]. Of interest, cancer cells, or similarly proliferating cells like earlystage embryonic cells, often express substantial amounts of mRNA isoforms with shorter 3′-UTRs [6,12]. Shortening of 3′-UTRs has also been found to correlate with poor prognosis of cancer patients, including lung cancer [13,14]. However, the detailed mechanism of how APA is regulated in normal and in cancer cells remains to be elucidated.
In this study, we used paired clinical tissue specimen to determine the differential expressed genes and the tandem 3′-UTR lengths of these genes in NSCLC. We also examined the potential protein factors that may influence the observed change of the 3′-UTR length. Our results showed that shorten 3′-UTRs is a common feature for the genes that are differentially expressed in cancer cells, and expressions of 3′ end processing protein factors cleavage stimulation factor (CSTF) may play vital role for the shortening of 3′-UTR.

Patient information and tissue specimens
Twenty-six patients diagnosed with NSCLC (Table 1) were enrolled in this study. The protocol for using paired cancerous tissues and adjacent normal lung tissues was approved by Ethics Committee of The First People's Hospital of Hangzhou. Fresh tissues were collected after surgery from each patient. All samples were snap frozen in liquid nitrogen within 30 min after surgical removal and then stored at − 80 °C. H&E staining was performed for validation of pathologic status of each specimen.

Cells culture
Human fetal lung fibroblast cell line HFL1 and lung cancer cell lines was purchased from ATCC (https :// www.atcc.org/) and cultured in DMEM medium (Gibco, Waltham, MA, USA) supplemented with 10% fetal bovine serum (Gibco, Waltham, Massachusetts, USA), 100 U/mL penicillin and 100 μg/mL streptomycin at 37 °C humidified atmosphere with 5% CO 2 . Log-phase growing cells were collected for subsequent gene expression detection and RNA interference experiment.

RNA interference
oligo siRNA (GGT GGA TCG TTC TCT ACG T) designed to target CSTF2 (Cleavage Stimulation Factor Subunit 2) was synthesized by RiboBio Co., Ltd. (Guangzhou, China). CSTF2 oligo siRNA was transfected into H460 cells at a concentration of 50 nM using Lipofectamine RNAiMAX Reagent (Invitrogen, Carlsbad, CA, USA). Cells were collected 48 h later after transfection for subsequent experiments, and quantitative real-time PCR was used to validate the efficiency of the RNA interference. Scramble siRNA Oligo was used as a control.

Total RNA extraction and in vitro transcription-sequencing alternative polyadenylation sites (IVT-SAPAS) analysis
Total RNA were extracted from lung cancerous and paired adjacent normal tissues, or from cultured H460 cells using TRIzol reagent (Life Technologies, USA) according to the manufacturer's protocol. The RNA samples were used to generate libraries for IVT-SAPAS analysis. The IVT-SAPAS was performed with the illumina HiSeq 2500 platform according to the manufacturer's protocol [15]. Briefly, twelve libraries with distinct barcodes were pooled together and were sequenced with Hiseq 2500 with rapid run mode, and 20 dark cycles were taken and then 55 bp were further sequenced.

Analysis of APA sites
The raw reads from IVT-SAPAS analysis were mapped to the human genome (hg19) using Bowtie software [16], and internal priming was filtered by our in-house Perl scripts. Poly(A) sites of each sample were defined by clustering the unique mapped reads, and were then merged together across samples. The merged poly(A) sites were mapped to the 3′UTRs dataset constructed from UCSC known genes for further analysis. The expression levels of poly(A) sites were calculated by the clustered reads and scaled to the sample with the lowest number of raw reads.
To compare the 3′-UTR length across samples, we standardized the length by defining the longest 3′UTR as 1.0 and calculated the weighted mean of 3′-UTR length according to the reads and related position of multiple poly(A) sites for each gene. Genes with alternative poly(A) were identified by a test of linear trend with significant P-values paired to a false discovery rate with cutoff of 1% between the lung cancer tissue and paired para-carcinoma tissue. We denoted the 3′-UTR length of each gene identified in paired non-carcinoma tissue as 1 and that in cancerous tissue as 2, and used Pearson correlation to test the relationship between two variables. A positive R-value indicates that the genes in the cancerous tissue showing longer tandem 3′UTRs versus the paired non-cancerous tissue, and a negative R-value indicates that the genes in the cancerous tissue showing shorter tandem 3′UTRs versus the paired non-cancerous tissue.
Functional annotation of the DEGs and APA switching genes was performed using the DAVID (The Database for Annotation, Visualization and Integrated Discovery) Bioinformatics Resources (http://david .abcc.ncifc rf.gov/), and genes with detected 3′UTRs length change were categorized for their significant enrichment according to Biological Process GO terms and pathways, with normalization of the background model for all transcripts found in both lung cancer and para-carcinoma tissues.
Results of real-time PCR were analyzed using the 2 −ΔCT method to compare the transcriptional levels of CSTF2 gene in lung cancer cells to that in normal control.

Western blotting analysis
Cells were collected and lysed in RIPA buffer containing protease and phosphatase inhibitors. 20 μg of cell lysates were used for western blot analyses using antibodies against CSTF2 and GAPDH (Cell Signaling Technology, USA). Simply, proteins were separated by SDS-PAGE and transferred to the polyvinylidene difluoride membrane. The membrane was then incubated with primary antibody at 4 °C overnight, followed by incubation of secondary antibody for one hour. Finally, the target proteins detected by chemiluminescence kit. The western blot signal was detected by ChemiDoc XRS + (BioRad, USA).

Summary statistics of the IVT-SAPAS data in clinical samples
From a total of 41,773,101 average reads per each paired sample that were obtained from IVT-SAPAS analysis, we found an average read of 17,260,816 that meet the criteria for data analysis after data filtering, genome alignment and orientation filtering (Additional file 1: Table S1). Of them, 5,303,272 reads correspond to APA sites, with near 69.7% reads being mapped to the transcript of the UCSC ends database and Tian's database, and 1.5% and 1.3% of the reads being mapped to the noncoding gene or 1 kb downstream of the UCSC canonical genes defined using the GENCODE gene loci, respectively (Fig. 1a). In addition, there were 5.5%, 15.6% and 1.7% of reads were mapped in the introns, 3′UTRs or coding sequences (CDSs), respectively.

Differentially expressed genes (DEGs) in NSCLC
We conducted a gene expression survey for lung cancer tissue versus paired non-carcinoma tissue by calculating the number of transcript reads. Based on this analysis, we identified a total of 14,030 genes with distinct difference of expression. Of them, 1439 (10.26%) genes showed up-regulated expression and 1364 (9.72%) genes showed down-regulated expression in lung cancerous tissue (Fig. 1b). KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis further revealed that the up-regulated genes in cancer tissues were enriched in signaling pathways primarily associated with cell cycle and biosynthesis of amino acids (Fig. 1c), and the downregulated genes were enriched in pathways of neuroactive ligand-receptor interaction and cytokine-cytokine receptor interaction (Fig. 1d).

Identification of APA switching genes in NSCLC
Previous studies have demonstrated that highly proliferative cells or cancer cells often expressed substantial amounts of mRNA isoforms with shortened 3′UTRs [6,17]. Studies also revealed that shortened 3′UTRs correlated with poor prognosis of cancer, including lung cancer [13,18]. We thus performed a comparison analysis for the tandem 3′UTRs lengths of these identified genes. For this, we set up a standard of the 3′UTR length by designating the longest 3′UTR as 1.0 and calculated the 3′UTR length of each gene using weighted mean of 3′UTR length with multiple APA sites to reduce the variance of 3′UTR length. We detected shortened 3′UTR in collected cancer tissues from 96.2% (25/26) patients, as shown in Fig. 2a. We also performed a test of linear trend to determine the tandem 3′UTR length switching. Based on the statistical significance (FDR = 0.01, |r| ≥ 0.1), we identified total of 7639 genes with potential APA switching in 26 paired clinical samples. Of interest, we noticed that lung cancer cells tend to have shortened 3′UTRs of these identified genes (1855 genes showing shorten 3′UTR and 58 genes showing elongated 3′UTR). Additional file 1: Table S2 shows the gene numbers with identified APA switching for each patient.
To determine the significance of these APA site-switching events, we performed functional annotation using the web-accessible DAVID program. The analysis indicated that genes with shorten 3′UTR are enriched in signaling pathways such as mTOR signaling, ubiquitin mediated proteolysis and RNA degradation (Fig. 2b). With gene ontology (GO) enrichment analysis, we also identified the top 30 enriched genes with shorten 3′UTR determined in these clinical samples (Fig. 2c).

CSTF2 regulates 3′UTR shortening in NSCLC cell
To assess the potential mechanisms of 3′UTR shortening in human lung cancer, we examined the RNA expression of genes (Additional file 1: Table S3) that their encoding proteins may correlate to 3′UTR length editing. Of total 93 candidate genes, we found that 28 genes displayed significant changes of up-regulation and 5 genes of downregulation. Our results also revealed that CSTF2 was one of the top genes with most significant up-regulation in cancerous tissues versus paired non-cancerous tissues ( Fig. 3a and Additional file 1: Tables S3, S4). Of interest,  (Fig. 3b). Our qRT-PCR analysis further confirmed the higher mRNA expression of CSTF2 in H460 cells when comparing to HFL1 cells (Fig. 3c). We further assess the expression pattern of CSTF2 in lung cancer samples in The Cancer Genome Atlas database (TCGA), and the results from 100 paired TCGA clinical samples showed near twofolds change of up-regulated CSTF2 expression in cancer cells versus normal lung cells, which is in consistence with the CSTF2 expression pattern of our clinical samples (Fig. 3d).
We thus hypothesize that the CSTF2 gene may play a key role in regulation of 3′UTR length in lung cancer cells.
To test this, we used RNA interference to knock down the CSTF2 expression in H460 cells, and determined the APA site-switching events with IVT-SAPAS analysis. We observed near 80% knocking-down effect of siRNA on CSTF2 expression (Fig. 4a), and a total of 10,235 genes with significant change after siRNA transduction was detected in H460 cells. Of them, 5694 genes (55.63%) are with APA site-switching events, 269 genes (2.62%) show expression difference, and 78 genes (0.76%) show distinct change of 3′UTR length. Of note, the global comparison shows consistence of IVT-SAPAS reads for genes with APA site switching between data sets for the clinical samples and for H460 cells, however, significant differences are observed for genes with differential expression or with different 3′UTR in length, whereas more reads were detected with clinical sample set (Additional file 1: Table 5S). In addition, we noticed that the IVT-SAPAS reads from the clinical samples and H460 cells showed similar distribution pattern of poly(A) signals (6-nucleotides), which is consistent with the previously reported [15] and demonstrates the validity of the IVT-SAPAS analysis used in our study capturing poly(A) site (Fig. 4b).
In IVT-SAPAS assay for H460 set, we detected elongation of 3′UTR length for 60 genes in CSTF2-siRNA transducted cells (Fig. 4c). Of interest, we found that, of the 1855 genes with shorter 3′UTR length identified in human lung cancer tissues (Fig. 4d), 38 genes displayed elongation of 3′UTR length when CSTF2 expression was knocked down in H460 cells (Fig. 4e and Table 2). The functional annotation of these genes further revealed that eleven of these 38 overlapping genes can be annotated into signaling pathways of insulin signaling, lipid metabolism and MAPK signaling. For example, FGF2 (fibroblast growth factor 2, uc003iev.1), which shows 0.3 fold-change of down-regulation in cancer tissue (P-value = 1.85e−4), has been demonstrated to play vital roles in tumor angiogenesis and lymphangiogenesis of lung cancer [19,20]; MAP4K4 (mitogen-activated protein kinase kinase kinase kinase 4, uc002tbc.2), which shows 1.2 foldchange of upregulation in cancer tissue (P-value = 0.037), contributes to cell proliferation, anchorage-independent growth and cancer cell migration of lung cancer [21]. Of interest, the analysis for the distributions of poly(A) reads showed switching of distal stop codon in clinical samples may indicate cancer cells prefer to use the shorten length of transcripts for both FGF2 and MAP4K4 when comparing to normal cells, and the knocking down of CSTF2 expression in H460 cells could lead to changes of transcript lengths of these genes according to the detected numbers and sites of poly(A) reads (Additional file 2: Fig. S1). These findings thus not only demonstrate the vital role of CSTF2 in regulation of 3′UTR length for cancer cell-associated genes, but also indicate a potential of CSTF2 or its protein family may also serve as oncogene(s) driving carcinogenesis of NSCLC.

Discussion
In eukaryotic cells, polyadenylation, a process of adding poly(A) tails to of nearly every fully processed mRNA, is the last key step in mRNA maturation and has been suggested to play critical roles in virtually all aspects of a transcript's life cycle such as mRNA synthesis, conferring mRNA stability, mRNA transportation from nucleus to the cytoplasm and promoting mRNA's translational efficiency [22][23][24][25][26]. The use of tandem APA sites located on the terminal exons often leads to tandem 3′UTRs with variable lengths, and this 3′UTR-APA event plays important roles in regulating the gene expression network and is globally regulated in response to changes in cell proliferation and differentiation [11]. Global 3′UTR regulation corresponding to the changes of cellular proliferation and differentiation status has  and shorter mRNAs were detectable and more prominent in these cancer cell lines compared to normal tissues. In this study, we found that 3′UTR shortening is a common event for cancer-related genes and shorter mRNAs are more frequently detected in tumor tissues compared to corresponding normal tissues. Of note, of the identified 1855 genes (FDR = 0.01) with a significant difference of the 3′UTR length that may link tandem 3′UTR length switching with the pathologic process of lung cancer, we detected alternative isoforms of Cyclin D1, DICER1, RAB10, Cyclin D2, FGF2 in all clinical cancer samples, and alternative isoform of IMP-1 in a remarkable portion of the samples (Additional file 3: Fig. S2). Consistently with Mayr's study, these genes expressed a higher fraction of shorter APA isoforms in a majority of the cancer tissues comparing to the corresponding normal tissues (Additional file 1: Table S6). In addition to these findings, our results further revealed that these identified genes with shorten 3′UTR were enriched in signaling pathways of mTOR signaling, ubiquitin mediated proteolysis and RNA degradation. These results provide novel insights for the contribution of 3′UTR-APA site-switching events of cancer-related genes to cancer development and for development of novel therapeutic strategies of lung cancer. We also performed the histological comparison of IVT-SAPAS reads for adenocarcinoma and squamous lung cancer with our data set of the clinical samples. Our results showed that 719 down-regulated and 1045 upregulated genes were detected in both histological types of cancers. Of interest, 16 genes with APA shortness and only one gene with APA lengthen were detected in both adenocarcinoma and squamous lung cancer tissues (Additional file 4: Fig. S3 and Additional file 1: Table S7). These overlapped gene events may indicate the privilege of certain genes or profile of genes to be involved in the cancer development and disease progression regardless of the histological difference for NSCLC. It is needed to be indicated that, however, the initial results from the IVT-SAPAS data set with clinical samples showing the histological similarity of genes with APA switching for adenocarcinoma and squamous lung cancer needs to be further identified due to the limited number of clinical samples enrolled.
Our data also revealed the potential roles of polyadenylation processing protein, CSTF2, on the shortness of 3′UTR of these cancer-related genes. Processing at