The TGFβ-signaling pathway and colorectal cancer: associations between dysregulated genes and miRNAs

Background The TGFβ-signaling pathway plays an important role in the pathogenesis of colorectal cancer (CRC). Loss of function of several genes within this pathway, such as bone morphogenetic proteins (BMPs) have been seen as key events in CRC progression. Methods In this study we comprehensively evaluate differential gene expression (RNASeq) of 81 genes in the TGFβ-signaling pathway and evaluate how dysregulated genes are associated with miRNA expression (Agilent Human miRNA Microarray V19.0). We utilize paired carcinoma and normal tissue from 217 CRC cases. We evaluate the associations between differentially expressed genes and miRNAs and sex, age, disease stage, and survival months. Results Thirteen genes were significantly downregulated and 14 were significantly upregulated after considering fold change (FC) of > 1.50 or < 0.67 and multiple comparison adjustment. Bone morphogenetic protein genes BMP5, BMP6, and BMP2 and growth differentiation factor GDF7 were downregulated. BMP4, BMP7, INHBA (Inhibin beta A), TGFBR1, TGFB2, TGIF1, TGIF2, and TFDP1 were upregulated. In general, genes with the greatest dysregulation, such as BMP5 (FC 0.17, BMP6 (FC 0.25), BMP2 (FC 0.32), CDKN2B (FC 0.32), MYC (FC 3.70), BMP7 (FC 4.17), and INHBA (FC 9.34) showed dysregulation in the majority of the population (84.3, 77.4, 81.1, 80.2, 82.0, 51.2, and 75.1% respectively). Four genes, TGFBR2, ID4, ID1, and PITX2, were un-associated or slightly upregulated in microsatellite-stable (MSS) tumors while downregulated in microsatellite-unstable (MSI) tumors. Eight dysregulated genes were associated with miRNA differential expression. E2F5 and THBS1 were associated with one or two miRNAs; RBL1, TGFBR1, TGIF2, and INHBA were associated with seven or more miRNAs with multiple seed-region matches. Evaluation of the joint effects of mRNA:miRNA identified interactions that were stronger in more advanced disease stages and varied by survival months. Conclusion These data support an interaction between miRNAs and genes in the TGFβ-signaling pathway in association with CRC risk. These interactions are associated with unique clinical characteristics that may provide targets for further investigations. Electronic supplementary material The online version of this article (10.1186/s12967-018-1566-8) contains supplementary material, which is available to authorized users.

Background The TGFβ-signaling pathway is important in the tumorigenesis of colorectal cancer (CRC) [1]. This pathway is a regulator of cellular proliferation, differentiation, apoptosis, and extracellular matrix remodeling, and also is involved in angiogenesis and inflammation [2,3]. It has been previously reported that components of the TGFβsignaling pathway are mutated in 27% of non-hypermutated tumors and 87% of hypermutated tumors [4], and that inactivation of the pathway is a common event in CRC tumorigenesis [4]. The TGFβ family of cytokine genes includes three isoforms of TGF-β, TGF-β1, TGF-β2, and TGF-β3, the type I receptors (TβR1 and ALK1) and a type II receptor (TβRII) [5]. Other components of the TGFβ-signaling pathway include SMAD genes, key intracellular mediators of the transcriptional responses to TGF-β [6] and bone morphogenetic proteins (BMP), which trigger a SMAD-signaling cascade that is linked to cell proliferation and cellular growth [7,8]. Growth differentiation factors (GDF) [9] and their receptors are components of the TGFβ superfamily as are Activin/ inhibin and their receptors [10]. BMP ligands bind to type 1 [BMPR1A, BMPR1B, Activin A receptor type 1 (ACVR1), and Activin receptor-like kinase 1 (ACVRL1)] and type 2 receptors [BMPR2, Activin A receptor type IIA (ACVR2A) and type IIB (ACVR2B)] [9]. While both type 1 and 2 receptors are needed for BMP signaling, type I receptors bind with a higher affinity than type II receptors. BMPR1A has been reported as being inactivated in some studies focusing on familial syndromes such as mixed polyposis syndrome and familial juvenile polyposis [11][12][13]. Other factors that regulate TGFβ and its receptors are BAMBI (BMP and activing membrane bound inhibitor) [14], THBS1 (thrombospondin 1 also known as TSP1) [15], LEFTY (left-right determination factor, Factor 2 also known as TGFβ4) [16], and FST (Follistatin) [17].
It has been hypothesized that miRNAs, small, nonprotein-coding RNA molecules involved in the regulation of gene expression either by post-transcriptionally suppressing mRNA translation or by causing mRNA degradation [18][19][20][21][22][23], may work with the TGFβ-signaling pathway to mediate cell growth and promote tumorigenesis [24]. MiRNAs have been linked to TGFβ-signaling pathway in a variety of diseases, usually in studies which only examined few miRNAs and genes within the signaling pathway. For instance, miR-181a has been shown to have its expression altered by TGFβ [24,25], miR-494 and miR-126-5p have been linked to BMP4 in the regulation of angiogenesis [26]; miR-98 with TBSP1 expression in asthma [27]; miR-590-5p with downregulation of TGFβ signaling in cardiosphere-derived stem cells [28]; the miR-17-92 cluster with cell proliferation in mouse mesenchymal cells [29]; and miR-140-5p with tumor growth [30]. MiRNAs have also been linked with various disease processes in CRC. For example, miR-200a-3p regulates epithelial mesenchymal transition in CRC [31]; and miR-21-5p is upregulated in CRC, while its inhibition can hinder CRC tumor growth [31]. Although the TGFβsignaling pathway appears to be important for CRC, few studies have looked at this pathway with miRNAs. We have previously examined miRNA expression with SNPs in 21 genes associated with the TGFβ-signaling pathway and found that expression of several miRNAs varied by SNPs in TGFβ1 in normal mucosa [32]. Additionally, other groups have found that specific miRNAs play a significant role in CRC via the TGFβ pathway.  has been shown to play an important role in CRC by promoting cellular proliferation via SMAD3 [33]; miRs-203, 181d and 182 regulate genes, including TGF-β2 and RUNX2, involved in CRC proliferation, differentiation, and invasion [34]; and miR-34a mediates oxaliplatin resistance in CRC via the TGFβ/SMAD4 pathway [35].
In this study, we comprehensively examined expression differences between carcinoma and normal mucosa with all genes identified in the TGFβ-signaling pathway. For those genes that had significant differences in expression when considering level of expression (> 1.50 fold change (FC) or < 0.67 FC), we assessed their association with expression levels of over 800 miRNAs commonly expressed in CRC. We evaluated associations for overall CRC as well as for specific tumor molecular phenotype, i.e. tumors with or without mismatch repair deficiency, and how these associations relate to clinical features such as disease stage and survival months. Our goal is to obtain a better understanding of how miRNAs relate to the TGFβ-signaling pathway in CRC carcinogenesis. We believe that our epidemiological approach, using population-based cases, is an initial step in identifying associations that can be further examined in targeted laboratory studies with the goal of identification of targeted therapeutics.

Study participants
Participants come from two population-based case-control studies that included all incident colon and rectal cancer diagnosed between 30 and 79 years of age in Utah or at Kaiser Permanente in Northern California (KPNC). Diagnosis was verified by tumor registry data as a first primary adenocarcinoma of the colon or rectum with a diagnosis date between October 1991 and September 1994 (colon study) and between May 1997 and May 2001 (rectal study) [36]. Participants were non-Hispanic white, Hispanic, or black for the colon cancer study; the rectal cancer study also included people who self-reported RNA from 245 colorectal carcinoma and normal mucosa pairs was chosen for sequencing based on availability of RNA and high quality miRNA data; 217 pairs passed quality control (QC) and were used in these analyses. RNA library construction was done with the Illumina TruSeq Stranded Total RNA Sample Preparation Kit with Ribo-Zero. The samples were then fragmented and primed for cDNA synthesis, adapters were then ligated onto the cDNA, and the resulting samples were then amplified using PCR; the amplified library was then purified using Agencount AMPure XP beads. A more detailed description of the methods can be found in our previous work [40]. Illumina TruSeq v3 single read flow cell and a 50 cycle single-read sequence run was performed on an Illumina HiSeq instrument. Reads were aligned to a sequence database containing the human genome (build GRCh37/hg19, February 2009 from genome.ucsc.edu) and alignment was performed using novoalign v2.08.01. Total gene counts were calculated for each exon and UTR of the genes using a list of gene coordinates obtained from http://genom e.ucsc.edu. We disregarded genes that were not expressed in our RNA-Seq data or for which the expression was missing for the majority of samples [40].

miRNA expression
The Agilent Human miRNA Microarray V19.0 was used. Data were required to pass stringent QC parameters established by Agilent that included tests for excessive background fluorescence, excessive variation among probe sequence replicates on the array, and measures of the total gene signal on the array to assess low signal. Samples failing to meet quality standards were re-labeled, hybridized to arrays, and re-scanned. If a sample failed QC assessment a second time, the sample was excluded from analysis. The repeatability associated with this microarray was extremely high (r = 0.98) [36]; comparison of miRNA expression levels obtained from the Agilent microarray to those obtained from qPCR had an agreement of 100% in terms of directionality of findings and the FCs were almost identical [41]. To normalize differences in miRNA expression that could be attributed to the array, amount of RNA, location on array, or factors that could erroneously influence miRNA expression levels, total gene signal was normalized by multiplying each sample by a scaling factor which was the median of the 75th percentile of all the samples divided by the individual 75th percentile of each sample [42].

TGFβ-signaling pathway genes
The Kyoto encyclopedia of genes and genomes (KEGG) (www.genom e.jp/kegg-bin/show_pathw ay?hsa04 350) pathway map for TGFβ-signaling was used to identify genes within this pathway. Using this pathway map, we identified 84 genes, 81 of which had sufficient expression in CRC tissue for inclusion in the study (Additional file 1: Table S1).

Statistical methods
We utilized a negative binomial mixed effects model in SAS (accounting for paired carcinoma/normal status) to determine genes in the TGFβ-signaling pathway that had a significant difference in expression between individually paired colorectal carcinoma and normal mucosa (i.e. differentially expressed). In this test we offset the overall exposure as the log expression of all identified proteincoding genes (n = 17,461). The Benjamini and Hochberg [43] procedure was used to control the false discovery rate (FDR) using a value of 0.05 or less. We calculated FC using the population means to give an estimate of the magnitude of difference in expression to identify mRNA and miRNAs. Population means were used because in some instances there was no expression in either the tumor or normal, making individual FCs impossible to calculate. Utilization of the population means we believe gave the best estimate of FC in the population. We calculated level of expression of each gene by dividing the total expression for that gene in an individual by the total expression of all protein-coding genes per million transcripts (RPMPCG or reads per million protein-coding genes). We focused on those genes with FC of > 1.50 or < 0.67 for analysis with differential miRNA expression since these levels of FC may have a greater biological significance than FCs closer to one. A FC of greater than one indicates a positive differential expression (i.e. up-regulated in carcinoma), while a FC between zero and one indicates a negative differential expression (i.e. down-regulated in carcinoma). We also report at the population level the percentage of the population with FC > 1.50 or < 0.67 to provide an estimate of the prevalence of these changes in the population.
We evaluated clinical indicators of age, sex, AJCC disease stage, and survival months with those genes that were dysregulated using Spearman correlation coefficients. Additionally we evaluated miRNA and mRNA associations for specific categories of sex, age (< 55, 55-65, and > 65), AJCC disease stage, and survival months (24,, and > 60 months).
There were 814 miRNAs expressed in greater than 20% of normal colorectal mucosa that were analyzed; differential expression was calculated using subject-level paired data as the expression in the carcinoma tissue minus the expression in the normal mucosa. In these analyses, we fit a least squares linear regression model to the RPMPCG differential expression levels and miRNA differential expression levels. P-values were generated using the bootstrap method by creating a distribution of 10,000 F statistics derived by resampling the residuals from the null hypothesis model of no association between gene expression and miRNA expression using the boot package in R. Linear models were adjusted for age and sex. Multiplicity adjustments for gene/miRNA associations were made at the gene level using the FDR by Benjamini and Hochberg [43].

Bioinformatics analysis
We analyzed miRNAs and targeted mRNAs for seed region matches. The mRNA 3′ UTR FASTA as well as the seed region sequence of the associated miRNA were analyzed to determine seed region pairings between miRNA and mRNA. MiRNA seed regions were calculated as described in our previous work [44]; we calculated and included seeds of six, seven, and eight nucleotides in length. Our hypothesis is that a seed-region match would increase the likelihood that identified genes associated with a specific miRNA were more likely to have a direct association (as indicated by a negative beta coefficient) given a higher propensity for binding and thus mRNA degradation. We used mRNA FASTA sequences generated from both GRCh37 and GRCh38 Homo sapiens alignments, using UCSC Table Browser (https ://genom e.ucsc.edu/cgi-bin/hgTab les) [45]. We downloaded FASTA sequences that matched our Ensembl IDs and had a consensus coding sequences available. Analyses were done using scripts in R 3.2.3 and in perl 5.018002.

Data accessibility
Utah miRNA data are available through GEO (https :// www.ncbi.nlm.nih.gov/geo/query /acc.cgi?acc=GSE11 5513). Other data will be shared in accord with the signed consent and approved IRB studies. Contact study authors for data access.

Results
Colon cancer cases comprised 77.9% of the study population ( Table 1). The majority of cases were male (54.4%), non-Hispanic white (74.2%) and were diagnosed with a microsatellite stable (MSS) tumor. The mean age at diagnosis was 64.8 years.

Gene expression results
Of the 81 genes analyzed, 27 showed statistically significant differences in expression between carcinoma and normal mucosa with a more meaningful FC when tumor sub-site was not considered ( Table 2). Of these 27 genes, 13 were downregulated and 14 were up-regulated. Of the downregulated genes, three were part of the BMP family (BMP5, BMP6, and BMP2) and two were part of the growth differentiation factor family (GDF7 and GDF6). INHBA (Inhibin beta A), which encodes for a subunit of activin and inhibin, was strongly upregulated, while the gene encoding activin type 1 receptor, ACVR1C, was downregulated. Two BMPs (BMP4 and BMP7) were upregulated.   16:191 Figure 1 shows the KEGG TGFβ-Signaling Pathway and those genes that are up and downregulated within the pathway.
A small number of genes had either slightly stronger associations (as indicated by FC) for the MSS (Additional file 1: Table S2) or MSI (Additional file 1: Table S3) phenotype. Associations were most similar for MSS tumors and overall tumors; one gene (THBS1) was associated overall (FC 0.66) and for MSS-specific tumors (FC 0.68) that would not have merited inclusion by our cutpoints for FC. AMH (anti-Mullerian Hormone) was upregulated in overall tumors (FC 1.74) to a greater effect than for MSS tumors (FC 1.48). There were considerably more differences observed between MSI tumors and overall tumors. Seven genes were downregulated in MSI tumors with a much greater effect than for MSS tumors. Two of these genes were much more strongly downregulated in MSI tumors (ID3 FC MSS 0.86 and FC MSI I 0.52; THBS1 For those genes that had significant differential expression with a FC of < 0.67 or > 1.50 we further evaluated their associations with sex, age, disease stage and survival months (Table 3). There were few correlations with any of these factors that we consider strong, although correlations > 0.134 were statistically significant with p values < 0.05. The strongest correlations for these factors were: INHBB with sex (r = − 0.175) indicating slightly stronger CDKN2B expression levels among women; BMP2 with age (r = 0.224) indicating less expression with increasing age; CDKN2B with disease stage (r = 0.197) indicating greater correlation with more advanced disease stage; and BMP4 with survival (r = − 0.174) indicating worse survival with greater BMP4 expression.

mRNA and miRNA associations
Eleven of the dysregulated genes were associated with miRNA differential expression ( Table 4). All of the associated genes were upregulated except for THBS1. While two of the genes were only associated with a few miRNAs, E2F5 with miR-17-5p and miR-20a-5p, and THBS1 with miR-145-5p, the others were associated with seven or more miRNAs. The miRNAs associated with E2F5 had seed-region matches with this gene. RBL1 was associated with 11 miRNAs (five with a seed-region match), TGFBR1 with 11 miRNAs (nine with a seed-region match), TGIF2 with 26 miRNAs (eight with a seed-region match), INHBA with seven miRNAs (two with a seed-region match), MYC with 12 miRNAs, and TFDP1 with seven miRNAs. Of the miRNAs and mRNAs with a seed-region match four had inverse associations as indicated by a negative beta coefficient (TGFBR1 with miR-2117 and miR-6071; TGIF2 with miR-375 and miR-4749-3p). The FC of the differentially expressed miRNAs varied less in the population than the differential expression of the mRNAs.  16:191 Although variability in the population still existed, when a population FC of > 1.50 or < 0.67 was observed for the miRNA, a much larger percentage of the population had a significant up or downregulated miRNA. The stronger the population FC, the larger percentage of individuals had a similar FC. Figure 1 displays the miRNAs with their associated mRNAs in the KEGG TGFβ-signaling pathway. Figure 2 highlights key components of the pathway, including those genes acting as extracellular factors (Fig. 2a), those acting as membrane receptors (Fig. 2b) and those acting as nuclear factors (Fig. 2c) that were associated miRNAs. Seed region matches are shown with a -| between the miRNA and mRNA, while interactions without seed matches are shown with a line.

Discussion
The TGFβ pathway is one of the most important pathways in the development of CRC [1]. In normal cells the TGFβ pathway plays an important role in suppressing growth and tumorigenesis; however, as cancer progresses the TGFβ pathway promotes epithelial-mesenchymal transition, invasion and metastasis [24]. In this study, of the 27 dysregulated genes in this pathway, 13 genes were significantly downregulated. The genes that were downregulated included BMP5, BMP6 and BMP2, in addition to GDF7 and GDF6. We have previously shown an association between BMP2 SNPs and CRC risk [46]. BMP5 and BMP6 play an important role in various cancers; both have been shown to induce apoptosis [47]. BMP6 has previously been shown to be downregulated in metastatic breast cancer [48]. Our data suggests these genes play an important inhibitory role in CRC tumorigenesis as well, and therefore, are downregulated similar to other cancers.
Growth differentiation factors are also known to have inhibitory effects on growth in human cancers, and both GDF9a and GDF9b have been associated with breast cancer [47]. We found that both GDF6 and GDF7 were downregulated in CRC tissues. While we could find no earlier documentation of these genes having an association with CRC risk, GDF7 is a ligand in the BMP pathway that has been shown to regulate the Hedgehog and Wnt pathways and has been associated with the pro-inflammatory phenotypes in many diseases including Barrett's Esophagus [49] and esophageal adenocarcinoma [50]. Additionally, the role of Wnt in CRC has been established [51].
The genes that were upregulated in our study included BMP4, BMP7, TGFBR1, TGFB2, TGIF1, TGIF2, and TFDP1. BMP4 is known to regulate the SMAD 1/5 signal        16:191 transduction pathway and plays an important role in angiogenesis [10]. Our data suggest that in CRC, multiple BMPs have altered expression levels. However, it is interesting to note that downregulated BMP5, BMP6, and BMP2 were some of the most consistently downregulated genes in the population, with approximately 80% of cases having expression of these genes downregulated in tumors; BMP4 and BMP7 were upregulated in over 50% of CRC. BMPs are members of a pleiotropic family of growth factors whose signaling has been shown to protect against colonic polyposis with known mutations that can contribute to the development of the tumor microenvironment [52]. Loss of BMP signals has been cited as one of the two main genetic alterations leading to CRC; disrupted BMP signaling allows tumor growth and expansion [53]. BMPs have been shown to upregulate various cytokines important in cancer development and metastasis, and BMP expression levels have been associated with survival and prognosis in various cancers [54,55]. Our data support these earlier findings of the importance of BMPs in the development and progression of CRC, in that BMP7, BMP5, BMP4, BMP2, and BMP6 were all significantly differentially expressed in CRC tumors and BMP4 differential expression was inversely associated with survival months. Moreover, they suggest that BMPs play very diverse roles in regulating the TGFβ pathway. Meaning that through their diverse group of downstream ligands and targets, some members of the BMP family can play an important inhibitory effect in tumorigenesis, and therefore are downregulated in CRC, while others play a stimulatory effect in tumorigenesis, and therefore must be upregulated in CRC. In addition to evaluating the role of the TGFβ pathway in overall CRC risk, we evaluated gene expression levels for associations with mismatch repair proficiency and deficiency. Our data suggest that several genes in this pathway have significant associations with these phenotypes. Notably, THBS1 was associated with MSI-specific tumors. Members of the thrombospondin family previously have been associated with differential expression in different CRC phenotypes; THBS4 has been associated with CIMP high tumors, of which a subset are mismatch repair deficient [56]. THBS1 has previously been associated with the development of the tumor micro-environment and angiogenesis [57] and metastasis [15]. AMH was upregulated in all tumors, but had lower expression levels in MSS tumors and the highest expression levels in MSI tumors. AMH is selectively expressed in epithelial cells versus mesenchymal cells and a loss of AMH is known to induce EMT in various cancers [58]. AMH signaling induces downstream phosphorylation of SMAD 1/5/8 [58]. Notably SMADs 1/5/8 also are known to be downstream elements in the JNK and ERK pathways [59]. TGFβ is a known activator of the ERK, JNK and p38 MAPK pathways [60], and our data suggests that AMH is a possible mediator of this cross-talk between pathways. We have previously shown that both ERK and JNK dependent pathways are significantly associated with CIMP, MSI and TP53 phenotypes in colon and rectal cancer [61]. Therefore, it is possible that cross-talk between the TGFβ-signaling pathway and other pro-inflammatory pathways plays an important role in the determination of tumor molecular phenotype.
Our data also showed that eight genes in the TGFβsignaling pathway are associated with differential miRNA expression. MiRNAs have previously been recognized as important in many pathways that regulate normal versus pathological function, and are dysregulated in most, if not all, cancers [62]. In particular, miRNAs have been suggested as potential modulators of the TGFβ-signaling pathways' seemingly paradoxical effects in regular colonic tissue and CRC development [24,63]. Additionally, the canonical BMP pathways have been shown to alter gene expression levels through miRNAs [26]. We have previously shown that miRNAs are extensively dysregulated in CRC [64] and that miRNAs can be used to differentiate between normal tissue and colonic carcinoma and rectal carcinoma at a molecular level [41].
In this study, we show that miRNAs play a role in the TGFβ's function in CRC development, supporting earlier findings. Most earlier studies have focused on one, or several miRNAs such as miR-21 and miR-155 [65], miR-590 [28], miR-17-92 [29] and miR-494 and miR-126-5p [26] or else focus on select genes in the TGFβ pathway, such as THBS1 [15]. Here we assessed the associations between every gene in the KEGG TGFβ-signaling pathway and the expression levels of over 800 miRNAs. This allowed us to confirm earlier findings of associations with the miR-17-92 cluster (miR-17, 19, 20 and 92), in addition to detecting previously un-reported associations between a diverse array of miRNAs and genes in the TGFβsignaling pathway. Moreover, we had previously reported that miR-17-5p, miR-20a-5p, miR-145-5p, miR-3651 and miR-92-3p were associated with both colon and rectal cancer, while miR-4749-3p was associated with rectal cancer only and miR-663a was associated with enriched biologic processes in normal colonic mucosa [41]. Here we expand upon our previous findings and suggest that one of the mechanisms by which these miRNAs are associated with colorectal cancer may involve the dysregulation of the TGFβ pathway.
Our results suggest that the component of the TGFβsignaling pathway most influenced by miRNAs is that part of the pathway that leads to apoptosis and cell cycle control (see Figs. 1, 2c). Dysregulated genes associated with miRNAs were TGFβR1, TGIF2, E2F4/E2F5,