- Open Access
Integrative analysis of the cancer genome atlas and cancer cell lines encyclopedia large-scale genomic databases: MUC4/MUC16/MUC20 signature is associated with poor survival in human carcinomas
Journal of Translational Medicine volume 16, Article number: 259 (2018)
MUC4 is a membrane-bound mucin that promotes carcinogenetic progression and is often proposed as a promising biomarker for various carcinomas. In this manuscript, we analyzed large scale genomic datasets in order to evaluate MUC4 expression, identify genes that are correlated with MUC4 and propose new signatures as a prognostic marker of epithelial cancers.
Using cBioportal or SurvExpress tools, we studied MUC4 expression in large-scale genomic public datasets of human cancer (the cancer genome atlas, TCGA) and cancer cell line encyclopedia (CCLE).
We identified 187 co-expressed genes for which the expression is correlated with MUC4 expression. Gene ontology analysis showed they are notably involved in cell adhesion, cell–cell junctions, glycosylation and cell signaling. In addition, we showed that MUC4 expression is correlated with MUC16 and MUC20, two other membrane-bound mucins. We showed that MUC4 expression is associated with a poorer overall survival in TCGA cancers with different localizations including pancreatic cancer, bladder cancer, colon cancer, lung adenocarcinoma, lung squamous adenocarcinoma, skin cancer and stomach cancer. We showed that the combination of MUC4, MUC16 and MUC20 signature is associated with statistically significant reduced overall survival and increased hazard ratio in pancreatic, colon and stomach cancer.
Altogether, this study provides the link between (i) MUC4 expression and clinical outcome in cancer and (ii) MUC4 expression and correlated genes involved in cell adhesion, cell–cell junctions, glycosylation and cell signaling. We propose the MUC4/MUC16/MUC20high signature as a marker of poor prognostic for pancreatic, colon and stomach cancers.
The cancer genome atlas (TCGA) was developed by National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI) in order to provide comprehensive mapping of the key genomic changes that occur during carcinogenesis. Datasets of more than 11,000 patients of 33 different types of tumors are publically available. In parallel, cancer cell line encyclopedia (CCLE), a large-scale genomic dataset of human cancer cell lines, was generated by the Broad Institute and Novartis in order to reflect the genomic diversity of human cancers and provide complete preclinical datasets for mutation, copy number variation and mRNA expression studies . In order to analyse this kind of large scale datasets, several useful online tools have been created. cBioportal is an open-access database analysis tool developed at the Memorial Sloan-Kettering Cancer Centre (MSKCC) to analyze large-scale cancer genomics data sets [2, 3]. SurvExpress is another online tool for biomarker validation using 225 datasets available and therefore provide key information linking gene expression and the impact on cancer outcome .
Mucins are large high molecular weight glycoproteins that are classified in two sub groups: (i) the secreted mucins that are responsible of rheologic properties of mucus and (ii) the membrane-bound mucins that include MUC4, MUC16 and MUC20 [5, 6]. MUC4 was first discovered in our laboratory 25 years ago from a tracheobronchial cDNA library . MUC4 is characterized by a long hyper-glycosylated extracellular domain, Epidermal Growth Factor (EGF)-like domains, a hydrophobic transmembrane domain, and a short cytoplasmic tail. MUC4 also contains NIDO, AMOP and vWF-D domains . A direct interaction between MUC4 and its membrane partner, the oncogenic receptor ErbB2, alters downstream signaling pathways . MUC4 is expressed at the surface of epithelial cells from gastrointestinal and respiratory tracts  and has been studied in various cancers where it is generally overexpressed and described as an oncomucin and has been proposed as an attractive prognostic tumor biomarker. Its biological role has been mainly evaluated in pancreatic, ovarian, esophagus and lung cancers [9, 11,12,13,14]. Other membrane-bound mucins MUC16 and MUC20 share some functional features but evolved from distinct ancestors . MUC20 gene is located on the chromosomic region 3q29 close to MUC4. MUC16, also known as the CA125 antigen, is a routinely used serum marker for the diagnosis of ovarian cancer . Both mucins favor tumor aggressiveness and are associated with poor overall survival and could be proposed as prognosis factors [16,17,18].
In this manuscript, we have used the online tools cBioportal, DAVID6.8 and SurvExpress in order to (i) evaluate MUC4 expression in various carcinomas, (ii) identify genes that are correlated with MUC4 and evaluate their roles and (iii) propose MUC4/MUC16/MUC20 combination as a prognostic marker of pancreatic, colon and stomach cancers.
Expression analysis from public datasets
MUC4 z-score expressions were extracted from databases available at cBioPortal for Cancer Genomics [2, 3]. This portal stores expression data and clinical attributes. The z-score for MUC4 mRNA expression is determined for each sample by comparing mRNA expression to the distribution in a reference population harboring typical expression for the gene. The query “MUC4” was realized in CCLE (881 samples, Broad Institute, Novartis Institutes for Biomedical Research)  and in all TCGA datasets available (13,489 human samples, TCGA Research Network (http://cancergenome.nih.gov/)). The mRNA expression from selected data was plotted in relation to the clinical attribute (tumor type and histology) in each sample. MUC4 expression was analyzed in normal tissues by using the Genome Tissue Expression (GTEX) tool [19, 20]. Data were extracted from GTEX portal on 06/29/17 (dbGaP accession phs000424.v6.p1) using the 4585 Entrez gene ID.
DAVID6.8 identification and gene ontology of genes correlated with MUC4
We established a list of 187 genes that are correlated with MUC4 expression in CCLE dataset out of 16208 genes analyzed with cBioportal tool on co-expression tab. These genes harbor a correlation with both Pearson’s and Spearman’s higher than 0.3 or lower than − 0.3. Functional annotation and ontology clustering of the complete list of genes were performed using David Functional Annotation Tool (https://david.ncifcrf.gov/) and Homo sapiens background [21, 22]. Enrichment scores of ontology clusters are provided by the online tool.
Interaction of proteins correlated with MUC4 was determined using String 10 tool (https://string-db.org/) . Edges represent protein–protein associations such as known interactions (from curated databases or experimentally determined), predicted interactions (from gene neighborhood, gene fusion or co-occurrence), text-mining, co-expression or protein homology. The network was divided in 3 clusters based on k-means clustering.
Methylation and copy number analysis
Using (https://portals.broadinstitute.org/ccle), we extracted mRNA expression of MUC4, methylation score (Reduced Representation Bisulfite Sequencing: RRBS) and copy number variations of the genes of interest. The mRNA expression of MUC4 was plotted in relation to log2 copy number or RRBS score.
SurvExpress survival analysis
Survival analysis was performed using the SurvExpress online tool available in bioinformatica.mty.itesm.mx/SurvExpress (Aguire Gamboa PLos One 2013). We used the optimized algorithm that generates risk group by sorting prognostic index (higher value of MUC4 for higher risk) and split the two cohorts where the p-value is minimal. Hazard ratio [95% confidence interval (CI)] was also evaluated. The tool also provided a box plot of genes expression and the corresponding p value testing the differences.
Gene Expression Omnibus microarray
GSE28735 and GSE16515 pancreatic cancer microarrays were analysed from the NCBI Gene Expression Omnibus (GEO) database (http://www.ncbi.nml.nih.gov/geo/). GSE28735 is a dataset containing 45 normal pancreas (adjacent non tumoral, ANT) and 45 tumor (T) tissues from pancreatic ductal adenocarcinoma (PDAC) cases. GSE16515 contains 52 samples (16 had both tumor and normal expression data, and 20 only had tumor data. Data were analysed using GEO2R software. The dataset GSE28735 used Affymetrix GeneChip Human Gene 1.0 ST array. The dataset GSE16515 used the Affymetrix Human Genome U133 Plus 2.0 Array. GSE13507 contains 165 bladder cancer and 58 ANT samples. GSE30219 contains 14 normal lung, 85 adenocarcinomas and 61 squamous cancer samples. GSE40967 contains 566 colorectal cancers and 19 normal mucosae. GSE27342 contains 80 tumors and 80 paired ANT tissues. GSE4587 contains 2 normal, 2 melanomas and 2 metastatic melanomas. GSE14407 contains 12 ovarian adenocarcinomas and 12 normal ovary samples.
For MUC4 expression analysis, paired and unpaired t test statistical analyses were performed using the Graphpad Prism 6.0 software (Graphpad softwares Inc., La Jolla, CA, USA). p < 0.05 was considered as statistically significant. Receiving operator characteristic (ROC) curves and areas under ROC (AUROC) were evaluated by comparing tumor and ANT values. cBioportal provided Pearson and Spearman tests were performed to analyze correlation of other genes, RRBS score and log2 copy number with MUC4 expression. DAVID tool provided p value of each ontology enrichment score. SurvExpress tool provided statistical analysis of hazard ratio and overall survival. A Log rank testing evaluated the equality of survival curves between the high and low risk groups.
MUC4 expression analysis in databases
MUC4 expression was analyzed from databases available at cBioPortal for Cancer Genomics [2, 3]. We queried for MUC4 mRNA expression in the 881 samples from CCLE  (Fig. 1). The oncoprint showed that MUC4 was altered in 195 samples out of 881 (22%). 188 were amplification (n = 120) or mRNA upregulation (n = 88) (Additional file 1: Figure S1). Results were sorted depending on the tumor type. We mainly observed an important z-score expression of MUC4 in carcinoma samples (n = 538 samples, p = 0.001) (Fig. 2a). MUC4 Expression scores were subsequently sorted depending on the organ (Fig. 2b). As expected, pancreatic cancer cell lines harbor the highest MUC4 expression (n = 35, z-score = 2.166, p = 0.0006 against theoretical control median = 0). Other cell lines from different tissues (lung NSC, esophagus, bile duct, stomach, upper digestive, colorectal, ovary, and urinary tract) showed statistically significant alteration. We also performed a similar analysis on 13 489 human samples retrieved from TCGA by using the cBioportal platform. An important MUC4 expression z-score was observed in bladder urothelial carcinoma, cervical squamous cell carcinoma/endocervical adenocarcinoma, colorectal carcinoma, esophageal carcinoma, head and neck squamous cell carcinoma, lung adenocarcinoma, lung squamous cell carcinoma, ovarian serous cystadenocarcinoma, pancreatic adenocarcinoma, prostate adenocarcinoma, stomach adenocarcinoma and uterine corpus endometrial carcinoma (Fig. 3). Expression of MUC4 in normal tissues was analyzed using the GTEX project tool, MUC4 was expressed in lung, testis, small intestine, terminal, ileum, prostate, vagina, minor salivary gland and esophagus mucosa and transverse colon (Additional file 2: Figure S2). Altogether, this shows that MUC4 high expression is observed in carcinoma and notably in pancreatic cancer.
MUC4 co-regulated genes
Using the co-expression tool on expression data extracted from the 881 samples of CCLE , we obtained a list of genes that are co-expressed with MUC4. Genes that harbor a correlation with both Pearson’s and Spearman’s higher than 0.3 or lower than − 0.3 were selected. 187 genes are positively (n = 178) or negatively (n = 9) correlated with MUC4 expression. The better correlated genes were Adhesion G Protein-Coupled Receptor F1 (ADGRF1, Pearson’s correlation = 0.56) and Lipocalin2 (LCN2, Pearson’s correlation = 0.54) (Table 1). We also observed that expression of other membrane-bound mucins MUC16 and MUC20 are positively correlated with MUC4. Correlation between MUC16 and MUC20 was also observed (not shown). Only few genes were negatively correlated such as ZEB1 transcription factor or ST3 Beta-Galactoside Alpha-2,3-Sialyltransferase 2 (ST3GAL2) (Table 2).
Functional Annotation of the complete list of genes and ontology clustering were performed using David Functional Annotation Tool. The gene clustering analysis is presented in Table 3. The complete gene ontologies that are statistically significant are provided in Additional file 3: Table S1. We observed the highest enrichment scores in gene clusters involved in cell adhesion (7.08) and tight junction (5.44) (Table 3). Notably, we observed the correlation of expression of MUC4 with genes encoding integrins (ITGB4 and ITGB6) and cadherin-type proteins such as CDH1, CDH3, Desmocollin 2 (DSC2). A strong enrichment of 91 transmembrane proteins was observed including EPH Receptor A1 (EPHA1), Epithelial cell adhesion molecule (EPCAM), Carcinoembryonic Antigen Related Cell Adhesion Molecule-5 and -6 (CEACAM5 and CEACAM6), C-X-C motif chemokine ligand 16 (CXCL16) and ATPase Secretory Pathway Ca2+ Transporting 2 (ATP2C2). As MUC4 is a glycoprotein, it is interesting to also note the correlated expression of enzymes involved in different steps of glycosylation such as sialyltransferases (ST3GAL2, ST6GALNAC1), beta-1,3-N-acetylglucosaminyltransferases (B3GNT5, B3GNT3), fucosyltransferases (FUT3, FUT2), and UDP-GalNAc transferase (GALNT3). MUC4 was also associated with genes associated with cell signaling containing SH2 domain (Cbl proto-oncogene C (CBLC), signal transducing adaptor family member 2 (STAP2), dual adaptor of phosphotyrosine and 3-phosphoinositides 1 (DAPP1), SH2 domain containing 3A (SH2D3A), protein tyrosine kinase 6 (PTK6), growth factor receptor bound protein 7 (GRB7), fyn related Src family tyrosine kinase (FRK), tensin 4 (TNS4)) or SH3 domains (MET transcriptional regulator (MACC1), Rho GTPase activating protein 27 (ARHGAP27), tight junction protein 2 (TJP2), Rho guanine nucleotide exchange factor-5 and -16 (ARHGEF5, ARHGEF16), protein tyrosine kinase 6 (PTK6), EPS8 like 1 (EPS8L1), tight junction protein 3 (TJP3) and FRK). Finally, several genes encoding proteins with a SEA domain (ADGRF1, ST14, MUC16) were correlated with MUC4 expression. Additionally, we analyzed protein–protein interactions of differentially expressed proteins with MUC4 with the String 10 tool. We showed that MUC4 is directly related with CEACAM5, CEACAM6, MUC16, MUC20 and glycosylation enzymes (ST3GAL2, B3GNT3, B3GNT5 and GALNT3) (Additional file 4: Figure S3). Altogether, we have identified genes with expression correlated with MUC4 involved notably in cell adhesion, cell–cell junctions, glycosylation and cell signaling. In order to understand the association between the observed aberrant expression of MUC4 and other molecular events, we explored the correlation between MUC4 expression in CCLE and DNA methylation (RRBS) of the top genes correlated with MUC4. We observed that MUC4 expression is negatively correlated with the methylation score of 16 out of 20 of the top genes (LCN2, MUC20, STEAP4, WFDC2, GJB3, SH2D3A, RNF39, PRSS22, HS3ST1, GPR87, TACST2, FAM83A, LAMC2, B3GNT3, CLDN7) (Fig. 4) suggesting that the association of MUC4 and the correlated genes could be mediated by methylation regulation. Only ADGRF1 RBBS is not correlated with MUC4 mRNA level. MUC16, SCEL and C1ORF116 scores were not available. Additionally we also evaluated the copy number variation association of the top genes with MUC4 expression. We only observed a weak amplification of MUC20 copy number (Pearson’s correlation = 0.13) and a weak deletion of MUC16 copy number (Pearson’s correlation = − 0.14) suggesting that the relationship between MUC4 expression and copy number variation of top genes is unlikely (Additional file 5: Figure S4).
MUC4 and patient survival
To establish a correlation between MUC4 expression and patient survival, we have compared survival analysis and hazard ratio in population designated as MUC4 high risk and low risk in every organ from TCGA datasets (Table 4). We have used SurvExpress optimized algorithm that generates risk group by sorting prognostic index (higher value of MUC4 for higher risk). The algorithm splits the populations where the p-value testing the difference of MUC4 expression is minimal . Pancreatic cancer presented the most important hazard ratio for MUC4 (HR = 3.94 [CI 1.81–8.61] p = 0.0005756) (Fig. 5a). MUC4 high risk was also significantly associated with survival in bladder cancer (HR = 1.48), colon cancer (HR = 2.1), lung adenocarcinoma (HR = 1.7), lung squamous carcinoma (HR = 1.69), ovarian cancer (HR = 1.33), skin cancer (HR = 1.87) and stomach cancer (HR = 1.58) (Fig. 5a). Acute myeloid leukemia (HR = 1.59) and liver cancer (HR = 1.4) almost reach statistical significance. Other datasets did not show any statistically significant differences.
A significant reduction in patient’s survival was observed in bladder cancer (p = 0.01135), colon cancer (p = 0.00891), lung adenocarcinoma (p = 0.008187), lung squamous carcinoma (p = 0.03586), ovarian cancer (p = 0.0186), pancreatic cancer (p = 0.000219), skin cancer (p = 0.02384) and stomach cancer (p = 0.04751) as illustrated in Kaplan–Meier curves (Fig. 5b). Strikingly, pancreatic median survival was 593 days in MUC4high cohort (n = 149) whereas the 50% survival was not reached in MUC4low cohort (n = 27). In lung squamous carcinoma, the median survival of MUC4high cohort (n = 116) was 1067 days whereas MUC4low cohort (n = 59) presented a 2170 days median survival. It is interesting to note that the algorithm splits the population in two parts that were characterized as the most different regarding MUC4 expression. Therefore, there are a modest number of MUC4low PDAC or lung adenocarcinoma patients and a low number of MUC4high colon or stomach cancer patients. A similar survival analysis was performed on pancreatic cancer by dividing the patient population in two equal parts (88 vs 88), MUC4high harbored a decreased survival that was close to statistical significance (p = 0.06784) (not shown). Therefore, MUC4 expression is associated with a poorer overall survival in different cancers including pancreatic cancer.
We also compared the survival and hazard ratio, in the same cancers whose survival is associated with MUC4 (bladder cancer, colon cancer, lung adenocarcinoma, lung squamous carcinoma, ovarian cancer, pancreatic cancer, skin cancer and stomach cancer), according to gene signatures corresponding to the five first gene ontology term from Additional file 3: Table S1 (GO 0031424: keratinization, GO 0007155: cell adhesion, GO 0019897: extrinsic component of plasma membrane, GO 0016323: basolateral plasma membrane and GO 0016324: apical plasma membrane) (Fig. 6a, Additional file 6: Table S2). These gene signatures were all significantly associated with survival in the TCGA dataset tested. The “keratinization” (GO 0031424) and “cell adhesion” (GO 0007155) signature are associated with HR comprised between 1.65 and 3.76 and between 2.15 and 3.23, respectively. The GO 0019897 signature is associated with weaker HR (1.55–2.30). “basolateral” (GO 0016323) and “apical plasma membrane” (GO 0016324) signatures harbor more increased HR (2.21–4.5 and 1.77–4.42, respectively) in these datasets.
We performed a similar analysis according to the top genes (ADGRF1, LCN2, MUC20, C1ORF116, SCEL, STEAP4) that harbored Pearson’s correlation with MUC4 superior to 0.5 (Fig. 6b, Additional file 7: Table S3). This signature is associated with survival in all TCGA dataset tested (HR comprised between 1.91 and 8.77). Notably, pancreatic cancer harbored the strongest association with survival according to this signature (HR = 8.77 [CI 2.15–35.83]). Overall, these bigger signatures harbored higher hazard ratio compared to MUC4 alone.
MUC4, MUC16 and MUC20 signature in cancer
Mucins have been proposed as potential biomarkers for carcinoma. Notably, previous work suggested that combination of mucins expression may be useful for early detection and evaluation of malignancy of pancreatobiliary neoplasms . Moreover, MUC16/CA125 antigen is an already routinely used serum marker for the diagnosis of ovarian cancer . Therefore, we decided to intentionally focus on the two other membrane bound mucins MUC16 and MUC20 that were correlated with expression of MUC4. We analyzed the survival curves of the high risk group (MUC4/MUC16/MUC20high, n = 159) and low risk group (MUC4/MUC16/MUC20low, n = 17) from the pancreas TCGA dataset. The MUC4/MUC16/MUC20high risk group was associated with an increased hazard ratio (HR = 6.5 [2.04–20.78], p = 0.001582) and a shorter overall survival (p = 0.0003088) (Fig. 7a). Median survival was similar as in MUC4high cohort (593 days). The MUC4/MUC16/MUC20high group harbored a statistically significant increase of MUC4, MUC16 and MUC20 expression (Fig. 7b). We also analyzed overall survival in every other PDAC database available in Surexpress. We show that MUC4high group was associated with a statistically significant reduced overall survival and increased hazard ratio in both ICGC and Stratford (GSE21501) cohorts (Fig. 7c). In Zhang cohort (GSE28735), MUC4high group was associated with a reduced overall survival that was close to statistical significance (p = 0.08971). In other organs, the MUC4/MUC16/MUC20high group was associated with an increased hazard ratio and reduced overall survival in bladder cancer, colon cancer, lung adenocarcinoma, lung squamous adenocarcinoma, skin cancer, stomach cancer (Additional file 8: Figure S5A). Notably, the MUC4/MUC16/MUC20high group in colon cancer (HR = 2.26 [1.51–3.4]) showed a median survival of 1741 days whereas the low risk group did not reach the 50% survival. Similarly, the MUC4/MUC16/MUC20high group in stomach cancer showed a median survival of 762 days whereas the low risk had a median survival of 1811 days. No significant difference was observed for ovarian cancer (p = 0.2081). Moreover, a reduced overall survival was observed in liver cancer (p = 0.04789) and acute myeloid leukemia (AML) (p = 0.02577) (Additional file 8: Figure S5B) in which we did not show any statistical difference when sorting the patients for MUC4 alone. Overall, we observed that MUC4/MUC16/MUC20 signature harbored an increased hazard ratio compared with MUC4 alone for pancreatic cancer and to a lower extent in bladder cancer, colon cancer, lung squamous cancer and stomach cancer.
We analyzed MUC4, MUC16 and MUC20 expression in pancreatic tumor (T) and paired adjacent non tumoral tissues (ANT) from GSE28735 (Fig. 6) and GSE16515 (not shown) datasets [25, 26]. We confirmed MUC4 overexpression in tumor tissues (p < 0.0001). MUC16 and MUC20 mRNA level were also increased (p < 0.0001 and p = 0.0062) in tumor samples (Fig. 8a). As previously observed in CCLE dataset, MUC4 expression was correlated with MUC16 (p = 0.0006) and MUC20 (p = 0.0621) in GSE28735 (Additional file 9: Figure S6). We also analyzed MUC4, MUC16 and MUC20 expression in datasets of other cancers (Additional file 10: Figure S7). MUC4 expression is increased in bladder cancer vs ANT (GSE13507, p < 0.01). MUC20 is increased in lung adenocarcinoma vs normal samples (GSE30219, p < 0.05). MUC4 and MUC20 expression is increased in colorectal cancer vs normal mucosae (GSE40967, p < 0.01). MUC16 and MUC20 relative expression is increased in ovarian adenocarcinoma (GSE14407, p < 0.01 and p < 0.05 respectively). ROC curves of MUC4, MUC16, MUC20 and MUC4 + MUC16 + MUC20 combination were established using GSE28735 dataset. The combination of MUC4 + MUC16 + MUC20 produced a high specificity of 97.78% (88.23–99.94) and a mild sensitivity of 55.56% (40–70.36) (likelihood ratio = 25) (Fig. 8b). Similar results were obtained for GSE16515 with 93.75% specificity and 69.44% sensitivity (LR ± 11.11) (not shown). MUC16 AUROC was similar to that of MUC4 + MUC16 + MUC20 in GSE28735 dataset but harbored a lower specificity/sensitivity in GSE16515.
Altogether, this suggests that MUC4/MUC16/MUC20high signature would be useful in stratification of patients with worst prognosis in several carcinoma and notably pancreatic, stomach and colon cancers.
The TCGA and the CCLE have provided a tremendous amount of publicly available data combining gene expression information related to clinical outcome. Web-based tools allow the scientific community to perform powerful large scale genomic analysis and propose new biomarkers or new therapeutic targets. In the present report, we analyzed MUC4 expression systematically in all organs and confirmed its aberrant expression in associated carcinoma. We identified 187 genes for which the expression is correlated with MUC4 expression. These genes are involved in cell adhesion, cell–cell junctions, glycosylation and cell signaling. MUC4 was also correlated with MUC16 and MUC20 membrane-bound mucins. This combination is associated with a poorer overall survival in different cancers including pancreatic, colon and stomach cancers suggesting MUC4/MUC16/MUC20 as a poor prognostic signature for these cancers.
Previous works have showed that MUC4 is altered in normal, premalignant and malignant epithelia of the digestive tract . The mechanisms underlying this alteration of expression are diverse and involve regulators such as growth factors, cytokines, demethylation of promoters and miRNA [28,29,30,31,32]. In the present manuscript we also observe that MUC4 gene is amplified in 13% of cancer cell lines. We also found a mild correlation between alteration of MUC4 copy number and MUC4 expression suggesting that gene amplification could also mediate this MUC4 aberrant expression. This kind of regulation is scarcely described in the literature. In TCGA, We confirmed that MUC4 expression was observed mainly in human carcinomas including bladder, cervix, head and neck, lung, ovarian, pancreatic, prostate, stomach carcinomas. For most of these organs, MUC4 high expression was associated with a poorer overall survival. MUC4 is one of the most differentially expressed genes in pancreatic cancer that are thought to be potential clinical targets . Recently, a meta-analysis based on 1900 patients from 18 studies showed that MUC4 overexpression was associated with tumor stage, tumor invasion and lymph node metastasis . A worse overall survival was observed in MUC4-overexpressing patients with biliary tract carcinoma (HR 2.41), pancreatic cancer (HR 2.01), and colorectal cancer (HR 1.73). Using the TCGA cohorts, we extended this finding on lung adenocarcinoma, lung squamous carcinoma, ovarian cancer, skin cancer and stomach cancer. The authors noted that a limit of this meta-analysis was insufficient statistical power of some eligible studies. The large scale genomic approach of TCGA helps us to overcome this limitation. Based on available TCGA datasets, mucin mutation map was generated by cBioPortal Mutation Mapper . MUC4 mutations were notably observed in Kidney Clear Cell Renal Carcinoma (20–45%) and were correlated with survival outcomes. Rare mutations were described in the main overexpressing model that is pancreatic cancer. Because of the very large size of MUC4 gene, probability of acquiring mutation could be increased. MUC4 belongs to the most mutated genes upon stress exposure such as nicotine treatment or aging [36, 37]. The enrichment of mutation of MUC4 could be related with the fact that the first risk factor of kidney cancer is smoking  and that kidney cancer diagnosis is occurring at elder ages (65 years) . Pancreatic cancer shares these characteristics but harbors a very rare mutation occurrence (3%) suggesting that aging could be specific of cancers such as kidney or lung and that overexpression is more important for other cancers. So far, functional consequences of MUC4 mutation remain to be elucidated.
We and others have investigated MUC4 biological roles in various cancers such as pancreatic, ovarian, esophagus and lung cancers. MUC4 was shown to promote aggressiveness of tumors as it induces proliferation, migration, invasion, EMT, cell stemness and chemoresistance [9, 11,12,13,14]. In the present work, we showed that MUC4 expression was correlated with genes, such as integrins cadherin-type proteins, involved in cell adhesion and cell–cell junctions. As a membrane-bound mucin, MUC4 is thought to act on cell–cell and cell-MEC interaction. Because of its huge extracellular domain that profoundly modifies steric hindrance, MUC4 may alter migration, invasion and adherence properties . Rat homologue of MUC4, sialomucin complex (SMC), overexpression leads to suppression of cell adhesion . Notably, MUC4 overexpression disrupts the adherens junctions and leads to partial delocalization of E-cadherin to the apical surface of the cell causing loss of cell polarity . Moreover, interactions between MUC4 glycans and galectin-3 were shown to also mediate docking of circulating tumor cells to the surface of endothelial cells . The alteration of cell adhesion induced by MUC4 is one of the first steps toward the metastatic process. MUC4 expression was also correlated with several genes encoding glycosylation enzymes or glycoproteins. This essential set of genes is involved in a wide set of cellular function including cell adhesion, barrier role, interaction with selection of endothelial cells or regulation of cell signaling [5, 44]. The glycan-associated antigens are commonly associated with patient survival of gastrointestinal cancer . Alteration of MUC4 glycosylation is proposed to play a substantial role in binding properties mediated by the extracellular subunit of MUC4 and the NIDO domain . One should note that the expression of these genes is correlated with MUC4. However, a direct regulatory mechanism remains to be demonstrated in future studies.
In order to regulate these major biological properties, MUC4 has been commonly associated with cell signaling alteration and notably MAPK, NF-kB, or FAK signaling pathways. Interestingly, we observed that MUC4 expression is highly correlated with proteins containing Src Homology 2 (SH2) domain or Src Homology 3 (SH3) domains. Intracellular adaptor signaling proteins family is characterized by one SH2 and at least one SH3 domain and is crucial for effective integrating of intracellular and extracellular stimuli .
It is interesting to note that MUC4 expression is not correlated with MUC1 that is a major membrane-bound mucin commonly overexpressed in cancer [48, 49]. In the US, it was estimated that 900 000 cancers, out of 1 400 000, harbor overexpression of MUC1 highlighting its attractiveness as a therapeutic target. This could be explained by different regulatory mechanisms such as different signaling pathways or different miRNA regulating the two mucins.
MUC16 is the peptide part to the CA125 serum marker for ovarian cancer . MUC16 is a very large mucin (22 000 amino acid (aa)) that is heavily glycosylated and facilitates ovarian cancer. MUC20 is a small mucin (500 aa) mostly expressed in renal proximal tube and that is deregulated in several cancers such as colorectal or ovarian cancers where it favors aggressiveness [17, 18]. MUC16/CA125 is routinely used in clinics unlike MUC4 and MUC20. In the present manuscript, we showed that expression of MUC16 and MUC20 are positively correlated with MUC4 and that the MUC4/MUC16/MUC20high combinatory expression is associated with an increased hazard ratio and reduced overall survival suggesting a potential for this signature as a prognostic marker for several carcinomas and notably pancreatic, stomach and colon cancer. Biomarkers for pancreatic cancer are needed for detection and evaluation of response to therapy . Unfortunately, the marker currently used (CA19.9) lacks sensitivity or specificity to be used in cancer diagnosis. Similarly established biomarkers with adequate sensitivity and specificity are lacking for gastric cancer . The need of biomarkers is less urgent for colorectal cancer since several predictive/prognostic/diagnostic biomarkers have been described .
The present work highlights the relationship between MUC4/MUC16/MUC20 expression and overall survival. This signature could be proposed as a prognostic marker. Moreover, MUC4 is expressed in the earliest stage (PanIN1A) of pancreatic cancer but is not specific enough. The potential of the combination MUC4/MUC16/MUC20 as a diagnosis marker is not known and remains to be investigated in the future. Moreover, development of unsupervised algorithm will allow the identification of new non intentional bigger signatures leading to better prognostic and predictive performances. Genome wide computational unsupervised procedures from discovery datasets will help to determine hypothesis signature. The signature will be subsequently validated on a number of independents datasets. Thus, multi-platform analysis using TCGA datasets helped to characterize the complex molecular landscape of PDAC . Another meta-analysis approach based on PDAC datasets allowed the identification of a 5 genes classifier signature (TMPRSS4, AHNAK2, POSTN, ECT2, SERPINB5) with 95% sensitivity and 89% specificity in discriminating PDAC from non-tumor samples . Interestingly, TMPRSS4 and SERPINB5 are two genes belonging to the gene list correlated with MUC4 expression.
We analyzed MUC4 expression systematically in all organs in TCGA and CCLE large scale databases and confirmed its aberrant expression in associated carcinoma and the MUC4 impact on patient’s survival. Moreover, 187 genes (involved in cell adhesion, cell–cell junctions, glycosylation and cell signaling) were correlated with MUC4. Among them, MUC16 and MUC20 membrane-bound mucins and their combination MUC4/MUC16/MUC20 is associated with a poorer overall survival in different cancers including pancreatic, colon and stomach cancers suggesting MUC4/MUC16/MUC20 as a poor prognostic signature for these cancers. This potential as new biomarkers remains to be investigated in the future.
area under receiving operator characteristic
cancer cell line encyclopedia
pancreatic ductal adenocarcinoma
receiving operator characteristic
the cancer genome atlas
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483(7391):603–7.
Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2(5):401–4.
Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6(269):l1.
Aguirre-Gamboa R, Gomez-Rueda H, Martinez-Ledesma E, Martinez-Torteya A, Chacolla-Huaringa R, Rodriguez-Barrientos A, et al. SurvExpress: an online biomarker validation tool and database for cancer gene expression data using survival analysis. PLoS ONE. 2013;8(9):e74250.
Corfield AP. Mucins: a biologically relevant glycan barrier in mucosal protection. Biochim Biophys Acta. 2015;1850(1):236–52.
Dekker J, Rossen JW, Buller HA, Einerhand AW. The MUC family: an obituary. Trends Biochem Sci. 2002;27(3):126–31.
Porchet N, Nguyen VC, Dufosse J, Audie JP, Guyonnet-Duperat V, Gross MS, et al. Molecular cloning and chromosomal localization of a novel human tracheo-bronchial mucin cDNA containing tandemly repeated sequences of 48 base pairs. Biochem Biophys Res Commun. 1991;175(2):414–22.
Jonckheere N, Skrypek N, Frenois F, Van Seuningen I. Membrane-bound mucin modular domains: from structure to function. Biochimie. 2013;95(6):1077–86.
Jonckheere N, Skrypek N, Merlin J, Dessein AF, Dumont P, Leteurtre E, et al. The mucin MUC4 and its membrane partner ErbB2 regulate biological properties of human CAPAN-2 pancreatic cancer cells via different signalling pathways. PLoS ONE. 2012;7(2):e32232.
Jonckheere N, Skrypek N, Van Seuningen I. Mucins and pancreatic cancer. Cancers (Basel). 2010;2(4):1794–812.
Bruyere E, Jonckheere N, Frenois F, Mariette C, Van Seuningen I. The MUC4 membrane-bound mucin regulates esophageal cancer cell proliferation and migration properties: implication for S100A4 protein. Biochem Biophys Res Commun. 2011;413(2):325–9.
Skrypek N, Duchene B, Hebbar M, Leteurtre E, van Seuningen I, Jonckheere N. The MUC4 mucin mediates gemcitabine resistance of human pancreatic cancer cells via the Concentrative Nucleoside Transporter family. Oncogene. 2013;32(13):1714–23.
Bafna S, Kaur S, Momi N, Batra SK. Pancreatic cancer cells resistance to gemcitabine: the role of MUC4 mucin. Br J Cancer. 2009;101(7):1155–61.
Kaur S, Kumar S, Momi N, Sasson AR, Batra SK. Mucins in pancreatic cancer and its microenvironment. Nat Rev Gastroenterol Hepatol. 2013;10(10):607–20.
Duraisamy S, Ramasamy S, Kharbanda S, Kufe D. Distinct evolution of the human carcinoma-associated transmembrane mucins, MUC1, MUC4 AND MUC16. Gene. 2006;373:28–34.
Bafna S, Kaur S, Batra SK. Membrane-bound mucins: the mechanistic basis for alterations in the growth and survival of cancer cells. Oncogene. 2010;29(20):2893–904.
Chen CH, Wang SW, Chen CW, Huang MR, Hung JS, Huang HC, et al. MUC20 overexpression predicts poor prognosis and enhances EGF-induced malignant phenotypes via activation of the EGFR-STAT3 pathway in endometrial cancer. Gynecol Oncol. 2013;128(3):560–7.
Xiao X, Wang L, Wei P, Chi Y, Li D, Wang Q, et al. Role of MUC20 overexpression as a predictor of recurrence and poor outcome in colorectal cancer. J Transl Med. 2013;11:151.
Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45(6):580–5.
Ardlie KG, Deluca DS, Segrè AV, Sullivan TJ, Young TR, Gelfand ET, et al. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348(6235):648–60.
da Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44–57.
da Huang W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1–13.
Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 2017;45(D1):D362–8.
Yonezawa S, Higashi M, Yamada N, Yokoyama S, Kitamoto S, Kitajima S, et al. Mucins in human neoplasms: clinical pathology, gene expression and diagnostic application. Pathol Int. 2011;61(12):697–716.
Pei H, Li L, Fridley BL, Jenkins GD, Kalari KR, Lingle W, et al. FKBP51 affects cancer cell response to chemotherapy by negatively regulating Akt. Cancer Cell. 2009;16(3):259–66.
Zhang G, Schetter A, He P, Funamizu N, Gaedcke J, Ghadimi BM, et al. DPEP1 inhibits tumor cell invasiveness, enhances chemosensitivity and predicts clinical outcome in pancreatic ductal adenocarcinoma. PLoS ONE. 2012;7(2):e31507.
Jonckheere N, Van Seuningen I. The membrane-bound mucins: from cell signalling to transcriptional regulation and expression in epithelial cancers. Biochimie. 2010;92(1):1–11.
Andrianifahanana M, Singh AP, Nemos C, Ponnusamy MP, Moniaux N, Mehta PP, et al. IFN-gamma-induced expression of MUC4 in pancreatic cancer cells is mediated by STAT-1 upregulation: a novel mechanism for IFN-gamma response. Oncogene. 2007;26(51):7251–61.
Jonckheere N, Perrais M, Mariette C, Batra SK, Aubert JP, Pigny P, et al. A role for human MUC4 mucin gene, the ErbB2 ligand, as a target of TGF-beta in pancreatic carcinogenesis. Oncogene. 2004;23(34):5729–38.
Vincent A, Ducourouble MP, Van Seuningen I. Epigenetic regulation of the human mucin gene MUC4 in epithelial cancer cell lines involves both DNA methylation and histone modifications mediated by DNA methyltransferases and histone deacetylases. Faseb J. 2008;22(8):3035–45.
Yamada N, Nishida Y, Tsutsumida H, Goto M, Higashi M, Nomoto M, et al. Promoter CpG methylation in cancer cells contributes to the regulation of MUC4. Br J Cancer. 2009;100(2):344–51.
Lahdaoui F, Delpu Y, Vincent A, Renaud F, Messager M, Duchene B, et al. miR-219-1-3p is a negative regulator of the mucin MUC4 expression and is a tumor suppressor in pancreatic cancer. Oncogene. 2015;34(6):780–8.
Iacobuzio-Donahue CA, Ashfaq R, Maitra A, Adsay NV, Shen-Ong GL, Berg K, et al. Highly expressed genes in pancreatic ductal adenocarcinomas: a comprehensive characterization and comparison of the transcription profiles obtained from three major technologies. Cancer Res. 2003;63(24):8614–22.
Huang X, Wang X, Lu SM, Chen C, Wang J, Zheng YY, et al. Clinicopathological and prognostic significance of MUC4 expression in cancers: evidence from meta-analysis. Int J Clin Exp Med. 2015;8(7):10274–83.
King RJ, Yu F, Singh PK. Genomic alterations in mucins across cancers. Oncotarget. 2017. https://doi.org/10.18632/oncotarget.17934.
Bavarva JH, Tae H, McIver L, Garner HR. Nicotine and oxidative stress induced exomic variations are concordant and overrepresented in cancer-associated genes. Oncotarget. 2014;5(13):4788–98.
Bavarva JH, Tae H, McIver L, Karunasena E, Garner HR. The dynamic exome: acquired variants as individuals age. Aging (Albany NY). 2014;6(6):511–21.
Hunt JD, van der Hel OL, McMillan GP, Boffetta P, Brennan P. Renal cell carcinoma in relation to cigarette smoking: meta-analysis of 24 studies. Int J Cancer. 2005;114(1):101–8.
Hayat MJ, Howlader N, Reichman ME, Edwards BK. Cancer statistics, trends, and multiple primary cancer analyses from the Surveillance, Epidemiology, and End Results (SEER) Program. Oncologist. 2007;12(1):20–37.
Hollingsworth MA, Swanson BJ. Mucins in cancer: protection and control of the cell surface. Nat Rev Cancer. 2004;4(1):45–60.
Komatsu M, Tatum L, Altman NH, Carothers Carraway CA, Carraway KL. Potentiation of metastasis by cell surface sialomucin complex (rat MUC4), a multifunctional anti-adhesive glycoprotein. Int J Cancer. 2000;87(4):480–6.
Pino V, Ramsauer VP, Salas P, Carothers Carraway CA, Carraway KL. Membrane mucin Muc4 induces density-dependent changes in ERK activation in mammary epithelial and tumor cells: role in reversal of contact inhibition. J Biol Chem. 2006;281(39):29411–20.
Senapati S, Chaturvedi P, Chaney WG, Chakraborty S, Gnanapragassam VS, Sasson AR, et al. Novel INTeraction of MUC4 and galectin: potential pathobiological implications for metastasis in lethal pancreatic cancer. Clin Cancer Res. 2011;17(2):267–74.
Pinho SS, Reis CA. Glycosylation in cancer: mechanisms and clinical implications. Nat Rev Cancer. 2015;15(9):540–55.
Baldus SE, Hanisch FG. Biochemistry and pathological importance of mucin-associated antigens in gastrointestinal neoplasia. Adv Cancer Res. 2000;79:201–48.
Hanson RL, Hollingsworth MA. Functional consequences of differential O-glycosylation of MUC1, MUC4, and MUC16 (downstream effects on signaling). Biomolecules. 2016;6(3):34.
Reebye V, Frilling A, Hajitou A, Nicholls JP, Habib NA, Mintz PJ. A perspective on non-catalytic Src homology (SH) adaptor signalling proteins. Cell Signal. 2012;24(2):388–92.
Kufe DW. Functional targeting of the MUC1 oncogene in human cancers. Cancer Biol Ther. 2009;8(13):1197–203.
Kufe DW. Mucins in cancer: function, prognosis and therapy. Nat Rev Cancer. 2009;9(12):874–85.
Yin BW, Lloyd KO. Molecular cloning of the CA125 ovarian cancer antigen: identification as a new mucin, MUC16. J Biol Chem. 2001;276(29):27371–5.
Kleeff J, Korc M, Apte M, La Vecchia C, Johnson CD, Biankin AV, et al. Pancreatic cancer. Nat Rev Dis Primers. 2016;2:16022.
Ajani JA, Lee J, Sano T, Janjigian YY, Fan D, Song S. Gastric adenocarcinoma. Nat Rev Dis Primers. 2017;3:17036.
Kuipers EJ, Grady WM, Lieberman D, Seufferlein T, Sung JJ, Boelens PG, et al. Colorectal cancer. Nat Rev Dis Primers. 2015;1:15065.
TCGA-Network. Integrated genomic characterization of pancreatic ductal adenocarcinoma. Cancer Cell. 2017;32(2):185–203 e113.
Bhasin MK, Ndebele K, Bucur O, Yee EU, Otu HH, Plati J, et al. Meta-analysis of transcriptome data identifies a novel 5-gene pancreatic adenocarcinoma classifier. Oncotarget. 2016;7(17):23263–81.
NJ conceived and designed the analysis. NJ analyzed the data. NJ and IVS wrote and edited the paper. Both authors read and approved the final manuscript.
We are grateful to M. Foster and A. Turner for helpful contributions and Dr B Neve, Dr A. Vincent, Dr R. Vasseur (Inserm UMR-S1172, Lille) for their critical reading of the manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
All data are available and are based upon public data extracted from the TCGA Research Network (http://cancergenome.nih.gov/), Genome Tissue Expression (GTEX) project (http://www.GTEXportal.org/) and Gene Expression Omnibus (GEO) database (http://www.ncbi.nml.nih.gov/geo/).
Consent to publish
Ethics approval and consent to participate
Our work is supported by grants from la Ligue Nationale Contre le Cancer (Comités 59, 62, 80, IVS, NJ), from SIRIC ONCOLille, Grant INCaDGOS-Inserm 6041 (IVS, NJ) and from région Nord-Pas de Calais “Contrat de Plan Etat Région” CPER Cancer 2007–2013 (IVS).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Jonckheere, N., Van Seuningen, I. Integrative analysis of the cancer genome atlas and cancer cell lines encyclopedia large-scale genomic databases: MUC4/MUC16/MUC20 signature is associated with poor survival in human carcinomas. J Transl Med 16, 259 (2018) doi:10.1186/s12967-018-1632-2
- Patient survival