To compare differences between fresh frozen samples obtained from patients with advanced-stage CRC, we performed panomics by TruSeq™ Amplicon sequencing, Clariom D arrays, quantitative mass spectrometric profiling, and 2-dimensional gel electrophoresis. For all patients, primary tumour tissue was collected before chemotherapy. All patients had synchronous liver metastasis at the time of diagnosis (Table 1), and except for one patient (P2), all metastatic tissues were taken before exposure to chemotherapy. For P4, screening six different primary tumour locations was also possible.
Characterization of paired patient samples using gene expression data and mutation analysis
To identify genomic features, we analysed mutational data of the selected cohort (targeted sequencing). Frequently mutated genes were APC (100%), GNA11 (100%), TP53 (100%), ERBB2 (75%), KRAS (62%), and ATM (62%). All 24 genes with a detectable mutation are presented in Additional file 1: Fig. S2. We did not observe any significantly unbalanced distribution between malignant groups and individual patients when considering the mutated genes.
Gene expression profiles for 25,161 genes were retrieved from microarrays and visualized using the top 2000 most variable genes (Fig. 1a). Remarkably, the patient and non-disease groups clustered together. In total, 130 genes were identified as significantly differentially expressed between NM and corresponding T samples (q-value < 0.01; |log2 FC| > 2; 62 more highly expressed in NM, 68 more highly expressed in T; Fig. 1b) and 154 genes between NM and corresponding LM (81 more highly expressed in NM, 73 more highly expressed in LM, Fig. 1c). Conversely, no differences in gene expression between LM and T were detected (Fig. 1d).
Proteomic characterization of paired patient samples
Using a robust label-free workflow, all samples showed a high proteome depth and were included in subsequent analysis (Additional file 1: Fig. S3).
A total of 2885 proteins were identified by LC‒MS/MS. After removing missing values, unsupervised clustering using PCA for 2686 protein groups revealed one close cluster of all NM samples (Fig. 2a). In line with the transcriptomics data, patients (P1-P4) and not disease groups (T, LM) presented distinct similarities pointing to individual clinical phenotypes. Comparison between NM and T samples yielded 71 significantly differentially expressed proteins (q-value < 0.01 and |log2 FC| > 1), with 68 being (96%) up-regulated and three (4%) down-regulated in NM samples (Fig. 2b). Similarly, in NM and LM comparison, 69 proteins were significantly differentially expressed (q-value < 0.01 and |log2 FC| > 1): 61 (88%) up-regulated and eight (12%) down-regulated in NM (Fig. 2c). The chloride channel accessory 1 protein (CLCA1) was detected at an exceptionally significant low level in both T and LM compared to NM (log2FCTvsNM = − 21.010 and log2FCLMvsNM = − 20.888).
The most surprising aspect of the data was that no protein could be detected as being significantly expressed between T and LM, highlighting inter-patient heterogeneity (Fig. 2d).
Comparison of gene and protein expression data and validation of the global proteome by 2-D DIGE
Closer inspection of the gene (n = 25,161) and protein (n = 2686) expression data showed an overlap of three genes/proteins for the NM vs. T comparison (CA1, CLCA1, MATN2) and five for the NM vs. LM comparison (AHCYL2, CA1, CLCA1, FCGBP, MATN2) that were significantly expressed between the groups. All genes/proteins were characterized by higher levels in normal material, as plotted based on gene and protein abundances (Fig. 3). In agreement, increased expression of all targeted genes were associated with good prognosis in CRC when using data from Human Protein Atlas [31, 32].
Two-dimensional gel electrophoresis was performed to validate the clustering results for the proteome profiling. SameSpots software detected 1334 spots per gel. PCA based on these results indicated a qualitatively similar result, suggesting a clear distinction between patients and sample groups (Additional file 1: Fig. S4).
The individual progression proteomic landscape in individual patients
Based on the group comparison results, each primary tumour's transcriptome and proteome were compared with paired adjacent mucosa and associated liver metastasis. All tumour samples correlated phenotypically closely with their corresponding liver metastases but presented different gene/protein profiles compared to their adjacent normal mucosa. The Pearson correlation for gene expression and protein abundance between the primary tumours and paired liver metastases compared to their adjacent mucosa was 0.6854 < r < 0.8281 for the proteome and 0.6114 < r < 0.8133 for the transcriptome (Additional file 1: Figs. S5–S8).
Patient 1: female, right-sided CRC (pT4)
Proteome analysis of NM1, T1, and LM1 revealed proteome expression changes between the initial diagnosis and surgery for metastasis after one month. Of the 2,686 evaluable proteins, 294 showed |log2 FC| > 2 in the NM1 vs. T1 comparison and 251 in the NM1 vs. LM1 comparison (Additional file 1: Fig. S5a). The comparison between T1 and LM1 revealed 55 contrasting proteins (Additional file 1: Fig. S5b). Additional file 2: Table S1 lists the top differentially expressed proteins of the three two-group comparisons. Among proteins with higher abundance in LM1 than T1, CD74 (log2 FC = 2.179) is reported to be an oncogene (Network of Cancer Genes 7.0, http://ncg.kcl.ac.uk) that promotes tumour growth and metastasis in various cancer types. Tenascin C (TNC), the second detectable oncogene, was more highly expressed in T1 (log2 FC = 6.522) and LM1 (log2 FC = 2.930) than in NM1 but showed lower levels in LM1 than in T1 (log2 FC = − 3.592).
At the RNA level, 30 transcripts in the NM1 vs. T1 comparison and 34 in the NM1 vs. LM1 comparison were differentially expressed at |log2 FC| > 1 (Additional file 1: Fig. S5c). Contrasting RNAs were found in the T1 vs. LM1 comparison, with no genes showing |log2 FC| > 1 (Additional file 1: Fig. S5d). Additional file 2: Table S2 lists the top differentially expressed RNAs with either higher or lower gene expression for each two-group comparison. Among RNAs with a higher level in LM1 than T1, SFRP4 was previously reported as an oncogene (Network of Cancer Genes 7.0).
The overlap of differentially expressed genes and proteins showed three molecules for the NM1 vs. T1 comparison (CA1, CLCA1, ZG16) and six for the NM1 vs. LM1 comparison (CA1, CLCA1, FABP4, MUC2, PIGR, ZG16). The abundance of CEACAM5, which has been widely applied in clinical detection of liver metastasis of CRC, showed almost no differential expression between NM1, T1, and LM1 at RNA and protein levels (Additional file 1: Fig. S5).
Patient 2: male, left-sided CRC (pT1)
Proteome analysis of samples from another patient (NM2/T2 and LM2) indicated proteome expression changes between initial diagnosis and metastasis surgery after 36 months. The patient underwent several chemotherapies during that timeframe. In total, 183 proteins in the NM2 vs. T2 comparison and 265 in the NM2 vs. LM2 comparison were differentially expressed (|log2 FC| > 2, Additional file 1: Fig. S6a and b). The comparison between T2 and LM2 revealed only 70 contrasting proteins. Additional file 2: Table S3 provides the top differentially expressed proteins with either a higher or lower protein concentration for each comparison. Among proteins with a higher level in LM2 than T2, SRSF3 (Serine and Arginine Rich Splicing Factor, log2 FC = 2.013) is the only reported oncogene (Network of Cancer Genes 7.0). Interestingly, SRSF3 was also strongly more highly expressed in T2 (log2 FC = 2.944) and LM2 (log2 FC = 4.956) than in NM2. CEACAM5, a potential marker for CRC progression, was also strongly upregulated in T2 and LM2 compared to NM2.
Regarding gene expression data, 53 RNAs revealed |log2 FC| > 1 in the NM2 vs. T2 comparison and 59 in the NM2 vs. LM2 comparison. The comparison of T2 vs. LM2 revealed 13 differentially expressed RNAs (Additional file 1: Fig. S6c and d). The top differentially expressed RNAs of the three two-group comparisons are presented in Additional file 2: Table S4. The overlap of differentially expressed genes and proteins showed nine features for the NM2 vs. T2 comparison (CA1, CLCA1, CPA3, EPB41L3, JCHAIN, MATN2, MUC2, OGN, SULT1B1) and eight for the NM2 vs. LM2 comparison (CA1, CLCA1, MATN2, MEP1A, MUC2, PIGR, SULT1B1, ZG16). PIGR is an annotated healthy driver (Network of Cancer Genes 7.0) with lower RNA (log2 FC = − 1.165) and protein (log2 FC = − 2.676) levels in LM2 than NM2.
Patient 3: male, left-sided CRC (pT3)
Proteome analysis of the samples from a third patient (NM3/T3 and LM3) indicated proteome expression changes between initial diagnosis and metastasis surgery after one month. In total, 328 proteins in the NM3 vs. T3 comparison and 264 in the NM3 vs. LM3 comparison were differentially expressed (|log2 FC| > 2, Additional file 1: Fig. S7a and b). In total, 83 contrasting proteins were obtained through the comparison between T3 and LM3. Additional file 2: Table S5 gives the top 15 differentially expressed proteins (low & high), including the two strongly expressed oncogenes, i.e., CD74 and TNC, and the clinically applicable biomarker for CRC liver metastasis CEACAM5.
Furthermore, 19 RNAs in the NM3 vs. T3 comparison and 35 in the NM3 vs. LM3 comparison were differentially expressed (|log2 FC| > 1). The comparison between T3 and LM3 revealed only nine differentially expressed genes (the top genes are presented in Additional file 2: Table S6). Interestingly, only CA1 was found at a lower level in both malignant tissues than in adjacent mucosa at both RNA and protein levels (RNA level: log2 FC = − 1.208 and − 1.402; protein level: log2 FC = − 2.170 and − 3.133, Additional file 1: Fig. S7c and d).
Patient 4: female, right-sided CRC (pT4)
Samples from the fourth patient were retrieved during one simultaneous surgery. This opened the possibility of not merely collecting one sample of the tumour and its paired liver metastasis but six distinct tumour partitions. Semi-quantitative protein profiling by LC‒MS/MS revealed 329 proteins to be differentially expressed in NM4 vs. T4 (pooled) and 249 proteins in NM4 vs. LM4 comparison (|log2 FC| > 2). Comparing T4 and LM4 showed 155 differentially expressed proteins (Additional file 1: Fig. S8a and b, top proteins are shown in Additional file 2: Table S7). Interestingly, almost all tumour-associated proteins, including the oncogenes DEK and CHD4 and the tumour-suppressor gene MYH9, were expressed at lower levels in LM4 than in T4. In contrast, CEACAM5 was more highly expressed in LM4. Particular attention should be given to TNC, SRSF3, and OLFM4, which presented the highest protein expression in T4 and LM4 compared to NM4.
Based on transcriptomic analysis, 35 RNAs showed |log2 FC| > 1 in the NM4 vs. T4 comparison; 61 showed |log2 FC| > 1 in the NM4 vs. LM4 comparison. Comparing T4 vs. LM4 revealed only four differentially expressed targets (Additional file 1: Fig. S8c and d). Additional file 2: Table S8 lists the top differentially expressed RNAs for the three two-group comparisons. The overlap of genes and proteins showed two molecules for the NM4 vs. T4 comparison (OGN, ORM1), one for the NM4 vs. LM4 comparison (CA1), and one for the T4 vs. LM4 comparison (FGG).
The intra-tumoral proteomic landscape in samples from patient 4
Six distinct parts of the primary tumour of P4 were analysed to assess how different tumour biopsies might display intra-tumoral heterogeneity at the transcriptome and proteome levels. Interestingly, the Pearson correlation between T4(1–6) vs. NM4 and LM4 vs. NM4 ranged from 0.3596 < r < 0.7032 at the protein level and from 0.2100 < r < 0.8043 at the transcriptome level (Additional file 1: Figs. S9 and S10), highlighting different RNA and protein compositions for the six biopsies. Frequently mutated genes were ERBB2 (100%), APC (100%), GNA11 (100%), KRAS (100%), PIK3CA (100%), SMAD4 (100%), and TP53 (100%). All 17 genes with a detectable mutation are presented in Additional file 1: Fig. S11.
Functional enrichment analysis of detected proteins
Gene set enrichment analysis using HALLMARK gene sets was performed to obtain more comprehensive insight into the biological and functional characteristics of cancer progression in individual patients. Because of the small effect size, RNA expression data were not used for functional enrichment. With the threshold of FDR < 0.05, the protein expression differences indicated distinct activated pathways during the normal-to-tumour-to-metastasis transition for individual patients.
Pathway analysis for P1 revealed xenobiotic metabolism, coagulation, and KRAS signalling as the top enriched hallmark gene sets in LM1 compared to NM1 and T; P2 showed activated pathways associated with angiogenesis, coagulation, and the complement system. Proteomic profiling for P3 resulted in activated pathways related to the epithelial-mesenchymal transition (EMT), myogenesis, and angiogenesis, P4 showed signatures correlating with oxidative phosphorylation, reactive oxygen species pathway, and bile acid metabolism (Fig. 4). The most striking results from the data are that: (i) individual pathways in a given patient is utilized during metastasis, and (ii) the most activated pathways were detected in T vs. LM comparisons.