Mass spectrometry-based analysis of formalin-fixed, paraffin-embedded distal cholangiocarcinoma identifies stromal thrombospondin-2 as a potential prognostic marker

Distal cholangiocarcinoma is an aggressive malignancy with a dismal prognosis. Diagnostic and prognostic biomarkers for distal cholangiocarcinoma are lacking. The aim of the present study was to identify differentially expressed proteins between distal cholangiocarcinoma and normal bile duct samples. A workflow utilizing discovery mass spectrometry and verification by parallel reaction monitoring was used to analyze surgically resected formalin-fixed, paraffin-embedded samples from distal cholangiocarcinoma patients and normal bile duct samples. Bioinformatic analysis was used for functional annotation and pathway analysis. Immunohistochemistry was performed to validate the expression of thrombospondin-2 and investigate its association with survival. In the discovery study, a total of 3057 proteins were identified. Eighty-seven proteins were found to be differentially expressed (q < 0.05 and fold change ≥ 2 or ≤ 0.5); 31 proteins were upregulated and 56 were downregulated in the distal cholangiocarcinoma samples compared to controls. Bioinformatic analysis revealed an abundance of differentially expressed proteins associated with the tumor reactive stroma. Parallel reaction monitoring verified 28 proteins as upregulated and 18 as downregulated in distal cholangiocarcinoma samples compared to controls. Immunohistochemical validation revealed thrombospondin-2 to be upregulated in distal cholangiocarcinoma epithelial and stromal compartments. In paired lymph node metastases samples, thrombospondin-2 expression was significantly lower; however, stromal thrombospondin-2 expression was still frequent (72%). Stromal thrombospondin-2 was an independent predictor of poor disease-free survival (HR 3.95, 95% CI 1.09–14.3; P = 0.037). Several proteins without prior association with distal cholangiocarcinoma biology were identified and verified as differentially expressed between distal cholangiocarcinoma and normal bile duct samples. These proteins can be further evaluated to elucidate their biomarker potential and role in distal cholangiocarcinoma carcinogenesis. Stromal thrombospondin-2 is a potential prognostic marker in distal cholangiocarcinoma.


Introduction
Cholangiocarcinoma (CCA), an epithelial tumor arising along the biliary tree, is a low-incidence malignancy accounting for approximately 3% of all gastrointestinal cancers [1]. CCA is a notably aggressive malignancy with a current overall 5-year survival rate of less than 5% [2]. Surgical resection is the only treatment that offers curative potential. However, an invasive growth pattern and the absence of early symptoms contribute to presentation with metastatic disease in a majority of patients [3]. Based on its anatomical location along the biliary tree, CCA is classified as intrahepatic (iCCA) or extrahepatic (eCCA) which is further divided into the perihilar (pCCA) and distal (dCCA) subtypes [4]. Anatomically, dCCA is located between the insertion of the cystic duct and the ampulla of Vater [4,5].
Mounting evidence points to a significant difference between the different subtypes with regards to not only clinical management but also tumor biology and molecular characteristics [3,6,7]. However, the majority of previous biomarker studies do not take this subclassification into account [3,8]. Currently, only one biomarker, namely, plasma carbohydrate antigen  is in clinical use for the diagnosis of CCA. The sensitivity and specificity of CA 19-9 are not sufficient to allow screening of unselected patient populations [9].
A large effort has elucidated the spectra of genomic and transcriptomic alterations associated with malignancy along the biliary tree [6,7]. However, changes at the mRNA level do not necessarily correlate with changes at the protein level [10,11], and studies of the proteomic alterations that develop during carcinogenesis can provide complementary information and help identify new biomarkers [12,13]. Modern mass spectrometry (MS) platforms utilizing online liquid-phase separation with tandem MS (LC-MS/MS) have been used for protein identification and quantification in human sample cohorts, enabling quantitative comparison of thousands of proteins [13][14][15]. Traditionally, MS-based proteomics has been performed using fresh frozen tissues. However, it has been demonstrated that the more widely available formalin-fixed, paraffinembedded (FFPE) tissues can also be utilized successfully [16,17].
A discovery LC-MS/MS experiment can identify a large number of differentially expressed proteins (DEPs) between the investigated biological conditions providing information regarding proteomic alterations associated with malignancy as well as biomarker candidates. However, few identified potential biomarkers in cancer research have been able to progress to clinical utilization, in part due to the difficulty in validating the large number of biomarkers [18]. Parallel reaction monitoring (PRM) is an MS technique developed for targeted quantification [19,20]. PRM has been shown to provide improved quantitative precision and to be able to quantify proteins at lower concentrations than a typical discovery LC-MS/MS experiment. Thus, PRM can be used for large scale verification of proteins identified from discovery experiments [21,22].
The aim of the present study was to use a combined discovery LC-MS/MS and PRM-verification workflow to analyze resected dCCA and normal bile duct FFPE samples to identify DEPs. As a secondary aim, further validation of thrombospondin-2 (THBS2) expression was performed using immunohistochemistry (IHC).

Study population
All consecutive patients who underwent surgery with curative intent for dCCA at the Department of Surgery, Skåne University Hospital, Lund and Malmö, between January 2000 and December 2015 were identified from hospital records. Only patients with no neoadjuvant treatment and no 30-day mortality were included. Sixty-four patients were identified. FFPE materials from 59 patients were available and subsequently retrieved from the local biobank at the Department of Pathology, Skåne University Hospital, Lund and Malmö. All samples underwent histopathological reevaluation by an experienced gastrointestinal pathologist (A.S) blinded to the original assessment. The cohort has been previously described [23,24]. Staging was based on the American Joint Committee on Cancer (AJCC) 7th edition [5]. R1 resection was defined as cancer growth < 1 mm from the resection margin. Demographical and clinical data were retrospectively acquired through patients' medical records. Patients who received at least 3 cycles of postoperative chemotherapy were coded as having received adjuvant treatment. The clinicopathological data of the cohort are presented in Table 1. Patients were postoperatively followed for up to 5 years. Survival analysis was censored at 5 years after surgery for event-free patients. Survival status was recorded in September 2019.
Keywords: Distal cholangiocarcinoma, Biliary tract cancer, Mass spectrometry, Parallel reaction monitoring, Biomarker, Stroma, Thrombospondin-2 Disease-free survival (DFS) was defined as the time until clinical diagnosis of dCCA recurrence or death from any cause. Overall survival (OS) was defined as the time until death of any cause.
The control tissues were identified from a prospective database of pancreaticobiliary surgery maintained at the Department of Surgery, Skåne University Hospital, Lund and Malmö. Patients who underwent pancreaticoduodenectomy due to a benign diagnosis without proximity to the distal bile duct were identified.

MS tissue area selection
Samples were sectioned into 10-μm sections. Hematoxylin and eosin (H&E)-stained slides were used to mark the areas of interest in each slide for macrodissection. Areas of interest in the dCCA samples were defined as areas containing only tumor epithelium and intratumoral stromal compartment without any adjacent tissues. The normal bile duct tissues were retrieved from the region between the cystic duct and the ampulla of Vater. The bile duct tissues were histopathologically evaluated, and any sample with dysplasia or inflammation of the bile duct was excluded. The macrodissected area was located from the epithelial lining and up to, but not including the adventitia or pancreatic parenchyma. Macrodissection was carried out using a scalpel with the marked H&E slides as a template for serial sections. An area corresponding to approximately 3 full slides was utilized when available. In some samples with scarce material the maximum available amount was retrieved although it did not correspond to 3 full sections. The material was stored at 4 °C.

MS sample preparation
Twenty dCCA samples from the ten patients each with the worst and best OS were selected for the discovery study. The selection was performed to allow for additional comparison between prognostic groups that did not reveal significant results (data not shown).
All samples were processed in a manner blinded to patient identity and outcome. Preparation of all samples was performed in parallel. For deparaffinization and protein retrieval, macrodissected samples were incubated in 1 ml of EnVision FLEX Target Retrieval Solution (High pH) (K8004, Dako, Glostrup, Denmark) at a dilution of 1:50 at 97 °C for 10 min. Samples were then centrifuged at 14,000 rcf at 4 °C for 10 min. The supernatants were removed and the deparaffinization steps were repeated. Samples were resuspended in 150 μl of 500 mM Tris-HCL (pH 8), followed by transfer to a new tube and incubation at 90 °C for 1 h. Then, 150 μl of 6 M guanidine-HCl in 50 mM ammonium bicarbonate (AMBIC) was added, followed by probe sonication for 2 × 4 min on ice. The samples were then centrifuged at 14,000 rcf for 10 min at 24 °C, and the supernatants were transferred to a new tube. Proteins were reduced by the addition of 6 μl of 1 M dithiothreitol and incubated for 1 h at 56 °C. Alkylation was performed by the addition of 20 μl of 1 M iodoacetamide for 30 min in dark. Precipitation was carried out overnight using pure ethanol at a ratio (v/v) (sample:ethanol) of 1:9, and the samples were stored at − 20 °C. Following centrifugation at 14,000 rcf for 15 min at 4 °C the supernatants were carefully discarded, and the

Liquid chromatography conditions
The liquid chromatography (LC) instrument used was an EASY-nLC (Thermo Fisher Scientific, Rockford, IL, USA). Previously prepared material from one patient was measured repeatedly at regular intervals as a quality control. One microgram of sample was injected and measured with a flow rate of 300 nL/minute, and a two-column setup consisting of an Acclaim PepMap RSLC column (75 µm × 25 cm) as the analytical column and Acclaim PepMap 100 column (75 µm × 2 cm) as the precolumn (both from Thermo Fisher Scientific, Rockford, IL, USA) was used for the separation. The LC gradient was created using solvent A (0.1% formic acid) and solvent B (0.1% formic acid in acetonitrile). Based on manual evaluation of the number of peptides eluted at different time points, a nonlinear gradient was developed. The nonlinear gradient started at 5% B and increased to 22% B at 95 min and 36% B at 150 min. All measurements in the discovery study were performed in technical duplicates. Samples were measured in a randomized order.

MS conditions
The samples were analyzed using a Q Exactive Plus mass spectrometer (Thermo Fisher Scientific, Rockford, IL, USA). The equipped ion source was an EASY-Spray (Thermo Fisher Scientific, Rockford, IL, USA). The system was operated in positive mode, data-dependent acquisition (DDA) was used. For peptide identification, a full MS survey scan was collected in the Orbitrap. Fifteen data-dependent higher energy collision dissociation MS/ MS scans of the most intense precursors were performed.
The spray voltage was set to 1.75 kV, and the capillary temperature was 300 °C. Moreover, the S-lens radiofrequency level was fixed at 50. MS1 survey scans of the eluting peptides were executed with a resolution of 70,000, recording a window between m/z 350.0 and 1800. The automatic gain control (AGC) target was set to 1 × 10 6 and the maximum injection time was 100 ms. The normalized collision energy (NCE) was set to 25.0% for all scans. The resolution of the data-dependent MS 2 scans was fixed to 35,000, and the values for the AGC target were set to 1 × 10 6 with a maximum injection time of 120 ms.

Protein identification and quantification
The software Proteome Discoverer (PD) (version 1.4) (Thermo Fisher Scientific, Rockford, IL, USA) was used for protein identification. The selection of spectra employed the following settings: minimal and maximal precursor mass, 350 and 5000 Da, respectively; and signal-to-noise threshold of 1.5. Parameters for Sequest HT searches were set as follows: precursor mass tolerance, 10 ppm; fragment mass tolerance, 0.02 Da; trypsin; 1 missed cleavage site; UniProt human database; dynamic modification; oxidation (+ 15.995 Da; (M, P)); static modification; carbamidomethyl (+ 57.021 Da; (C)). A percolator was used for the processing node and the false discovery rate (FDR) was set to 1%. Proteins were identified using at least two peptides. A precursor ion area detector was used for quantification, and each protein was quantified from the average area of the 3 most abundant peptides identified for that particular protein.
To increase the number of candidate proteins for PRM validation, we also performed protein identification and quantification using two alternative software programs, MaxQuant and OpenMS. The settings used for each software program are presented in Additional file 1.

PRM verification
To verify the DEPs identified from the discovery study, a targeted proteomic study was performed using PRM. Selection of suitable proteins for inclusion in the PRM study was performed through evaluation of the DEPs after quantification using PD, MaxQuant and OpenMS. In addition, proteins successfully quantified in only dCCA samples or controls respectively were considered. All proteins were submitted to a literature review, and proteins previously found to be dysregulated in CCA or with a known association to cancer biology were prioritized for inclusion. Peptide selection was based on the data from the PD evaluation of the discovery study. Only peptides with no missing cleavages were included. Peptide properties (charge state, precursor m/z, retention time) were extrapolated from PD.
Materials from the 20 dCCA samples used in the discovery study and 13 controls were prepared anew (additional material was available from 8 control samples analyzed in the discovery study, and 5 new samples were added). Sample preparation was performed in accordance with the same protocol utilized in the discovery study. The same LC instrumentation and settings and MS instrumentation used in the discovery study were used for the PRM study. Measurements were performed in technical triplicates when possible. For samples with a low amount of material, one or two technical replicates were deemed acceptable. For the measurements, 1 µg of sample was injected into the instrument. PRM acquisition was performed without retention time scheduling over the complete chromatographic run. The MS 2 resolution was set at 70,000, with the AGC target set to 5 × 10 5 and a maximum injection time of 70 ms. The chromatic peak width was 30 s. The NCE was set to 26.0% and the isolation window was 2.0 m/z. MS1 ion chromatogram extraction and relative quantification were performed using Skyline [25].

IHC
The entire dCCA cohort (N = 59), including paired lymph node metastases in available samples (N = 26) and control samples (N = 10), was selected for IHC validation. FFPE samples were sectioned to 4 µm, deparaffinized in xylene and rehydrated in graded ethanol solutions. Antigen retrieval was performed using EnVision FLEX Target Retrieval Solution (low pH) (K8005, Dako, Glostrup, Denmark) at 97 °C for 20 min using an automated PT Link (Dako, Glostrup, Denmark). Endogenous peroxidase activity was quenched with 0.3% hydrogen peroxide and 1% methanol in phosphate-buffered saline for 10 min at room temperature. After blocking with 5% goat serum and avidin/biotin blocking kit (SP-2001, Vector Labs, Burlingame, CA, USA) the sections were incubated with primary polyclonal rabbit anti-THBS2 antibody diluted 1:100 (Ab112543, Abcam, Cambridge, UK) overnight at 4 °C. A biotinylated goat anti-rabbit secondary antibody diluted 1:200 (BA-1000, Vector Labs, Burlingame, CA, USA) was added to the slides and incubated for 1 h at room temperature. The sections were subsequently incubated with avidin-biotin-peroxidase complex Vectastain Elite ABC (PK-6100, Vector Laboratories, Burlingame, CA, USA) for 30 min. For color development the slides were exposed to the chromogen 3,3´diaminobenzidine (SK-4100, Vector Laboratories, Burlingame, CA, USA) for 8 min. Following counterstaining with Mayer's hematoxylin, the sections were dehydrated with graded ethanol solutions and mounted with Pertex. Omission of primary antibody was used as a negative control. Placental tissue sections were included as positive controls. IHC staining was evaluated by an experienced gastrointestinal pathologist (A.S). The predominant staining pattern in the intratumoral area was evaluated.
Staining was reviewed under light microscopy at 200× magnification. Given that both epithelial and stromal immunoreactivity was evident, separate scoring was performed for both compartments. Expression in > 10% of cells was recorded as positive expression. The staining intensity was scored as 0 (negative), 1+ (low), 2+ (moderate) or 3+ (strong).

Statistical analysis and bioinformatics
Statistical processing of the MS data from the discovery and PRM study was done using Perseus [26] and Graph-Pad Prism 8. In the discovery study, the mean value of technical replicates was used. The data was filtered, only proteins quantified in at least 50% of the dCCA and control samples were used for further analysis. Values were log2 transformed and normalized by subtracting the median protein intensities in each sample. Missing values were replaced by imputing random numbers drawn from a normal distribution similar to the measured data (width = 0.3, downshift = 0). DEPs between the dCCA samples and controls were identified using a twotailed t-test followed by multiple testing permutationbased FDR. Finally, proteins with a q value (adjusted p-value) < 0.05 (FDR 5%) and fold change (FC) ≥ 2 or ≤ 0.5 were considered significantly differentially expressed.
For statistical evaluation of the PRM data, the mean values of technical replicates were used. All data were log2 transformed and normalization to the average value of peptides quantifying the housekeeping proteins glyceraldehyde 3-phosphate dehydrogenase (GAPDH) (GALQ-NIIPASTGAAK, LISWYDNEFGYSNR) and tubulin beta chain (TUBB) (ISVYYNEATGGK) in each sample was performed. Group comparison of DEPs was performed identically to the discovery study. Gene ontology (GO) [27] classification was performed using the PANTHER online bioinformatics tool [28]. The PANTHER overrepresentation test of functional annotation and pathways was used. In addition the DAVID bioinformatics tool [29] was used to perform additional pathway analysis against the KEGG [30] and REACTOME [31] databases. All enrichment analysis were performed against the background of the total number of proteins identified in the study. FDR adjustment for multiple testing was employed, and a q < 0.05 (FDR 5%) was used to indicate significance in all enrichment tests. Protein-protein interactions were investigated using the STRING online bioinformatics tool [32]. An interaction score ≥ 0.7 (high confidence) was required for inclusion in the model. Stata MP statistical package version 14.2 was used for analysis of IHC data. Comparison of continuous baseline

Discovery study
A workflow including discovery LC-MS/MS followed by PRM verification using macrodissected archived FFPE samples was used to identify DEPs between dCCA and controls. For the discovery study materials from 20 dCCA samples and 10 controls were prepared for analysis. Four control samples could not be measured due to a low protein yield. The clinicopathological data of the patients whose samples were analyzed are presented in Additional file 2. In total, 3037 proteins were identified (Additional file 3). In the dCCA samples, 2967 proteins were identified, and in the controls, 1501 proteins were identified. After removal of all proteins with quantifiable values in fewer than 50% of the samples in each group, a total of 836 proteins remained. Principal component analysis (PCA) of the 836 proteins reveled the evident separation of the groups. Components 1 and 2 accounted for 19.1% and 9.9% of total variation, respectively (Fig. 1). A total of 87 DEPs (q < 0.05, FC ≥ 2 or ≤ 0.5) were identified (Additional file 4). Of the 87 DEPs, 31 proteins were found to be upregulated in the dCCA samples relative to controls and 56 were downregulated. In addition to quantification using PD, quantification with MaxQuant and OpenMS generated 109 and 135 DEPs, respectively (Additional file 4).

Bioinformatic analysis
Two-way unsupervised hierarchical clustering was applied to the 87 DEPs, and the results were visualized in a heat-map (Fig. 2). Two clusters of proteins that clearly separated dCCA and control samples were evident. GO analysis of the DEPs revealed the molecular functions in which the DEPs where most frequently involved to be binding, catalytic activity and molecular function regulation (Fig. 3a). The cellular components in which the DEPs were most frequently involved were cell, extracellular region and protein-containing complex (Fig. 3b). The biological processes in which the DEPs were most frequently involved were metabolic process, biological regulation and biological adhesion (Fig. 3c). PANTHER-overrepresentation test was performed on the 87 DEPs. No statistically significant enrichment with regards to molecular function or biological process was seen. When cellular components were analyzed, collagen trimer, extracellular space, extracellular matrix (ECM) and its lineage parent extracellular region and extracellular region part were significantly enriched in the DEPs. PANTHER pathway analysis revealed that the integrin signaling pathway was significantly enriched in the DEPs. KEGG pathway analysis revealed that protein digestion and absorption and ECM-receptor interaction were significantly enriched in the DEPs. When pathway enrichment was analyzed with REACTOME, ECM proteoglycans, integrin cell surface interactions, collagen degradation, assembly of collagen fibrils and other multimeric structures, collagen biosynthesis and modifying enzymes, neural cell adhesion molecule 1 interactions, ECM organization, signaling by platelet-derived growth factor and scavenging by class A receptors were found to be significantly enriched in the DEPs. Detailed results of the enrichment analysis are presented in Additional file 5. The STRING-database was employed to identify protein-protein interactions within the DEPs. In tota1, 46 protein interactions were found, and these interactions were significantly enriched based on the given nodes (P < 0.001). A complex pattern of protein-protein interactions was found, and clusters of ECM proteins, blood components and cytoskeleton-associated proteins could be discerned (Fig. 4).

Targeted verification
For PRM verification, in total 33 patient samples were selected (20 dCCA samples and 13 controls). Minute number of samples, many with low protein yields allowed for 25 samples to comply with the PRM analysis criteria, comprising 16 dCCA and 9 control samples. Clinicopathological data of the samples analyzed by PRM are presented in Additional file 2. The median coefficient of variation (CV)% of technical replicates for each protein ranged from 3 to 23%. A total of 170 peptides mapping to 91 proteins, including the housekeeping proteins GAPDH and TUBB, were included in the PRM spectral library (Additional file 6). A total of 122 peptides mapping to 79 proteins were successfully quantified (Additional file 7).
In total, 65 different peptides were found to be differentially expressed (q < 0.05, FC ≥ 2 or ≤ 0.5). These peptides mapped to 46 proteins, of which 19 were identified as differentially expressed by 2 different peptides, and 27 proteins were identified as differentially expressed based on 1 peptide. Thirty-nine peptides mapping to 28 proteins were found to be upregulated, and 26 peptides mapping to 18 proteins were found to be downregulated when dCCA samples were compared to controls ( Table 2). Previous associations between the identified proteins and CCA are presented in Table 2. Hierarchical clustering revealed a good separation between dCCA and controls based on the DEPs from the PRM verification (Fig. 5).

THBS2 expression in dCCA
The most upregulated protein verified by PRM was THBS2 (Fig. 6), which had not previously been studied in CCA; THBS2 was selected for further validation using IHC. THBS2 showed a membranous/cytoplasmic staining pattern in epithelial cells and stromal fibroblasts. In the normal bile ducts epithelial THBS2 immunoreactivity was focally positive but predominantly negative. Weak focal stromal expression was also detectable but predominantly negative. Occasional intratumoral lymphocytic infiltration with positive THBS2 expression was noted but not evaluated further. Among the dCCA samples, epithelial immunoreactivity was absent in 6 samples (10%) and scored as 1+ in 30 samples (51%), 2+ in 13 samples (32%) and 3+ in 4 samples (7%). In dCCA stromal compartment, THBS2 expression was absent in 5 samples (8%) and scored as 1+ in 18 samples (31%), 2+ in 25 samples (42%) and 3+ in 11 samples (19%). A weak correlation between epithelial and stromal THBS2 immunoreactivity was observed (r s = 0.32; P = 0.013). The proportion of samples positive for epithelial THBS2 decreased from 25 (96%) to 14 (54%) when primary tumors and paired lymph node metastasis samples were compared (P = 0.001). The proportion of samples positive for stromal THBS2 expression decreased from 25 (96%) to 18 (72%) when primary tumors and paired lymph node metastasis samples were compared (P = 0.031). Representative examples of IHC-staining are presented in Fig. 7.
The distribution of clinicopathological variables between dCCA samples in the epithelial and stromal compartments with positive and negative THBS2 expression is presented in Additional file 8. Epithelial THBS2 positivity was associated with blood vessel invasion (P = 0.004), and stromal THBS2 positivity was associated with R1 resection

Extracellular matrix Proteins
Cytoskeletonassociated proteins Fig. 4 Protein-protein interactions of the 87 differentially expressed proteins (q < 0.05, FC ≥ 2 or ≤ 0.5) between the distal cholangiocarcinoma samples and controls obtained using the STRING database. Discernable clusters of extracellular matrix proteins, blood components and cytoskeleton-associated proteins are highlighted    1.09-14.3; P = 0.037) and tended to be associated with OS (HR 3.34, 95% CI 0.94-11.8; P = 0.062).

Discussion
In this study, we identified DEPs between resected dCCA and normal bile duct samples using macrodissected archived FFPE tissues. A workflow using discovery LC-MS/MS followed by PRM verification of selected candidates was performed. In total, 46 proteins were successfully verified. Bioinformatic analysis highlighted alterations to the tumor reactive stroma (TRS) in dCCA. THBS2 was identified as upregulated in dCCA using MS and further validated to be upregulated in dCCA epithelium and stroma using IHC. There was a significantly lower proportion of epithelial and stromal THBS2 in paired lymph node metastases compared to primary tumors. However, stromal THBS2 expression in lymph node metastases was frequent (72%). Stromal THBS2 was significantly associated with poor DFS (HR 3.95, 95% CI 1.09-14.3; P = 0.037) when adjusted for confounding variables.
One of the characteristics of CCA tumor biology is the generation of a rich TRS [33]. The TRS consists of different cell types such as cancer-associated fibroblasts (CAFs) and tumor-associated macrophages; aberrant lymphatic vasculature; and a remodeled ECM, often with a dense desmoplastic reaction. Signaling interactions between cancer cells and the TRS play an important role in CCA carcinogenesis and treatment resistance [34]. GO analysis of DEPs from the discovery study revealed their enrichment in several TRS components. In addition, the majority of enriched pathways have been implicated in cancer-TRS interactions. This finding highlights the importance of the proteomic alterations to the TRS that occurs in dCCA.
Notably, using IHC, we found that the overexpression of THBS2 in dCCA identified by MS, could be due to both stromal and epithelial THBS2 overexpression. THBS2 is a primarily extracellular protein and has been shown to impact cellular functions such as angiogenesis [35], apoptosis [36], cytoskeletal organization and ECM remodeling [37]. THBS2 has been found to have both pro-and antitumorigenic functions and an association with both good and poor prognosis has been found in different cancers [38,39]. In pancreatic cancer, THBS2 expression in cancer cells in vitro inhibited invasiveness through downregulation of matrix metalloproteinase-9 and urokinase-type plasminogen activator [40]. In contrast, stromal THBS2 expression in pancreatic stellate cells was found to promote cancer cell invasiveness in coculture experiments [41]. Discrepancies between the regulation and prognostic implications of epithelial and stromal THBS2 have been described [42,43]. Recently, Kim et al. [44] demonstrated that plasma THBS2 is a promising diagnostic biomarker for pancreatic cancer. Given the biological similarities of dCCA and pancreatic cancer [45], further evaluation of THBS2 as a potential diagnostic biomarker of dCCA should be encouraged. We have not found any previous study in which the expression of THBS2 in paired lymph node metastases was investigated. Both epithelial THBS2 and stromal THBS2 were significantly less frequent in the lymph node metastases than in primary tumors. The loss of epithelial THBS2 during the dCCA metastatic process could be due to a tumor-suppressive function. Although significantly lower, stromal THBS2 expression was frequently retained in lymph node metastases (72%). The maintenance of THBS2 stromal cells through the metastatic process suggests a function of the THBS2 stroma in dCCA metastatic development. A large amount of stroma in lymph node metastases has previously been associated with poor survival in various cancers [46]. We note that among the most substantially downregulated proteins according to PRM verification were several proteins with a known TRS association. Proteins belonging to the small leucine-rich proteoglycan family such as decorin, lumican, prolargin, dermatopontin and mimecan were all substantially downregulated. Small leucinerich proteoglycans are known to be involved in cellular proliferation, differentiation, survival, adhesion, migration, the inflammatory response and cancer development [47]. Decorin has a tumor-suppressive effect and has been extensively studied as a therapeutic target in epithelial cancers [48]. Furthermore, decorin was previously shown to be downregulated and associated with negative prognosis in eCCA. In addition, decorin treatment could reverse CCA proliferation, invasion and migration in vitro [49]. Mimecan is downregulated in several solid cancers and has a tumor-suppressive function through its interaction with epidermal growth factor receptor signaling and tumoral immune infiltration [50,51]. Mimecan was also found to be downregulated in CCA [52]. Dermatopontin has been shown to be involved in transforming growth factor-β signaling, collagen fibrillogenesis, cell adhesion, and cell proliferation [53]. Additionally, dermatopontin has a tumor-suppressive effect [54] and was found to be downregulated in gallbladder cancer [55]. Prolargin has not been extensively studied in a cancer context however, low prolargin expression has been associated with poor survival in pancreatic cancer [56]. Lumican is known to modulate collagen fibrillogenesis and integrin signaling and has been shown to have a tumorsuppressive effect in several cancers [57]. To the best of our knowledge, neither prolargin nor lumican has been associated with dCCA. Other substantially downregulated proteins include microfibril-associated glycoprotein 4, fibulin-5 and extracellular superoxide dismutase. Microfibril-associated glycoprotein 4 is known to be downregulated in the majority of human cancers with prognostic significance and was predicted to impact cell proliferation and elastic fiber formation [58]. Microfibrilassociated glycoprotein 4 was found to be downregulated in CAFs isolated from iCCA [59]. Fibulin-5 is downregulated in several human cancers and has a tumor-suppressive function mediated through its interaction with matrix metalloproteinases [60]. Furthermore, fibulin-5 was found to be downregulated in CCA [52]. Extracellular superoxide dismutase, an important enzyme in the antioxidant defense system, is frequently downregulated in cancer and has a tumor-suppressive role [61]; however, it has not been described in CCA biology. In addition, catalase another essential member of the antioxidant defense system was found to be downregulated [62]. When looking at the most upregulated proteins identified in our study, several proteins, such as protein S100-P [63,64], carcinoembryonic antigen-related cell adhesion molecule 6 [65], keratin, type I cytoskeletal 17 [66,67], thymidine phosphorylase [68,69], heat shock protein 90 alpha/beta [70] and olfactomedin-4 [71,72], are well established as upregulated proteins and tentative biomarkers in CCA. The correct identification and quantification of known biomarkers gives us confidence in the biological relevance of the proteins identified with our study design. Other substantially upregulated proteins without previous association with CCA include calponin-2, collagen triple helix repeat-containing protein 1 and ero1-like protein alpha. Calponin-2 is an actin cytoskeleton-associated regulatory protein that can inhibit cell proliferation and migration. Calponin-2 has not been extensively studied in cancer, however, it was found to be a prognostic factor and to have tumorsuppressive effects in pancreatic and prostate cancers [73,74]. Collagen triple helix repeat-containing protein 1 is involved in the regulation of cell motility through the regulation of collagen deposition. It has an oncogenic effect and has been shown to be upregulated and associated with poor prognosis in several cancers [75]. Ero1like protein alpha, which is important for disulfide bond formation in secreted molecules, is upregulated in several cancer forms and has been shown to play an important role in tumor-mediated immunosuppression [76].
Several previous studies have employed various MSbased proteomics approaches to identify biomarkers of cholangiocarcinoma through analysis of resected human tissues [72,[77][78][79][80][81][82][83][84][85][86][87][88]. The majority of these previous studies have investigated iCCA or a mixture of different CCA subtypes. A study by Maeda et al. [72] was the first study to analyze an eCCA-only cohort. To the best of our knowledge, our study represents the first proteomic characterization of a dCCA-only cohort.
Some limitations to the present study are noted. A small number of samples were analyzed. All material was retrospectively acquired and stored for various time periods prior to analysis. however, archival storage time of FFPE tissues has not been found to negatively impact MS based proteomics [89][90][91]. For the MS analysis, tumor and control samples were unmatched with regards to sex due to the limited number of tissues available for use as controls. Although we cannot exclude some bias as a result of this imbalance, the effect is likely minor. We choose to use normal tissue controls as compared to the more widely used and widely available morphologically normal tumor adjacent tissues since comparative studies have found molecular aberrations in normal tumor adjacent tissues compared to normal tissues [92]. This is hypothesized to be caused by field cancerization or microenvironment alterations influenced by the tumor [93]. Thus, normal controls can help identify additional biomarkers compared to normal tumor adjacent tissues [92]. In the PRM verification, a housekeeping normalization was used. Since there is currently no consensus on the most suitable housekeeping proteins in cholangiocarcinoma, we chose to use GAPDH and TUBB as housekeeping proteins since they are commonly used for this purpose and because suitable peptides were available in the discovery study data. However, expression of classic housekeeping genes was found to be less stable than assumed, especially in cancer tissues, which thus can be a source of bias [94]. IHC validation was also performed with a relatively small number of samples, particularly with regard to paired lymph node metastases. The survival analysis was underpowered for the detection of anything other than a large prognostic effect, especially as few cases were negative for THBS2. Notably, however, the effect size of the HR for stromal THBS2 was large.

Conclusions
An MS-based workflow was used to identify and verify several proteins without a previous association with dCCA biology. The identified proteins can be further investigated to elucidate their function and potential as biomarkers in dCCA. THBS2 was validated as frequently expressed in the epithelium and stroma of dCCA. Stromal THBS2 is a potential prognostic marker; additionally, it was frequently retained in paired lymph node metastases. The use of stromal THBS2 as a prognostic marker in dCCA should be validated using separate larger cohorts, additionally the potential of THBS2 as a diagnostic biomarker in dCCA should be evaluated.