Skip to main content

Non-invasive biomarkers for early diagnosis of pancreatic cancer risk: metabolite genomewide association study based on the KCPS-II cohort

Abstract

Background

Pancreatic cancer is a lethal disease with a high mortality rate. The difficulty of early diagnosis is one of its primary causes. Therefore, we aimed to discover non-invasive biomarkers that facilitate the early diagnosis of pancreatic cancer risk.

Methods

The study subjects were randomly selected from the Korean Cancer Prevention Study-II and matched by age, sex, and blood collection point [pancreatic cancer incidence (n = 128) vs. control (n = 256)]. The baseline serum samples were analyzed by non-targeted metabolomics, and XGBoost was used to select significant metabolites related to pancreatic cancer incidence. Genomewide association study for the selected metabolites discovered valuable single nucleotide polymorphisms (SNPs). Moderation and mediation analysis were conducted to explore the variables related to pancreatic cancer risk.

Results

Eleven discriminant metabolites were selected by applying a cut-off of 4.0 in XGBoost. Five SNP presented significance in metabolite-GWAS (p ≤ 5 × 10–6) and logistic regression analysis. Among them, the pair metabolite of rs2370981, rs55870181, and rs72805402 displayed a different network pattern with clinical/biochemical indicators on comparison with allelic carrier and non-carrier. In addition, we demonstrated the indirect effect of rs59519100 on pancreatic cancer risk mediated by γ-glutamyl tyrosine, which affects the smoking status. The predictive ability for pancreatic cancer on the model using five SNPs and four pair metabolites with the conventional risk factors was the highest (AUC: 0.738 [0.661–0.815]).

Conclusions

Signatures involving metabolites and SNPs discovered in the present research may be closely associated with the pathogenesis of pancreatic cancer and for use as predictive biomarkers allowing early pancreatic cancer diagnosis and therapy.

Introduction

The pancreas is an organ responsible for producing digestive juices and regulating the blood glucose levels. Pancreatic cancer is very lethal considering that early diagnosis is challenging and the chances of metastasis to the other organs are very high [1]. Pancreatic cancer accounts for approximately 3% of all cancers in the United States, and it is more common in men than in women [2]. According to the National Statistical Office of Korea, 6931 people (3600 men and 3331 women) died from pancreatic cancer, accounting for 8.4% of all cancer cases in 2021 [3].

The cause of pancreatic cancer is unclear, but smoking, being overweight, diabetes, and a relevant family history act as risk factors for pancreatic cancer. Smoking is a crucial risk factor for chronic pancreatic cancer [4]. In a study involving 2009 pancreatic cancer cases and 1532 control groups from the International Pancreatic Cancer Cohort, smokers showed a 1.72-fold higher risk of pancreatic cancer than the non-smokers. In addition, as per a report, the more the numbers of cigarettes smoked, the higher the risk of pancreatic cancer [5].

Recently, several studies were conducted on pancreatic cancer. Currently, the most widely used single tumor marker for pancreatic cancer is carbohydrate antigen (CA) 19–9, as noted in 80% of all pancreatic cancer patients. However, as its specificity is low for screening tests, it is usually used to determine the stage and prognosis of pancreatic cancer or to monitor its recurrence [6, 7]. In addition, Hwang et al. [8] suggested that the miR-21 expression is closely related to anticancer drug resistance; this aspect can be applied to predict anticancer drug resistance and the clinical outcomes for Korean pancreatic cancer patients. However, there are no biomarkers for the early diagnosis or early detection of pancreatic cancer risk yet.

Multi-omics is a method of comprehensively analyzing the data generated at various molecular levels, such as genome, transcriptome, proteome, and metabolome; it has been applied in multiple fields for disease research [9, 10]. This approach can provide systemic clues to understand the underlying metabolic changes occurring through the disease duration. Indeed, proteomics on genetically engineered mouse models with early and advanced stages of pancreatic cancer identified candidate proteome markers applicable to early detection [11]. Moreover, for ovarian cancer that was mainly diagnosed in the late stage, multi-omics technology has been widely used to discover several valuable biomarkers for the early diagnosis [12].

This study aims to discover non-invasive biomarkers for predicting pancreatic cancer risk through multi-omics technology. Genotyping and non-targeted screened metabolite data in the Korean subjects from the Korean Cancer Prevention Study (KCPS)-II were integratively analyzed through diverse statistical analyses. We expected that, our findings, including genomic and metabolomic biomarkers, can serve as the basis for research on pancreatic cancer pathogeneses.

Materials and methods

Study population

The study subjects were selected from the KCPS-II cohort. Briefly, the KCPS-II subjects were recruited through 18 health promotion centers across South Korea from April 2004. After their enrollment, hospital admission records, death registries, and National Cancer Center registry data were collected during the follow-up period. Written informed consent for cohort registration and secondary research was obtained from all cohort subjects, and their blood samples were collected.

For the current research, subjects aged 25–71 years were randomly selected from the KCPS-II. We comprised two groups by matching in a 1:2 ratio by age, sex, and the blood collection point [pancreatic cancer incidence group (n = 128) vs. control (n = 256)]. The subjects who were cancer-free at the time of enrollment, but later developed pancreatic cancer during the follow-up period were assigned to the case group.

All procedures in the current research involving human participants were performed in accordance with the ethical standards of the Institutional Review Board at the Yonsei University Health System under the Helsinki Declaration [IRB Number: 4-2022-1136].

Smoking history

Each participant answered a self-administered questionnaire concerning their smoking habits (never-smoker = 0, ex-smoker = 1, or a current smoker = 2). The smoking amount of current smokers was also investigated, but due to several missing values, this data was not used in this study.

Metabolome analysis

Non-targeted metabolomics

UHPLC-MS/MS analysis

The prepared serum samples were precipitated with cold acetonitrile (Wako Pure Chemical Industries, Osaka, Japan) (1:3, v/v) and centrifuged for 15 min (13,000 rpm, 4 ℃). The supernatant was then separated and dried in a vacuum concentrator (HyperVAC-MAX, Hanil Scientific Inc., Gimpo, Korea) without heating. Next, 200 μL of 10% methanol (J.T. Baker® Chemicals; Avantor Performance Materials, Inc., Radnor, PA, USA) was added for reconstitution and filtrated through a 0.45-μm polyvinylidene difluoride syringe filter. L-Leucine-1-13C (Sigma-Aldrich, Saint Louis, MO, USA) was used as an internal standard (ISTD). The quality control (QC) sample was prepared following the exact step by combining all the serum samples.

The serum samples were injected into the Acquity UPLC-BEH-C18 column (Waters, Milford, MA, USA) connected to the Thermo UHPLC system (Ultimate 3000 BioRS; Dionex, Thermo Fisher Scientific, Bremen, Germany). The column temperature was maintained at 50 ℃. Two mobile phases [A, composed of 0.1% formic acid in LC–MS grade water (Thermo Fisher Scientific, Fair Lawn, NJ, USA); B, composed of 0.1% formic acid in LC–MS-grade methanol (Thermo Fisher Scientific, Fair Lawn, NJ, USA)] made gradient during 17 min for separating the compounds in the samples. Q Exactive Plus Orbitrap (Thermo Fisher Scientific, Waltham, MA, USA) was combined with the UHPLC system for data detection. On MS, positive electrospray ionization mode (ESI +) with 30 of collision energy, 3.5 kV of spray voltage, 60 (arbitrary units) of a flow rate of nitrogen sheath gas, and 20 (arbitrary units) of a flow rate of auxiliary gas was performed. Full scan-ddms2 mode with a scan range of 80–1000 mass-to-charge (m/z) was used to collect data.

The QC samples were measured for every 10th prepared serum sample and monitored for sensitivity and reproducibility. In addition, the intra-assay and inter-assay variations were assessed using replicated results of QC samples for a few days.

Identification of metabolites

Compound Discoverer 3.2 software (Thermo Fisher Scientific, San Jose, CA, USA) was used for processing the raw spectra. Alignment and normalization were performed QCs in the program. Features detected < 80% in all QC samples were discarded. Processed features were identified with reference to online databases ChemSpider (http://www.chemspider.com), LIPID MAPS (https://www.lipidmaps.org), mzCloud (https://www.mzcloud.org), and Kyoto Encyclopedia of Genes and Genomes (KEGG; https://www.genome.jp/kegg).

Genotyping

DNA was genotyped using the KORV1.0–96 Array (Affymetrix, Santa Clara, CA, USA) provided by the K-CHIP consortium and Affymetrix Genomewide Human SNP Array 5.0 (Affymetrix Inc.). Markers with a high missing rate (> 5%), individuals with a high missing rate (> 5%), and SNPs with a minor allele frequency < 0.05 or in a significant deviation from the Hardy—Weinberg equilibrium (p < 1.0E − 6) were excluded for quality control.

Statistical analysis

All statistical analyses were conducted by SPSS 26 (IBM Corp, Armonk, NY, USA), R 4.1.3, and Python 3.9.12. We performed Independent t-tests and Mann–Whitney U-tests to evaluate the differences in the clinical/biochemical variables between the two groups. The skewed variables were logarithmically transformed. For nominal variables, a Chi-square test was applied. The data are expressed as the mean ± SE, and two-tailed p < 0.05 were considered to indicate statistical significance.

For multivariate analyses, the normalized metabolite data were exported from Compound Discoverer 3.2. After Pareto-scaling and logarithmically transforming, the eXtreme Gradient Boosting (XGBoost) model was fitted using Python. The log-loss function was applied as the target in a binary variable (control; 0, case; 1). To optimize the model hyperparameters, we limited the maximum depth of the trees and eta while increasing the n_estimators so as to help prevent overfitting; the model using a too-small weak learner (n_estimators) with deep tree may contain noise, and reducing the eta diminishes the contribution of each tree to the model. As a result, the XGBoost model was fitted with the following parameters to achieve a high AUC in the test set: n_estimators, 50; learning rate, 0.15, alpha, 0.001; max depth, 2; min child weigh, 5; and et, 0.1.

Metabolite-GWAS was performed using PLINK 2.0. Next, logistic regression analysis was performed after adjusting for age and sex to evaluate the association between the revealed significant SNPs and pancreatic cancer. The predictive ability for pancreatic cancer using the biomarkers discovered in this study was assessed through regression analysis. Furthermore, we confirmed whether the smoking status is a significant moderator of the association between metabolites (independent variable) and pancreatic cancer incidence (dependent variable) by using p-values from a coefficient of the interaction term (metabolites* smoking status). In addition, we conducted a mediation analysis to demonstrate a metabolite as a significant mediator of the association between smoking status (independent variable) and pancreatic cancer incidence (dependent variable) using the R mediate function in the mediation package. Python and R codes used in the current research were provided in Additional file 2: Data S1.

A network model was created in the carrier and the non-carrier groups of effect alleles so as to visualize the relationships between clinical/biochemical indicators and paired metabolites of each SNP based on partial correlation. To reflect the difference in the quantitative abundance between the pancreatic cancer incidence and control groups, we calculated the z-score of each variable.

Results

Anthropometric and clinical/biochemical characteristics at the baseline

Excluding 35 subjects without genotyping data, 349 patients were included in the final analysis [pancreatic cancer incidence group (n = 113) vs. control (n = 236)]. The baseline characteristics of the total subjects are presented in Table 1. No significant differences were noted between the pancreatic cancer incidence and control groups. To summarize, the mean age was 52.4 years in the pancreatic cancer incidence group and 52.7 years in the control group (p from t-test = 0.991). The pancreatic cancer incidence group included 77.0% male and 23.0% female, while the control group included 73.7% male and 26.3% female, indicating no significant difference between the groups (p = 0.511). No statistical difference was noted in BMI, with the pancreatic cancer incidence and control groups showing respective mean values of 24.6 and 24.3 (p = 0.238). In addition, the two groups showed no significant difference in CA 19–9 (pancreatic cancer incidence group, 20.0 ± 2.48; control group, 8.37 ± 0.526; p = 0.346). The Chi-squared test confirmed the lack of any significant difference in the frequency of current smokers between the two groups (pancreatic cancer incidence group, 31.7%; control group, 30.3%; p = 0.116).

Table 1 Baseline clinical and biochemical characteristics of subjects

Discriminant metabolites between the pancreatic cancer incidence and control groups

Among the 3165 detected features from MS, 173 metabolites were identified. A heatmap comparing the abundance of identified metabolites between the pancreatic cancer incidence and control groups is shown in Additional file 1: Figure S1.

Before establishing the XGBoost model, a random seed 6:4 was applied to divide the training and the test sets (Additional file 2: Data S2). In the training set, 68 individuals from the pancreatic cancer incidence group and 141 from the control group were included. There was no significant difference in the age and sex distribution between these two groups. The proportion of current smokers in the pancreatic cancer incidence group was 30.9%, which showed a statistical difference from the control group of 30.5% (p = 0.018). In the test set, 45 individuals were from the pancreatic cancer incidence group, while 95 were from the control group. There were no significant differences in terms of age, gender, or smoking status between these two groups.

We fitted XGBoost on the training dataset (n = 209) and calculated the feature importance for identifying the effect of metabolites on the fitted model. As a result, 11 metabolites that considerably differed between the groups were selected (feature importance ≥ 4.0), as summarized in Table 2. The levels of serum eicosa-11,14,17-trienoic acid, kynurenic acid, γ-glutamyl tyrosine, lysoPE(18:0/0:0), trans-3'-hydroxy cotinine, and L-leucine were found to be elevated in the pancreatic cancer incidence group. In contrast, the pancreatic cancer incidence group had lower N(6)-methyllysine, palmitic amide, adipic acid, 9-decenoylcarnitine, and 5α-pregnane-3,20-dione levels than the control group.

Table 2 Identification of meaningful metabolites using XGBoost

The performance values of the XGBoost model on the training and test sets are shown in Additional file 2: Data S2. The training set had an accuracy of 0.952, precision of 0.983, recall of 0.868, and AUC of 0.998. In the case of the test set, an accuracy of 0.671, precision of 0.471, recall of 0.178, and AUC of 0.640 were recorded.

Metabolite-genomewide association analysis

Using 11 selected metabolites, we conducted a metabolite-GWAS. We generated a Manhattan plot to identify significant SNPs and performed linkage disequilibrium clumping with a threshold of p ≤ 5 × 10–6 to mitigate the tendency for correlation between genetic variants located nearby. Logistic regression analysis was performed to demonstrate their association with the incidence of pancreatic cancer (Table 3). Particularly, the G allele of rs2370981 mapped to NRXN3, strongly related to eicosa-11,14,17-trienoic acid, was identified as a protective allele for pancreatic cancer [OR = 0.371, p = 0.043]. Other four notable SNPs (i.e., rs59519100, rs11164375, rs72805402, and rs55870181) were all associated with a higher risk of pancreatic cancer; rs59519100 showed a significant association with γ-glutamyl tyrosine, rs11164375 with lysoPE (18:0/0:0), rs72805402 (mapped to ZNF503) and rs55870181 with L-leucine; Manhattan plots for these are presented in Additional file 1: Figure S2.

Table 3 Genome-wide association analysis of pancreatic cancer-related metabolites

Network analysis between metabolomic biomarkers and clinical/biochemical indicators

We divided the subjects into each SNP’s effect allele carrier and non-carrier groups. Then, clinical/biochemical indicators and pair metabolites of the SNP were used to create network models based on the z-score obtained after comparing the pancreatic cancer incidence and control groups for each variable and the partial correlation values between them (Fig. 1).

Fig. 1
figure 1figure 1

The network between metabolites and clinical/biochemical indicators in each SNP group. ALB albumin, ALP alkaline phosphatase, ALT alanine aminotransferase, AST aspartate aminotransferase, BIL bilirubin, BMI body mass index, BUN blood urea nitrogen, CHO total cholesterol, CRE creatinine, DBP diastolic blood pressure, FBS fasting blood sugar, GGT gamma-glutamyltransferase, HDL, high-density lipoprotein, LDL low-density lipoprotein, SBP Systolic blood pressure, TG Triglyceride, URIC uric acid, WBC white blood cell. Node presents metabolite or clinical/biochemical indicators; the edge between two nodes indicates a partial correlation. The color of the nodes represents the z-score when comparing the pancreatic cancer incidence and control groups. Positive and negative correlations are represented using light-red and light-blue edges. Thicker edges represent stronger correlations between the two metabolite levels

As a result, pair metabolites of rs2370981, rs55870181, rs59519100, and rs72805402 displayed significantly different partial correlation network patterns with the clinical/biochemical indicators on comparison of the effect allele carrier and the non-carrier groups of each SNP. In summary, the risk allele carriers of rs2370981 showed several significant partial correlations that were not detected in the non-risk allele carriers; eicosa-11,14,17-trienoic acid with low-density lipoprotein (LDL) (r = 0.613, p = 0.045), alanine aminotransferase (ALT) (r = 0.632, p = 0.037), white blood cell (r = 0.816, p = 0.002), body mass index (r = -0.636, p = 0.036), and creatinine (r = − 0.67, p = 0.024). Moreover, a significant negative partial correlation between γ-glutamyl tyrosine and aspartate aminotransferase (AST) (r = − 0.237, p = 0.049) was observed in the risk allele carriers of rs59519100. Finally, l-leucine exhibited notable partial correlations with a few clinical/biochemical indications. l-Leucine and diastolic blood pressure (r = 0.18, p = 0.046) and L-leucine and glucose (r = − 0.259, p = 0.004) were identified as the risk allele carriers of rs55870181. In addition, in the non-risk allele carriers of rs72805402, l-leucine positively correlated with the blood urea nitrogen level (r = 0.137, p = 0.049) and negatively correlated with high-density lipoprotein (r = − 0.146, p = 0.035).

Mediation and moderation analyses

Mediation analysis, after adjusting for age and sex, was conducted on the selected metabolites and SNP biomarkers for pancreatic cancer. We noted significant outcomes in the association between γ-glutamyl tyrosine and rs59519100. Although rs59519100 showed no significant direct effect on pancreatic cancer incidence (β = 0.069, p = 0.242), γ-glutamyl tyrosine mediated the indirect effect of rs59519100 on pancreatic cancer incidence (β = 0.056, p = 0.002) with causal mediation effects of 44.6% relative to the total effect (Fig. 2).

Fig. 2
figure 2

Mediation and moderation analysis. The result of the mediation analysis is presented in the blue circle and that of the moderation analysis in the red circle. Adjusting odds ratio (AOR) and confidence interval are indicated with points and lines on the graph. Variables marked with a are derived from the age- and sex-adjusting model. Variable marked with b is derived from the age-, sex-, and smoking status-adjusting model

Next, we conducted a moderation analysis after adjusting for the age and sex so as to explore the effect of smoking status as a moderator on the association among γ-glutamyl tyrosine, rs59519100, and pancreatic cancer (Fig. 2). The level of γ-glutamyl tyrosine was negatively associated with pancreatic cancer risk (β = -0.504, p < 0.001). It was maintained after adjusting the smoking status (β = − 0.508, p < 0.001). When the interaction effect (smoking status * γ-glutamyl tyrosine) was added to the linear model, this interaction term was found to be positively associated with pancreatic cancer risk (β = 0.666, p = 0.033). In other words, the smoking status affected the association between γ-glutamyl tyrosine and pancreatic cancer risk. In addition, smoking did not significantly modulate the other associations (Additional file 1: Figure S3).

Evaluation of the predictive power as a biomarker for pancreatic cancer

Figure 3 depicts the prediction model using conventional risk factors and significant biomarkers identified in the present research. First, the total subjects' results (n = 349) are as follows: an area under the curve (AUC) obtained from the prediction model consisting of age, sex, and CA 19–9 was 0.569 [0.484–0.654]. The conventional model with age, sex, smoking status (never, ever, current), and CA 19–9 was 0.564 [0.480–0.649]. On adding five SNP biomarkers (i.e., rs2370981, rs59519100, rs11164375, rs72805402, and rs55870181) and four metabolic biomarkers (i.e., eicosa-11,14,17-trienoic acid, γ-glutamyl tyrosine, lysoPE(18:0/0:0), and L-leucine) to the conventional model, AUC was improved to 0.702 [0.640–0.763]. The highest AUC of 0.738 [0.661–0.815] was observed in the final model consisting of all variables (i.e., age, sex, smoking status, CA 19–9, rs2370981, rs59519100, rs11164375, rs72805402, rs55870181, eicosa-11,14,17-trienoic acid, γ-glutamyl tyrosine, lysoPE(18:0/0:0), and l-leucine). Furthermore, the predictive power of the model using variables indicating significance in mediation and moderation analyses (i.e., age, sex, smoking status, γ-glutamyl tyrosine, and rs59519100) was an AUC of 0.651 [0.588–0.713], which was within the range of predictive power of the previously described models.

Fig. 3
figure 3

ROC curves for the prediction of pancreatic cancer in total subjects. Prediction models in the total subjects (n = 349), training set (n = 209), and test set (n = 140). The variables utilized in each model are different, and each model is displayed in a different color

The prediction performance trend was similar even when analyzed separately into training (n = 209) and test sets (n = 140). In both sets, the final model when metabolic and SNP biomarkers were added to the conventional model exhibited the most potent prediction power, and the predictive power of the final model was considerably improved when compared to the conventional model. The final model of the training set had an AUC of 0.843 [0.769–0.918], whereas the conventional model was 0.625 [0.526–0.725]. In addition, the final model of the test set had an AUC of 0.734 [0.618–0.850], while the conventional model showed 0.568 [0.416–0.719].

Discussion

We discovered four metabolites (i.e., eicosa-11,14,17-trienoic acid, γ-glutamyl tyrosine, lysoPE(18:0), and L-leucine) and five SNPs (i.e., rs2370981, rs59519100, rs11164375, rs72805402, and rs55870181) with the potential to act as predictive biomarkers for pancreatic cancer using metabolite-GWAS analysis. As the current study used data from subjects obtained before the onset of pancreatic cancer, no significant difference was noted between the two groups in terms of CA 19–9, which was mainly used to determine the prognosis, treatment effects, and recurrence of pancreatic cancer. Moreover, the predictive value of the conventional model for predicting pancreatic cancer using age, gender, smoking status, and CA 19–9 was 0.564 [0.480–0.649]. However, when the four metabolites and five SNPs identified in this study were combined, the predictive power for pancreatic cancer increased to 0.702 [0.640–0.763], and, when CA 19–9 was integrated, the predictive power for pancreatic cancer was found to be the highest, with an AUC of 0.738 [0.661–0.815]. In other words, utilizing CA 19–9, not extensively used in the screening tests owing to its low specificity, with the biomarkers revealed in our study could improve the predictive potential for the early detection of pancreatic cancer risk. Furthermore, the partial correlation network between each pair of metabolites and clinical/biochemical indicators revealed significantly different patterns according to the effect allele carrier or non-carrier groups of rs2370981, rs55870181, rs59519100, and rs72805402; metabolism involving metabolic biomarkers were associated with a genetic predisposition.

Among them, the indirect effect of rs59519100 mediated by γ-glutamyl tyrosine on pancreatic cancer risk was demonstrated through mediation analysis. Furthermore, the association between γ-glutamyl tyrosine and pancreatic cancer risk was impacted by the smoking status. γ-Glutamyl tyrosine is a dipeptide composed of γ-glutamate and tyrosine—a product of incomplete proteolytic breakdown. Although dipeptides have some physiological effects, the metabolic function of γ-glutamyl tyrosine is unclear. We observed a higher serum level of γ-glutamyl tyrosine in the pancreatic cancer incidence group. The abnormal levels of γ-glutamyl dipeptide have been linked to several metabolic disorders in epidemiological studies [13, 14]. Similarly, metabolomics discovered several γ-glutamyl dipeptides related to oxidative stress and dysregulated lipid profiles [15, 16] as they are involved in the γ-glutamyl cycle for regenerating the intracellular glutathione. As γ-glutamyltransferase (GGT) detoxicates glutathione, increased GGT activity is an important marker for increased oxidative stress. γ-Glutamyl tyrosine, observed in our study, may also contribute to the biochemical pathways, inducing oxidative stress.

Unexpectedly, γ-glutamyl tyrosine was not significantly correlated with the levels of GGT, ALT, and AST in all the subjects of the present study (data not shown). However, a negative partial correlation between γ-glutamyl tyrosine and AST (r = − 0.237, p = 0.049) was identified in the risk allele carrier of the rs59519100 group. In other words, subjects with the rs59519100 risk allele showed a high risk of developing pancreatic cancer, and metabolic alterations in their etiology were implied by AST and γ-glutamyl tyrosine. As liver enzymes (i.e., GGT, ALT, and AST) are very close to each other, the significance of AST could be connected with the mechanisms of γ-glutamyl tyrosine linked to GGT. Indeed, pancreatic ductal adenocarcinoma patients with elevated AST levels revealed a considerably shorter overall survival than those with lower AST levels [17]. Furthermore, we discovered a novel SNP, rs59519100, significantly associated with γ-glutamyl tyrosine, in relation to the risk of pancreatic cancer. Further study is therefore needed to clarify the underlying mechanisms of these valuable biomarkers.

Intriguingly, through moderation analysis, we demonstrated that the smoking status significantly affected the association between γ-glutamyl tyrosine and pancreatic cancer risk. On the other hand, an association between the smoking status and γ-glutamyl tyrosine has not yet been reported, while liver enzymes (such as GGT, AST, and ALT), which is possibly connected to γ-glutamyl tyrosine, has shown some evidence of association with the smoking habit. Zhang et al. [18] determined the smoking and alcohol drinking habit synergistically affected the elevation of GGT levels in Chinese [19, 20]. In a mouse model, the maternal smoking exposure during pregnancy increased the severity of non-alcoholic steatohepatitis in offspring mice by increasing their serum ALT, AST, total cholesterol, and triglyceride levels and modulating the phosphorylation of AMP-activated protein kinase [21]. Elucidation of the exact metabolic pathways between these biomarkers through which the smoking modulates can facilitate precision medicine or management for pancreatic cancer.

The next notable biomarker is l-leucine, which belongs to the branched-chain amino acids (BCAAs). The breakdown of BCAAs, mainly stored as tissue protein, provides a source for synthesizing other molecules. Consistent with some previous reports, serum l-leucine was elevated in the prediagnostic serum of the pancreatic cancer-incidence group when compared to the control in our research. Mayers et al. observed that subjects with elevated circulating BCAAs in the prediagnostic plasma had more than a two-fold increased risk of pancreatic ductal adenocarcinoma (PDAC) [22]. The leading cause of this increase in plasma BCAAs is tissue protein degradation exceeding the systemic requirement for BCAAs [22, 23], which often occurs in metabolic diseases [24]. Moreover, abnormal physiological functions of the pancreas, including that related to insulin secretion, could directly modulate tissue protein degradation, including that of BCAAs. In all the study subjects, l-leucine was found to be negatively correlated with the levels of glucose (r = − 0.113, p = 0.034), LDL (r = − 0.130, p = 0.015), and uric acid (r = − 0.118, p = 0.031) (data not shown). These findings indicate that higher leucine levels in the pancreatic cancer incidence group may closely reflect the condition of the pancreas during disease progression.

Furthermore, one of the two SNPs associated with L-leucine was mapped to the gene; rs72805402 mapped to ZNF503 (Zinc Finger Protein 503) that functions as a transcriptional repressor. Rich leucine residues in the SCAN domain of zinc finger proteins participate in protein—protein interaction, thereby inducing various transcription activities [25]. The associations of ZNF503 acting as an essential regulator have been reported during the developmental process and tumor initiation with multiple carcinomas, [26, 27] but not in pancreatic cancer. Therefore, our data provide a candidate gene for diagnostic and therapeutic strategies for pancreatic cancer. Different network patterns in the risk allele carrier or non-carrier groups provide a comprehensive insight into SNP-metabolite-clinical indicators of pancreatic cancer incidence.

Finally, eicosa-11,14,17-trienoic acid associated with rs2370981 mapped to NRXN3 (neurexin 3) belongs to the long-chain fatty acids, with very few articles published on eicosa-11,14,17-trienoic acid [28]. NRXN3 encodes the receptor and cell adhesion molecules mainly involved in the nervous system [29]. Therefore, most mutations in this gene have been reported in neurological diseases, and several associations with carcinoma have been reported, albeit not in pancreatic cancer. Interestingly, hypermethylation of ZNF582, the same class as zinc finger protein associated with L-leucine in our research, regulated the transcription of NRXN3 in nasopharyngeal carcinoma [30]. In addition, the changes in the protein NRXN3 level in the brain cerebrospinal fluid derived from Huntington’s disease agreed with the protein and mRNA levels of ZNF503 [31]. Based on the recent literature review, we suggested that SNPs of the two genes discovered in our study could synergistically affect the pancreatic cancer risk.

Several limitations should be delineated in this case. First, this study was conducted on design without classifying the pancreatic cancer type. Therefore, if the result was replicated from blood samples collected following the pancreatic cancer stage with type information, the biomarkers identified in the present study could be robust for pancreatic cancer. Next, it was a small sample size for conducting GWAS. With a larger sample size, it was possible to discover more meaningful biomarkers, with more substantial statistical power. Third, drawing the causality and interpreting the underlying mechanisms between biomarkers were challenging in our study design. Instead, we performed moderation, mediation, and network analysis. Additional experimental research is therefore warranted to elucidate the exact mechanism of pathogenesis related to discovered associations. Furthermore, the effect of smoking was analyzed using only self-reported smoking status data. Thus, it is necessary to examine the impact of smoking on other variables, such as the duration and amount of tobacco use.

Despite some limitations in this study, it is the first one to employ metabolite-GWAS for pancreatic cancer in the Korean population. As a result, we identified four metabolites (i.e., eicosa-11,14,17-trienoic acid, γ-glutamyl tyrosine, lysoPE(18:0), and L-leucine) and five SNPs (i.e., rs2370981, rs59519100, rs11164375, rs72805402, and rs55870181) with the potential for use as predictive biomarkers for pancreatic cancer risk. Particularly, we noted the indirect effect of rs59519100 mediated by γ-glutamyl tyrosine on pancreatic cancer risk and affected by the smoking status. Indeed, the smoking status affected the newly discovered pathogenesis involving γ-glutamyl tyrosine related to pancreatic cancer risk. In addition, the difference in the network pattern based on the presence or absence of risk allele of SNP is also noteworthy. We therefore believe that the present results can serve as the base of precision medicine or management for pancreatic cancer.

Availability of data and materials

Some or all datasets generated during and/or analyzed during the current study are not publicly available, but can be made available from the corresponding author upon reasonable request.

Abbreviations

ALT:

Alanine aminotransferase

AST:

Aspartate aminotransferase

AUC:

Area under the curve

BCAAs:

Branched-chain amino acids

CA:

Carbohydrate antigen

ESI:

Electrospray ionization mode

GGT:

Gamma-glutamyltransferase

ISTD:

Internal standard

KCPS:

Korean Cancer Prevention Study

KEGG:

Kyoto encyclopedia of genes and genomes

LDL:

Low-density lipoprotein

NRXN3:

Neurexin 3

PDAC:

Pancreatic ductal adenocarcinoma

QC:

Quality control

SNP:

Single nucleotide polymorphisms

XGBoost:

EXtreme Gradient Boosting

ZNF503:

Zinc finger protein 503

References

  1. Hassan MM, Bondy ML, Wolff RA, Abbruzzese JL, Vauthey JN, Pisters PW, et al. Risk factors for pancreatic cancer: case-control study. Am J Gastroenterol. 2007;102(12):2696–707. https://doi.org/10.1111/j.1572-0241.2007.01510.x.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin. 2022;72(1):7–33. https://doi.org/10.3322/caac.21708.

    Article  PubMed  Google Scholar 

  3. Statistics Korea. Korean Statistical Information Service database: Cause of death statistics in 2021. 2022.

  4. Yadav D, Lowenfels AB. The epidemiology of pancreatitis and pancreatic cancer. Gastroenterology. 2013;144(6):1252–61. https://doi.org/10.1053/j.gastro.2013.01.068.

    Article  PubMed  Google Scholar 

  5. Lynch SM, Vrieling A, Lubin JH, Kraft P, Mendelsohn JB, Hartge P, et al. Cigarette smoking and pancreatic cancer: a pooled analysis from the pancreatic cancer cohort consortium. Am J Epidemiol. 2009;170(4):403–13. https://doi.org/10.1093/aje/kwp134.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Luo G, Jin K, Deng S, Cheng H, Fan Z, Gong Y, et al. Roles of CA19-9 in pancreatic cancer: biomarker, predictor and promoter. Biochim Biophys Acta Rev Cancer. 2021;1875(2):188409. https://doi.org/10.1016/j.bbcan.2020.188409.

    Article  PubMed  CAS  Google Scholar 

  7. Ge L, Pan B, Song F, Ma J, Zeraatkar D, Zhou J, et al. Comparing the diagnostic accuracy of five common tumour biomarkers and CA19-9 for pancreatic cancer: a protocol for a network meta-analysis of diagnostic test accuracy. BMJ Open. 2017;7(12):e018175. https://doi.org/10.1136/bmjopen-2017-018175.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Hwang JH, Voortman J, Giovannetti E, Steinberg SM, Leon LG, Kim YT, et al. Identification of microRNA-21 as a biomarker for chemoresistance and clinical outcome following adjuvant therapy in resectable pancreatic cancer. PLoS ONE. 2010;5(5):e10630. https://doi.org/10.1371/journal.pone.0010630.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Hasin Y, Seldin M, Lusis A. Multi-omics approaches to disease. Genome Biol. 2017;18(1):83. https://doi.org/10.1186/s13059-017-1215-1.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. Pettini F, Visibelli A, Cicaloni V, Iovinelli D, Spiga O. Multi-omics model applied to cancer genetics. Int J Mol Sci. 2021;22(11):5751. https://doi.org/10.3390/ijms22115751.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Faca VM, Song KS, Wang H, Zhang Q, Krasnoselsky AL, Newcomb LF, et al. A mouse to human search for plasma proteome changes associated with pancreatic tumor development. PLoS Med. 2008;5(6):e123. https://doi.org/10.1371/journal.pmed.0050123.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Xiao Y, Bi M, Guo H, Li M. Multi-omics approaches for biomarker discovery in early ovarian cancer diagnosis. EBioMedicine. 2022;79:104001. https://doi.org/10.1016/j.ebiom.2022.104001.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Comte B, Monnerie S, Brandolini-Bunlon M, Canlet C, Castelli F, Chu-Van E, et al. Multiplatform metabolomics for an integrative exploration of metabolic syndrome in older men. EBioMedicine. 2021;69:103440. https://doi.org/10.1016/j.ebiom.2021.103440.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Saoi M, Sasaki K, Sagawa H, Abe K, Kogiso T, Tokushige K, et al. High throughput screening of serum γ-Glutamyl dipeptides for risk assessment of nonalcoholic steatohepatitis with impaired glutathione salvage pathway. J Proteome Res. 2020;19(7):2689–99. https://doi.org/10.1021/acs.jproteome.9b00405.

    Article  PubMed  CAS  Google Scholar 

  15. Zheng Y, Yu B, Alexander D, Steffen LM, Boerwinkle E. Human metabolome associates with dietary intake habits among African Americans in the atherosclerosis risk in communities study. Am J Epidemiol. 2014;179(12):1424–33. https://doi.org/10.1093/aje/kwu073.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Zierer J, Kastenmüller G, Suhre K, Gieger C, Codd V, Tsai PC, et al. African Americans in the atherosclerosis risk in communities study. Am J Epidemiol. 2014;179(12):1424–33. https://doi.org/10.1093/aje/kwu073.

    Article  Google Scholar 

  17. He M, Liu Y, Huang H, Wu J, Wu J, Wang R, et al. Serum aspartate aminotransferase is an adverse prognostic indicator for patients with resectable pancreatic ductal adenocarcinoma. Lab Med. 2023. https://doi.org/10.1093/labmed/lmad014.

    Article  PubMed  Google Scholar 

  18. Zhang Z, Ma L, Geng H, Bian Y. Effects of smoking, and drinking on serum gamma-glutamyl transferase levels using physical examination data: a cross-sectional study in Northwest China. Int J Gen Med. 2021;14:1301–9. https://doi.org/10.2147/IJGM.S301900.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Wannamethee SG, Shaper AG. Cigarette smoking and serum liver enzymes: the role of alcohol and inflammation. Ann Clin Biochem. 2010;47(Pt 4):321–6. https://doi.org/10.1258/acb.2010.009303.

    Article  PubMed  CAS  Google Scholar 

  20. Csordas A, Bernhard D. The biology behind the atherothrombotic effects of cigarette smoke. Nat Rev Cardiol. 2013;10(4):219–30. https://doi.org/10.1038/nrcardio.2013.8.

    Article  PubMed  CAS  Google Scholar 

  21. Yang D, Kim JW, Jeong H, Kim MS, Lim CW, Lee K, et al. Effects of maternal cigarette smoke exposure on the progression of nonalcoholic steatohepatitis in offspring mice. Toxicol Res. 2022;39(1):91–103. https://doi.org/10.1007/s43188-022-00153-1.

    Article  PubMed  CAS  Google Scholar 

  22. Mayers JR, Wu C, Clish CB, Kraft P, Torrence ME, Fiske BP, et al. Elevation of circulating branched-chain amino acids is an early event in human pancreatic adenocarcinoma development. Nat Med. 2014;20(10):1193–8. https://doi.org/10.1038/nm.3686.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Ferguson D, Eichler SJ, Yiew NKH, Colca JR, Cho K, Patti GJ, et al. Mitochondrial pyruvate carrier inhibition initiates metabolic crosstalk to stimulate branched chain amino acid catabolism. Mol Metab. 2023;70:101694. https://doi.org/10.1016/j.molmet.2023.101694.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Sivanand S, Vander Heiden MG. Emerging roles for branched-chain amino acid metabolism in cancer. Cancer Cell. 2020;37(2):147–56. https://doi.org/10.1016/j.ccell.2019.12.011.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Li X, Han M, Zhang H, Liu F, Pan Y, Zhu J, et al. Structures and biological functions of zinc finger proteins and their roles in hepatocellular carcinoma. Biomark Res. 2022;10(1):2. https://doi.org/10.1186/s40364-021-00345-1.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Yin G, Liu Z, Wang Y, Sun L, Wang L, Yao B, et al. ZNF503 accelerates aggressiveness of hepatocellular carcinoma cells by down-regulation of GATA3 expression and regulated by microRNA-495. Am J Transl Res. 2019;11(6):3426–37.

    PubMed  PubMed Central  CAS  Google Scholar 

  27. Shahi P, Wang CY, Lawson DA, Slorach EM, Lu A, Yu Y, et al. ZNF503/Zpo2 drives aggressive breast cancer progression by down-regulation of GATA3 expression. Proc Natl Acad Sci U S A. 2017;114(12):3169–74. https://doi.org/10.1073/pnas.1701690114.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. The Human Metabolome Database. https://hmdb.ca/metabolites/HMDB0244373

  29. Kamal N, Jafari Khamirani H, Dara M, Dianatpour M. NRXN3 mutations cause developmental delay, movement disorder, and behavioral problems: CRISPR edited cells based WES results. Gene. 2023;867:147347. https://doi.org/10.1016/j.gene.2023.147347.

    Article  PubMed  CAS  Google Scholar 

  30. Zhao Y, Hong XH, Li K, Li YQ, Li YQ, He SW, et al. ZNF582 hypermethylation promotes metastasis of nasopharyngeal carcinoma by regulating the transcription of adhesion molecules Nectin-3 and NRXN3. Cancer Commun. 2020;40(12):721–37. https://doi.org/10.1002/cac2.12104.

    Article  Google Scholar 

  31. Fang Q, Strand A, Law W, Faca VM, Fitzgibbon MP, Hamel N, et al. Brain-specific proteins decline in the cerebrospinal fluid of humans with Huntington disease. Mol Cell Proteomics. 2009;8(3):451–66. https://doi.org/10.1074/mcp.M800231-MCP200.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

Not applicable

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT of the Korea government (MSIT) [NRF-2022R1A6A3A01085831].

Author information

Authors and Affiliations

Authors

Contributions

YH designed the study, conducted experimental analyses, performed statistical analyses, wrote the draft, and revised the manuscript. KJJ designed the study and interpreted the data. UK performed statistical analyses and interpreted the data. CIJ conducted experimental analyses and interpreted the data. KL interpreted the data. SHJ designed the study and provided samples. All authors carefully reviewed the final manuscript and approved it for publication.

Corresponding author

Correspondence to Sun Ha Jee.

Ethics declarations

Ethical approval and consent to participate

All procedures in the studies involving human participants were performed in accordance with the ethical standards of the Institutional Review Board at the Yonsei University Health System under the Helsinki Declaration [IRB number: 4-2022-1136]. Paper-based informed consent forms, stored in a document system after obtaining the necessary signatures, were used to record the intent and to identify the will of the subjects to participate in the research.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Figure S1.

Heatmap of metabolite abundance in each group. Figure S2. Manhattan plot from GWAS. Figure S3. Moderation effect of smoking on association between metabolite and pancreatic cancer risk.

Additional file 2: Data S1.

Python and R codes used in the current research. Data S2. Characteristics of the divided set from XGBoost.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, Y., Jung, K.J., Kim, U. et al. Non-invasive biomarkers for early diagnosis of pancreatic cancer risk: metabolite genomewide association study based on the KCPS-II cohort. J Transl Med 21, 878 (2023). https://doi.org/10.1186/s12967-023-04670-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12967-023-04670-x

Keywords