Bidirectional Mendelian randomization analysis of the genetic association between primary lung cancer and colorectal cancer

Background With the development and popularization of low-dose chest CT technology, the diagnosis and survival rates of patients with early lung cancer (LC) have significantly improved. The occurrence of colorectal cancer (CRC) as the second primary cancer (SPC) in primary lung cancer (PLC) survivors has become an essential factor affecting the prognosis of early LC. This study explored the potential association between PLC and CRC genetically, laying a foundation for developing SPC-CRC prevention strategies after primary early LC. Methods Based on a two-sample bidirectional Mendelian randomization (MR) design, this study systematically screened genetic instrumental variables (IVs) based on the genome-wide association studies (GWAS) of PLC and CRC, applied inverse variance weighted (IVW) as the main method to assess the incidence association between the two cancers, and used a variety of other MR methods for supplementary analysis. Finally, the Genetic Risk Scores (GRS) method was used for secondary analysis to verify the results robustness further. Results From LC to CRC forward MR analysis, 20 genetic IVs of overall LC, 15 genetic IVs of squamous cell lung carcinoma (LUSC), and 10 genetic IVs of adenocarcinoma of the lung (LUAD) were screened. In the reverse MR analysis from CRC to LC, 47 genetic IVs for overall CRC, 37 for colon cancer, and 25 for rectal cancer were screened. The IVW method and a variety of MR methods all found that overall LC and CRC were significantly associated at the genetic level. Subgroup analysis also showed that LUSC was associated with CRC. And the results of the GRS method were consistent with those of the main analysis, confirming the robustness of the study. Summary Our MR study found an association between LC and CRC, with an increased risk of SPC-CRC following PLC, especially LUSC. Our study provides an essential basis for the precise prevention of SPC-CRC after PLC, suggesting that we should pay more attention to the population with a history of PLC in clinical work, and pay close attention to the incidence of SPC-CRC, and carry out intervention and treatment as soon as possible. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-023-04612-7.


Introduction
Lung cancer (LC) is a highly prevalent malignancy and is the foremost cause of worldwide cancer-related mortality [1].Non-small-cell lung cancer (NSCLC) is the predominant histological subtype of LC, accounting for 76% of LC.It encompasses a diverse range of cancer types, with the largest subgroups being adenocarcinoma of lung (LUAD) and squamous cell lung carcinoma (LUSC) [2].With the advancement and widespread utilization of lowdose chest CT, the diagnostic rate for primary LC (PLC) has significantly increased, leading to a substantial number of patients being diagnosed with early-stage LC.Statistics from the Japanese Joint Committee of LC Registry Database indicate that in 2010, 18,973 patients received treatment for PLC in Japan.Among them, stage I patients accounted for 78.9% of the total [3,4].The study suggests that early-stage LC will become the predominant population for LC management with the widespread implementation of low-dose chest CT screening in high-risk groups.
Currently, surgery is the recommended treatment for patients diagnosed with stage I-IIIA NSCLC [5].Lobectomy is considered the standard surgical approach and has been associated with a 5 year overall survival rate of 77-92% for clinical stage IA, 68% for IB, 60% for IIA, 53% for IIB, and 36% for IIIA [6].In recent years, the Japanese Society of Clinical Oncology has conducted a series of prospective clinical studies on surgical treatment strategies for early-stage LC, with the most influential study, JCOG0802, exploring stage IA LC patients with a solid component greater than 50% and less than 2 cm in diameter.The findings indicate that segmental resection and lobectomy have comparable efficacy, as evidenced by a 5 year survival rate exceeding 90% (94.3% for segmental resection vs 91.1% for lobectomy).Further analysis of the causes of death revealed that second primary cancer (SPC) is the second leading cause of mortality after LC itself.It is also one of the main factors contributing to better 5 year survival rates for patients undergoing segmental resection than lobectomy, with colorectal cancer (CRC) being the most common type among all SPCs [7].In addition, the National Cancer Institute conducted a multicenter intergroup trial for NSCLC, revealing that approximately 15% of stage I patients develop SPCs.Of particular concern in post-operative earlystage NSCLC patients is CRC, which ranks as the second most lethal SPCs [8].The studies above indicate that as the early diagnosis and treatment system for PLC gradually improves, patients can attain long-term survival following surgery.However, the occurrence of SPCs poses a significant threat to postoperative patient survival.Observational studies suggest that CRC is one of the main types of SPCs after PLC surgery.However, due to the inherent limitations of observational studies, such as confounding factors, whether there is an association between PLC and the development of CRC at the genetic level remains to be seen [9].
Mendelian randomization (MR) is a widely utilized method of etiological inference in genetic epidemiology [10].In recent years, with the further exploration of MR research methods, they have increasingly become an ideal approach for gene-level studies to infer pathogenic associations between two complex diseases.For example, 2021 Li et al. explored the association between rheumatoid arthritis and Parkinson's disease based on a genomewide association study (GWAS) with a large sample, using MR analysis of two samples [11].In the same year, Zhu et al. used MR to investigate the association between polycystic ovary syndrome and breast cancer and found that poly-cystic ovary syndrome was strongly associated with the development of triple-negative breast cancer [12].
In this study, we aim to utilize GWAS data of PLC and CRC to elucidate the correlation between these two cancers at the genetic level through a bidirectional twosample MR analysis.Our study will provide a foundation for developing prevention strategies for CRC after earlystage PLC surgery in clinical practice.

Materials and methods
The overview of the study design of the MR is displayed in Fig. 1.We estimated the cause effects of LC and CRC using inverse variance weighted (IVW), which was used as the primary method of analysis in this study.And we used genetic risk score (GRS) to validate the main results.Also, we applied various sensitivity analysis methods of two-sample MR to validate analysis results, including simple median, weighted median, MR-robust adjusted profile score (MR-RAPS), and MR-pleiotropy residual sum and outlier (MR-PRESSO).

Sources of data
The genetic instrumental variables (IVs) for LC were derived from the largest sample size to date of the PLC GWAS published by James D. McKay, which used 14,803 cases and 12,262 controls of European descent to genotype on the OncoArray and combined the results with the previously published results from aggregated GWAS analysis of LC on 29,266 patients and 56,450 controls [13].Regarding the reverse analysis, we obtained CRC-risk genetic IVs from two recent meta-analyses of GWASs on CRC risk [14].The GWAS summary statistics of LC and CRC were downloaded from its public website "open GWAS" (https:// gwas.mrcieu.ac.uk/).We used only freely accessible summarized data in this study; therefore, this study did not require ethical approval.

Selection of IVs
The MR analysis evaluates the effect of a predictor on an outcome.There are three assumptions for a valid IVs-it must be: (a) associated with the exposure (the "relevance" assumption); (b) independent of the outcome given the exposure (the "exclusion restriction"); and (c) independent of all (both observed and unobserved) confounders (the "exchangeability" assumption) [15,16].If an IV is associated with a confounder of exposure and outcome, then there is a conflict with these assumptions, which may lead to potential biases and erroneous conclusions.Therefore, genetic IVs for overall LC, LUSC, LUAD, overall CRC, colon cancer and rectal cancer were constructed according to the following criteria [17]: (a) r 2 measure of linkage disequilibrium (LD) among IVs < 0.01 at a 500 kb window (Genetic variants in close genomic locations tend to co-inherit, a phenomenon known as LD, when LD exists among genetic variants, the information provided by each genetic variant is not independent, and when these genetic variants are not independent of each other as IVs, the effect estimation will be biased); (b) P value less than the genome-wide significant level identified in the corresponding study (5 × 10 −8 , in the GWAS study, the criteria indicated an association between SNPs and disease); (c) minor allele frequency (MAF) > 0.01 (mutations are present in more than 1% of the population); (d) nonpalindromic single-nucleotide polymorphisms (SNPs, palindromic sequences are those in which SNPs in the forward and reverse strands of DNA have the same order of bases, in opposite directions.When the frequency of the outcome effect gene is low, it is not possible to infer whether the chain is in the forward or reverse chain); (e) removal of IVs associated with confounding factors using the PhenoScanner (in the MR analysis, IVs is likely to be associated with the outcome through confounding factors, and if the association between IVs and confounding factors is not excluded, the research results will be affected) [18].

MR analyses
The principal analyses were conducted using the inverse variance weighted (IVW) approach.The IVW method, the most commonly used and mainstream method for MR analysis, use meta-analysis approach to combine ratio estimates of SNPs in an inverse variance weighted way and obtain an estimate of the effect of risk factors on outcomes [19,20].Ratio estimates are the ratio of the effect of a single SNP on the outcome divided by the effect on the risk factor (with all associations assumed to be log-linear) [21].The IVW method provides reliable estimates when all IVs are valid, meeting the three core MR assumptions as provided above.IVW methods include the fixed-effects IVW and the random-effects IVW.If heterogeneity exists in the MR analysis, we will apply the random-effects IVW, which is not prone to weaker bias SNP-exposure association [22].Additionally, the weighted median, simple median, MR-PRESSO, MR-RAPS and MR-Egger are used to assess whether LC and CRC are associated at the genetic level, and P < 0.05 is considered statistically significant.Weighted median and simple median method, which have the high tolerance for pleiotropic genetic variation that can obtain relatively stable effect values even when nearly half of the IVs are invalid.The key distinction between the two methods lies in their management of estimated medians, with the simple median method assigning equal weight to all values and the weighted median method incorporating weight for each value [22,23].MR-PRESSO method, which assumes that at least 50% of the genetic variants are valid genetic IVs, holding horizontal pleiotropy and the InSIDE assumption.In addition to identifying outlier genetic IVs, MR-PRESSO method can also provide adjusted estimation after removal outlier genetic variants [24].In conclusion, the MR-PRESSO approach has the following three primary purposes [23,25]: (1) "MR-PRESSO global test" to identify the extent of horizontal pleiotropy; (2) "MR-PRESSO outlier test" to exclude aberrant genetic variants (outliers) and estimate the corrected results; (3) "MR-PRESSO distortion test" to assess whether the discrepancy exists between the pre-corrected and corrected outcomes.The MR-RAPS with a Huber loss function can model the random-effects distribution of pleiotropic effects.Taking into account both systematic and idiosyncratic pleiotropy, the MR-RAPS method showed outstanding performance in numerical patterns.It is highly recommended as a practical tool for regular MR analysis, especially when dealing with complex traits that involve exposure and outcome [26].MR-Egger regression method, which provides a weighted linear regression of the outcome coefficients on the exposure coefficients and can detect some violations of the standard instrumental variable assumptions and provide a non-violation-prone effect estimate [27].

Genetic risk scores (GRSs)
To validate the above MR results, we conducted a secondary analysis by applying the GRS method.We conducted the analyses utilizing R (version 3.5.3)with the "gtx" R package (version 0.0.8 for Windows), whose grs.summary module has the GRS function.The grs.summary module merely used single SNP association summarized data obtained from the results of the GWAS analysis, which is similar to a method which regresses an outcome onto an additive GRS [25,28].For uncorrelated SNPs, the causal estimate α value can be estimated by , and the standard error se α can be estimated by se α ≈ Here, ω denotes the estimated effects on the intermediate trait or biomarker, and β values are estimated effects on the response variable or outcome with standard errors se β [28].

Horizontal pleiotropy and heterogeneity test
MR-Egger regression and the Cochran's Q test were applied to estimate pleiotropy and heterogeneity, respectively.We eliminated the possibility that the MR-Egger intercept had a P value of less than 0.05 with the exclusion of possible horizontal pleiotropy.If the P value of Cochran's Q test was less than 0.05, the final results of MR referred to a multiplicative random-effects model of IVW.Leave-one-out sensitivity analysis was also performed to further assess each IV's independent potency.We considered a P value of less than 0.05 to indicate a statistically significant genetic association between exposures and outcomes.The strength of the association between SNP and the exposures are evaluated using the F statistic [29].No weak IVs is present if the F statistic is > 10 (Additional file 3: Table S2).

MR analysis results of CRC to LC Screen and validation of IVs
In CRC to LC MR analysis, 56 overall CRC, 45 colon cancer and 29 rectal cancer IVs in the GWAS study reached significant differences (5 × 10 -8 ).A single palindromic sequence has been identified within the SNPs datasets (overall CRC, colon cancer and rectal cancer: rs11874392).Based on the LD status between genetic variant loci, 50, 39, and 25 independent IVs associated with overall CRC, colon cancer and rectal cancer were selected without LD correlation (5 overall CRC, 5 colon cancer and 3 rectal cancer IVs are not LD independent.r 2 < 0.01, window = 500 kb).Removal of IVs associated with confounders using the PhenoScanner database (smoking: rs597808; alcohol consumption: rs174533; BMI: rs1446585, rs597808, rs174533, rs1446585).Ultimately, we identified 47 genetic IVs for overall CRC, 37  S2).
In our MR study of overall CRC to LUSC, we did not observe a significant genetic correlation between the two diseases (IVW: OR = 1.1206; 95% CI 0.

MR results of colon cancer to LC
In  4E, 5E).

Horizontal pleiotropy and heterogeneity test
In LC overall and LUSC to rectal cancer MR analysis, Cochrane's Q tests showed that there was some heterogeneity between the LC overall and LUSC IVs (LC overall: Q = 40.737,P = 0.003; LUSC: Q = 32.833,P = 0.003; Additional file 2: Table S1).The leave-one-out plot indicated that no single SNP drove the genetic association in LC overall and LUSC to rectal cancer MR (Additional file 1: Fig. S1).No heterogeneity was found in any other MR analysis group.
The MR-Egger regression analysis showed that the horizontal pleiotropy of the IVs was present in LUAD to CRC overall and colon cancer MR analysis (CRC overall: P = 0.019; colon cancer: P = 0.048; Additional file 2: Table S1).No IVs with horizontal pleiotropy were found by MR-PRESSO method in LUAD to CRC overall and colon cancer MR analysis.No horizontal pleiotropy was found in any other MR analysis group.1).The result of GRS CRC to LC was consistent with the above MR results of CRC to LC.

Discussion
SPC refers to the occurrence of a new primary cancer in an individual previously diagnosed with and treated for another cancer.In recent years, advancements in cancer prevention, diagnosis, and treatment have significantly increased early-stage cancer patients receiving prompt and effective care.As a result, there has been a notable improvement in long-term survival rates, with 14.5 million individuals surviving early-stage cancers alone in the United States in 2014 [24].Previous research has demonstrated that the incidence of SPC is significantly higher in cancer patients than in the general population and tends to increase with longer survival times.After 20 years or more of follow-up, over 19% of patients are likely to develop SPC [33].Regarding PLC, early-stage patients have a 1.7-fold higher risk of developing SPC than the general population, and approximately 13.4-22% of patients will develop SPC [34,35].As the incidence of SPC following early LC surgery is progressively increasing, researchers have shown significant interest in studying the morbidity, treatment, and prognosis of SPCs.Given that CRC has the highest morbidity and mortality rate among SPCs, investigating the association between PLC and CRC can aid in identifying high-risk patients for early screening after LC surgery and providing timely and effective treatment, ultimately improving patient survival rates.The etiology of SPC remains uncertain, and observational studies indicate a potential association between genetic predisposition, environmental influences, and lifestyle choices in the development of SPC.Previous observational studies have suggested a possible association between PLC and CRC [36].However, due to the presence of various confounding factors and the challenges associated with conducting large-scale case-control and cohort studies, the clinical question of whether there is indeed an association between PLC and CRC and its extent remains to be explored.A study by Zhou et al. [37], based on the SEER database, reported that patients with LC had a 19% higher risk of developing CRC than the general population, and patients with LUSC had a 38% higher risk of CRC than the general population.However, there was no difference in the risk of CRC between patients with LUAD and the general population.However, Su et al. 's retrospective study found no increased risk of CRC among survivors of PLC [38].Meanwhile, in 2009, Noura et al. surveyed 301 patients with CRC to assess post-operative SPC (extra-CRC) occurrence.The results showed that the incidence of postoperative extra-CRC in CRC patients was significantly higher than that in normal population, especially LC.During the 10 year follow-up period, a total of 40 cases of secondary primary extra-CRC (including LC, stomach cancer, liver cancer, etc.) occurred, of which 8 cases (20%) were LC, ranking first [39].The present study is an innovative approach to exploring the association between PLC and CRC using a two-sample MR study.
In our study we have identified a significant association between CRC and the occurrence of overall LC and LUSC for the first time through stratified analysis of PLC by two-sample MR approach.We found an increased risk of SPC-CRC following PLC, especially LUSC.To investigate the underlying reasons, a PLC GWAS conducted by James et al. in 2017 demonstrated significant genomic differences between LUAD and LUSC, despite both belonging to NSCLC, suggesting potential distinct mechanisms for the development of LUAD and LUSC [13].Furthermore, multiple previous studies have indicated the presence of shared signaling pathways, such as the PI3K pathway [40,41], FGFR1 pathway [42,43] between LUSC and CRC, implying potential common genetic origins and developmental processes between these two cancer types.
The 2021 United States Preventive Services Task Force (USPSTF) [44] recommends that all adults aged 50 to 75 undergo CRC screening.For individuals with a family history of CRC, the population with obesity, long-term smoking, and heavy alcohol consumption, regular screening is recommended due to the higher risk of developing CRC.Additionally, even in the absence of these risk factors, the USPSTF recommends starting CRC screening at age 45, with options including annual high-sensitivity guaiac-based fecal occult blood test (gFOBT) or fecal immunochemical test (FIT), every 1 to 3 years stool DNA-FIT testing, every 5 years computed tomography colonography, every 5 years flexible sigmoidoscopy, every 10 years colonoscopy, and annual FIT.Our research conclusions validate the results of previous observational studies [37].Therefore, for individuals with a history of PLC, regular screening should be conducted, including fecal occult blood test, digital rectal examination, and colonoscopy.Close attention should be paid to the occurrence of SPC-CRC in order to initiate early intervention and treatment.
There are several advantages in our MR study.Firstly, to the best of our knowledge, this is the first study to evaluate the genetic association between LC and CRC based on a two sample MR analysis with large scale GWAS data.Compared to previous observational studies, MR analysis could effectively reduce potential bias including confounders and reverse causation, thus enhancing the causal inference.Secondly, GWAS datasets of LC and CRC applied were predominately based on populations of European ancestry, which was capable to minimize the impact of population stratification.Furthermore, we systematically screened confounding factors associated with PLC and CRC using the PhenoScanner database and eliminated IVs associated with confounding factors to avoid the potential horizontal pleiotropy of genetic IVs.Meanwhile, MR-Egger and MR-PRESSO (Outlier-corrected) outlier SNP evaluation methods were used to examine the influence of pleiotropy further and ensure the reliability of the results [45,46].In addition, Cochran's Q and leave-one-out method was employed to examine heterogeneity in IVs.If Cochran's Q test detected no significant heterogeneity, an IVW linear regression was utilized for unbiased association estimation; if significant heterogeneity existed, a random-effects IVW model was applied to ensure the accuracy of results [22,47].Finally, besides employing the IVW method as the primary analysis approach, we also utilized the GRS method as a secondary analysis in this study.Moreover, various MR complementary methods were employed to ensure result accuracy, including the weight median, simple median, MR-RAPS, and MR-PRESSO methods.However, we would like to acknowledge some limitations.Firstly, the study included a single population, and the representativeness of the results remains to be further verified in the whole population.Secondly, although a series of strict steps were used to identify outlier variants for avoiding horizontal pleiotropy, we still unable to totally eliminate the impact of horizontal pleiotropy, which may be due to the complex and unclear biological function of many genetic variants.Thirdly, as we explore the relationship between LC and rectal cancer, we achieved a statistical efficacy of more than 80%, whereas in our study of LC and colon cancer, it was less than 80%.And larger sample sizes and more advanced methods are needed to corroborate the results and fully illustrate the statistical power.Finally, GWAS could provide new insights into genes involved in PLC-CRC, but the precise mechanisms studies are needed for better understanding the pathophysiology.
In summary, this study has established a genetic association between PLC and CRC, which provides an essential basis for the precise prevention of SPC-CRC after PLC, suggesting that we should pay more attention to the incidence of SPC-CRC and carry out intervention and treatment as soon as possible.

Fig. 1
Fig. 1 Study design and overview of our Mendelian randomization (MR) study.LC lung cancer, CRC colorectal cancer, MAF minor allele frequency, IVW inverse-variance weighted, MR-PRESSO Mendelian Randomization Pleiotropy RESidual Sum and Outlier, MR-RAPS Mendelian Randomization robust adjusted profile score, GRS Genetic risk scores, LUAD adenocarcinoma of lung, LUSC squamous cell lung carcinoma

Fig. 2
Fig. 2 Forest plot of Two-Sample Mendelian Randomization study based on the MR method form LC to CRC.A, B, C Mendelian randomization estimates of genetically predicted overall LC on CRC (A), CC (B) and RC (C) risk.D, E, F Mendelian randomization estimates of genetically predicted LUSC on CRC (D), CC (E) and RC (F) risk.G, H, I Mendelian randomization estimates of genetically predicted LUAD on CRC (G), CC (H) and RC (I) risk.LC lung cancer, CRC colorectal cancer, IVW inverse variance weighted, MR-PRESSO Mendelian Randomization Pleiotropy RESidual Sum and Outlier, MR-RAPS Mendelian Randomization robust adjusted profile score, LUAD adenocarcinoma of lung, LUSC squamous cell lung carcinoma, CC colon cancer, RC rectal cancer

Fig. 3
Fig. 3 The scatterplots represent genetic IVs association between LC and CRC (Forward MR analysis).A, B, C Plots of the effect size of each single nucleotide polymorphism (SNP) of overall LC on CRC (A), CC (B) and RC (C) risk.D, E, F Plots of the effect size of each single nucleotide polymorphism (SNP) of LUSC on CRC (D), CC (E) and RC (F) risk.G, H, I Plots of the effect size of each single nucleotide polymorphism (SNP) of LUAD on CRC (G), CC (H) and RC (I) risk.LC lung cancer, CRC colorectal cancer, IVW inverse variance weighted, LUAD adenocarcinoma of lung, LUSC squamous cell lung carcinoma, CC colon cancer, RC rectal cancer

Fig. 5
Fig. 5 The scatterplots represent genetic IVs association between CRC and LC (Reverse MR analysis).A, B, C Plots of the effect size of each single nucleotide polymorphism (SNP) of CRC on overall LC (A), LUSC (B) and LUAD (C) risk.D, E, F Plots of the effect size of each single nucleotide polymorphism (SNP) of CC on overall LC (D), LUSC (E) and LUAD (F) risk.G, H, I Plots of the effect size of each single nucleotide polymorphism (SNP) of RC on overall LC (G), LUSC (H) and LUAD (I) risk.LC lung cancer, CRC colorectal cancer, IVW inverse variance weighted, LUAD adenocarcinoma of lung, LUSC squamous cell lung carcinoma, CC colon cancer, RC rectal cancer colon cancer and overall LC MR study, we did not obtain any statistically significant association between colon cancer and overall LC at genetic level (IVW:

Table 1
The effects of the GRS LC on CRC and the GRS CRC on LC