Identification of novel pathogenic MSH2 mutation and new DNA repair genes variants: investigation of a Tunisian Lynch syndrome family with discordant twins

Background Lynch syndrome (LS) is a highly penetrant inherited cancer predisposition syndrome, characterized by autosomal dominant inheritance and germline mutations in DNA mismatch repair genes. Despite several genetic variations that have been identified in various populations, the penetrance is highly variable and the reasons for this have not been fully elucidated. This study investigates whether, besides pathogenic mutations, environment and low penetrance genetic risk factors may result in phenotype modification in a Tunisian LS family. Patients and methods A Tunisian family with strong colorectal cancer (CRC) history that fulfill the Amsterdam I criteria for the diagnosis of Lynch syndrome was proposed for oncogenetic counseling. The index case was a man, diagnosed at the age of 33 years with CRC. He has a monozygotic twin diagnosed at the age of 35 years with crohn disease. Forty-seven years-old was the onset age of his paternal uncle withCRC. An immunohistochemical (IHC) labeling for the four proteins (MLH1, MSH2, MSH6 and PMS2) of the MisMatchRepair (MMR) system was performed for the index case. A targeted sequencing of MSH2, MLH1 and a panel of 85 DNA repair genes was performed for the index case and for his unaffected father. Results The IHC results showed a loss of MSH2 but not MLH1, MSH6 and PMS2 proteins expression. Genomic DNA screening, by targeted DNA repair genes sequencing, revealed an MSH2 pathogenic mutation (c.1552C>T; p.Q518X), confirmed by Sanger sequencing. This mutation was suspected to be a causal mutation associated to the loss of MSH2 expression and it was found in first and second degree relatives. The index case has smoking and alcohol consumption habits. Moreover, he harbors extensive genetic variations in other DNA-repair genes not shared with his unaffected father. Conclusion In our investigated Tunisian family, we confirmed the LS by IHC, molecular and in silico investigations. We identified a novel pathogenic mutation described for the first time in Tunisia. These results come enriching the previously reported pathogenic mutations in LS families. Our study brings new arguments to the interpretation of MMR expression pattern and highlights new risk modifiers genes eventually implicated in CRC. Twins discordance reported in this work underscore that disease penetrance could be influenced by both genetic background and environmental factors. Electronic supplementary material The online version of this article (10.1186/s12967-019-1961-9) contains supplementary material, which is available to authorized users.


Background
The surge in CRC incidence in young adults is particularly alarming. Among early onset-CRC, approximately 30% of patients are affected by tumors harboring mutations causing hereditary cancer predisposing syndromes, and 20% have familial CRC [1].
Lynch syndrome (LS) is considered as the most common hereditary CRC form [2]. It is an autosomal dominant syndrome subdivided into LS I, or site-specific colonic cancer, and LS II, or extracolonic cancer, with gastric, endometrial, biliary, pancreatic, and urinary tract carcinomas [3].
This syndrome is responsible of 2 to 6% of all CRC. It is known to increase the risk of other cancers in family members. The lifetime estimated risk for cancer ranges from 50 to 80% for CRC and from 40 to 60% for endometrial cancer [4]. Currently, in the context of lack of LS specific clinical symptoms, there is an important need to identify consistent molecular markers for early diagnosis and prognosis of this syndrome. In addition, it is for crucial importance to identify the mutational profile associated to LS in Tunisian population allowing us to implement an oncogenetic counseling based on genetic tests specific to this population. This will help in early detection of individuals and families at high risk of developing LS and will consequently reduce mortality and morbidity due to the disease. Indeed LS guidelines outline specific surveillance and monitoring protocols based on MMR genes mutation testing and MMR proteins expression profile [5,6]. The MMR system is composed of four proteins working in pairs (dimers-MLH1/PMS2 and MSH2/ MSH6). These dimmers migrate to the nucleus to bind to the DNA. The formation of the complex is crucial for the stability, migration and the function of the complex [7].
LS is characterized by point mutations and/or large rearrangements in DNA MMR genes [8] resulting in a loss of MMR complex function and microsatellite instability (MSI) [9,10]. The high penetrance mutations confer a predisposition to CRC in hereditary syndromes, responsible for about 50-80% of risk to develop CRC [11,12]. However, there is a large variability in LS penetrance that is essentially dependent on low penetrance mutations and environmental factors. In fact, recently, germline mutations in DNA-repair genes (DRGs) have been reported in sporadic CRC, but their contribution to CRC risk and susceptibility is still unclear. Germline mutations in DRGs previously known to be linked to other inherited diseases could be involved in familial CRC predisposition [13]. Moreover, both germline and somatic variants in the exonuclease domains of DNA polymerase ɛ (POLE) and polymerase d (POLD1) have been reported to affect proofreading function and lead to an ultramutated phenotype [14]. Germline POLE variants can result in a LS phenotype and microsatellite instable CRCs. The exact effect of germline POLE/POLD1 variants remains however, unclear [14][15][16][17].
The identification of an inherited mutation plays a crucial role in identifying at risk individuals and families for LS that are proposed for oncogenetic counseling [18]. However, Neither MMR mutated gene nor mutation type are associated to the onset age or the cancer type [3]. Thus, it is for crucial importance to search for other DRGs that could be implicated in the increase of CRC risk in patients with strong familial history. Moreover, underlying genotype-phenotype correlation in LS provides significant insights for oncogenetic counseling of familial CRC.
So far, very few clinical studies and genetic reports conducted on patients with LS in Tunisia have been published [19,20]. Moussa et al. [20] have identified pathogenic mutations in MMR genes in only 11/31 LS suspected Tunisian families. Given limitations to this previous Tunisian study, the CCR susceptibility genes list could be expended with new DNA repair genes. In this study our main goal is to identify germline mutations associated to LS in a CRC Tunisian family with monozygotic twins and to assess factors associated with increasing cancer risk.

Patients
This study was conducted according to the declaration of Helsinki and to the approval of the Institutional reviewed board (IRB) of Institut Pasteur de Tunis. Five individuals, belonging to the same large Tunisian family, were investigated after written informed consent (Fig. 1). This family fulfill the Amsterdam I criteria for the diagnosis of Lynch syndrome. The index case was a man (CRCNab3), referred for a molecular diagnosis of LS to the Gastroenterology Department of Mohamed Tahar Maamouri Medical Hospital in Nabeul, Tunisia. He was diagnosed at 33-years-old with a well differentiated adenocarcinoma at the transverse colon (pT3 N1a of 6 cm × 6 cm × 2 cm) and treated with hemicolectomy. The index case has a monozygotic (MZ) twin diagnosed with crohn disease at the age of 35 (CRCNab4) and a brother who recently suffered from gastro-intestinal disconfort but considered as healthy (CRC-Nab5). Their father developed lymphoid hyperplasia at right colon without clinical significance (CRCNab2). Their paternal uncle (CRCNab1) was diagnosed with a sigmoidien well differentiated lieberkuhnien adenocarcinoma T3N0MX at 47-years-old treated with sigmoidectomy. The index case and his two brothers have smoking and/or alcohol consumption habits. The other investigated relatives have neither smoking nor alcohol consumption habits.

Immunohistochemical study
To assess the expression of MMR proteins, we performed immunohistochemical labeling on Formalin

Targeted DNA repair genes panel conception (DRGs)
Library preparation for NGS was accomplished using the novel development of the HaloPlex assay that incorporates molecular barcodes for high-sensitivity sequencing as a custom design (HaloPlex HS ). Using SureDesign (Agilent Technologies Inc.), probes were generated to cover the exons and 15 bp of the surrounding intronic sequences of a total of 87 candidate genes known to be involved in DNA repair disorders (the list of all analyzed genes is provided as Additional file 1). sg/) and UMD predictor (http://umd-predi ctor.eu/). Variants not previously reported in healthy controls and classified as pathogenic were evaluated for sequencing depth and visually inspected using the Integrative Genomic Viewer (IGV) before validation by Sanger Sequencing.

Sanger sequencing
PCR reactions were performed on genomic DNAs (gDNAs), following standard protocols, pursued by Sanger sequencing using an automated sequencer (ABI 3500; Applied Biosystems, Foster City, CA) using a cycle sequencing reaction kit (Big Dye Terminator kit, Applied Biosystems). Data were analyzed by BioEdit Sequence Alignment Editor Version 7.2.5. As the POLE/POLD1 genes were not included in the HaloPlex gene panel list, the Sanger sequencing was used to screen for the following hotspot pathogenic mutations: p.L424V located in exon 13 of POLE gene, and p.S478N located in exon 11 of POLD1 gene [15].

Immunohistochemical pattern
IHC result for (CRCNab3) showed an MSH2 nuclear expression loss in tumor and in stromal cells, a cytoplasmic staining for MSH6 and PMS2 and an incomplete nuclear staining for MLH1. The sporadic CRC sample with proficient MMR (pMMR), showed a positive nuclear staining in tumor cells as well as in adjacent normal cells ( Fig. 2) with all the proteins.

Pathogenic mutation detected by targeted DNA repair genes panel
Variants not previously reported in healthy controls and classified as pathogenic in ClinVar were evaluated for sequencing depth and visually inspected using the Integrative Genomic Viewer (IGV). After filtering strategies, described above, an MSH2 mutation (c.1552C>T; p.Q518X) was detected in both, the index case (CRC-Nab3) and his father (CRCNab2) and it was selected as candidate for Sanger validation, among the other identified variations.

Analysis of variants carried by the proband (CRCNab3) and absent in his father (CRCNab2)
Eighty-seven variants on 60 genes have been detected in the proband (CRCNab3) not shared with his father (CRCNab2). After filtering steps (Table 1), fifteen non shared variants have been identified; 13 exonic variants, 1 splicing SNPs and 1 frameshift variant. We detected the following variations ( Table 2)

Sanger sequencing
The mutation in the exon 10 of the MSH2 gene (exon 10; c.1552C>T; p.Q518X) was confirmed by Sanger sequencing (Fig. 4). We first confirmed it on the index case (CRCNab3) and then in all investigated first and second degree relatives (CRCNab2, CRCNab1, CRCNab4 and CRCNab5). The consequence of this mutation was a stop gain variant (p.Q518X). The global and the local Minor Allele frequencies (MAF) of this variant are illustrated in Table 3. In addition, neither p.L424V in POLE nor p.S478N in POLD1, were found by Sanger sequencing in investigated members.
In silico prediction of the (p.Q518X) detected mutation on the MSH2/MSH6 dimerization We have performed in silico prediction of the potential effect of this mutation on MSH2 protein structure and function. The Fig. 5 highlights the pathogenic effect of the identified MSH2 mutation on MSH2/MSH6 heterodimerization. MutSalpha consists of the association of the MSH6 and MSH2 which dimeric form is capable of recognizing the damaged DNA (Fig. 5a). We mapped the function segments downstream the stop codon insertion in dark blue (Fig. 5b). This results in loss of interaction between different regions within the heterodimer (protein-protein interaction loss, DNA-protein interaction loss and nuclear translocation activity loss). The structure result was performed in the bases of the MSH2/MSH6 complex structure of Warren et al. [21] (PDB code: 2O8B). This finding is in perfect concordance with the IHC pattern showing an MSH6 cytoplasmic accumulation and a loss of MSH2 expression.

Discussion
LS penetrance is highly variable and the reasons for this have not been fully elucidated. Peters et al. [22] affirmed that it remains critical that we stay on the path to uncover the complete genetic architecture of CRC to more fully understand the etiology of the disease. In Tunisian population, MMR germline mutations are responsible for at least 35.5% of CRC developed in patients with personal or familial history suggestive of Lynch syndrome [20]. In the only molecular Tunisian study by Moussa et al. [20], the entire coding regions, splice junctions and promoter regions of MLH1 and MSH2 were screened for the presence of point mutations. The following mutations in MSH2 and MLH1 were described in their investigated Tunisian LS families; MSH2 (p.Gln402X, p.Pro472ThrfsX4, p.Arg243Gln, p.Ser281X and p.Gly713ArgfsX4) and     [20]. MSH6 was analyzed but no mutation was founded. In this previous Tunisian study 64.5% (20/31) of families with suspicion of LS remain with undiscovered mutations in MMR genes. Thus, they suggest that other genes could predispose to non polyposis CRC. In this context, we described herein an interesting Tunisian family with strong history of colon cancer affecting three generations with a tumor spectrum specific to LS I form (Fig. 1). Genetic investigation using targeted sequencing DNA repair genes panel revealed, among detected variations, a single nucleotide substitution (c.1552C>T) in MSH2 in the proband (CRCNab3) and in his father (CRC Nab2). It was confirmed by Sanger sequencing (Fig. 2) and identified in the 47-years-old paternal uncle, diagnosed with a sigmoidien CRC and in three of the proband first degree relatives previously cited. This mutation is identified for the first time in a Tunisian LS family and was reported once by Fidalgo et al. [23] in an index case of LS Portuguese family. Fidalgo et al. [23] have confirmed the pathogenicity of this mutation by various approaches such as protein truncation test (PTT), single strand conformation polymorphism (SSCP), heteroduplex analysis (HA) and denaturing gradient gel electrophoresis (DGGE) as well [23]. The effect of this mutation is a premature stop codon (p.Q518X) which is already reported in InSiGHT variant databases (https ://www.insig ht-group .org/varia nts/datab ases/) as a pathogenetic variant. The distribution of such rare mutation could be explained either by the only achieved molecular LS Tunisian study or by the scarcity of this mutation all over the world. The In silico prediction of the effect of this mutation on (MSH2·MSH6 heterodimer), crucial for MMR complex function [21,24], revealed that its pathogenicity affects allosteric interactions between different regions within the heterodimer; loss of MSH2 ATPase Domain (loss of nuclear translocation capacity), loss of interaction with EXO1 and Loss of DNA-protein interaction. This will be translated in IHC expression profile by the loss of MSH2 nuclear expression and a cytoplasmic MSH6 accumulation. In almost all published articles using IHC analysis, the MSI phenotype is assigned following the loss of expression of MLH1 or MSH2 [20,25,26]. CRCNab3 phenotype was consistent with deficient MMR system (dMMR) linked to LS. The In situ functional effect of this mutation (c.1552C>T, p.Q518X) was confirmed by the obtained immunohistochemical pattern. Our IHC results are in concordance with molecular ones, supporting the evidence that MMR protein loss is explained notably by the pathogenic mutation in corresponding MMR gene. Moreover, IHC interpretation guidelines for cytoplasmic MMR staining bears no exact significance [27][28][29]. There are no data as yet to indicate that its presence is reflective of protein deficiency. Our results bring evidence that cytoplasmic staining could be taken into account to the evaluation of function loss within MSH2/MSH6 heterodimer. To the best of our knowledge, this is the first Tunisian study describing the effect of (c.1552C>T; p.Q518X) MSH2 mutation on MSH2/MSH6 complex heterodimerization, confirmed by IHC. Furthermore, it has been reported that in MSI cases, the presence of the BRAF V600E hotspot mutation excludes the diagnosis of LS, and the clinical utility of the combination of MMR and BRAF status is well established [30]. In this context, no somatic BRAF V600E mutation was detected in our investigated index case, confirming the LS. In this work, the investigated family members share the same MSH2 pathogenic mutation (c.1552C>T, p.Q518X) with different phenotypes, suggesting hence, an important role of microenvironment and/or other DRGs mutations. The two CRC cases (CRCNab1 and CRCNab3) had both the MSH2 pathogenic mutation. CRCNab3 has alcohol and smoking habits that CRCNab1 has not. These two CRC patients showed some differences in colon tumor localization and age of disease onset. The 65 yearsold unaffected mutation carrier father (CRC Nab2) had a healthy life style contrary to his 34 years-old unaffected mutation carrier son (CRC Nab5) who is a smoker and alcohol consumer.
Interestingly, MZ twins provide a model to investigate environmental effects on disease development and progression [31,32]. The proband MZ twin (CRC Nab4) carried the MSH2 pathogenic mutation (c.1552C>T, p.Q518X) without alcohol habits. He has developed crohn disease (CD) at 35 years-old. It was already known that chronic inflammation creates a microenvironment suitable for the disease progression [33]. dosSantos [34], has identified that a pro-inflammatory state is the cornerstone in the association between CD and CRC, justifying the fact that CD might be a risk factor for CRC. She added that a family history of CRC is an important factor that doubles the risk of CRC in patients with CD. In our study, discordant twins' habits allow us to suggest that they could directly or indirectly affect DNA changes independently of their mutational status. Alcohol consumption and cigarette smoking are considered as major risk factors for gastrointestinal cancer, including colorectal cancer [35]. The World Cancer Research Fund and the American Institute of Cancer Research suggests that excessive alcohol consumption enhance the risk of colorectal cancer. As a result of cumulative evidence from epidemiological studies, colorectal cancer has been listed as a pathology linked to alcohol intake and cigarette smoking [36]. This MZ twins discordance, pointed the important roles of environmental and modifiable factors in relation to gene-environment interactions in the prevention of CRC [37]. Studies of gene-environment interactions in families are crucial in providing potential insights for developing prevention strategies against CRC [38,39]. Public health policies to prevent this cancer should include modification of alcohol intake habits, especially among individuals at increased risk [35,40].
Carcinogenesis model in sporadic and hereditary CRC are based on the accumulation of mutations which is the critical determinant of tumorigenesis [41]. Currently, through genome-wide association studies, it has become possible to evaluate the role of common low-penetrance genetic modifiers and how they can affect disease expression that occurs both within families or individuals with similar MMR gene status [22]. Donald et al. [42], have conducted a meta-analysis to evaluate the role and effects of common low-penetrance genetic polymorphisms for a better understanding of their association with CRC risk in individuals belonging to LS families. They failed to uncover consistent evidence that LS phenotype is influenced by the effects of low penetrance modifiers. Weigl et al. [43] have identified that both family history and the identified genetic variants carry essential risk information and their combination provide great potential for CRC risk stratification. In this context, besides the pathogenic MSH2 mutation (c.1552C>T; p.Q518X) shared by the index case and his father, the Table 2 summarize other pinpointed pathogenic variants present on the index case not shared with his father.
It is widely recognized that environmental carcinogens induce DNA damage, which could in turn induce genomic instability [44]. The bulky DNA adducts generated by tobacco carcinogens are mainly repaired by nucleotide excision repair (NER). NER is the most common pathway for repairing bulky DNA lesions and maintaining genomic stability. Different key proteins are involved in this process including; ERCC2 (XPD) which accomplish 3′-5′ unwinding of the DNA strands of the damaged site, while the damaged DNA is excised at 5′ site by XPF (ERCC4)-ERCC1 heterodimer and at 3′ site by ERCC5 (XPG), which is an MSH2 and RECQL4 neighbor (Fig. 5). Aberrant expression of key NER factors alters NER capacity, thus threatening genomic stability and integrity [45]. In our study, we noted the following variants: ERCC2 (exon5:c.360+3G>T), ERCC4 (exon 11: c.2422G>T: p.A808S) and ERCC5 (exon 1: c.8T>G: p.V3G). Therefore, our identified alterations in the index case not shared with his father in NER pathway members could alter the efficacy of DNA repair and might enhance colorectal cancer risk. Only few studies have examined the contribution of SNPs in NER pathway genes to CRC risk [46,47]. Our study is the first Tunisian one which highlights that variants in some members of the xeroderma pigmentosum (XP) genes family could play an important role in colorectal cancer increased risk.
Other cellular DNA repair pathways, such as base excision repair (BER), double-strand break repair (DSBR), and homologous recombination repair (HRR) also play important roles in the carcinogenesis process by repairing single strand and double strand DNA breaks induced by smoking, ionizing radiation, and other DNA damaging agents [47]. BRCA2 is a member of the HRR pathway, which restores the integrity of double-strand DNA breaks [48]. Inherited mutations in HRR genes have long been known to increase the risk of several cancers, including breast, ovarian, prostate and pancreatic cancers [49]. Risch et al. [50] reported that there is an increased risk for colon cancer in BRCA2 families. We identified a BRCA2 exon 17 variation (c.7810C>A: p.L2604M) in the index case.
BRCA2 is co-expressed with RecQ protein-like 4 (RECQL4) which is a key member of the RecQ family and plays an important role in the initiation of DNA replication, progression of stalled replication forks, and telomere maintenance, as well as in the repair of DNA DSB via the HRR pathway [51]. Mutations of the RECQL4 gene are associated with the rare type II Rothmund-Thomson syndrome, which has a propensity for osteosarcomas [52]. However, recent studies have shown that RECQL4 acts as a tumor-promotor in some cancers, such as prostate cancer, colorectal cancer, and breast cancer [52][53][54][55]. We detected (c.2120G>T: p.C707F) variation in the exon 13 of RECQL4. This is the first description of this variation in patients with CRC. Structure-specific endonuclease subunit (SLX4) encodes a Fanconi anemiarelated protein that is required for repair of specific types of DNA lesions and critical for cellular responses to replication fork failure. Lee et al. [56] suggest that frameshift mutations of SLX4 may play a cancer-related role in limited cases of CRCs. We found an exon 3 SLX4 variation (c.742G>T: p.E248X).
NF1 which plays a role as a tumor suppressor gene [57] is co-expressed with RECQL4 and BRCA2. To date, the association between NF1 and adenocarcinoma of the gastrointestinal tract is thought to be casual. However, Li et al. [58] suggested that germline mutations in NF1 can occur in somatic cells and contribute to cancer development. Indeed, Seminog and Goldacre [59] observed that NF1 patients were at high risk of colon and recto-sigmoid junction cancer when compared with the general population. We detected NF1 variations in exon 25 (c.3259C>T: p.P1087S) and exon 43 (c.6623C>A:p. A2208D).
These actionable genes are not part of the recommended germline testing for individuals with familial CRC. The Fig. 3 showed their respectively protein-protein interactions, supporting the hypothesis that other variants unusually described in CRC might explain in part the phenotypic difference between the father and his son (CRCNab3). Thus, patients with multiple low penetrance SNPs could be experiencing an additive effect to increase CRC risk through gene-gene interactions. Confirmation of these identified variants by Sanger sequencing could be of important output regarding LS genetic profiling.

Conclusion
In overall, a better understanding of the genotypephenotype correlation associated to LS may lead to implement a personalized oncogenetic counseling of individuals with particular mutational genetic profiles in terms of their risk management. Since, promoting a universal LS screening was the project aim of the International Mismatch Repair Consortium (IMRC), our study results are taking part from the conducted research projects in this field. Therefore, further studies are needed, with more particular attention to low penetrance modifier variants in order to better define the genotype-phenotype correlation and risk evaluation of colorectal carcinoma in LS context. Further conclusions regarding CRC-risk events should be based on a larger series of patients and families.

Additional file
Additional file 1. Targeted DNA repair genes panel list (87 genes).