Genome-wide association study identifies genetic susceptibility loci and pathways of radiation-induced acute oral mucositis

Background Radiation-induced oral mucositis (OM) is one of the most common acute complications for head and neck cancer. Severe OM is associated with radiation treatment breaks, which harms successful tumor management. Radiogenomics studies have indicated that genetic variants are associated with adverse effects of radiotherapy. Methods A large-scale genome-wide scan was performed in 1467 nasopharyngeal carcinoma patients, including 753 treated with 2D-CRT from Genetic Architecture of the Radiotherapy Toxicity and Prognosis (GARTP) cohort and 714 treated with IMRT (192 from the GARTP and 522 newly recruited). Subgroup analysis by radiotherapy technique was further performed in the top associations. We also performed physical and regulatory mapping of the risk loci and gene set enrichment analysis of the candidate target genes. Results We identified 50 associated genomic loci and 64 genes via positional mapping, expression quantitative trait locus (eQTL) mapping, chromatin interaction mapping and gene-based analysis, and 36 of these loci were replicated in subgroup analysis. Interestingly, one of the top loci located in TNKS, a gene relevant to radiation toxicity, was associated with increased OM risk with OR = 3.72 of the lead SNP rs117157809 (95% CI 2.10–6.57; P = 6.33 × 10−6). Gene set analyses showed that the 64 candidate target genes were enriched in the biological processes of regulating telomere capping and maintenance and telomerase activity (Top P = 7.73 × 10−7). Conclusions These results enhance the biological understanding of radiotherapy toxicity. The association signals enriched in telomere function regulation implicate the potential underlying mechanism and warrant further functional investigation and potential individual radiotherapy applications.

treatment outcomes [2]. Due to the high tumor control of nasopharyngeal carcinoma (NPC) by radiotherapy, more attentions has been paid to the adverse effects of radiotherapy, especially radiation-induced OM [3].
In a large proportion of patients, the use of opioid analgesics does not adequately palliate symptoms. Therefore, symptomatic management of mucositis is insufficient to avoid negative clinical outcomes, and there is a clear need for agents that reduce the incidence of mucositis [1]. Radiogenomics studies have suggested that common genetic variants are associated with radiotherapy adverse effects, and a single-nucleotide polymorphism-based predictive assay along with clinical factors could be used to estimate the risk of a patient with cancer developing adverse effects from radiotherapy. Such an assay could be used for personalized therapy and for the prevention of severe adverse effects, which could improve quality of life for patients [4].
For adverse reactions to radiotherapy, tailoring treatment dose by genetic risk is considered to achieve individualized treatment. It proposed a hypothesis that germline genetics contribute to the development of radiation injury. So far, the mechanisms of radiation-induced normal tissue toxicity are complex and are not fully understood. However, it has been reported that there are at least 14 canonical pathways taking part in the development of OM in patients treated with radiochemotherapy [5]. Our aim is to identify new loci and pathways associated with the development of radiation-induced OM through a genome-wide association approach in a population from southern China.

Study objects
The participants were recruited from two sections. 960 subjects were screened between 2005 and 2007 from the GARTP study (Genetic Architecture of the Radiotherapy Toxicity and Prognosis, registered with http://www.chict r.org.cn/, ChiCTR-ROC-17012658), according to the following criteria: pathologically confirmed NPC, previously untreated, no previous radiotherapy and/or chemotherapy, receiving the whole course of radical radiotherapy, and adult (age older than 18). For the patients who were treated by 2D-CRT, the accumulated radiation doses to the primary tumor were 68-76Gy with two Gy per fraction. For the patients treated by IMRT, the prescribed treatment protocol was 68-70Gy for 30-33 fractions to the planning target volume (PTV) of gross tumor volume of the primary (GTV-P) and 64-68Gy for 30-33 fractions to the PTV of nodal gross tumor volume (GTV-N). We recruited additional 553 NPC patients from the Sun Yat-sen University Cancer Center (SYSUCC) of China in 2006-2014. One patient was excluded because he did not complete the whole course of radiotherapy. The characteristics of patients, including age, gender, TNM stage (using 2009 7th UICC/AJCC staging system), radiation technique and treatment scheme were recorded.

Mucositis evaluation
Oral mucositis caused by radiotherapy was observed and recorded. It was evaluated and classified as grade 0-5 based on the acute radiation toxicity grading criterion of the Radiation Therapy Oncology Group or European Organization for Research and Treatment of Cancer (RTOG/EORTC) [6]. According to the grading results, we divided the patients into two groups: severe OM (grade ≥ 3) and mild OM (grade ≤ 2).

Genotyping, quality control and imputation
Genomic DNA was extracted from whole peripheral blood samples using a commercial DNA extraction kit (Qiagen) and was quantified using PicoGreen reagent (Invitrogen). We genotyped GARTP study samples on the Human610-Quad Chip and others on Infinium Global Screening Array-24 BeadChip. Genotyping and quality control for the Human610-Quad chip can be found in our previous publication [7]. For the Infinium Global Screening Array-24, we generated a cluster file using our in-house data including about 2000 samples from our cancer center, and called genotypes according to the manufacture's protocol [8,9]. The variants with low call rates, poor clustering metrics or extreme heterozygosity rate were manually re-clustered or removed. We then performed quality control at sample level and at SNP level according to the following criteria: (1) individuals level: call rate < 95%, gender discrepancies, heterozygosity rate outliers (> 6 sd.), unexpected duplicates or probable relatives based on pairwise identity by descent (PI_HAT > 0.5), and population stratification outliers (> 6 sd.); (2) SNPs level: non-autosomal chromosomes, call rate < 95%, minor allele frequencies (MAF) < 0.001, and deviated from Hardy-Weinberg equilibrium (HWE) (P < 10 −12 ). All filtered samples were imputed by a twostage imputation approach, using SHAPEIT2 [10] for phasing and IMPUTE2 [11] for imputation. The imputation was performed in 5-Mb nonoverlapping intervals. SNPs with a frequency > 1% and that were imputable with INFO > 0.8 were included in the downstream analysis.
We then merged overlapping SNPs and conducted further quality control to the SNPs. We excluded SNPs with call rates < 95%, deviated from Hardy-Weinberg equilibrium (P < 10 −12 ), or MAF < 0.01. We performed quality control filtering using PLINK 1.09 [12]. Finally, a total of 1467 patients (945 patients from the GARTP study and 522 patients recruited in 2006-2014) and 3,968,928 genetic variants were analyzed in GWAS.

Genome-wide association analyses
Univariate logistic regression analyses were performed by comparison of clinical factors with OM. Considering the collinearity among the clinical variables, multivariate regression analysis was further performed with the filtered variables. The significantly associated clinical factors in multivariate regression were adjusted in genomewide association analyses. Genome-wide association analyses were performed under additive genetic effects assumption, using a logistic regression model adjusting treatment scheme, radiation technology and the first five eigenvectors of principal components as covariates. We also created quantile-quantile plot and Manhattan plot using the R package "qqman". A quantile-quantile plot was used to evaluate the overall significance of the GWAS, and the deviation of the observed versus the expected distribution of the P values was represented by the inflation factor (λ GC ). Considering the different incidence rates in the different radiation technology subgroups (2D-CRT and IMRT), we performed further association analysis using logistic models, only adjusting for the treatment scheme in the two subgroups, respectively to examine the top variants.

Genomic risk loci and functional annotation
Functional annotation was performed with FUMA [13], an online platform for the functional mapping of genetic variants. We first defined 'independent significant SNPs' as those surpassing a predefined threshold P value (1 × 10 −4 ) and showing moderate to low linkage disequilibrium (r 2 < 0.6). We further defined 'lead SNPs' as the subset of independent SNPs (r 2 < 0.1). Additionally, we defined genomic risk loci by merging LD blocks of independent significant SNPs that have close physical position (< 250 kb). All known SNPs in the 1000 genome data that have (r 2 > 0.6) with any of the independent significant SNP were included for annotation, and the region containing all of these 'candidate SNPs' was considered to be a single independent genomic locus. All LD information was calculated from 1000G phase3 East Asian population [14].
Functional consequences for the SNPs were obtained by performing ANNOVAR [15] gene-based annotation using Ensembl genes. SNPs were matched according to chromosome, position, reference, and alternative alleles, and were annotated by CADD scores (scores > 12.37 indicate deleterious SNP [16]), RegulomeDB scores [17] (lower scores indicate higher potentiality of regulatory function), and by chromatin states predicted by hidden Markov model based on 5 chromatin marks for 127 epigenomes in the Roadmap Epigenomics Project (lower scores ≤ 7 represent higher accessibility of the genomic regions). CADD scores integrate diverse annotations into a single measure that correlates with pathogenicity, disease severity, experimentally measured regulatory effects and complex trait associations.

Gene mapping
SNPs in genomic risk loci were mapped to genes in FUMA using three strategies.
First, position mapping was based on the physical distances (within a 10-kb window) from known proteincoding genes in the human reference assembly (GRCh37 or hg19). The second strategy, eQTL mapping, used information from three data repositories (GTEx [18], Blood eQTL browser [19], and BIOS QTL browser [20]) and mapped SNPs to genes based on a significant eQTL association (i.e., where the expression of the gene is associated with allelic variation at the SNP). eQTL mapping was based on cis-eQTLs (local regulatory effect within 1 Mb). A false discovery rate (FDR) of 0.05 was applied to define significant eQTL association. The third strategy, chromatin interaction mapping, mapped SNPs to the promoter regions of genes based on significant chromatin interactions. This type of mapping was a 3D DNA-DNA interaction between the SNP region and a gene region, without a distance boundary. FUMA currently contains Hi-C data for 21 tissue/cell types from the study [21]. Because chromatin interactions are often defined in a certain resolution (40 kb), an interaction region may span multiple genes. Hence, this method would map all SNPs within these regions to genes in the corresponding interaction region. To prioritize candidate genes, we integrated predicted enhancers and promoters in certain tissue and cell types from the Roadmap Epigenomics Project [22], including blood, gastrointestinal tissue and skin. Using the information, FUMA selected chromatin interactions for which one region involved in the interaction overlapped with predicted enhancers and the other overlapped with predicted promoters 250 bp upstream and 500 bp downstream of the TSS of a gene. We used an FDR of 1 × 10 −6 to define significant interactions.

Gene set analysis
Genes implicated by mapping of GWAS SNPs were further investigated using the GENE2FUNC procedure in FUMA, which provides hypergeometric tests of enrichment of the list of mapped genes in MSigDB gene sets [23], including BioCarta, KEGG, Reactome, and Gene Oncology (GO). The adjusted P value (FDR) for gene set enrichment analysis was supplied by the Benjamini-Hochberg method. The threshold of adjusted P-value was 0.05. The minimum number of input genes overlapping with a tested gene set to be reported as significant was two.

Population characteristics
The study population was composed of 1467 NPC. Of those, 349 patients (23.79%) developed severe OM (grade ≥ 3) after radiotherapy, and no grade 5 mucositis was observed. The clinical characteristics were analyzed by univariate logistic regression. Tumor stage, clinical stage, radiation technique and treatment scheme were reported to be associated with severe OM ( Table 1). The severe OM incidence rates were significantly different for patients who received different radiation therapies, with rates of 14.9% and 33.2% for patients treated with 2D-CRT and IMRT, respectively. The clinical characteristics of different subgroups (2D-CRT and IMRT) are shown in Additional file 1: Table S1. Compared with radiotherapy alone, patients treated with induction chemotherapy and/or adjuvant chemotherapy showed similar OM risk with an OR of 1.07 (95% CI 0.55-2.09). However, patients treated with concurrent chemoradiotherapy had a higher risk of severe OM with an OR of 6.96 (95% CI 4.50-10.77) compared to patients treated with radiotherapy alone. The results of multivariate logistic regression indicated that different radiation techniques and treatment schemes were significant clinical factors for the incidence of radiation-induced OM (Additional file 2: Table S2) and were considered as covariates in the GWAS.

SNPs associated with severe oral mucositis
A total of 3,968,928 SNPs were included in the genomewide association analysis under an additive assumption using a logistic regression model, adjusting radiation technique and treatment scheme. The distribution of the observed versus the expected P values are shown in the quantile-quantile plot with λ GC = 1.01 (Additional file 3: Figure S1). The three top lead SNPs were rs9484606 in the intergenic region of chromosome 6 (OR = 1.70, 95% CI 1.36-2.13, P = 2.98 × 10 −6 , VTA1/ADGRG6), rs16876733 in the intergenic region of chromosome 7 (OR = 1.95, 95% CI 1.47-2.59, P = 3.05 × 10 −6 , PER4/NDUFA4), and rs117157809 in the intron region of TNKS (OR = 3.72, 95% CI 2.10-6.57, P = 6.33 × 10 −6 ), respectively (Fig. 1). We selected the independent SNPs with P-value < 1×10 −4 in GWAS analysis using all patients, and further examined them in the two treatment subgroups (2D-CRT and IMRT) under the threshold of P = 0.05. The resulting significant SNPs are shown in Table 2.

Gene mapping
Using three gene mapping strategies (position mapping, eQTL mapping and chromatin interaction mapping) in FUMA, we further mapped the significant association variants to genes and identified 50 genomic risk loci and 64 mapped genes associated with radiation-induced oral mucositis (Additional files 4, 5: Table S3, S4). The results of the overlapped SNPs and genes in the subgroup analysis are shown in Table 2.
The two genes IKBKAP and DHTKD1 were mapped by all three strategies. IKBKAP was located at the chromosome 9 locus, and its lead SNP rs10816756 was located in the intron of the gene with an OR of 1.87 for the minor allele (95% CI 1.38-2.53, P = 5.77 × 10 −5 ). Two SNPs rs2230794 and rs76846430, located at the exon of IKB-KAP, were both in complete LD with rs10816756 (r 2 = 1, Fig. 2). rs2230794 is a missense variant (OR = 1.83, 95% CI 1.35-2.49, P = 9.85 × 10 −5 ), and rs76846430 is a splicesite variant (OR = 1.78, 95% CI 1.30-2.42, P = 2.7 × 10 −4 ). We further performed expression quantitative trait locus (eQTL) analysis and found that with the increasing number of risk alleles of rs10816756, there was a higher mRNA level of IKBKAP in the whole peripheral blood (Additional file 6: Table S5). We further analyzed chromatin functional interaction in the risk loci, and 8 genes were identified to interact with the chromatin at that site, such as IKBKAP, KLF4, and RAD23B ( Fig. 3 and Additional file 7: Table S6). The other gene, DHTKD1, was located at the chromosome 10 locus, and its lead SNP rs7068532 was in the intron region with an OR of the minor allele of 1.60 (95% CI 1.28-2.00, P = 3.33 × 10 −5 ). eQTL analysis indicated that the whole peripheral blood mRNA level of DHTKD1 decreased when the risk alleles increased (Additional file 6: Table S5). The risk locus had a chromatin interaction with the gene DHTKD1 (Additional file 8: Figure S2 and Additional files 5, 6: Tables S4,  S5). In addition, we searched the 64 mapped genes in the GeneRIF dataset to examine their relevance to radiation-induced OM. There are two low frequency variants, rs117157809 and rs6814005, which were located in the introns of TNKS and MAPK10, respectively. Patients carrying the minor alleles of rs117157809 tended to have higher risks of developing severe OM with a per allele OR of 3.72 (95% CI 2.10-6.57, P = 6.33 × 10 −6 ). The association was consistent with an OR of 3.34 (95% CI 2.31-4.84, P = 1.10 × 10 −3 ) in patients treated by IMRT and with an OR of 5.06 (95% CI 3.13-8.18, P = 6.99 × 10 −4 ) in patients treated by 2D-CRT. The SNP rs79488099 located in the 3′-UTR of TNKS was in modest LD with the lead SNP rs117157809 (rs79488099: r 2 = 0.63, P = 1.48 × 10 −4 ), and the CADD score of rs79488099 was 16.17 indicating a deleterious mutation. A similar result for rs6814005 is shown in Table 2. Another lead SNP, rs13227327, was located in the intron of SDK1 with an OR of the minor allele of 1.62 (95% CI 1.30-2.03, P = 2.24 × 10 −5 ). In that risk loci, two SNPs, rs601424 and rs671694, were both located in the exons of SDK1 and were in modest LD with rs13227327 (rs601424: r 2 = 0.69, P = 4.54 × 10 −4 ; rs671694: r 2 = 0.69, P = 3.31 × 10 −4 ; Additional file 9: Figure S3).

Gene-set based analysis
The FUMA tool implicated 64 genes providing more extensive information on the likely consequences of relevant genetic variants. Gene-set based analysis was performed using these genes to further evaluate the underlying disease mechanisms responsible for the genetic signals. The 20 significant GO biological processes are listed in Additional file 10: Table S7. Among those gene sets, there were 8 GO gene sets involved in the regulation of telomere or telomerase activity, including   18:224 in the regulation of telomere capping (P = 7.73 × 10 −7 ), the positive regulation of telomerase activity (P = 2.13 × 10 −6 ), and the positive regulation of telomere maintenance (P = 1.77 × 10 −5 ). Four genes (TNKS, NEK2, NBN and KLF4) recurred in these GO sets. In addition, 4 of 20 significant gene sets were involved in DNA metabolism, biosynthesis and replication. One of those pathways was the Wnt signaling pathway, which was reported to regulate radioresistance [24].

Discussion
In the present study, we found that some clinical factors were associated with radiation-induced OM, especially chemotherapy concurrent with radiotherapy, which is promoted for cancer control and is widely used for the cancer treatment [25]. However, it greatly increases the radiation sensitivity of normal tissue and the occurrence of radiation-induced OM. Adjusting these clinical factors, our genome-wide association study identified 50 risk loci and 64 mapped genes by using a total sample size of nearly 1500. Many of the OM-associated genes are involved in telomere biological processes, including telomere capping, maintenance and telomerase activity, while some other genes participate in DNA biological processes, including DNA metabolism, biosynthesis and replication.
To our knowledge, this study is the largest GWAS for oral mucositis in NPC patients treated with radiotherapy.
Case, RTOG grade ≥ 3; Control, RTOG grade ≤ 2 SNP, single nucleotide polymorphism; CHR, chromosome; MA, minor allele; MAF, minor allele frequency; OR, odds ratio for minor allele; 95% CI, 95% confidence interval a The subgroup of patients who received intensity modulated radiation therapy b The subgroup of patients who received two-dimensional conventional radiotherapy Published studies have investigated the associations between SNPs and radiation-induced OM in head and neck cancer and in nasopharyngeal carcinoma. Most of them evaluated the associations by adopting the strategy of candidate genes. Those candidate genes and pathways include DNA damage and repair involved in doublestrand breaks repair genes [26] and the base excision repair pathway [27]. Other important cellular signaling pathways include the Wnt/β-catenin pathway [28], cell cycle regulated genes, the NF-κB pathways [29], angiogenesis-related genes [30], and GAS5 lncRNAs [31]. Those studies were based on small sample sizes of 100-500 and evaluated limited number of SNPs in the candidate genes, and the significant SNPs were found at the level of an uncorrected P value of 0.05. Only one study showed genome-wide level analysis in a total of 24 patients with NPC [32]. For the studies of other radiation-induced side effects, many of the susceptibility genes identified were implicated in DNA damage response and repair pathways, oxidative stress and apoptosis [33]. Our gene sets-based analysis identified the Wnt signaling pathway, which has been reported to be related with radiation-induced OM [28]. More importantly, we identified other potential pathways, including telomere and DNA biological processes. In particular, the gene set of telomere biological process has a significant impact on the radiosensitivity of patients. It has been demonstrated that telomere dysfunction is correlated with delayed DNA break repair kinetics and with sensitivity to ionizing radiation. For example, the telomerase-deficient mouse models demonstrated that short telomeres determined a condition of hypersensitivity to ionizing radiation, and consequently, had a decreased survival rate [34]. In vitro experiments have suggested that irradiation sensitivity of non-transformed human epithelial cells is augmented with telomere dysfunction because short dysfunctional telomeres interfered with efficient DNA repair by joining radiationinduced DNA broken ends, and it also reduced the repair fidelity of DNA broken-ends [35]. Furthermore, the study has formulated that the importance of telomeres in predicting individual radiosensitivity of cancer patients [36].
In this study, the gene sets related with telomere function such as telomere capping, maintenance and telomerase activity were mainly implicated by TNKS, NEK2, KLF4 and NBN. These genes were reported to be linked to the radiation-induced damage. For example, depletion of TNKS is associated with a defective damage response observed by degraded proteasome-mediated DNA-PKcs, including increased sensitivity to ionizing radiation-induced mutagenesis, chromosome aberration (terminal deletion), telomere fusion, and cell killing [37]. The activity of NEK2 was reported to be inhibited by ionizing radiation, and this response was dependent on ATM and on PP1 binding to NEK2 [38]. The absence of NEK2 promoted apoptosis and reduced cell numbers [39]. Other evidence indicated that NEK2 promoted glioma stem cell radioresistance through the regulation of EZH2 [40]. In addition, KLF4 was reported to prevent centrosome amplification and to exhibit antiapoptotic activity following γ-radiation-induced DNA damage [41,42]. It has been further confirmed in animal experiments that KLF4 was a radio-protective factor for the intestine following γ-radiation-induced gut injury in mice [43]. NBN has been reported to have an association with radiation-induced oral mucositis [44], and its mutation was found in the radiosensitivity-related syndrome and Nijmegen breakage syndrome [45]. In vitro studies have demonstrated that the influence of NBN on radiation hypersensitivity was accompanied by enhanced γ-radiation-induced apoptosis in human lymphoblastoid cells [46]. Our finding further elucidated the underlying mechanisms of radiation-induced damage.
Some genes that are significant in both of the treatment subgroups are of further interest, including IKBKAP, SDK1 and MAPK10. IKBKAP is located on chromosome 9 in risk locus 26. rs10816756 is the lead SNP of that locus, having a modest sign of P = 5.77 × 10 −5 . IKB-KAP was identified as a scaffold protein that plays a role in the regulation of activation of the mammalian stress response via the c-Jun N-terminal kinase (JNK)-signaling pathway [47], and JNK signaling pathway has been suggested to be involved in radiation-induced OM pathobiology [48]. SDK1, an adhesion molecule, is activated by cellular stress especially in conditions with the reactive oxygen species [49]. In addition, the gene MAPK10, a member of the MAPK signaling pathway, is capable of Fig. 3 Cross-locus interactions for genomic regions associated with radiation-induced oral mucositis. Circos plots showing genes on chromosome 9 that were implicated through the genomic risk loci (blue areas) by positional mapping, by chromatin interaction mapping (orange font), eQTL mapping (green font), or by both chromatin interaction and eQTL mapping (red font). The outer layer shows a Manhattan plot containing the -log 10 P-value of each SNP in the GWAS analysis of radiation-induced oral mucositis (n = 1467 individuals)