Identification of potential drug targets for rheumatoid arthritis from genetic insights: a Mendelian randomization study

Cao, Yu; Yang, Ying; Hu, Qingfeng; Wei, Guojun

doi:10.1186/s12967-023-04474-z

Research
Open access
Published: 11 September 2023

Identification of potential drug targets for rheumatoid arthritis from genetic insights: a Mendelian randomization study

Yu Cao^1,2^na1,
Ying Yang²^na1,
Qingfeng Hu¹ &
…
Guojun Wei¹

Journal of Translational Medicine volume 21, Article number: 616 (2023) Cite this article

8670 Accesses
9 Citations
2 Altmetric
Metrics details

Abstract

Introduction

Rheumatoid arthritis (RA) is a chronic inflammatory illness that mostly affects the joints of the hands and feet and can reduce life expectancy by an average of 3 to 10 years. Although tremendous progress has been achieved in the treatment of RA, a large minority of patients continue to respond poorly to existing medications, owing in part to a lack of appropriate therapeutic targets.

Methods

To find therapeutic targets for RA, a Mendelian randomization (MR) was performed. Cis-expression quantitative trait loci (cis-eQTL, exposure) data were obtained from the eQTLGen Consortium (sample size 31,684). Summary statistics for RA (outcome) were obtained from two largest independent cohorts: sample sizes of 97,173 (22,350 cases and 74,823 controls) and 269,377 (8279 cases and 261,098), respectively. Colocalisation analysis was used to test whether RA risk and gene expression were driven by common SNPs. Drug prediction and molecular docking was further used to validate the medicinal value of drug targets.

Results

Seven drug targets were significant in both cohorts in MR analysis and supported by localization. PheWAS at the gene level showed only ATP2A1 associated with other traits. These genes are strongly associated with immune function in terms of biological significance. Molecular docking showed excellent binding for drugs and proteins with available structural data.

Conclusion

This study identifies seven potential drug targets for RA. Drugs designed to target these genes have a higher chance of success in clinical trials and is expected to help prioritise RA drug development and save on drug development costs.

Introduction

Rheumatoid arthritis (RA) is a chronic inflammatory immune-mediated disease characterized by synovitis and cartilage destruction, primarily affecting the synovial membranes, tendon sheaths, and synovial bursae of the joints [1], and manifesting as joint pain, stiffness, swelling, deformity, and functional impairment [2]. It has a roughly 1% worldwide prevalence and is the 42nd most common debilitating disease in the world [3]. Its prevalence is rising as the world's population ages. Because of the high mortality and morbidity of RA, patients have a terrible quality of life and a huge economic burden on society. The National Audit Office estimates that RA costs the UK roughly £560 million per year in healthcare expenditures, not counting the cost of sickness absence and work-related impairment [4].

There is no cure for rheumatoid arthritis, so patients need to take long-term medication to reduce their signs and symptoms. These medications are often classified as disease-modifying antirheumatic drugs, glucocorticoids, nonsteroidal anti-inflammatory medicines and biologic drugs [5]. At the same time, the treatment of RA has improved considerably thanks to new targeted therapies that interfere with specific pathways of inflammation and immune response, such as antibodies against inflammatory cytokines or antibodies against immune cell surface molecules [6]. Nonetheless, a considerable minority of RA patients continue to fail to react to existing therapies [7]. In other words, despite the significant improvements in the treatment of RA with the introduction of biological and targeted synthetic disease-modifying anti-rheumatic drugs, a significant proportion of patients remain asymptomatic. These patients can be considered as 'Difficult-to-Treat' RA patients, and this group of patients represents a huge treatment challenge for both the healthcare team and the patient. In addition to new management strategies, optimal care for these patients requires new therapeutic drug targets [6].

Incorporating genetics into medication development might be one of the most effective strategies to enhance this process, as genetically backed therapies are far more likely to succeed in clinical trials [8,9,10]. Proteins encoded by druggable genes have become targets for drugs, or potentially for small molecules or monoclonal antibodies [11, 12]. Although genome-wide association studies (GWAS) have been efficient in identifying SNPs (Single Nucleotide Polymorphisms) linked with RA risk [13,14,15], GWAS methods cannot consistently identify causative genes and directly drive medication development.

Mendelian randomization (MR) is a statistical analysis of genetics that can be used to predict drug efficacy by mimicking randomised controlled trials [16,17,18,19]. SNPs (Expression Quantitative Trait Loci, eQTLs) linked to changes in gene expression may be comparable to long-term exposure to medications that target the encoded proteins [11, 20]. The GWAS for outcome (RA) may then be used to derive association data between the same genetic variations (SNPs) and disease (RA). Using MR, it is possible to integrate data on SNP-gene expression and SNP-disease connection to establish a causal relationship between exposure and outcome. A strong MR research may be designed using publically accessible data from two large-scale GWAS because exposure and outcome can be evaluated in two independent cohorts. Furthermore, people are randomly allocated variations with either high or low expression levels of druggable genes after fertilization, and individuals are often ignorant of their genotype, therefore MR investigations are akin to blinded trials [21]. However, it is important to acknowledge the limitations of the Mendelian randomization (MR) approach used in this study. One potential limitation is the presence of horizontal pleiotropy, where the instrumental variables may have direct effects on outcomes other than the exposure of interest. This can introduce bias in the estimation of causal effects. Violations of instrumental variable assumptions can also affect the validity of MR analyses. It is crucial to carefully consider and address these limitations and potential sources of bias in the interpretation of the results.

In this study, we propose several innovative findings in the field of rheumatoid arthritis (RA) research. Firstly, we identify new therapeutic targets for RA using the Mendelian randomization approach and large-scale genome-wide association study data. This approach combines cis-eQTL and RA risk association data to establish a causal relationship between exposure and outcome, improving drug efficacy prediction. Secondly, we validate the pharmacological activity of seven potential RA drug targets through drug prediction and molecular docking studies. These targets, closely related to immune function, are assessed for feasibility and potential drug candidates by evaluating their binding affinity and interaction patterns with drugs.

Additionally, we conduct gene co-localization analysis to confirm the shared driving factors between potential therapeutic targets and RA risk. This analysis helps determine the causal relationship between treatment targets and the disease, excluding potential confounding factors. Furthermore, our PheWAS analysis explores the associations between potential therapeutic targets and other characteristics, providing valuable insights into their multifunctionality and potential impact mechanisms for further research and the development of related treatment strategies. Lastly, gene enrichment analysis and protein–protein interaction network construction reveal the functional characteristics and biological relevance of potential therapeutic targets, deepening our understanding of their mechanisms in RA development and treatment.

In summary, our study offers important insights into the discovery of new therapeutic targets for RA. By integrating Mendelian randomization, drug prediction, gene co-localization analysis, PheWAS, gene enrichment analysis, and protein–protein interaction network construction, we provide valuable guidance for the development of more effective and targeted treatment approaches.

Materials and Methods

Exposure data

eQTLs data were obtained from eQTLGen Consortium (https://eqtlgen.org/). In brief, the eQTLGen data set contained 16,987 genes and 31,684 cis-eQTLs in blood samples from mostly healthy European individuals. A full description of the data can be found in the original article [22]. On March 13, 2023, the entire cis-eQTLs data and allele frequency statistics were acquired from the eQTLGen consortium. The list of druggable genes is from a previous study. It was designed as a computational method and combined with data from many existing genome-wide association studies to identify druggable proteins, linking them to known drugs, to propose 4463 druggable genes [12]. Considering that eQTLs are closer to the gene of interest in drug development studies and have more direct regulation of gene expression, the eQTLs used in this study were limited to SNPs with 5 kb upstream of the starting point or 5 kb downstream of the endpoint of a druggable gene. eQTLs for 2554 druggable genes were finally obtained.

Outcome data

GWAS data for the RA discovery Cohort were obtained from a previous large multi-ethnic study [14]. This study included 35,871 RA cases and 24,0149 controls from 37 cohorts with European, East Asian, African, South Asian and Arab ancestry. All RA cases satisfied the 1987 American College of Rheumatology (ACR) criteria [23] or the 2010 ACR/European League Against Rheumatism criteria [24], or were diagnosed with RA by a professional rheumatologist. 31,963 of the 35,871 cases had a known seropositive status; of these, 27,448 were seropositive and 4,515 were seronegative (rheumatoid factor or anti-citrullinated peptide antibodies were used to determine seropositivity). Detailed information on the data can be found in the original literature [14]. For consistency with the exposure data, GWAS data from the European pedigree sample of 22,350 cases and 74,823 controls were selected as outcome data for this study. GWAS data for the RA replication cohort containing 8279 cases and 261,098 controls were obtained from FinnGen Release 8 (https://www.finngen.fi/en) [25], which were released in December 2022.

Mendelian randomisation analysis

The R package TwoSampleMR (version 0.5.6) was used to conclude the MR analysis [26]. Exposure and outcome data were imported and harmonised using the R package built-in function (harmonise_data). The genetic instrumental variables used for MR analysis were subject to three MR assumptions: (1) SNPs were directly linked with exposure (i.e. highly associated with at least one gene expression, FDR < 0.05); (2) SNPs were not associated with exposure-outcome confounders; and (3) SNPs affected outcome through exposure only [27]. Therefore, several quality controls were performed in this study. First, the exposure and outcome groups were nearly entirely composed of people of European ancestry, reducing any potential bias from population stratification. Second, to decrease bias caused by weak instrumental factors, instrumental variables having a F statistic of less than 10 [F = (beta/se)2] were deleted [28]. When there was genetic linkage between SNPs, a reference panel from the 1000 Genome Project [29] was used to remove linked SNPs at r2 < 0.2 and a cropping range of 10,000 Kb, retaining the most significant SNPs (with the smallest p-value) and ensuring independence between SNPs. Finally, Steiger filtering approach was used to exclude genes with more SNP explained outcome (RA) variance than exposed variation.

In the main analysis, MR estimates were calculated for each SNP using the Wald ratio method, and for genes with multiple instrumental variables, SNP estimates were meta-analysed using inverse variance weighted (IVW), MR-Egger and weighted median methods. IVW assumes that all genetic instruments are valid, and its statistical power is the highest of all methods when the assumption is valid [30]. The weighted median technique allows some (50%) instrumental variables to be invalid by weighting the MR estimates produced by each SNP according to their magnitude and providing an overall MR estimate based on the median with bootstrapped standard errors [31]. MR Egger allows for the presence of horizontal pleiotropy, i.e. some SNPs may have an influence on the outcome via a different pathway than the exposure of interest, albeit at the expense of diminished statistical power. The MR Egger intercept test (MR-Egger intercept test) can be used to determine whether or not there is horizontal pleiotropy present [32]. When these three methods yield directionally consistent estimates it indicates that pleiotropy does not bias the IVW estimates. For genes containing more than two instrumental variables, MR-Egger intercept tests were performed to test for the presence of horizontal pleiotropy (judged at P < 0.05), and Cochran's Q was determined using both the IVW and MR-Egger techniques to test for heterogeneity amongst Wald ratios [33]. The Bonferroni correction was applied to determine the adjusted significance threshold for multiple testing, taking into account the false positives caused by multiple testing. In the discovery cohort, a P value below 1.96E−5 (P = 0.05/2554) was defined as significant. For significant genes, quality control was performed by checking that the three methods were consistent in the direction of estimated effect and that the MR-Egger test had no horizontal pleiotropy. Significant genes that passed quality control were repeatedly validated in the FinnGen cohort, and associations with a P value below 0.0017 (P = 0.05/29) were considered significant.

Colocalisation analysis

For genes that were significant in both cohorts, colocalisation analysis of RA risk was performed using the R package coloc [34]. Analyses were performed using SNPs harmonised by TwoSampleMR package with default priori probabilities: p1 = 1E−4, p2 = 1E−4, p12 = 1E−5. P1, p2, and p12 are predefined probability that the SNP in the test area is substantially linked with gene expression, RA risk, or both. The posterior probabilities derived from the colocalization analysis correspond to one of five hypotheses: PPH0, SNPs are not associated with either trait; PPH1, SNPs are associated with gene expression but not with RA risk; PPH2, associated with RA risk but not with gene expression; PPH3, associated with RA risk and gene expression but driven by different SNPs; PPH4, associated with RA risk and gene expression, was driven by common SNPs. The threshold of significance for colocalisation was set at PPH4 > 0.80, and genes that colocalised with RA could be considered as potential drug target genes.

Phenome-wide association analysis

In order to further evaluate the horizontal pleiotropy of potential drug targets and possible side effects, a phenome-wide association study (PheWAS) was performed on AstraZeneca PheWAS Portal (https://azphewas.com/) [35]. The original study used data of ~ 15.5 K binary and ~ 1.5 K continuous phenotypes from a subset of approximately 450,000 exome sequencing participants published by UK Biobank. The full construction methodology can be found in the original article [35]. We performed multiple corrections and set a threshold of 2E−9 (as the default in the AstraZeneca PheWAS Portal) to account for the potential for false positives.

Enrichment analysis

To investigate the functional characteristics and biological relevance of the identified prospective therapeutic target genes, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment study were done using the R package clusterProfiler [36], Pathview [37]. GO includes three terms: biological process (BP), molecular function (MF), and cellular component (CC). The KEGG pathway can provide metabolic pathway information.

Protein interaction network construction

By evaluating and analyzing protein–protein interaction (PPI) networks, one can gain a better knowledge of how one protein interacts with another intracellularly. In this study, the PPI network was built using STRING with a confidence score of 0.4 as the minimum needed interaction score and all other parameters left at default levels [38]. PPI results were further visualised by Cytoscape (V3.9.1) [39]. In addition, GeneMANIA (https://genemania.org/) was also used for PPI analysis [40].

Candidate drug prediction

Assessing protein-drug interactions is important to understand whether target genes can be used as actual drug targets. This study will use the Drug Signatures Database (DSigDB, http://dsigdb.tanlab.org/DSigDBv1.0/) [41] to accomplish this. Specifically, with 22,527 gene sets and 17,389 distinct compounds spanning 19,531 genes, DSigDB is a sizable database that connects medicines and other chemicals to their target genes. The identified target genes are uploaded to DSigDB and drug candidates can be predicted to assess the medicinal activity of the target genes.

Molecular docking

To further understand the effect of drug candidates on drug target genes and the druggability of target genes, this study further performed molecular docking at the atomic level to assess the binding energy and interaction pattern between drug candidates and their targets. Molecular docking simulations allow us to analyze the binding affinity and mode of interaction between ligands and drug targets. By identifying ligands with high binding affinity and favorable interaction patterns, we can prioritize drug targets for further experimental validation and optimize the design of potential drug candidates.In this study, AutodockVina 1.2.2 (http://autodock.scripps.edu/), a computerised protein–ligand docking software, was used to perform molecular docking of the top 5 significant drugs and the proteins encoded by the corresponding target genes [42]. Drug structure data were obtained from the PubChem Compound Database [43] (https://pubchem.ncbi.nlm.nih.gov/) and the corresponding IDs have been displayed in Table 4. Protein structure data were downloaded from the PDB (Protein Data Bank, http://www.rcsb.org/) and the corresponding PDB IDs can be viewed in Table 4 (except for ATP2A1 protein from EMBL-EBI (https://www.ebi.ac.uk/) Alphafold-based structure data). The final structures were obtained for five proteins and four drugs. To begin, all water molecules were removed from the protein and ligand files and polar hydrogen atoms were added. The grid boxes were centered to encompass each protein's structural domains and to allow for unrestricted molecular mobility. A grid point distance of 0.05 nm and a pocket size of 30 Å × 30 Å × 30 Å were used to build up the docking pocket. The entire molecular docking process was visualised in the model by Autodock Vina 1.2.2. Flow chart of this study was presented in Fig. 1.

Results

Thirty-two genes causally significantly associated with RA risk during the discovery phase

The largest current GWAS for RA traits was a large multi-ethnic study in 2022, containing 22,350 cases and 74,823 controls of European ancestry [14]. As shown in Figs. 2, 3, in the discovery cohort, the expression of 32 genes was causally related with RA risk (p < 1.96E−5 = 0.05/2554, Bonferroni correction for 2554 drug targets). However, when sensitivity analyses were performed, the BRSK1 gene had inconsistent direction of effect values across the three methods (Additional file 1: Table S1), and the CD226 and MMEL1 genes failed the horizontal pleiotropy test (P < 0.05, Additional file 1: Table S2), so these three genes were excluded from subsequent analyses. The results of the heterogeneity test (Additional file 1: Table S3) indicated heterogeneity between the SNPs for ITPR3, HLA-DPA1, C5, MMEL1 and CTLA4.

Replication phase 12 genes remain significant in an independent RA cohorts

In the replication phase this study used GWAS data from the Finnish FinnGen database containing 8279 cases and 261,098 controls of European ancestry. The MR analysis was performed in the same way as in the discovery cohort. Using the Wald ratio or IVW approach, the genetic predicted expression of 12 genes was replicated as causally connected to RA risk, as shown in Fig. 4 (significant after Bonferroni correction at P < 0.0017 = 0.05/29). The CTLA4 gene was excluded from subsequent analyses due to inconsistent direction of effect values across the three methods (Additional file 1: Table S4). The horizontal pleiotropy test showed no significant pleiotropy for any of the genes (Additional file 1: Table S5). In the test for heterogeneity (Additional file 1: Table S6), the HLA-DRB1, ITPR3, CTLA4, FCGR2B, CCR6 and HLA-DPA1 genes showed inter-SNP heterogeneity. In addition, the HLCS gene was excluded from subsequent analyses due to inconsistency in the direction of effect estimates between the discovery and replication phase.

Colocalization analysis

Previous studies have suggested that significant MR outcomes may arise from a locus in which SNPs are in close linkage disequilibrium and in which the SNP-exposure and SNP-outcome associations arise from two different causal SNPs, and therefore may lead to inferred false positive results [44]. Colocalisation analysis can be used to explore whether exposure and outcome share the same causal SNP when SNPs are clearly associated with both exposure and outcome [34]. Evidence suggests that proteins that have undergone both MR and colocalisation tests have greater potential to become drug targets and are more likely to be approved [45]. Therefore, the decision was made in this study to use discovery phase data to perform colocalisation on the 11 genes proposed in the previous analysis. As shown in Table 1, seven of the 11 proteins (CCR6, HLA-DPA1, HLA-DRB1, IFNGR2, C5, ATP2A1 and FEN1) showed strong evidence of colocalisation with RA (PP.H4 > 0.8) and could be candidate drug target genes.

Table 1 Colocalization results of eQTLs for 11 genes with RA-associated SNPs

Full size table

PheWAS

To further assess whether the seven potential drug target genes identified would have beneficial or deleterious effects on other traits and whether there was potential pleiotropy that was not captured by the MR-Egger intercept test, this study used 17,361 dichotomous phenotypes and 1419 quantitative phenotypes from the AstraZeneca PheWAS Portal database [35] to perform PheWAS at the gene level. PheWAS results can be interpreted as the association of genetically determined protein expression with specific diseases or traits. As shown in Table 2 and Additional file 1: Figs. S1–S14, with the exception of ATP2A1, none of the six drug targets were significantly associated with other traits at the gene level (P < 5E−8 for genomic association), suggesting that the potential side effects of drugs acting on these targets and the presence of horizontal pleiotropy in these genes are likely to be small, further indicating the reliability of the results of this study. In contrast, ATP2A1 was positively associated with Hand Grip Strength and negatively associated with Impedance of Arm, suggesting that RA drugs acting on ATP2A1 gene may affect both traits, and that the MR analysis of the ATP2A1 gene may have a pleiotropic effect on the results.

Table 2 Traits Significantly associated with APT2A1 using AstraZeneca PheWAS portal

Full size table