Skip to main content

Genetic risk factors for severe and fatigue dominant long COVID and commonalities with ME/CFS identified by combinatorial analysis



Long COVID is a debilitating chronic condition that has affected over 100 million people globally. It is characterized by a diverse array of symptoms, including fatigue, cognitive dysfunction and respiratory problems. Studies have so far largely failed to identify genetic associations, the mechanisms behind the disease, or any common pathophysiology with other conditions such as myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) that present with similar symptoms.


We used a combinatorial analysis approach to identify combinations of genetic variants significantly associated with the development of long COVID and to examine the biological mechanisms underpinning its various symptoms. We compared two subpopulations of long COVID patients from Sano Genetics’ Long COVID GOLD study cohort, focusing on patients with severe or fatigue dominant phenotypes. We evaluated the genetic signatures previously identified in an ME/CFS population against this long COVID population to understand similarities with other fatigue disorders that may be triggered by a prior viral infection. Finally, we also compared the output of this long COVID analysis against known genetic associations in other chronic diseases, including a range of metabolic and neurological disorders, to understand the overlap of pathophysiological mechanisms.


Combinatorial analysis identified 73 genes that were highly associated with at least one of the long COVID populations included in this analysis. Of these, 9 genes have prior associations with acute COVID-19, and 14 were differentially expressed in a transcriptomic analysis of long COVID patients. A pathway enrichment analysis revealed that the biological pathways most significantly associated with the 73 long COVID genes were mainly aligned with neurological and cardiometabolic diseases.

Expanded genotype analysis suggests that specific SNX9 genotypes are a significant contributor to the risk of or protection against severe long COVID infection, but that the gene-disease relationship is context dependent and mediated by interactions with KLF15 and RYR3.

Comparison of the genes uniquely associated with the Severe and Fatigue Dominant long COVID patients revealed significant differences between the pathways enriched in each subgroup. The genes unique to Severe long COVID patients were associated with immune pathways such as myeloid differentiation and macrophage foam cells. Genes unique to the Fatigue Dominant subgroup were enriched in metabolic pathways such as MAPK/JNK signaling. We also identified overlap in the genes associated with Fatigue Dominant long COVID and ME/CFS, including several involved in circadian rhythm regulation and insulin regulation. Overall, 39 SNPs associated in this study with long COVID can be linked to 9 genes identified in a recent combinatorial analysis of ME/CFS patient from UK Biobank.

Among the 73 genes associated with long COVID, 42 are potentially tractable for novel drug discovery approaches, with 13 of these already targeted by drugs in clinical development pipelines. From this analysis for example, we identified TLR4 antagonists as repurposing candidates with potential to protect against long term cognitive impairment pathology caused by SARS-CoV-2. We are currently evaluating the repurposing potential of these drug targets for use in treating long COVID and/or ME/CFS.


This study demonstrates the power of combinatorial analytics for stratifying heterogeneous populations in complex diseases that do not have simple monogenic etiologies. These results build upon the genetic findings from combinatorial analyses of severe acute COVID-19 patients and an ME/CFS population and we expect that access to additional independent, larger patient datasets will further improve the disease insights and validate potential treatment options in long COVID.


Post COVID-19 condition (or long COVID) is a debilitating syndrome that the World Health Organization (WHO) estimates affects up to 20% of people infected by SARS-CoV-2 [1]. Other more recent studies put the prevalence of long-term symptoms (over 3 months post-infection) in COVID-19 patients even higher [2], with all estimates implying that over 100 million patients have been affected by the condition globally [3]. Even though symptoms decline for most patients over time, some patients still experienced symptoms such as post-exertional malaise or postural tachycardia syndrome (POTS) [4] up to 2 years after infection [5], and the long-term health consequences of long COVID remain unknown, with suggestions of a doubling of the risk of developing cardiovascular issues [6].

Reports indicate an extensive array of symptoms associated with long COVID [7], with the most common being fatigue and post-exertional malaise (PEM) [8], cognitive dysfunction [9], mood disturbances [10] and respiratory problems [11]. However, establishing a precise diagnosis for either of these diseases has proved challenging, in large part due to the complexity and diversity of their clinical presentation and their effects across multiple organ systems. In an attempt to provide some definitive metrics, a recent study developed a data-driven scoring framework for diagnosing long COVID based on the available symptom data [12].

Although many studies have investigated the genetic risks underlying long COVID, only one genome-wide association study (GWAS) has identified a single risk locus around the lead variant in FOXP4 [13, 14]. Studies that used combinatorial analytical approaches to delineate genetic risk factors in similarly heterogenous populations have demonstrated more success, for example in severe COVID-19 [15] and ME/CFS [16].

Combinatorial analytics approaches identify combinations of features that together (rather than individually) are associated with the disease phenotype [17]. They capture the non-linear effects of interactions between multiple genes (and exogenous factors if available). These signals are distinct from and complementary to the monogenic, linear additive associations of single SNPs found by GWAS. In complex (multifactorial and heterogenous) diseases these non-linear combinatorial signals are significantly more important in understanding disease biology than in relatively monogenic disorders such as many cancers and rare genetic disorders [18, 19].

In this study we used combinatorial analytics to identify disease risk signatures (combinations of genetic variants significantly associated with the development of long COVID) and explored the biological mechanisms with which they are involved. We investigated subpopulations of long COVID patients who had experienced either severe disease or a fatigue dominant phenotype, to compare the underlying genes and pathways that explain some of the heterogenous manifestations of the disease.

We also compared the output of this study against our previous ME/CFS analysis [16] to understand similarities in post-viral fatigue and other phenotypes experienced by subsets of long COVID patients. Finally, we compared the pathways that were significantly enriched in this genetic analysis of long COVID against known genetic associations in other chronic diseases that are predominantly autoimmune, neurological and/or metabolic in nature, to evaluate any common pathophysiological mechanisms that might be shared by long COVID.


Sano Genetics GOLD study dataset

Genotypic and phenotypic data for both cases and controls included in this study were generated from Sano Genetics’ Long COVID GOLD study [20]. Eligible participants (n = 1996), recruited between 2020 and 2022, provided saliva samples for an at-home Sano DNA Test (evaluated via Illumina Global Screening Array with Multi-disease drop-in panel) and completed a questionnaire hosted on the Sano Genetics platform detailing their acute COVID-19 and long COVID symptoms (if experienced), as well as basic demographic data and other chronic health conditions (see Additional file 1).

Symptom based score for long COVID severity

Given the heterogeneity of post-COVID symptoms reported by the GOLD study and other previous studies, we developed a data-driven scoring method to characterize the severity of self-reported symptoms. We analyzed participant reported scores for each available long COVID symptom experienced pre- and post-acute COVID-19, including breathlessness, fatigue, degree of muscle pain and change in mental health (see Additional file 5: Table S1 for more details). A ‘Total Change’ score was generated for each patient from the sum of the reported differences across symptoms pre- and post-COVID.

Cohort characteristics

At the time of analysis, a total of 1829 individuals in the GOLD study had a self-reported COVID-19 diagnosis. This COVID-19 cohort had a median age of 50 years [interquartile range (IQR) = 40—60] and median COVID-19 recovery time of 169 days [IQR = 14—507.5] (Table 1). It consisted of 61.1% females and 92.6% self-reported their ethnicity as ‘White’. The most prevalent self-reported comorbidities (prior to or after COVID-19) in the cohort were anxiety or panic attacks (30.0%), depression (26.2%), asthma (25.5%), eczema (18.6%) and migraines (17.4%).

Table 1 Characteristics of the two long COVID cohorts derived from the GOLD study dataset

The GOLD study cohort included in this analysis was recruited between January 2020 and November 2022. Using the Office for National Statistics (ONS) COVID-19 Infection Survey data [21], we have estimated the most prevalent circulating SARS-CoV-2 variant in the UK that each participant was most likely to be exposed to when they contracted COVID-19 (Fig. 1). This demonstrates that the majority (65%) of samples included in the study were most likely infected with the wildtype strain. The study does not include any participants who contracted any of the more recent SARS-CoV-2 variants that emerged in 2023.

Fig. 1
figure 1

Monthly distribution of the first self-reported COVID-19 diagnosis for the 1829 individuals included in Sano Genetics’ Long COVID GOLD study (2020–2022)

Of those confirmed to have had COVID-19, 1345 (73.5%) reported fatigue symptoms, 1135 (62.1%) reported symptoms linked to concentration, 1124 (61.5%) reported short-term memory symptoms and 714 (39.0%) reported breathlessness. The median ‘Total Change’ symptom score for the cohort was 15 [IQR = 2—35] (Additional file 5: Figure S1).

In the dataset, 1489 (81.4%) individuals provided free-text responses on other symptoms that they experienced since their illness that were not covered elsewhere in the questionnaire. The most frequently reported symptoms included loss of smell, headache, pain, tinnitus, loss of taste, dizziness, insomnia and postural tachycardia syndrome (POTS) (see Additional file 5: Table S2 and Additional file 5: Figure S2). Following COVID infection, 353 (19.3%) individuals reported reducing their working hours while 359 (19.6%) people discontinued working altogether post-illness.

Long COVID cohorts

We defined two long COVID case populations from the GOLD study based on self-reported symptom changes 3 months post COVID-19—‘Severe’ long-haulers who reported the greatest variety and severity of symptoms and ‘Fatigue Dominant’ cases who reported predominantly fatigue-associated long COVID symptoms.

The World Health Organization defines long COVID patients as those experiencing one or more symptoms post initial COVID-19 infection. However, the cohort in the GOLD study that met these criteria displayed a great range in the severity and length of self-reported symptoms experienced post COVID-19. Instead, we aimed to focus on the more ‘severe’ long haulers who reported the greatest degree of symptoms experienced as these are likely to be the patients experiencing long COVID symptoms that do not diminish over time without pharmaceutical intervention.

The Fatigue Dominant’ cohort was chosen primarily due to their phenotypic similarity with ME/CFS, allowing us to explore potential commonalities between the diseases based on our previously published combinatorial analysis for ME/CFS [16].

The number and overlap in cases and controls included in the two datasets are included in Additional file 5: Figure S3.

Severe long COVID cohort

The Severe long COVID cohort (n = 1,323 where cases = 459 and controls = 864) was selected using the difference in scores reported pre- and post-acute COVID-19 for three long COVID symptom groups—namely, respiratory, fatigue and mental health. Severe cases were defined as those with a ‘Total Change’ score for these symptoms greater than or equal to the upper quartile of the distribution. The controls in this study were defined as samples with a ‘Total Change’ score greater than or equal to 0 but below the median of the distribution.

Fatigue dominant long COVID cohort

The Fatigue Dominant cohort (n = 1,386 where cases = 477 and controls = 909) was selected using only a subset of symptoms relating to fatigue in the scores (‘Fatigue Change') reported for pre- and post-acute COVID-19 symptoms (see Additional file 5: Table S1). The controls in this study were defined as samples with a ‘Fatigue Change’ score greater than or equal to 0 but below the median of the distribution.

The characteristics of the two cohorts are shown in Fig. 2 and described in Table 1, Fig. 1, Fig. 2 and Additional file 5: Figure S4.

Fig. 2
figure 2

Distribution of the (a) ‘Total Change’ score for cases and controls in the Severe long COVID and (b) ‘Fatigue Change’ (part of ‘Total Change’ score) score in the Fatigue Dominant long COVID cohorts

Dataset QC

The two case–control datasets underwent a series of quality control (QC) procedures before they were analyzed using the PrecisionLife platform.

Standard variant-level and sample-level QC procedures were applied to the dataset (comprising of 696,382 SNPs) as described in the Genotype Quality Control section in Supplementary Information. Due to the small sample size of the two long COVID cohorts, the genotype data was filtered to exclude SNPs with minor allele frequency (MAF) < 5%. Very low frequency SNPs were removed as significant combinations involving rare variants are especially infrequent. This filter also increases the statistical power of combinatorial analysis to detect genotype-disease associations by reducing the amount of false discovery rate (FDR) correction required when testing multiple SNP-genotype combinations. Following QC, the Severe dataset comprised of 283,478 SNPs and the Fatigue Dominant dataset contained 283,444 SNPs.

Combinatorial analytics using the PrecisionLife platform

The PrecisionLife combinatorial analysis platform enables hypothesis-free identification of high-order combinatorial features (known as disease signatures), which may include multiple SNP genotypes and/or other multi-modal features in combination. These disease signatures capture both the linear and non-linear effects of genetic and molecular interaction networks and enable the identification of associations including those that are only relevant to a subgroup of patients. We have previously validated this analytical approach across a variety of complex chronic diseases where it has identified more associations with increased explanation of observed disease variance and reproducibility than comparable GWAS studies [15,16,17].

In the combinatorial analytics approach, disease signatures are identified and statistically validated in ‘layers’ of increasing combinatorial complexity, i.e., singletons, pairs, triplets etc. (also known as combinatorial order). Each disease signature is validated multiple times using several statistical tests at each stage of the process to avoid false positives. A more detailed description of the mining and validation stages is given in our previous ME/CFS study [16].

We applied the PrecisionLife platform to both long COVID case–control datasets in a hypothesis-free manner to identify combinations of SNP genotypes that are strongly associated with the development of long COVID symptoms when they co-occur in the same patient. The method prioritizes SNP genotype combinations that have high odds ratios, low p-values (p < 0.05) and high prevalence (> 5%) in long COVID cases. A permutation-based approach was used to compare the observed properties of the most highly associated SNP-genotype combinations to the null distribution for randomized datasets [22], with p-value cut-offs based on a specified threshold (Benjamini–Hochberg FDR of 0.05) after multiple testing correction. Combinations passing these tests were reported as validated long COVID disease signatures. Finally, a merged network (disease architecture) view is generated by clustering all validated disease signatures based on their co-occurrence in patients in the dataset.

SNPs found in multiple disease signatures often form the central hub of the disease architecture (see Fig. 3). These are termed ‘critical SNPs’ if the corresponding networks pass a further permutation-based statistical test. Potential critical SNPs are scored using a Random Forest (RF) algorithm with a fivefold cross-validation framework to assess the accuracy with which they predict the case–control split in the dataset.

Fig. 3
figure 3

Conceptual representation of features, combinations and disease signatures that form part of PrecisionLife’s combinatorial analytics methodology. In the case of the long COVID study all features were SNP genotypes, but other feature types, e.g., a patient’s expression level of a specific protein, medication history or clinical features such as their eosinophil level, can also be used, independently from or in combination with the genotype data

A cascade mapping process was used to map all the critical SNPs identified in the validated disease signatures to the human reference genome (GRCh38) [23]. SNPs identified in the coding region of a gene (or genes) were mapped directly to this gene and any remaining SNPs within 2kb upstream or 0.5kb downstream were mapped to the nearest gene(s). Due to the uncertainty about the wide range of cells and tissues that have been implicated in long COVID etiology [7], genes assigned by either expression quantitative trait loci (eQTLs) or chromatin interaction (Hi-C) data were not specifically prioritized for further analysis (as they would likely be in other indications) to avoid capturing any spurious associations from non-trait-related tissues or cells. Genes that could additionally be mapped using only eQTL or Hi-C data from the critical SNPs were observed and reported in Additional file 2, although these were not further evaluated.

Finally, a semantic knowledge graph, including data from over 50 public data sources (see Additional file 5: Table S3), was used to annotate the SNPs and genes, including data on prior genetic associations to disease, chromosomal location, tissue expression profiles, splice variants, mouse phenotypes, protein function/structure, known active chemistry and any pre-existing scientific literature or clinical trials among other attributes. This allows us to generate evidence-backed mechanism of action hypotheses as to each genetic variant’s potential impact on a patient’s long COVID phenotype.

Ancestry analysis

Ancestry inference for the samples in the GOLD study was performed using GRAF-pop [24]. To maximize the number of samples included in each case–control dataset, samples of all ancestries were included in the analysis. Since ancestry-specific analyses could not be performed due to limited samples in each cohort, we performed a logistic regression analysis to control for confounding effects of population structure. Any disease signatures that were no longer significantly associated with case–control status (p < 0.05 with Bonferroni FDR correction) in a logistic regression that also includes a binary ancestry variable for white-European/other ancestry were considered false positives and removed from further analysis.

Assessing causality with expanded genotypes analysis

The disease signatures output by the PrecisionLife platform represent combinations of SNP genotypes that are significantly enriched in cases relative to controls. Expanded genotypes analysis (“EGA”) tests how the genotype of a critical SNP from the disease signature affects the odds of disease when the genotypes of all interacting SNPs are held constant.

For each disease signature, we first assign patients to one of the possible combinations of the component SNP genotypes (the “expanded genotype signatures”). In the example illustrated in Fig. 4, the validated disease signature is comprised of two SNPs, each in one of 3 states (0, 1 and 2), which can generate 9 (32) expanded genotype signatures. For combinations of 3, 4, and 5 SNPs, the number of expanded genotypes signatures is 27, 81, and 243 respectively. We then calculate the disease odds for patients with each expanded genotype signature.

Fig. 4
figure 4

Hypothetical example of an expanded genotypes analysis for a disease signature comprised of two SNPs. After controlling for the confounding effects of the interacting SNP genotype, patients with one or two copies of the critical SNP minor allele (genotypes “1” and “2”) have consistently elevated odds of disease relative to patients with the wild type genotype (“0”) at the critical SNP

For a given critical SNP of interest, we identify sets of expanded genotype signatures that share the same genotypes for all interacting SNPs (the blocks separated by the horizontal lines in Fig. 4). We calculate the “EGA odds ratio” by dividing the disease odds ratio for an expanded genotype signature with a copy of the critical SNP minor allele by the disease odds ratio for the matching expanded genotype signature with the critical SNP homozygous wild type genotype.

Due to the small number of patients associated with individual expanded genotype signatures, we may have insufficient statistical power to directly test whether the EGA odds ratios are significantly different from zero. Instead, the primary aim of the EGA is to test whether the observed directionality of the relationship between the critical SNP minor allele and disease phenotype is consistent across all or most expanded genotype signatures. If the critical SNP genotype does not affect disease, then we expect the minor allele genotype will be randomly associated with increased odds of disease for some expanded genotype signatures and decreased odds of disease for others, with no consistent biological pattern.

In the hypothetical example shown in Fig. 4, the EGA reveals that the critical SNP minor allele is consistently associated with elevated disease risk after controlling for the genotype of the interacting SNP. This pattern holds even though patients with the critical SNP minor allele have below average odds of disease when they also possess the wild type genotype at the interacting SNP. By controlling for the confounding effects of the interacting SNP, EGA allows us to gain a better understanding of the relationship between the critical SNP and disease.

Each disease signature was assigned to one of the following seven categories based on the broad patterns observed from the EGA: universally causative, universally protective, SNP-specific causative, SNP-specific protective, combination-specific causative, combination-specific protective, or ambiguous. Definitions of each category are provided in Additional file 5: Table S5. Across these categories, the designation of “Causative” and “Protective” do not necessarily guarantee that the specific critical SNP identified in the analysis directly affects disease risk. Due to low SNP coverage, the critical SNP could potentially be a neutral marker that is in strong linkage disequilibrium with the true biological variant.

We excluded all expanded genotype signatures which occurred in fewer than 15 patients from the EGA. Likewise, we did not consider disease signatures comprised of 4 or 5 SNPs due to the limited statistical power provided by the size of the available datasets. There are 81 possible expanded genotype signatures for a combination of 4 SNPs, which corresponds to only 17 patients per expanded genotype signature. More problematically, there are 243 possible expanded genotype signatures for a combination of 5 SNPs, which corresponds to fewer than 6 patients per expanded genotype signature. The stochastic noise associated with such small sample sizes make it very difficult to identify broad patterns across the full set of expanded genotype pairs.

Phenotype enrichment analysis

The available clinical data from the questionnaire was used to evaluate the long COVID patient profiles associated with each of the disease signatures generated by the analysis. We calculated the statistical significance of the association of a particular phenotype with a set of long COVID cases with shared genetic variants when compared against the rest of the case population. The two proportions Z-test was used for categorical variables, such as severity of acute COVID-19 and comorbidities, and Mann–Whitney U [25] for any continuous variables, such as participant reported scores that reflect change in symptoms pre- and post-COVID-19. Statistical associations were corrected for multiple testing using Benjamini–Hochberg method.

Overlap analysis (“seeded” approach)

We evaluated the genetic overlap between the Severe and Fatigue Dominant cohorts by taking the SNPs identified in the hypothesis-free analysis for one dataset (seed SNPs) and testing whether any combinations involving them are also significantly associated with disease risk in the second dataset when analyzed by the PrecisionLife platform (see section “Combinatorial analytics using the PrecisionLife platform”).

This hypothesis-driven or ‘seeded’ approach was performed in addition to a direct gene overlap analysis between the two cohorts. This approach mitigates the effects of stochastic differences in dataset composition when defining the combinatorial search space explored in our analyses. The number of possible SNP-genotype combinations is so extensive that it is impossible to sample the entirety of the space. This implies that true associations may remain unreported because they were not tested when the dataset was analyzed using the hypothesis-free approach.

We also employed this technique when evaluating the overlap between the genes identified in our analysis of the UK Biobank ME/CFS population and the two long COVID cohorts generated from the GOLD study. Due to the low SNP overlap (n = 42,500) between the arrays used to genotype the ME/CFS and long COVID datasets, we performed a seeded analysis using 383 SNPs in the Severe and Fatigue GOLD dataset that were within 10kb up or downstream of the original 14 ME/CFS genes.

Cross disease analysis

Cross disease analysis can provide insights into potential drug repurposing opportunities or development of common therapies. We compared the genes that were significantly associated with Severe and Fatigue Dominant long COVID against a variety of other chronic diseases to identify shared pathophysiological mechanisms. These diseases included neurodegenerative, mental and behavioural disorders, cardiovascular, gastrointestinal, autoimmune and metabolic diseases (see Additional file 5: Tables S8, S9). Disease-associated genes identified for each indication group are those with known genetic links reported in OpenTargets [26] (v 23.02, February 2023 release). Only genes with strong target-disease genetic association scores (> 0.9 out of 1.0) have been used in this analysis for each indication group.

Enrichment analysis was performed using the g:Profiler tool [27] to determine pathways and biological processes that are significantly associated with the disease-associated genes for each indication group (p < 0.05, p-value correction for multiple testing using Benjamini-Hochberg). This allows us to explore up/downstream of individual gene targets to identify biological processes that are impacted across diseases.


GWAS analysis

We evaluated the significance of individual genetic variants associated with the two long COVID datasets (Severe and Fatigue Dominant) using a standard GWAS analysis with PLINK [28]. As can be observed from the two Manhattan plots (Additional file 5: Figure S5), no SNP from either of the two cohorts reached the genome-wide significance threshold (p < 5 × 10−8).

Cohort analysis

To determine whether there was a correlation between circulation SARS-CoV-2 variant and long COVID symptom severity, we plotted the ‘Total Change’ score for all study participants, including cases and controls, against the month they first contracted COVID-19 (Fig. 5). As defined in our Severe long COVID cohort, the greater the ‘Total Change’ score, the greater the degree of severity in long COVID symptoms experienced by the participant.

Fig. 5
figure 5

Variation of the long COVID symptom-based ‘Total Change’ scores with COVID-19 diagnosis for 1829 individuals in the Sano Genetics’ Long COVID GOLD study (2020–2022)

This analysis shows a significant decrease in symptom severity over time, although the correlation coefficient is low, potentially due to data variability (Additional file 5: Figure S6).

Hypothesis free combinatorial analysis

Using the PrecisionLife combinatorial analysis platform, we identified 86 disease associated critical SNPs for the Severe cohort and 84 for the Fatigue Dominant cohort, mapping to 43 and 36 genes respectively (Table 2). A total of 74 unique genes were associated with at least one of the long COVID cohorts, including 5 genes which were identified in both the Severe and Fatigue Dominant cohorts.

Table 2 Summary of PrecisionLife combinatorial analysis results on Severe and Fatigue Dominant long COVID cohorts generated from the GOLD study

The disease signatures associated with each cohort were all combinations of 2 or more SNP genotypes, i.e., they were all combinatorial signals, predominantly involving combinations of 3–5 SNPs, that could not have been identified using GWAS (Fig. 6). An example of one of the disease signatures identified in the analysis of the Severe long COVID cohort is shown in Table 3. None of the SNPs identified in disease signatures were observed to be in linkage disequilibrium (LD) with each other.

Fig. 6
figure 6

Distribution of combinatorial order (i.e., number of component SNPs) for the validated combinatorial disease signatures from the Severe and Fatigue Dominant long COVID cohorts

Table 3 Example of one of the combinatorial disease signatures identified by the PrecisionLife combinatorial analysis of the Severe long COVID cohort

All cases included in the analysis possessed at least one of the disease signatures found to be significant in the hypothesis-free study of its cohort. The complete list of genetic variants and their mapped genes identified from this study are listed in Additional File 2 (Fig. 7).

Fig. 7
figure 7

Disease architecture diagrams representing (a) the Severe and (b) Fatigue Dominant long COVID patient populations generated by the PrecisionLife platform. Each circle represents a disease-associated SNP genotype, and edges represent their co-association in patients in disease signature(s). The critical SNP genotypes identified in each case population are highlighted in dark green

Upon further evaluation, 118 (10%) disease signatures identified in the Severe cohort and 120 (8.4%) signatures in the Fatigue Dominant cohort comprised of SNPs that could be mapped to genes with shared biological functions or pathways (see Additional file 3).

As there were limited number of cases and controls of non-European ancestry (see Additional file 5: Table S6) in each of the two datasets, we evaluated the output to identify any disease signatures that may be confounded by population structure effects rather than reflecting a true disease signal.

All disease signatures in the Severe cohort passed the ancestry confounder analysis. We identified 129 (9%) disease signatures in the Fatigue Dominant cohorts that did not pass the ancestry confounder check (Additional file 5: Table S7). However, when we removed the SNPs and mapped genes represented only by these potentially confounding disease signatures (and not also by one or more additional true disease signatures), only one gene (AC005005.1) associated with the Fatigue Dominant cohort linked to the critical SNP, rs4820946, was eliminated from all final disease associated gene lists. This reduced the 74 genes found to 73.

The cohort analysis indicates that fewer than 15% of cases that were assigned to either one or both long COVID case groups were hospitalized with severe COVID-19 or reported co-associated chronic diseases such as diabetes, cardiovascular disease or cognitive impairment. This meant that the number of cases with these phenotypes was too low to identify any associations, such as COVID-19 severity or a particular comorbidity, with genetic disease signatures.

Enrichment analysis of the fatigue, respiratory and mental health symptom-based scores for the Severe long COVID patients was used to investigate the clinical characteristics of the disease signatures identified in the Severe cohort study. Unfortunately, the population sizes were too small to reach statistical significance (p < 0.05) after multiple-testing correction (see Additional file 4).

From the two independent hypothesis-free analyses of the datasets, we identified SNP genotypes mapping to 5 genes that were found to be significantly associated with disease in both the Severe and Fatigue Dominant long COVID cohorts. For each gene, more than 70% of cases from both cohorts possessed at least one disease signature containing an associated SNP (Table 4). These genes have a range of different functions and potential mechanism of action hypotheses as to their role in the development of long COVID.

Table 4 List of genes significantly associated with long COVID in both the Severe and Fatigue Dominant cohorts

Seeded analysis to test overlap between long Covid cohorts

The two independent analyses of the Fatigue Dominant and Severe cohorts indicated that 5 genes were strongly associated with long COVID in both cohorts. We performed two seeded analyses to understand if any additional genes identified in either the Fatigue or Severe cohorts were also significant in the other population.

This approach revealed that 28/43 genes identified in the Severe cohort were also significantly associated with disease in the Fatigue Dominant cohort, and 25/35 genes from the original Fatigue Dominant analysis were also associated in the Severe cohort. This left 15 genes unique to the Severe cohort and 10 genes unique to the Fatigue Dominant cohort.

The unique genes, the percentage of total cases they were associated with, and their biological functions are summarized in Tables 5 and 6.

Table 5 List of genes that were uniquely associated with the Severe case cohort
Table 6 List of genes that were uniquely associated with the Fatigue Dominant case cohort

A comparative pathway enrichment analysis using the g:Profiler tool revealed that there were significant differences in the biological pathways associated with the lists of unique genes from the Severe and Fatigue Dominant cohorts (Fig. 8). Genes that were uniquely associated with the Severe long COVID cohort were more likely to be found in immune pathways such as myeloid differentiation, macrophage foam cells and lipid signaling pathways. Genes that were uniquely associated with the Fatigue Dominant cohort were linked to metabolic pathways such as JNK/MAPK signaling cascades.

Fig. 8
figure 8

Pathway enrichment plot for disease-associated genes found in the Severe and Fatigue Dominant long COVID cohorts. GeneRatio represents the ratio of genes found in the pathway compared to the genes associated with a cohort and p.adjust represents the p-value adjusted for multiple testing. The dots in the plot are colour-coded based on their corresponding p.adjust values

Comparison of long COVID with ME/CFS

We also used the seeded analysis approach to test for overlap between disease signatures associated with long COVID and those associated with ME/CFS in our previous study [16].

Taking the list of SNPs within genes that were identified to be significant within the UK Biobank ME/CFS population, we found that 24 SNPs were also associated with long COVID in the Severe cohort. Of these 24 SNPs, 9 were critical (RF scored) within the Severe long COVID population, mapping to 5 genes (Table 7).

Table 7 List of critical SNPs significantly associated with long COVID in the Severe and Fatigue Dominant long COVID cohorts that can be linked to genes identified in a combinatorial analysis of UK Biobank ME/CFS patients

In the Fatigue Dominant cohort, 27 SNPs were associated with long COVID, of which 12 SNPs were also common with the Severe cohort (Additional file 5: Table S4). 7 of these 27 SNPs were critical (RF scored) SNPs within the Fatigue Dominant long COVID cases, mapping to 5 genes previously found in the ME/CFS study (Table 7).

Comparison of long COVID genes identified with acute COVID-19 studies

Whilst few GWAS significant variants have so far been identified in long COVID [59], we sought to compare the 73 unique genes identified in our long COVID studies against the literature for any evidence within severe COVID-19 and/or long COVID. Of these genes, at least 9 have prior associations—such as differential expression and genetic susceptibility analyses—to acute COVID-19 after reviewing available publications in PubMed and other data sources such as OpenTargets (Table 8).

Table 8 Known associations of genes identified in either one or both of the cohorts of long COVID patients with acute COVID-19

We also compared our results against the blood derived gene expression signatures associated with post-acute sequelae identified by Thompson et al. [64]. There are several key differences between the studies—Thompson et al. recruited individuals hospitalized with severe acute COVID-19 infection, whereas the majority of individuals in our study experienced milder forms of the disease (Table 1). We are also drawing comparisons from a transcriptomic study derived from whole blood against a combinatorial study of germline genetic variants.

Nonetheless, we found that 14 of the 73 genes (Severe = 7 and Fatigue Dominant = 7) identified in our analyses were also differentially expressed at the transcriptomic level in patients experiencing long COVID (Additional file 5: Table S10).

Overlap between long COVID and other diseases

We identified genes with known genetic associations across a wide range of complex diseases including neurodegenerative, mental or behavioral, cardiovascular, gastrointestinal, autoimmune and metabolic diseases (see Additional file 5: Tables S8 and S9). We evaluated the degree of overlap at a biological process level (using mapping of genes to biological processes in Gene Ontology [65, 66]) to identify the common pathophysiological mechanisms that are shared between those diseases and long COVID.

27 biological processes are significantly enriched in the 73 long COVID genes identified in this analysis, of which 19 processes are also significantly enriched in at least one other indication group (Additional file 5: Table S13). Based on these 19 pathways, long COVID genes shared the greatest number of biological processes (> 50%) with cardiovascular disease and mental or behavioral disease followed by gastrointestinal disease, neurodegenerative disease, autoimmune disease and metabolic disease (Fig. 9, Additional file 5: Table S13).

Fig. 9
figure 9

Heatmap plot showing 19 biological processes (Gene Ontology biological process terms) shared between 73 long COVID genes identified in the GOLD cohort and genes with genetic evidence in one or more indication groups (neurodegenerative, mental or behavioral, cardiovascular, gastrointestinal, autoimmune and metabolic disorders). For each indication group, only the significantly enriched biological processes (p < 0.05) are shown in blue and the intensity of the color is based on the p values of the Gene Ontology term in each indication group

Expanded genotypes analysis to detect causal features

We conducted expanded genotypes analysis (EGA) for all Severe cohort RF scored genes (see Tables 3 and 4) found in disease signatures with 2 or 3 SNP genotypes. These comprise 5 genes corresponding to 23 disease signatures, including a disease signature that contains two RF scored genes (see Table 8).

We found that the critical SNP is universally protective across at least 2 validated disease signatures for 3 of the 5 RF scored genes (ADIPOQ, NOL4, and PDE6C). That is, when we control for the genotypes at the interacting SNPs, expanded genotype signatures featuring at least one copy of the critical SNP minor allele are consistently associated with lower odds of severe long COVID relative to expanded genotype signatures with the homozygous wild type genotype for the critical SNP. In all but one of the remaining disease signatures for these genes, the critical SNP minor allele is most often associated with decreased odds of severe long COVID, with narrow exceptions: i.e., when it fails to co-occur with the minor allele of an interacting SNP (“SNP-specific protective effect”) or when it co-occurs with a specific set of genotypes at multiple interacting SNPs (“combination-specific causative effect”).

The critical SNP minor alleles for these three genes are typically associated with decreased risk of severe long COVID, which either implies that they represent broadly protective variants or causative variants that are in LD with the wild type allele at the genotyped SNP. This relationship only becomes apparent, however, when we control for the confounding effects of other causative and/or protective variants. Only one validated disease signature for these three genes fails to exhibit a consistent biological association between the critical SNP minor allele and disease, indicating a potential false positive.

In contrast, the gene SNX9 is consistently associated with more complex interactions that highlight the combinatorial dynamics of disease. For example, we identified a disease signature comprising three SNPs that is associated with strongly elevated odds of long COVID. This disease signature includes:

  • critical SNP rs2025994 located approximately 40kb upstream of the SNX9 coding region

  • interacting SNP rs6777173 located 12 kb upstream of KLF15

  • interacting SNP rs11072524 located in an intron of RYR3

We also found that the SNX9 minor allele offers significant protection against the risk of long COVID among patients who possess a copy of the minor allele at either interacting SNP (i.e., a SNP-specific protective effect). That is, patients with the SNX9 heterozygous or homozygous minor allele genotype consistently have lower odds of developing severe long COVID than patients with the SNX9 homozygous wild type genotype, after controlling for the confounding effects of the genotypes at the two interacting SNPs (see Table 9). Due to the small sample sizes associated with many expanded genotype signatures, these individual comparisons are not statistically significant. However, if we pool all patients in this cohort, then patients with a copy of the SNX9 minor allele have significantly lower odds of disease than patients who are homozygous for the SNX9 wild type allele (odds ratio = 0.52, 41 cases/134 controls vs. 316 cases/532 controls, Fisher’s Exact Test p = 0.00047; note that these totals include patients with rare expanded genotype signatures not shown in Table 9).

Table 9 Expanded Genotypes Analysis results for 5 RF-scored genes identified in the Severe cohort linked to disease signatures of 2 or 3 SNPs (one disease signature contains SNPs associated with two genes)

A different pattern arises among patients who are homozygous for the wild type genotype at both interacting SNPs (Table 10). Here, patients with a copy of the SNX9 minor allele have higher odds of disease than patients who are homozygous for the SNX9 wild type allele (odds ratio = 1.86, 19 cases/22 controls vs. 74 cases/160 controls), although the odds ratio is not statistically significant (Fisher’s Exact Test p = 0.075).

Together these results suggest that the SNX9 genotype is a significant contributor to the risk of severe long COVID infection, but that the gene-disease relationship is context dependent and mediated by interactions with KLF15 and RYR3. Similar non-linear interactions are represented by three additional disease signatures comprised of the same SNX9 critical SNP and different interacting SNPs. Monogenic approaches such as GWAS that do not consider these gene–gene interactions can fail to detect potentially important drivers of disease.

Finally, the expanded genotypes analysis did not provide any additional insight into the relationship between DLC1 and disease. This could indicate that the biological relationship between DLC1 is highly complex or that the result is a false positive. However, the disease signatures associated with strongly elevated odds of severe long COVID all contain the rare homozygous minor allele genotype for the DLC1 critical SNP. Due to small sample sizes, we were unable to analyze other expanded genotype signatures containing the potentially causative genotype. Thus, the ambiguous results may reflect the fact that the relationship between the DLC1 minor allele and long COVID does not carry over into heterozygous patients.

Table 10 Assessing the effects of the SNX9 rs2025994 genotype on severe long COVID when controlling for the genotypes of the interacting SNPs rs6777173 (KLF15) and rs11072524 (RYR3)

Evaluation of potential novel drug targets and repurposing opportunities

We evaluated the genes identified in the study to find potential novel drug targets and their associated mechanistic patient stratification biomarkers (the disease signatures that connect patient subgroups with the mechanistic etiology for their disease). As described in our previous ME/CFS paper, the use of combinatorial analytics to identify novel targets has been validated in other diseases such as ALS, where these novel targets have demonstrated disease modifying activity in in vitro models [67].

Of the 73 unique genes found across the two cohorts, 42 are potentially tractable targets for drug development strategies based on annotations from OpenTargets (defined by a score of greater than 0 across at least one metric for tractability), see Additional file 5: Table S11. This includes 26 targets that are suited to an antibody approach and 18 that are amenable to modulation by small molecules.

Most (> 90%) of the genes are expressed in a wide range of tissues (Additional file 5: Figure S7) although the expression profile of the genes in specific cell types is variable (Additional file 5: Figure S8). Approximately 44% (n = 30) of the genes are expressed in inhibitory neurons followed by 41% in excitatory neurons (n = 28) and 40% in oligodendrocyte precursor cells (n = 27).

Using a systematic repositioning approach [68], we identified 13 long COVID targets that already have drugs in clinical development. As these drugs or development candidates may require fewer preclinical studies and already have a known safety profile, they could represent a quicker and de-risked strategy for developing potential new treatments. We are exploring the repurposing potential of these compounds for the treatment of long COVID and ME/CFS (where appropriate).

From this analysis for example, we identified TLR4 as an attractive repurposing candidate. Our analysis indicates that 52% of cases included the Severe long COVID cohort had at least one disease signature containing a variant in TLR4 and there is additional supporting evidence that inhibition of TLR4 in a mouse model prevents long term cognitive pathology such as synapse elimination and memory deficits that is caused by the SARS-CoV-2 Spike protein [47]. Clinical studies have already shown that antagonizing TLR4 signaling dampens the pathological cytokine storm observed in patients with severe acute COVID-19 and reduces mortality rates in hospitalized COVID-19 patients [69, 70]. However, our analysis also indicates that antagonism of TLR4 may demonstrate therapeutic effects in long term pathology caused by SARS-CoV-2.

We performed a search of the GlobalData [71] database to further understand the number and stage of development of TLR4 antagonists that are in clinical pipelines. This revealed a total of 88 unique drugs that target TLR4 (either singularly or as part of a combination therapy), including 8 in development for acute COVID-19, the most advanced of which (Paridiprubart, Edesa Biotech Inc) is currently being evaluated in a Phase 3 study in hospitalized COVID-19 patients with Acute Respiratory Distress Syndrome (ARDS) [72].


As an approach to identify the drivers of the complex disease biology of long COVID, combinatorial analytics yields more useful signal than GWAS. No SNPs reached the genome-wide significance threshold in either the Severe or Fatigue Dominant cohorts. This underlines the difficulties involved in using monogenic analysis approaches to understanding disease associated genetic variants and mechanistic etiologies in heterogeneous and polygenic diseases, especially with small datasets.

Using combinatorial analytics, we identified 73 unique genes in a long COVID population and highlighted the relevance of subsets of these genes to the different sub-cohorts of the disease population. At least 9 of the genes identified in this study have been linked to acute COVID-19, and despite key differences in the study designs, we also observe that 14 of the 73 genes were differentially expressed in a transcriptomic analysis of long COVID patients. We can form strong mechanism of action hypotheses for each gene’s role in the development of long COVID.

Splitting the population into two long COVID subtypes, Severe and Fatigue Dominant, allowed us to explore the genetic and biological differences underpinning different clinical manifestations. The comparative pathway enrichment analysis identified differences in pathways between the genes uniquely associated with the Severe long COVID group and those uniquely associated with the Fatigue Dominant phenotype (Figure 8). The greater number of genes involved in immune response in the Severe long COVID cohort may also indicate a more severe form of the acute infection. This may potentially arise as a result of patients experiencing higher viral loads than average, as we identified 4 genes that have been functionally linked to SARS-CoV-2 host response and/or acute severe COVID-19 (Table 5).

The pathway enrichment analysis also highlighted an overrepresentation of genes involved in macrophage foam cell differentiation. The formation of foam cells leading to a profibrotic macrophage phenotype is critical in the development of atherosclerosis [73]. However, there is also evidence that profibrotic pulmonary macrophages contribute to acute respiratory distress syndrome (ARDS) and lung injury associated with patients with severe COVID-19 [74].

The genes that were associated only with the Fatigue Dominant long COVID cohort are enriched in MAPK and JNK signaling cascades as well as other metabolic processes involved in mitochondrial function and cellular respiration (Table 6). As discussed in our previous ME/CFS paper, dysregulated mitochondrial function, resulting in the inability to increase respiration rates in response to increased demand from stressors such as exercise [75], may result in the post-exertional malaise (PEM) that is a hallmark of ME/CFS. The finding of similar pathways in the Fatigue Dominant long COVID cohort suggests that these patients may also struggle to meet energy demands.

It is known that NK cell effector function (cytotoxic activity) regulated by MAPK signaling cascades, including via the c-Jun N-terminal kinase (JNK) [76] signaling pathway, is dysregulated within patients with ME/CFS, who exhibit reduced NK cell cytotoxic activity [77]. Further work will be required to confirm if similar pathological events occur in patients who develop fatigue dominant long COVID.

When we evaluated the degree of similarity between the genes associated with ME/CFS and long COVID, we found 13 critical SNPs (39 in total) within at least one of the long COVID populations that could be mapped to a gene previously associated with ME/CFS.

In both Severe and Fatigue Dominant long COVID populations, we identified SNPs mapping to the genes ATP9A, INSR, CLOCK, SLC15A4 and GPC5. All of these genetic variants were found in a higher proportion of the Fatigue Dominant and Severe long COVID populations than in the ME/CFS case group. This finding may indicate that the long COVID case group defined by fatigue symptoms is more homogenous than those within the self-reported UK Biobank ME/CFS population, which likely includes a mix of viral and non-viral triggers of chronic fatigue symptoms.

We found that the CLOCK gene is significantly associated with Fatigue Dominant long COVID and ME/CFS. CLOCK (Circadian Locomotor Output Cycles Kaput) is an important regulator of circadian rhythm, disruptions of which have been associated with pain, insomnia, insulin resistance, immunological function and impaired mitochondrial function [78,79,80,81,82]. Interestingly, one of the most common variants identified in ~ 86% of the long COVID Fatigue Dominant population mapped to the gene NLGN1. NLGN1 is also transcriptionally activated by CLOCK in the forebrain [83], which could indicate multiple genetic contributions to dysregulated circadian rhythm in long COVID.

Of the remaining 4 genes common between long COVID and ME/CFS, we identified 3 common variants in the genes ATP9A, INSR and SLC15A4 in both Severe and Fatigue Dominant cohorts (Table 7).

SLC15A4 encodes a transmembrane transport that has previously been associated with inflammatory autoimmune diseases such as systemic lupus erythematosus from genome-wide association studies [84, 85]. However, SLC15A4 also plays a key role in mitochondrial function, with knock down of the gene resulting in impaired autophagy and mitochondrial membrane potential under cellular stress [86].

We also hypothesized that the genetic variants in ATP9A and INSR both contribute to dysregulated insulin signaling in subgroups of ME/CFS patients. Type 2 diabetes-related signaling pathways and insulin resistance were also a key theme within the genes associated with long COVID, and 11 of the gene targets identified in this analysis have prior associations with type 2 diabetes in the OpenTargets database (Additional file 5: Table S12). Metabolic dysfunction and type 2 diabetes may increase risk of developing severe acute COVID-19 [87] and epidemiological studies have demonstrated that there is an increased risk of developing diabetes post COVID-19 compared against controls who had not been infected with SARS-CoV-2 [88]. Furthermore, increased incidence of insulin resistance and glycemic dysregulation was observed in patients 2 months post COVID-19 and in long COVID patients [32, 89].

Several of the biological processes that genes identified in this study are significantly enriched for—such as foam cell differentiation—are also associated with known genetic links to metabolic diseases such as type 2 diabetes (Figure). Metabolic dysfunction has a variety of biological consequences, including increased levels of chronic inflammation, dysregulated immune response to acute infection, endothelial cell dysfunction and defects in coagulation pathways. All of these have been linked to long COVID and severe acute COVID-19 pathogenesis [90].

It is therefore plausible that patients with genetic variants that predispose them to metabolic dysfunction and insulin resistance are more likely to suffer from long term pathological sequelae after the acute phase of COVID-19 infection. From these findings we would indeed expect this population to have increased rates of new-onset type 2 diabetes compared to the non-long COVID population. Unfortunately, longitudinal health record data after the survey was completed was not available to validate this hypothesis in this analysis.

Similarities in indications observed from the cross-disease analysis have also highlighted shared pathways and biological processes associated with genetic drivers of these indications. The results are supported by common clinical manifestations reported in long COVID studies. Of the 27 pathways significantly enriched in the long COVID genes identified in this analysis, 16 (60%) are associated with gene targets previously associated with mental or behavioral disease (Figure 9). This includes indications such as major depressive disorder, anxiety disorder and schizophrenia. A recent meta-analysis of over 10,000 patients indicated that neurological and neuropsychiatric symptoms, such as brain fog, attention deficits and fatigue, were some of the most reported 3 months after acute COVID-19 [91]. This analysis may indicate some of the genetic underpinnings of these manifestations post-COVID.

Study limitations

There are several limitations to this study. The most obvious is that the available datasets, even in a disease as topical, prevalent, and debilitating as long COVID are still very small, which notwithstanding the improved sensitivity offered by the combinatorial analytics approach, inevitably poses limits on the statistical power of the study.

The most challenging limitation is the poor representation of diverse ancestries, which is essential to gain a deeper understanding of the variability of disease etiology and achieve a level of health equity. As demonstrated by the cohort analysis, even though considerable effort was made to recruit as diverse a population as possible, the majority of participants recruited to the GOLD study were of self-reported white Caucasian ancestry. It is evident that long COVID is a highly heterogeneous disease with a variety of different symptoms, clinical presentations and underlying disease mechanisms including neurological and metabolic dysregulation. From this dataset, we cannot understand the varying prevalence of these symptoms, or the effects that different genetic ancestries, socioeconomic factors, pathogen exposure levels or geographical differences may have in influencing the risk and presentation of long COVID in different ancestries.

Our cohort analysis also revealed that the incidence of other comorbidities (such as type 2 diabetes, cardiovascular disease etc.) was lower than expected for a cohort with the same average age as the long COVID population. This may indicate a degree of ‘otherwise healthy’ volunteer bias that limits this dataset as a representative sample of long COVID. Alternatively, it could reflect a problem with under-reporting of other medical conditions within the self-reported questionnaire.

All the non-genomic data was self-reported by the participants via a questionnaire upon recruitment to the study, including long COVID symptoms, level of acute COVID-19 severity and medical history. Unfortunately, no further EHR/primary care data was available. This method for reporting the degree of long COVID symptoms experienced is likely to be more subjective and prone to memory lapses and retrospective interpretation than direct and concurrent clinical information. This creates challenges in identifying the most relevant clusters of long COVID symptoms (e.g., respiratory, fatigue, GI etc.) and evaluating the severity of those symptoms experienced by different subgroups of cases.

We were unable to fully evaluate some of the most significant consequences and secondary diagnoses associated with long COVID disease. In particular, we would have liked to evaluate the specific drivers underlying the development of POTS, which was only recorded as part of participants’ free-text responses and not captured in the main questionnaire. In the absence of consistent diagnosis and clinical reporting for POTS, we attempted to analyze the symptoms that patients reported when recruited to the study. Tachycardia, dizziness, palpitations, brain fog and even in some cases POTS were recorded but in insufficient numbers for a meaningful analysis.

Hospital admission with a more severe form of acute COVID-19 has previously been identified as a risk factor for the development of long COVID [92]. We were unable to test this finding, as fewer than 10% of any of our case cohorts were hospitalized with COVID-19. As a result, there was insufficient data available to explore if long COVID cases with the 9 variants mapped to genes previously associated with acute COVID-19 (Table 8) were more likely to have experienced a more severe form of acute COVID-19.

Finally, there is some emerging evidence that vaccination against COVID-19 may be protective against the development of long COVID [93]. The analysis of our cohort does show a small but significant reduction in the severity of symptoms over the course of the period Jan 2020-Nov 2022. The majority of cases included in our study were first infected in 2020 or early 2021 (pre-widespread vaccination) but the questionnaire did not contain any questions regarding vaccination/booster status, or if the participants contracted acute COVID-19 before or after vaccination. As such, we are unable to evaluate the effect of vaccination on long COVID development within this cohort.

There is some additional evidence that omicron variants are less likely to cause long-term symptoms even after adjusting for vaccine status [94] and the evidence available in this study does not contradict that suggestion. However, as demonstrated by our cohort analysis, it was difficult to assess the association of SARS-CoV-2 variant status with long COVID risk due to the limited number of study participants recruited who contracted COVID-19 when omicron and more recent variants were the most prevalent in the UK, the limited amount of information on repeat infections, or vaccination/booster status.

Conclusions and future perspectives

The results of this study, while encouraging and building consistently on findings in ME/CFS and other diseases with related symptomology, still need to be validated and replicated within an independent long COVID population, which ideally would have much deeper clinical phenotype and longitudinal history information.

Various groups have been collecting large acute COVID-19 and long COVID patient datasets over the last 3 years and we hope that they will now make the individual patient level data available to the wider research community quickly. We can realistically expect that analyzing an independent, larger and more detailed patient dataset using combinatorial analytics approaches will further improve the disease insights that we are gaining in long COVID, offering routes forward to alleviate the massive unmet medical need which has blighted the lives of millions of patients.

Availability of data and materials

All data sources are described in the Supplementary Information, and no new source data were collected. Only data from existing GOLD and UK Biobank study cohorts were analyzed. All datasets generated during the study are described in the Supplementary Data section and/or available from the corresponding author upon reasonable request.


  1. World Health Organization WHO Fact Sheets Post COVID-19 condition (Long COVID) available from Accessed 8 Oct 2023

  2. O’Mahoney LL, Routen A, Gillies C, Ekezie W, Welford A, Zhang A, Karamchandani U, Simms-Williams N, Cassambai S, et al. The prevalence and long-term health effects of long Covid among hospitalised and non-hospitalised populations: a systematic review and meta-analysis. EClinicalMedicine. 2022;1(55):101762.;59:101959.

    Article  Google Scholar 

  3. WHO Coronavirus (COVID-19) Dashboard, last Accessed 4 June 2023

  4. Mallick D, Goyal L, Chourasia P, Zapata MR, Yashi K, Surani S. COVID-19 induced postural orthostatic tachycardia syndrome (POTS): a review. Cureus. 2023;15(3):e36955.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Ballouz T, Menges D, Anagnostopoulos A, Domenghino A, Aschmann HE, Frei A, Fehr JS, Puhan MA. Recovery and symptom trajectories up to two years after SARS-CoV-2 infection: population based, longitudinal cohort study. BMJ. 2023;31(381):e074425.

    Article  Google Scholar 

  6. Lee J, Kothari AS, Bhatt G, Gupta N, Ali AE, Najam N, Mazroua M, Mansoor T, Amal T, Elsaban M, Deo R. Cardiac complications among long COVID patients: a systematic review and meta-analysis. J Am Coll Cardiol. 2023;81(8):2115–2115.

    Article  Google Scholar 

  7. Davis HE, McCorkell L, Vogel JM, Topol EJ. Long COVID: major findings, mechanisms and recommendations. Nat Rev Microbiol. 2023;21(3):133–46. (Epub 2023 Jan 13. Erratum in: Nat Rev Microbiol. 2023 Jun;21(6):408).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Ceban F, Ling S, Lui LMW, Lee Y, Gill H, Teopiz KM, Rodrigues NB, Subramaniapillai M, Di Vincenzo JD, Cao B, Lin K, Mansur RB, Ho RC, Rosenblat JD, Miskowiak KW, Vinberg M, Maletic V, McIntyre RS. Fatigue and cognitive impairment in Post-COVID-19 syndrome: a systematic review and meta-analysis. Brain Behav Immun. 2022;101:93–135.

    Article  CAS  PubMed  Google Scholar 

  9. Harrison PJ, Taquet M. Neuropsychiatric disorders following SARS-CoV-2 infection. Brain. 2023;146(6):2241–7.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Kubota T, Kuroda N, Sone D. Neuropsychiatric aspects of long COVID: a comprehensive review. Psychiatry Clin Neurosci. 2023;77(2):84–93. (Epub 2022 Dec 12).

    Article  PubMed  Google Scholar 

  11. Vanichkachorn G, Newcomb R, Cowl CT, Murad MH, Breeher L, Miller S, Trenary M, Neveau D, Higgins S. Post-COVID-19 syndrome (long haul syndrome): description of a multidisciplinary clinic at mayo clinic and characteristics of the initial patient cohort. Mayo Clin Proc. 2021;96(7):1782–91.

    Article  CAS  PubMed  Google Scholar 

  12. Thaweethai T, Jolley SE, Karlson EW, Levitan EB, Levy B, McComsey GA, McCorkell L, Nadkarni GN, Parthasarathy S, RECOVER Consortium, et al. Development of a definition of postacute sequelae of SARS-CoV-2 infection. JAMA. 2023;329(22):1934–46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Lammi V, Ollila HM, Long COVID Host Genetics Initiative. Tackling long COVID using international host genetics research collaboration. Sleep Med. 2022;100:S64–5.

    Article  PubMed Central  Google Scholar 

  14. Lammi V, Nakanishi T, Jones SE, Long COVID Host Genetics Initiative, et al. Genome-wide association study of long COVID. Preprint at medRxiv. 2023.

    Article  Google Scholar 

  15. Taylor K, Das S, Pearson M, Kozubek J, Pawlowski M, Jensen CE, Skowron Z, Møller GL, Strivens M, Gardner S. Analysis of genetic host response risk factors in severe COVID-19 patients. Preprint at medRxiv. 2020.

    Article  PubMed  Google Scholar 

  16. Das S, Taylor K, Kozubek J, Sardell J, Gardner S. Genetic risk factors for ME/CFS identified using combinatorial analysis. J Transl Med. 2022;20(1):598.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Gardner S. Combinatorial analytics: an essential tool for the delivery of precision medicine and precision agriculture. Artif Intell Life Sci. 2021;1:100003.

    Article  CAS  Google Scholar 

  18. Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019;20(8):467–84.

    Article  CAS  PubMed  Google Scholar 

  19. Walsh R, Tadros R, Bezzina CR. When genetic burden reaches threshold. Eur Heart J. 2020;41(39):3849–55.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Sano Genetics GOLD Study Overview available from Accessed 8 Oct 2023

  21. UK Government Guidance COVID-19 Response: Living with COVID-19 available from Accessed 8 Oct 2023

  22. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple hypothesis testing. J R Stat Soc B. 1995;57:289–300.

    Google Scholar 

  23. Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, et al. Ensembl 2021. Nucleic Acids Res. 2021;49(D1):D884–91.

    Article  CAS  PubMed  Google Scholar 

  24. Jin Y, Schäffer AA, Sherry ST, Feolo M. Quickly identifying identical and closely related subjects in large databases using genotype data. PLoS ONE. 2017;12(6):e0179106.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. MacFarland TW, Yates JM, MacFarland TW, Yates JM. Mann–whitney u test. In: MacFarland TW, Yates JM, editors. Introduction to nonparametric statistics for the biological sciences using R. Cham: Springer International Publishing; 2016. p. 103–32.

    Chapter  Google Scholar 

  26. Ochoa D, Hercules A, Carmona M, Suveges D, Baker J, Malangone C, Lopez I, Miranda A, Cruz-Castillo C, Fumis L, Bernal-Llinares M, Tsukanov K, Cornu H, Tsirigos K, Razuvayevskaya O, Buniello A, Schwartzentruber J, Karim M, Ariano B, Martinez Osorio RE, Ferrer J, Ge X, Machlitt-Northen S, Gonzalez-Uriarte A, Saha S, Tirunagari S, Mehta C, Roldán-Romero JM, Horswell S, Young S, Ghoussaini M, Hulcoop DG, Dunham I, McDonagh EM. The next-generation open targets platform: reimagined, redesigned, rebuilt. Nucleic Acids Res. 2023;51(D1):D1353–9.

    Article  PubMed  Google Scholar 

  27. Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H, Vilo J. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019;47(W1):W191–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. de Goede KE, Harber KJ, Gorki FS, Verberk SGS, Groh LA, Keuning ED, Struys EA, van Weeghel M, Haschemi A, de Winther MPJ, van Dierendonck XAMH, Van den Bossche J. d-2-Hydroxyglutarate is an anti-inflammatory immunometabolite that accumulates in macrophages after TLR4 activation. Biochim Biophys Acta Mol Basis Dis. 2022;1868(9):166427.

    Article  CAS  PubMed  Google Scholar 

  30. Mosharaf MP, Reza MS, Kibria MK, Ahmed FF, Kabir MH, Hasan S, Mollah MNH. Computational identification of host genomic biomarkers highlighting their functions, pathways and regulators that influence SARS-CoV-2 infections and drug repurposing. Sci Rep. 2022;12(1):4279.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Hayashi Y. Metabolic impact of glucagon deficiency. Diabetes Obes Metab. 2011;13(Suppl 1):151–7. (PMID: 21824269).

    Article  CAS  PubMed  Google Scholar 

  32. Al-Hakeim HK, Al-Rubaye HT, Jubran AS, Almulla AF, Moustafa SR, Maes M. Increased insulin resistance due to Long COVID is associated with depressive symptoms and partly predicted by the inflammatory response during acute infection. Braz J Psychiatry. 2023;45(3):205–15.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Ustinova M, Peculis R, Rescenko R, Rovite V, Zaharenko L, Elbere I, Silamikele L, Konrade I, Sokolovska J, Pirags V, Klovins J. Novel susceptibility loci identified in a genome-wide association study of type 2 diabetes complications in population of Latvia. BMC Med Genomics. 2021;14(1):18.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Yadav A, Kataria MA, Saini V, Yadav A. Role of leptin and adiponectin in insulin resistance. Clin Chim Acta. 2013;18(417):80–4.

    Article  CAS  Google Scholar 

  35. Al-Kuraishy HM, Al-Gareeb AI, Bungau SG, Radu AF, Batiha GE. The potential molecular implications of adiponectin in the evolution of SARS-CoV-2: Inbuilt tendency. J King Saud Univ Sci. 2022;34(8):102347.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Dorighello GG, Assis LHP, Rentz T, Morari J, Santana MFM, Passarelli M, Ridgway ND, Vercesi AE, Oliveira HCF. Novel Role of CETP in macrophages: reduction of mitochondrial oxidants production and modulation of cell immune-metabolic profile. Antioxidants. 2022;11(9):1734.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Zhang Y, Li G. A tumor suppressor DLC1: The functions and signal pathways. J Cell Physiol. 2020;235(6):4999–5007.

    Article  CAS  PubMed  Google Scholar 

  38. Ma M, Brunal AA, Clark KC, Studtmann C, Stebbins K, Higashijima SI, Pan YA. Deficiency in the cell-adhesion molecule dscaml1 impairs hypothalamic CRH neuron development and perturbs normal neuroendocrine stress axis function. Front Cell Dev Biol. 2023;16(11):1113675.

    Article  Google Scholar 

  39. Chan KR, Koh CWT, Ng DHL, Qin S, Ooi JSG, Ong EZ, Zhang SLX, Sam H, Kalimuddin S, Low JGH, Ooi EE. Early peripheral blood MCEMP1 and HLA-DRA expression predicts COVID-19 prognosis. EBioMedicine. 2023;89:104472.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Papadopoulos KI, Papadopoulou A, Aw TC. Beauty and the beast: host microRNA-155 versus SARS-CoV-2. Hum Cell. 2023;36(3):908–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Ahmed FF, Reza MS, Sarker MS, Islam MS, Mosharaf MP, Hasan S, Mollah MNH. Identification of host transcriptome-guided repurposable drugs for SARS-CoV-1 infections and their validation with SARS-CoV-2 infections by using the integrated bioinformatics approaches. PLoS ONE. 2022;17(4):e0266124.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Jiao Y, Kong N, Wang H, Sun D, Dong S, Chen X, Zheng H, Tong W, Yu H, Yu L, Huang Y, Wang H, Sui B, Zhao L, Liao Y, Zhang W, Tong G, Shan T. PABPC4 broadly inhibits coronavirus replication by degrading nucleocapsid protein through selective autophagy. Microbiol Spectr. 2021;9(2):e0090821.

    Article  PubMed  Google Scholar 

  43. Zhang Y, Ozono S, Tada T, Tobiume M, Kameoka M, Kishigami S, Fujita H, Tokunaga K. MARCH8 targets cytoplasmic lysine residues of various viral envelope glycoproteins. Microbiol Spectr. 2022;10(1):e0061821.

    Article  PubMed  Google Scholar 

  44. Chen CH, Chen YC, Huang CH, Wang SH, Lin JS, Lo SC, Huang CC. Exploring potential proteomic biomarkers for prognosis of infective endocarditis through profiled autoantibodies by an immunomics protein array technique. Heart Surg Forum. 2020;23(5):E555–73.

    Article  PubMed  Google Scholar 

  45. Ish-Shalom E, Meirow Y, Sade-Feldman M, Kanterman J, Wang L, Mizrahi O, Klieger Y, Baniyash M. Impaired SNX9 expression in immune cells during chronic inflammation: prognostic and diagnostic implications. J Immunol. 2016;196(1):156–67.

    Article  CAS  PubMed  Google Scholar 

  46. Bendris N, Schmid SL. Endocytosis, metastasis and beyond: multiple facets of SNX9. Trends Cell Biol. 2017;27(3):189–200.

    Article  CAS  PubMed  Google Scholar 

  47. Fontes-Dantas FL, Fernandes GG, Gutman EG, De Lima EV, Antonio LS, Hammerle MB, Mota-Araujo HP, Colodeti LC, Araújo SMB, Froz GM, da Silva TN, Duarte LA, Salvio AL, Pires KL, Leon LAA, Vasconcelos CCF, Romão L, Savio LEB, Silva JL, da Costa R, Clarke JR, Da Poian AT, Alves-Leon SV, Passos GF, Figueiredo CP. SARS-CoV-2 Spike protein induces TLR4-mediated long-term cognitive dysfunction recapitulating post-COVID-19 syndrome in mice. Cell Rep. 2023;42(3):112189. (Epub 2023 Feb 17).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Choudhury A, Mukherjee S. In silico studies on the comparative characterization of the interactions of SARS-CoV-2 spike glycoprotein with ACE-2 receptor homologs and human TLRs. J Med Virol. 2020;92(10):2105–13.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Piehler A, Kaminski WE, Wenzel JJ, Langmann T, Schmitz G. Molecular structure of a novel cholesterol-responsive A subclass ABC transporter, ABCA9. Biochem Biophys Res Commun. 2002;295(2):408–16.

    Article  CAS  PubMed  Google Scholar 

  50. Park S, Song J, Baek IJ, Jang KY, Han CY, Jun DW, Kim PK, Raught B, Jin EJ. Loss of Acot12 contributes to NAFLD independent of lipolysis of adipose tissue. Exp Mol Med. 2021;53(7):1159–69.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Liu R, Liu X, Bai X, Xiao C, Dong Y. Different expression of lipid metabolism-related genes in Shandong black cattle and Luxi cattle based on transcriptome analysis. Sci Rep. 2020;10(1):21915.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Van Deveire KN, Scranton SK, Kostek MA, Angelopoulos TJ, Clarkson PM, Gordon PM, Moyna NM, Visich PS, Zoeller RF, Thompson PD, Devaney JM, Gordish-Dressman H, Hoffman EP, Maresh CM, Pescatello LS. Variants of the ankyrin repeat domain 6 gene (ANKRD6) and muscle and physical activity phenotypes among European-derived American adults. J Strength Cond Res. 2012;26(7):1740–8.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Dibley MG, Formosa LE, Lyu B, Reljic B, McGann D, Muellner-Wong L, Kraus F, Sharpe AJ, Stroud DA, Ryan MT. The mitochondrial acyl-carrier protein interaction network highlights important roles for LYRM family members in complex I and mitoribosome assembly. Mol Cell Proteomics. 2020;19(1):65–77.

    Article  CAS  PubMed  Google Scholar 

  54. Shen T, Miao Y, Ding C, Fan W, Liu S, Lv Y, Gao X, De Boevre M, Yan L, Okoth S, De Saeger S, Song S. Activation of the p38/MAPK pathway regulates autophagy in response to the CYPOR-dependent oxidative stress induced by zearalenone in porcine intestinal epithelial cells. Food Chem Toxicol. 2019;131:110527.

    Article  CAS  PubMed  Google Scholar 

  55. Killackey SA, Bi Y, Soares F, Hammi I, Winsor NJ, Abdul-Sater AA, Philpott DJ, Arnoult D, Girardin SE. Mitochondrial protein import stress regulates the LC3 lipidation step of mitophagy through NLRX1 and RRBP1. Mol Cell. 2022;82(15):2815-2831.e5.

    Article  CAS  PubMed  Google Scholar 

  56. Larhammar M, Huntwork-Rodriguez S, Rudhard Y, Sengupta-Ghosh A, Lewcock JW. The Ste20 family kinases MAP4K4, MINK1, and TNIK converge to regulate stress-induced JNK signaling in neurons. J Neurosci. 2017;37(46):11074–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Georgiadou M, Ivaska J. Tensins: bridging AMP-activated protein kinase with integrin activation. Trends Cell Biol. 2017;27(10):703–11.

    Article  CAS  PubMed  Google Scholar 

  58. Westmuckett AD, Thacker KM, Moore KL. Tyrosine sulfation of native mouse Psgl-1 is required for optimal leukocyte rolling on P-selectin in vivo. PLoS ONE. 2011;6(5):e20406.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Schulte E. 68. Untangling genetic risk factors of long covid: work of the international covid-19 host genetics initiative. Eur Neuropsychopharmacol. 2022;63:e82.

    Article  PubMed Central  Google Scholar 

  60. Satu MS, Khan MI, Rahman MR, Howlader KC, Roy S, Roy SS, Quinn JMW, Moni MA. Diseasome and comorbidities complexities of SARS-CoV-2 infection with common malignant diseases. Brief Bioinform. 2021;22(2):1415–29.

    Article  CAS  PubMed  Google Scholar 

  61. OpenTargets Evidence for GPC6 in COVID-19 available from Accessed 8 Oct 2023

  62. Schultheiß C, Paschold L, Willscher E, Simnica D, Wöstemeier A, Muscate F, Wass M, Eisenmann S, Dutzmann J, Keyßer G, Gagliani N, Binder M. Maturation trajectories and transcriptional landscape of plasmablasts and autoreactive B cells in COVID-19. iScience. 2021;24(11):103325. (Epub 2021 Oct 23).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Glessner JT, Chang X, Mentch F, Qu H, Abrams DJ, Thomas A, Sleiman PMA, Hakonarson H. COVID-19 in pediatrics: genetic susceptibility. Front Genet. 2022;16(13):928466.

    Article  CAS  Google Scholar 

  64. Thompson RC, Simons NW, Wilkins L, Cheng E, Del Valle DM, Hoffman GE, Cervia C, Fennessy B, Mouskas K, Francoeur NJ, et al. Molecular states during acute COVID-19 reveal distinct etiologies of long-term sequelae. Nat Med. 2023;29(1):236–46.

    Article  CAS  PubMed  Google Scholar 

  65. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. Gene Ontol Consort Nat Genet. 2000;25(1):25–9.

    Article  CAS  Google Scholar 

  66. The Gene Ontology Consortium. The gene ontology knowledgebase in 2023. Genetics. 2023;224(1):031.

    Article  Google Scholar 

  67. Stopford MJ, Allen SP, Ferraiuolo L. A high-throughput and pathophysiologically relevant astrocyte-motor neuron co-culture assay for amyotrophic lateral sclerosis therapeutic discovery. Bio Protoc. 2019;9(17):e3353.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Das S, Taylor K, Beaulah S, Gardner S. Systematic indication extension for drugs using patient stratification insights generated by combinatorial analytics. Patterns. 2022;3(6):100496.

    Article  PubMed  PubMed Central  Google Scholar 

  69. Mukherjee S. Toll-like receptor 4 in COVID-19: friend or foe? Future Virol. 2022. (Epub 2022 Apr 19).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Liu ZM, Yang MH, Yu K, Lian ZX, Deng SL. Toll-like receptor (TLRs) agonists and antagonists for COVID-19 treatments. Front Pharmacol. 2022;7(13):989664.

    Article  CAS  Google Scholar 

  71. GlobalData Pharma Market Data and Insights available from Accessed 8 Oct 2023

  72. Clinical Development of EB05 for the Treatment of ARDS presented at ARDS Drug Development Summit July 14, 2022, available from Accessed 8 Oct 2023

  73. Tabas I, Bornfeldt KE. Macrophage phenotype and function in different stages of atherosclerosis. Circ Res. 2016;118(4):653–67.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Wendisch D, Dietrich O, Mari T, von Stillfried S, Ibarra IL, Mittermaier M, Mache C, Chua RL, Knoll R, Timm S, Brumhard S, Deutsche COVID-19 OMICS Initiative (DeCOI), et al. SARS-CoV-2 infection triggers profibrotic macrophage responses and lung fibrosis. Cell. 2021;184(26):6243-6261.e27.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Tomas C, Brown A, Strassheim V, Elson JL, Newton J, Manning P. Cellular bioenergetics is impaired in patients with chronic fatigue syndrome. PLoS ONE. 2017;12(10):e0186802.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Trotta R, Fettucciari K, Azzoni L, Abebe B, Puorro KA, Eisenlohr LC, Perussia B. Differential role of p38 and c-Jun N-terminal kinase 1 mitogen-activated protein kinases in NK cell cytotoxicity. J Immunol. 2000;165(4):1782–9.

    Article  CAS  PubMed  Google Scholar 

  77. Huth TK, Staines D, Marshall-Gradisnik S. ERK1/2, MEK1/2 and p38 downstream signalling molecules impaired in CD56 dim CD16+ and CD56 bright CD16 dim/- natural killer cells in chronic fatigue Syndrome/myalgic encephalomyelitis patients. J Transl Med. 2016;21(14):97.

    Article  CAS  Google Scholar 

  78. de Goede P, Wefers J, Brombacher EC, Schrauwen P, Kalsbeek A. Circadian rhythms in mitochondrial respiration. J Mol Endocrinol. 2018;60(3):R115–30.

    Article  PubMed  PubMed Central  Google Scholar 

  79. Schmitt K, Grimm A, Dallmann R, Oettinghaus B, Restelli LM, Witzig M, Ishihara N, Mihara K, Ripperger JA, Albrecht U, Frank S, Brown SA, Eckert A. Circadian control of DRP1 activity regulates mitochondrial dynamics and bioenergetics. Cell Metab. 2018;27(3):657-666.e5.

    Article  CAS  PubMed  Google Scholar 

  80. Oosterman JE, Wopereis S, Kalsbeek A. The circadian clock, shift work, and tissue-specific insulin resistance. Endocrinology. 2020;161(12):bqaa180.

    Article  CAS  PubMed  Google Scholar 

  81. Orozco-Solis R, Aguilar-Arnal L. Circadian regulation of immunity through epigenetic mechanisms. Front Cell Infect Microbiol. 2020;13(10):96.

    Article  CAS  Google Scholar 

  82. Labrecque N, Cermakian N. Circadian clocks in the immune system. J Biol Rhythms. 2015;30(4):277–90.

    Article  CAS  PubMed  Google Scholar 

  83. Hannou L, Bélanger-Nelson E, O’Callaghan EK, Dufort-Gervais J, Ballester Roig MN, Roy PG, Beaulieu JM, Cermakian N, Mongrain V. Regulation of the neuroligin-1 gene by clock transcription factors. J Biol Rhythms. 2018;33(2):166–78.

    Article  CAS  PubMed  Google Scholar 

  84. Wang C, Ahlford A, Järvinen TM, Nordmark G, Eloranta ML, Gunnarsson I, Svenungsson E, Padyukov L, Sturfelt G, Jönsen A, Bengtsson AA, Truedsson L, Eriksson C, Rantapää-Dahlqvist S, Sjöwall C, Julkunen H, Criswell LA, Graham RR, Behrens TW, Kere J, Rönnblom L, Syvänen AC, Sandling JK. Genes identified in Asian SLE GWASs are also associated with SLE in caucasian populations. Eur J Hum Genet. 2013;21(9):994–9. (Epub 2012 Dec 19).

    Article  CAS  PubMed  Google Scholar 

  85. He CF, Liu YS, Cheng YL, Gao JP, Pan TM, Han JW, Quan C, Sun LD, Zheng HF, Zuo XB, Xu SX, Sheng YJ, Yao S, Hu WL, Li Y, Yu ZY, Yin XY, Zhang XJ, Cui Y, Yang S. TNIP1, SLC15A4, ETS1, RasGRP3 and IKZF1 are associated with clinical features of systemic lupus erythematosus in a Chinese Han population. Lupus. 2010;19(10):1181–6.

    Article  PubMed  Google Scholar 

  86. Kobayashi T, Nguyen-Tien D, Ohshima D, Karyu H, Shimabukuro-Demoto S, Yoshida-Sugitani R, Toyama-Sorimachi N. Human SLC15A4 is crucial for TLR-mediated type I interferon production and mitochondrial integrity. Int Immunol. 2021;33(7):399–406.

    Article  CAS  PubMed  Google Scholar 

  87. Scherer PE, Kirwan JP, Rosen CJ. Post-acute sequelae of COVID-19: a metabolic perspective. Elife. 2022;23(11):e78200.

    Article  Google Scholar 

  88. Xie Y, Al-Aly Z. Risks and burdens of incident diabetes in long COVID: a cohort study. Lancet Diabetes Endocrinol. 2022;10(5):311–21. (Epub 2022 Mar 21).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Montefusco L, Ben Nasr M, D’Addio F, Loretelli C, Rossi A, Pastore I, Daniele G, Abdelsalam A, Maestroni A, Dell’Acqua M, Ippolito E, Assi E, Usuelli V, Seelam AJ, Fiorina RM, Chebat E, Morpurgo P, Lunati ME, Bolla AM, Finzi G, Abdi R, Bonventre JV, Rusconi S, Riva A, Corradi D, Santus P, Nebuloni M, Folli F, Zuccotti GV, Galli M, Fiorina P. Acute and long-term disruption of glycometabolic control after SARS-CoV-2 infection. Nat Metab. 2021;3(6):774–85. (Epub 2021 May 25).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Chen X, Chen Y, Wu C, Wei M, Xu J, Chao YC, Song J, Hou D, Zhang Y, Du C, Li X, Song Y. Coagulopathy is a major extrapulmonary risk factor for mortality in hospitalized patients with COVID-19 with type 2 diabetes. BMJ Open Diabetes Res Care. 2020;8(2):e001851.

    Article  PubMed  Google Scholar 

  91. Premraj L, Kannapadi NV, Briggs J, Seal SM, Battaglini D, Fanning J, Suen J, Robba C, Fraser J, Cho SM. Mid and long-term neurological and neuropsychiatric manifestations of post-COVID-19 syndrome: a meta-analysis. J Neurol Sci. 2022;434:120162. (Epub 2022 Jan 29).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Tsampasian V, Elghazaly H, Chattopadhyay R, et al. Risk factors associated with post−COVID-19 condition: a systematic review and meta-analysis. JAMA Intern Med. 2023.

    Article  PubMed  PubMed Central  Google Scholar 

  93. Byambasuren O, Stehlik P, Clark J, Alcorn K, Glasziou P. Effect of covid-19 vaccination on long covid: systematic review. BMJ Med. 2023;2(1):e000385.

    Article  PubMed  PubMed Central  Google Scholar 

  94. Antonelli M, Pujol JC, Spector TD, Ourselin S, Steves CJ. Risk of long COVID associated with delta versus omicron variants of SARS-CoV-2. Lancet. 2022;399(10343):2263–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


Research described in this article has been conducted using data from Sano Genetics’ Long COVID GOLD study and we thank the Sano team for their help in preparing these data. Special thanks to Anastasia Lankina and Mark Strivens who provided input into the manuscript, Gert Møller and Claus Erik Jensen, who initially developed the combinatorial analytics methodology, and the rest of the PrecisionLife team.


The project was funded entirely by PrecisionLife Ltd.

Author information

Authors and Affiliations



KT, MP, SD, KC, JS. performed the analysis, and SG, KT, SD, and JS. contributed to writing of the manuscript. All authors consent to publication.

Corresponding author

Correspondence to Steve Gardner.

Ethics declarations

Ethics approval and consent to participate

The Sano Genetics GOLD study has approval from the Wales Research Ethics Committee (REC) (IRAS 291221). Consent to participate has been received from all participants.

Competing interests

K.T., M.P., S.D., K.C., J.S, and S.G. are employees of PrecisionLife, Ltd. S.G. is a shareholder of PrecisionLife, Ltd.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Data dictionary for the Sano GOLD study.

Additional file 2.

Critical SNPs found in Long COVID study.

Additional file 3.

Pathway enrichment of long COVID Signatures.

Additional file 4.

Severe Cohort Disease signatures.

Additional file 5.

Supplementary Material section.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Taylor, K., Pearson, M., Das, S. et al. Genetic risk factors for severe and fatigue dominant long COVID and commonalities with ME/CFS identified by combinatorial analysis. J Transl Med 21, 775 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: