Genetic risk factors for severe and fatigue dominant long COVID and commonalities with ME/CFS identified by combinatorial analysis

Background Long COVID is a debilitating chronic condition that has affected over 100 million people globally. It is characterized by a diverse array of symptoms, including fatigue, cognitive dysfunction and respiratory problems. Studies have so far largely failed to identify genetic associations, the mechanisms behind the disease, or any common pathophysiology with other conditions such as myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) that present with similar symptoms. Methods We used a combinatorial analysis approach to identify combinations of genetic variants significantly associated with the development of long COVID and to examine the biological mechanisms underpinning its various symptoms. We compared two subpopulations of long COVID patients from Sano Genetics’ Long COVID GOLD study cohort, focusing on patients with severe or fatigue dominant phenotypes. We evaluated the genetic signatures previously identified in an ME/CFS population against this long COVID population to understand similarities with other fatigue disorders that may be triggered by a prior viral infection. Finally, we also compared the output of this long COVID analysis against known genetic associations in other chronic diseases, including a range of metabolic and neurological disorders, to understand the overlap of pathophysiological mechanisms. Results Combinatorial analysis identified 73 genes that were highly associated with at least one of the long COVID populations included in this analysis. Of these, 9 genes have prior associations with acute COVID-19, and 14 were differentially expressed in a transcriptomic analysis of long COVID patients. A pathway enrichment analysis revealed that the biological pathways most significantly associated with the 73 long COVID genes were mainly aligned with neurological and cardiometabolic diseases. Expanded genotype analysis suggests that specific SNX9 genotypes are a significant contributor to the risk of or protection against severe long COVID infection, but that the gene-disease relationship is context dependent and mediated by interactions with KLF15 and RYR3. Comparison of the genes uniquely associated with the Severe and Fatigue Dominant long COVID patients revealed significant differences between the pathways enriched in each subgroup. The genes unique to Severe long COVID patients were associated with immune pathways such as myeloid differentiation and macrophage foam cells. Genes unique to the Fatigue Dominant subgroup were enriched in metabolic pathways such as MAPK/JNK signaling. We also identified overlap in the genes associated with Fatigue Dominant long COVID and ME/CFS, including several involved in circadian rhythm regulation and insulin regulation. Overall, 39 SNPs associated in this study with long COVID can be linked to 9 genes identified in a recent combinatorial analysis of ME/CFS patient from UK Biobank. Among the 73 genes associated with long COVID, 42 are potentially tractable for novel drug discovery approaches, with 13 of these already targeted by drugs in clinical development pipelines. From this analysis for example, we identified TLR4 antagonists as repurposing candidates with potential to protect against long term cognitive impairment pathology caused by SARS-CoV-2. We are currently evaluating the repurposing potential of these drug targets for use in treating long COVID and/or ME/CFS. Conclusion This study demonstrates the power of combinatorial analytics for stratifying heterogeneous populations in complex diseases that do not have simple monogenic etiologies. These results build upon the genetic findings from combinatorial analyses of severe acute COVID-19 patients and an ME/CFS population and we expect that access to additional independent, larger patient datasets will further improve the disease insights and validate potential treatment options in long COVID. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-023-04588-4.


Introduction
Post COVID-19 condition (or long COVID) is a debilitating syndrome that the World Health Organization (WHO) estimates affects up to 20% of people infected by SARS-CoV-2 [1].Other more recent studies put the prevalence of long-term symptoms (over 3 months post-infection) in COVID-19 patients even higher [2], with all estimates implying that over 100 million patients have been affected by the condition globally [3].Even though symptoms decline for most patients over time, some patients still experienced symptoms such as post-exertional malaise or postural tachycardia syndrome (POTS) [4] up to 2 years after infection [5], and the long-term health consequences of long COVID remain unknown, with suggestions of a doubling of the risk of developing cardiovascular issues [6].
Reports indicate an extensive array of symptoms associated with long COVID [7], with the most common being fatigue and post-exertional malaise (PEM) [8], cognitive dysfunction [9], mood disturbances [10] and respiratory problems [11].However, establishing a precise diagnosis for either of these diseases has proved challenging, in large part due to the complexity and diversity of their clinical presentation and their effects across multiple organ systems.In an attempt to provide some definitive metrics, a recent study developed a data-driven scoring framework for diagnosing long COVID based on the available symptom data [12].
Although many studies have investigated the genetic risks underlying long COVID, only one genome-wide association study (GWAS) has identified a single risk locus around the lead variant in FOXP4 [13,14].Studies that used combinatorial analytical approaches to delineate genetic risk factors in similarly heterogenous populations have demonstrated more success, for example in severe COVID-19 [15] and ME/CFS [16].
Combinatorial analytics approaches identify combinations of features that together (rather than individually) are associated with the disease phenotype [17].They capture the non-linear effects of interactions between multiple genes (and exogenous factors if available).These signals are distinct from and complementary to the monogenic, linear additive associations of single SNPs found by GWAS.In complex (multifactorial and heterogenous) diseases these non-linear combinatorial signals are significantly more important in understanding disease biology than in relatively monogenic disorders such as many cancers and rare genetic disorders [18,19].
In this study we used combinatorial analytics to identify disease risk signatures (combinations of genetic variants significantly associated with the development of long COVID) and explored the biological mechanisms with which they are involved.We investigated subpopulations of long COVID patients who had experienced either severe disease or a fatigue dominant phenotype, to compare the underlying genes and pathways that explain some of the heterogenous manifestations of the disease.
We also compared the output of this study against our previous ME/CFS analysis [16] to understand similarities in post-viral fatigue and other phenotypes experienced by subsets of long COVID patients.Finally, we compared the pathways that were significantly enriched in this genetic analysis of long COVID against known genetic associations in other chronic diseases that are predominantly autoimmune, neurological and/or metabolic in nature, to evaluate any common pathophysiological mechanisms that might be shared by long COVID.

Sano Genetics GOLD study dataset
Genotypic and phenotypic data for both cases and controls included in this study were generated from Sano Genetics' Long COVID GOLD study [20].Eligible participants (n = 1996), recruited between 2020 and 2022, provided saliva samples for an at-home Sano DNA Test (evaluated via Illumina Global Screening Array with Multi-disease drop-in panel) and completed a questionnaire hosted on the Sano Genetics platform detailing their acute COVID-19 and long COVID symptoms (if experienced), as well as basic demographic data and other chronic health conditions (see Additional file 1).

Symptom based score for long COVID severity
Given the heterogeneity of post-COVID symptoms reported by the GOLD study and other previous studies, we developed a data-driven scoring method to characterize the severity of self-reported symptoms.We analyzed participant reported scores for each available long COVID symptom experienced pre-and post-acute COVID-19, including breathlessness, fatigue, degree of muscle pain and change in mental health (see Additional file 5: Table S1 for more details).A 'Total Change' score was generated for each patient from the sum of the reported differences across symptoms pre-and post-COVID.

Cohort characteristics
At the time of analysis, a total of 1829 individuals in the GOLD study had a self-reported COVID-  The GOLD study cohort included in this analysis was recruited between January 2020 and November 2022.Using the Office for National Statistics (ONS) COVID-19 Infection Survey data [21], we have estimated the most prevalent circulating SARS-CoV-2 variant in the UK that each participant was most likely to be exposed to when they contracted COVID-19 (Fig. 1).This demonstrates that the majority (65%) of samples included in the study were most likely infected with the wildtype strain.The study does not include any participants who contracted any of the more recent SARS-CoV-2 variants that emerged in 2023.
In the dataset, 1489 (81.4%) individuals provided freetext responses on other symptoms that they experienced since their illness that were not covered elsewhere in the questionnaire.The most frequently reported symptoms included loss of smell, headache, pain, tinnitus, loss of taste, dizziness, insomnia and postural tachycardia syndrome (POTS) (see Additional file 5: Table S2 and Additional file 5: Figure S2).Following COVID infection, 353 (19.3%) individuals reported reducing their working hours while 359 (19.6%) people discontinued working altogether post-illness.

Long COVID cohorts
We defined two long COVID case populations from the GOLD study based on self-reported symptom changes 3 months post COVID-19-'Severe' long-haulers who reported the greatest variety and severity of symptoms and 'Fatigue Dominant' cases who reported predominantly fatigue-associated long COVID symptoms.
The World Health Organization defines long COVID patients as those experiencing one or more symptoms post initial COVID-19 infection.However, the cohort in the GOLD study that met these criteria displayed a great range in the severity and length of self-reported symptoms experienced post COVID-19.Instead, we aimed to focus on the more 'severe' long haulers who reported the greatest degree of symptoms experienced as these are likely to be the patients experiencing long COVID symptoms that do not diminish over time without pharmaceutical intervention.
The Fatigue Dominant' cohort was chosen primarily due to their phenotypic similarity with ME/CFS, allowing us to explore potential commonalities between the diseases based on our previously published combinatorial analysis for ME/CFS [16].
The number and overlap in cases and controls included in the two datasets are included in Additional file 5: Figure S3.

Severe long COVID cohort
The Severe long COVID cohort (n = 1,323 where cases = 459 and controls = 864) was selected using the difference in scores reported pre-and post-acute COVID-19 for three long COVID symptom groups-namely, respiratory, fatigue and mental health.Severe cases were defined as those with a 'Total Change' score for these symptoms greater than or equal to the upper quartile of the distribution.The controls in this study were defined as samples with a 'Total Change' score greater than or equal to 0 but below the median of the distribution.

Fatigue dominant long COVID cohort
The Fatigue Dominant cohort (n = 1,386 where cases = 477 and controls = 909) was selected using only a subset of symptoms relating to fatigue in the scores ('Fatigue Change') reported for pre-and post-acute COVID-19 symptoms (see Additional file 5: Table S1).
The controls in this study were defined as samples with a 'Fatigue Change' score greater than or equal to 0 but below the median of the distribution.
The characteristics of the two cohorts are shown in Fig. 2 and described in Table 1, Fig. 1, Fig. 2 and Additional file 5: Figure S4.

Dataset QC
The two case-control datasets underwent a series of quality control (QC) procedures before they were analyzed using the PrecisionLife platform.
Standard variant-level and sample-level QC procedures were applied to the dataset (comprising of 696,382 SNPs) as described in the Genotype Quality Control section in Supplementary Information.Due to the small sample size of the two long COVID cohorts, the genotype data was filtered to exclude SNPs with minor allele frequency (MAF) < 5%.Very low frequency SNPs were removed as significant combinations involving rare variants are especially infrequent.This filter also increases the statistical power of combinatorial analysis to detect genotype-disease associations by reducing the amount of false discovery rate (FDR) correction required when testing multiple SNP-genotype combinations.Following QC, the Severe dataset comprised of 283,478 SNPs and the Fatigue Dominant dataset contained 283,444 SNPs.

Combinatorial analytics using the PrecisionLife platform
The PrecisionLife combinatorial analysis platform enables hypothesis-free identification of high-order combinatorial features (known as disease signatures), which may include multiple SNP genotypes and/or other multimodal features in combination.These disease signatures capture both the linear and non-linear effects of genetic and molecular interaction networks and enable the identification of associations including those that are only relevant to a subgroup of patients.We have previously validated this analytical approach across a variety of complex chronic diseases where it has identified more associations with increased explanation of observed disease variance and reproducibility than comparable GWAS studies [15][16][17].
In the combinatorial analytics approach, disease signatures are identified and statistically validated in 'layers' of increasing combinatorial complexity, i.e., singletons, pairs, triplets etc. (also known as combinatorial order).Each disease signature is validated multiple times using several statistical tests at each stage of the process to avoid false positives.A more detailed description of the mining and validation stages is given in our previous ME/ CFS study [16].
We applied the PrecisionLife platform to both long COVID case-control datasets in a hypothesis-free manner to identify combinations of SNP genotypes that are strongly associated with the development of long COVID symptoms when they co-occur in the same patient.The method prioritizes SNP genotype combinations that have high odds ratios, low p-values (p < 0.05) and high prevalence (> 5%) in long COVID cases.A permutation-based approach was used to compare the observed properties of the most highly associated SNP-genotype combinations to the null distribution for randomized datasets [22], with p-value cut-offs based on a specified threshold (Benjamini-Hochberg FDR of 0.05) after multiple testing correction.Combinations passing these tests were reported as validated long COVID disease signatures.
Finally, a merged network (disease architecture) view is generated by clustering all validated disease signatures based on their co-occurrence in patients in the dataset.
SNPs found in multiple disease signatures often form the central hub of the disease architecture (see Fig. 3).These are termed 'critical SNPs' if the corresponding networks pass a further permutation-based statistical test.Potential critical SNPs are scored using a Random Forest (RF) algorithm with a fivefold cross-validation framework to assess the accuracy with which they predict the case-control split in the dataset.
A cascade mapping process was used to map all the critical SNPs identified in the validated disease signatures to the human reference genome (GRCh38) [23].SNPs identified in the coding region of a gene (or genes) were mapped directly to this gene and any remaining SNPs within 2kb upstream or 0.5kb downstream were mapped to the nearest gene(s).Due to the uncertainty about the wide range of cells and tissues that have been implicated in long COVID etiology [7], genes assigned by either expression quantitative trait loci (eQTLs) or chromatin interaction (Hi-C) data were not specifically prioritized for further analysis (as they would likely be in other indications) to avoid capturing any spurious associations from non-trait-related tissues or cells.Genes that could additionally be mapped using only eQTL or Hi-C data from the critical SNPs were observed and reported in Additional file 2, although these were not further evaluated.
Finally, a semantic knowledge graph, including data from over 50 public data sources (see Additional file 5: Table S3), was used to annotate the SNPs and genes, including data on prior genetic associations to disease, chromosomal location, tissue expression profiles, splice variants, mouse phenotypes, protein function/structure, Fig. 3 Conceptual representation of features, combinations and disease signatures that form part of PrecisionLife's combinatorial analytics methodology.In the case of the long COVID study all features were SNP genotypes, but other feature types, e.g., a patient's expression level of a specific protein, medication history or clinical features such as their eosinophil level, can also be used, independently from or in combination with the genotype data known active chemistry and any pre-existing scientific literature or clinical trials among other attributes.This allows us to generate evidence-backed mechanism of action hypotheses as to each genetic variant's potential impact on a patient's long COVID phenotype.

Ancestry analysis
Ancestry inference for the samples in the GOLD study was performed using GRAF-pop [24].To maximize the number of samples included in each case-control dataset, samples of all ancestries were included in the analysis.Since ancestry-specific analyses could not be performed due to limited samples in each cohort, we performed a logistic regression analysis to control for confounding effects of population structure.Any disease signatures that were no longer significantly associated with casecontrol status (p < 0.05 with Bonferroni FDR correction) in a logistic regression that also includes a binary ancestry variable for white-European/other ancestry were considered false positives and removed from further analysis.

Assessing causality with expanded genotypes analysis
The disease signatures output by the PrecisionLife platform represent combinations of SNP genotypes that are significantly enriched in cases relative to controls.Expanded genotypes analysis ("EGA") tests how the genotype of a critical SNP from the disease signature affects the odds of disease when the genotypes of all interacting SNPs are held constant.
For each disease signature, we first assign patients to one of the possible combinations of the component SNP genotypes (the "expanded genotype signatures").In the example illustrated in Fig. 4, the validated disease signature is comprised of two SNPs, each in one of 3 states (0, 1 and 2), which can generate 9 (3 2 ) expanded genotype signatures.For combinations of 3, 4, and 5 SNPs, the number of expanded genotypes signatures is 27, 81, and 243 respectively.We then calculate the disease odds for patients with each expanded genotype signature.
For a given critical SNP of interest, we identify sets of expanded genotype signatures that share the same genotypes for all interacting SNPs (the blocks separated by the horizontal lines in Fig. 4).We calculate the "EGA odds ratio" by dividing the disease odds ratio for an expanded genotype signature with a copy of the critical SNP minor allele by the disease odds ratio for the matching expanded genotype signature with the critical SNP homozygous wild type genotype.
Due to the small number of patients associated with individual expanded genotype signatures, we may have insufficient statistical power to directly test whether the EGA odds ratios are significantly different from zero.Instead, the primary aim of the EGA is to test whether the observed directionality of the relationship between the critical SNP minor allele and disease phenotype is consistent across all or most expanded genotype signatures.If the critical SNP genotype does not affect disease, then we expect the minor allele genotype will be randomly associated with increased odds of disease for some expanded genotype signatures and decreased odds of disease for others, with no consistent biological pattern.
In the hypothetical example shown in Fig. 4, the EGA reveals that the critical SNP minor allele is consistently associated with elevated disease risk after controlling for the genotype of the interacting SNP.This pattern holds even though patients with the critical SNP minor allele have below average odds of disease when they also possess the wild type genotype at the interacting SNP.By controlling for the confounding effects of the interacting SNP, EGA allows us to gain a better understanding of the relationship between the critical SNP and disease.
Each disease signature was assigned to one of the following seven categories based on the broad patterns observed from the EGA: universally causative, universally protective, SNP-specific causative, SNP-specific protective, combination-specific causative, combinationspecific protective, or ambiguous.Definitions of each Fig. 4 Hypothetical example of an expanded genotypes analysis for a disease signature comprised of two SNPs.After controlling for the confounding effects of the interacting SNP genotype, patients with one or two copies of the critical SNP minor allele (genotypes "1" and "2") have consistently elevated odds of disease relative to patients with the wild type genotype ("0") at the critical SNP category are provided in Additional file 5: Table S5.Across these categories, the designation of "Causative" and "Protective" do not necessarily guarantee that the specific critical SNP identified in the analysis directly affects disease risk.Due to low SNP coverage, the critical SNP could potentially be a neutral marker that is in strong linkage disequilibrium with the true biological variant.
We excluded all expanded genotype signatures which occurred in fewer than 15 patients from the EGA.Likewise, we did not consider disease signatures comprised of 4 or 5 SNPs due to the limited statistical power provided by the size of the available datasets.There are 81 possible expanded genotype signatures for a combination of 4 SNPs, which corresponds to only 17 patients per expanded genotype signature.More problematically, there are 243 possible expanded genotype signatures for a combination of 5 SNPs, which corresponds to fewer than 6 patients per expanded genotype signature.The stochastic noise associated with such small sample sizes make it very difficult to identify broad patterns across the full set of expanded genotype pairs.

Phenotype enrichment analysis
The available clinical data from the questionnaire was used to evaluate the long COVID patient profiles associated with each of the disease signatures generated by the analysis.We calculated the statistical significance of the association of a particular phenotype with a set of long COVID cases with shared genetic variants when compared against the rest of the case population.The two proportions Z-test was used for categorical variables, such as severity of acute COVID-19 and comorbidities, and Mann-Whitney U [25] for any continuous variables, such as participant reported scores that reflect change in symptoms pre-and post-COVID-19.Statistical associations were corrected for multiple testing using Benjamini-Hochberg method.

Overlap analysis ("seeded" approach)
We evaluated the genetic overlap between the Severe and Fatigue Dominant cohorts by taking the SNPs identified in the hypothesis-free analysis for one dataset (seed SNPs) and testing whether any combinations involving them are also significantly associated with disease risk in the second dataset when analyzed by the PrecisionLife platform (see section "Combinatorial analytics using the PrecisionLife platform").
This hypothesis-driven or 'seeded' approach was performed in addition to a direct gene overlap analysis between the two cohorts.This approach mitigates the effects of stochastic differences in dataset composition when defining the combinatorial search space explored in our analyses.The number of possible SNP-genotype combinations is so extensive that it is impossible to sample the entirety of the space.This implies that true associations may remain unreported because they were not tested when the dataset was analyzed using the hypothesis-free approach.
We also employed this technique when evaluating the overlap between the genes identified in our analysis of the UK Biobank ME/CFS population and the two long COVID cohorts generated from the GOLD study.Due to the low SNP overlap (n = 42,500) between the arrays used to genotype the ME/CFS and long COVID datasets, we performed a seeded analysis using 383 SNPs in the Severe and Fatigue GOLD dataset that were within 10kb up or downstream of the original 14 ME/CFS genes.

Cross disease analysis
Cross disease analysis can provide insights into potential drug repurposing opportunities or development of common therapies.We compared the genes that were significantly associated with Severe and Fatigue Dominant long COVID against a variety of other chronic diseases to identify shared pathophysiological mechanisms.These diseases included neurodegenerative, mental and behavioural disorders, cardiovascular, gastrointestinal, autoimmune and metabolic diseases (see Additional file 5: Tables S8, S9).Disease-associated genes identified for each indication group are those with known genetic links reported in OpenTargets [26] (v 23.02, February 2023 release).Only genes with strong target-disease genetic association scores (> 0.9 out of 1.0) have been used in this analysis for each indication group.
Enrichment analysis was performed using the g:Profiler tool [27] to determine pathways and biological processes that are significantly associated with the disease-associated genes for each indication group (p < 0.05, p-value correction for multiple testing using Benjamini-Hochberg).This allows us to explore up/downstream of individual gene targets to identify biological processes that are impacted across diseases.

GWAS analysis
We evaluated the significance of individual genetic variants associated with the two long COVID datasets (Severe and Fatigue Dominant) using a standard GWAS analysis with PLINK [28].As can be observed from the two Manhattan plots (Additional file 5: Figure S5), no SNP from either of the two cohorts reached the genomewide significance threshold (p < 5 × 10 −8 ).

Cohort analysis
To determine whether there was a correlation between circulation SARS-CoV-2 variant and long COVID symptom severity, we plotted the 'Total Change' score for all study participants, including cases and controls, against the month they first contracted COVID-19 (Fig. 5).As defined in our Severe long COVID cohort, the greater the 'Total Change' score, the greater the degree of severity in long COVID symptoms experienced by the participant.
This analysis shows a significant decrease in symptom severity over time, although the correlation coefficient is low, potentially due to data variability (Additional file 5: Figure S6).

Hypothesis free combinatorial analysis
Using the PrecisionLife combinatorial analysis platform, we identified 86 disease associated critical SNPs for the Severe cohort and 84 for the Fatigue Dominant cohort, mapping to 43 and 36 genes respectively (Table 2).A total of 74 unique genes were associated with at least one of the long COVID cohorts, including 5 genes which were identified in both the Severe and Fatigue Dominant cohorts.
The disease signatures associated with each cohort were all combinations of 2 or more SNP genotypes, i.e., they were all combinatorial signals, predominantly involving combinations of 3-5 SNPs, that could not have been identified using GWAS (Fig. 6).An example of one of the disease signatures identified in the analysis of the Severe long COVID cohort is shown in Table 3.None of  the SNPs identified in disease signatures were observed to be in linkage disequilibrium (LD) with each other.All cases included in the analysis possessed at least one of the disease signatures found to be significant in the hypothesis-free study of its cohort.The complete list of genetic variants and their mapped genes identified from this study are listed in Additional File 2 (Fig. 7).
Upon further evaluation, 118 (10%) disease signatures identified in the Severe cohort and 120 (8.4%) signatures in the Fatigue Dominant cohort comprised of SNPs that could be mapped to genes with shared biological functions or pathways (see Additional file 3).As there were limited number of cases and controls of non-European ancestry (see Additional file 5: Table S6) in each of the two datasets, we evaluated the output to identify any disease signatures that may be confounded by population structure effects rather than reflecting a true disease signal.
All disease signatures in the Severe cohort passed the ancestry confounder analysis.We identified 129 (9%) disease signatures in the Fatigue Dominant cohorts that did not pass the ancestry confounder check (Additional file 5: Table S7).However, when we removed the SNPs and mapped genes represented only by these potentially confounding disease signatures (and not also by one or more additional true disease signatures), only one gene (AC005005.1)associated with the Fatigue Dominant cohort linked to the critical SNP, rs4820946, was eliminated from all final disease associated gene lists.This reduced the 74 genes found to 73.
The cohort analysis indicates that fewer than 15% of cases that were assigned to either one or both long COVID case groups were hospitalized with severe COVID-19 or reported co-associated chronic diseases such as diabetes, cardiovascular disease or cognitive impairment.This meant that the number of cases with these phenotypes was too low to identify any associations, such as COVID-19 severity or a particular comorbidity, with genetic disease signatures.
Enrichment analysis of the fatigue, respiratory and mental health symptom-based scores for the Severe long COVID patients was used to investigate the clinical characteristics of the disease signatures identified in the Severe cohort study.Unfortunately, the population sizes were too small to reach statistical significance (p < 0.05) after multiple-testing correction (see Additional file 4).
From the two independent hypothesis-free analyses of the datasets, we identified SNP genotypes mapping to 5 genes that were found to be significantly associated with disease in both the Severe and Fatigue Dominant long COVID cohorts.For each gene, more than 70% of cases from both cohorts possessed at least one disease signature containing an associated SNP (Table 4).These genes have a range of different functions and potential mechanism of action hypotheses as to their role in the development of long COVID.

Seeded analysis to test overlap between long Covid cohorts
The two independent analyses of the Fatigue Dominant and Severe cohorts indicated that 5 genes were strongly associated with long COVID in both cohorts.We performed two seeded analyses to understand if any additional genes identified in either the Fatigue or Severe cohorts were also significant in the other population.
This approach revealed that 28/43 genes identified in the Severe cohort were also significantly associated with disease in the Fatigue Dominant cohort, and 25/35 genes from the original Fatigue Dominant analysis were also associated in the Severe cohort.This left 15 genes unique   to the Severe cohort and 10 genes unique to the Fatigue Dominant cohort.
The unique genes, the percentage of total cases they were associated with, and their biological functions are summarized in Tables 5 and 6.

CLOCK
-rs62303689 A comparative pathway enrichment analysis using the g:Profiler tool revealed that there were significant differences in the biological pathways associated with the lists of unique genes from the Severe and Fatigue Dominant cohorts (Fig. 8).Genes that were uniquely associated with the Severe long COVID cohort were more likely to be found in immune pathways such as myeloid differentiation, macrophage foam cells and lipid signaling pathways.Genes that were uniquely associated with the Fatigue Dominant cohort were linked to metabolic pathways such as JNK/MAPK signaling cascades.

Comparison of long COVID with ME/CFS
We also used the seeded analysis approach to test for overlap between disease signatures associated with long COVID and those associated with ME/CFS in our previous study [16].
Taking the list of SNPs within genes that were identified to be significant within the UK Biobank ME/CFS population, we found that 24 SNPs were also associated with long COVID in the Severe cohort.Of these 24 SNPs, 9 were critical (RF scored) within the Severe long COVID population, mapping to 5 genes (Table 7).
In the Fatigue Dominant cohort, 27 SNPs were associated with long COVID, of which 12 SNPs were also common with the Severe cohort (Additional file 5: Table S4).7 of these 27 SNPs were critical (RF scored) SNPs within the Fatigue Dominant long COVID cases, mapping to 5 genes previously found in the ME/CFS study (Table 7).

Comparison of long COVID genes identified with acute COVID-19 studies
Whilst few GWAS significant variants have so far been identified in long COVID [59], we sought to compare the 73 unique genes identified in our long COVID studies against the literature for any evidence within severe COVID-19 and/or long COVID.Of these genes, at least 9 have prior associations-such as differential expression and genetic susceptibility analyses-to acute COVID-19 after reviewing available publications in PubMed and other data sources such as OpenTargets (Table 8).
We also compared our results against the blood derived gene expression signatures associated with post-acute sequelae identified by Thompson et al. [64].There are several key differences between the studies-Thompson et al. recruited individuals hospitalized with severe acute COVID-19 infection, whereas the majority of individuals in our study experienced milder forms of the disease (Table 1).We are also drawing comparisons from a transcriptomic study derived from whole blood against a combinatorial study of germline genetic variants.
Nonetheless, we found that 14 of the 73 genes (Severe = 7 and Fatigue Dominant = 7) identified in our analyses were also differentially expressed at the transcriptomic level in patients experiencing long COVID (Additional file 5: Table S10).

Overlap between long COVID and other diseases
We identified genes with known genetic associations across a wide range of complex diseases including neurodegenerative, mental or behavioral, cardiovascular, gastrointestinal, autoimmune and metabolic diseases (see Additional file 5: Tables S8 and S9).We evaluated the degree of overlap at a biological process level (using mapping of genes to biological processes in Gene Ontology [65,66]) to identify the common pathophysiological mechanisms that are shared between those diseases and long COVID.27 biological processes are significantly enriched in the 73 long COVID genes identified in this analysis, of which 19 processes are also significantly enriched in at least one other indication group (Additional file 5: Table S13).Based on these 19 pathways, long COVID genes shared the greatest number of biological processes (> 50%) with cardiovascular disease and mental or behavioral disease followed by gastrointestinal disease, neurodegenerative disease, autoimmune disease and metabolic disease (Fig. 9, Additional file 5: Table S13).

Expanded genotypes analysis to detect causal features
We conducted expanded genotypes analysis (EGA) for all Severe cohort RF scored genes (see Tables 3 and 4) found in disease signatures with 2 or 3 SNP genotypes.These comprise 5 genes corresponding to 23 disease signatures, including a disease signature that contains two RF scored genes (see Table 8).
We found that the critical SNP is universally protective across at least 2 validated disease signatures for 3 of the 5 RF scored genes (ADIPOQ, NOL4, and PDE6C).That is, when we control for the genotypes at the interacting SNPs, expanded genotype signatures featuring at least one copy of the critical SNP minor allele are consistently associated with lower odds of severe long COVID relative to expanded genotype signatures with the homozygous wild type genotype for the critical SNP.In all but one of the remaining disease signatures for these genes, the critical SNP minor allele is most often associated with decreased odds of severe long COVID, with narrow exceptions: i.e., when it fails to co-occur with the minor allele of an interacting SNP ("SNP-specific protective effect") or when it co-occurs with a specific set of genotypes at multiple interacting SNPs ("combination-specific causative effect").
The critical SNP minor alleles for these three genes are typically associated with decreased risk of severe long COVID, which either implies that they represent broadly protective variants or causative variants that are in LD with the wild type allele at the genotyped SNP.This relationship only becomes apparent, however, when we control for the confounding effects of other causative and/or protective variants.Only one validated disease signature In contrast, the gene SNX9 is consistently associated with more complex interactions that highlight the combinatorial dynamics of disease.For example, we identified a disease signature comprising three SNPs that is associated with strongly elevated odds of long COVID.This disease signature includes: • critical SNP rs2025994 located approximately 40kb upstream of the SNX9 coding region • interacting SNP rs6777173 located 12 kb upstream of KLF15 • interacting SNP rs11072524 located in an intron of RYR3 We also found that the SNX9 minor allele offers significant protection against the risk of long COVID among patients who possess a copy of the minor allele at either interacting SNP (i.e., a SNP-specific protective effect).That is, patients with the SNX9 heterozygous or homozygous minor allele genotype consistently have lower odds of developing severe long COVID than patients with the SNX9 homozygous wild type genotype, after controlling for the confounding effects of the genotypes at the two interacting SNPs (see Table 9).Due to the small sample sizes associated with many expanded genotype signatures, these individual comparisons are not statistically significant.However, if we pool all patients in this cohort, then patients with a copy of the SNX9 minor allele have significantly lower odds of disease than patients who are homozygous for the SNX9 wild type allele (odds ratio = 0.52, 41 cases/134 controls vs. 316 cases/532 controls, Fisher's Exact Test p = 0.00047; note that these totals include patients with rare expanded genotype signatures not shown in Table 9).
A different pattern arises among patients who are homozygous for the wild type genotype at both interacting SNPs (Table 10).Here, patients with a copy of the SNX9 minor allele have higher odds of disease than patients who are homozygous for the SNX9 wild type allele (odds ratio = 1.86, 19 cases/22 controls vs. 74 Table 9 Expanded Genotypes Analysis results for 5 RF-scored genes identified in the Severe cohort linked to disease signatures of 2 or 3 SNPs (one disease signature contains SNPs associated with two genes) Universally causative minor allele 0 0 0 0 0 Universally protective minor allele 0 3 0 2 2 Minor allele with SNP-specific causative effect 2 0 0 0 0 Minor allele with SNP-specific protective effect 2 3 0 0 2 Minor allele with combination-specific causative effect 0 3 0 2 0 Minor allele with combination-specific protective effect 0 0 0 0 0 Ambiguous/no consistent effect 0 1 2 0 0 Table 10 Assessing the effects of the SNX9 rs2025994 genotype on severe long COVID when controlling for the genotypes of the interacting SNPs rs6777173 (KLF15) and rs11072524 (RYR3) We present comparisons for genotype combinations that are present in more than 10 patients (which excludes the 18 patients who are homozygous for the SNX9 minor allele).None of the EGA odds ratios for the individual comparisons are statistically significant after correcting for multiple testing.However, among patients who possess a copy of the minor allele at either interacting SNP, presence of the SNX9 minor allele consistently results in lower odds of disease relative to the homozygous wild type genotype (Fisher's Exact Test p = 0.00047).Among patients who possess only wild type alleles for both interacting SNPs, presence of the SNX9 minor allele results in higher odds of disease relative to the homozygous wild type genotype, but the difference is not statistically significant (Fisher's Exact Test p = 0.075) cases/160 controls), although the odds ratio is not statistically significant (Fisher's Exact Test p = 0.075).Together these results suggest that the SNX9 genotype is a significant contributor to the risk of severe long COVID infection, but that the gene-disease relationship is context dependent and mediated by interactions with KLF15 and RYR3.Similar non-linear interactions are represented by three additional disease signatures comprised of the same SNX9 critical SNP and different interacting SNPs.Monogenic approaches such as GWAS that do not consider these gene-gene interactions can fail to detect potentially important drivers of disease.
Finally, the expanded genotypes analysis did not provide any additional insight into the relationship between DLC1 and disease.This could indicate that the biological relationship between DLC1 is highly complex or that the result is a false positive.However, the disease signatures associated with strongly elevated odds of severe long COVID all contain the rare homozygous minor allele genotype for the DLC1 critical SNP.Due to small sample sizes, we were unable to analyze other expanded genotype signatures containing the potentially causative genotype.Thus, the ambiguous results may reflect the fact that the relationship between the DLC1 minor allele and long COVID does not carry over into heterozygous patients.

Evaluation of potential novel drug targets and repurposing opportunities
We evaluated the genes identified in the study to find potential novel drug targets and their associated mechanistic patient stratification biomarkers (the disease signatures that connect patient subgroups with the mechanistic etiology for their disease).As described in our previous ME/CFS paper, the use of combinatorial analytics to identify novel targets has been validated in other diseases such as ALS, where these novel targets have demonstrated disease modifying activity in in vitro models [67].
Of the 73 unique genes found across the two cohorts, 42 are potentially tractable targets for drug development strategies based on annotations from OpenTargets (defined by a score of greater than 0 across at least one metric for tractability), see Additional file 5: Table S11.This includes 26 targets that are suited to an antibody approach and 18 that are amenable to modulation by small molecules.
Most (> 90%) of the genes are expressed in a wide range of tissues (Additional file 5: Figure S7) although the expression profile of the genes in specific cell types is variable (Additional file 5: Figure S8).Approximately 44% (n = 30) of the genes are expressed in inhibitory neurons followed by 41% in excitatory neurons (n = 28) and 40% in oligodendrocyte precursor cells (n = 27).
Using a systematic repositioning approach [68], we identified 13 long COVID targets that already have drugs in clinical development.As these drugs or development candidates may require fewer preclinical studies and already have a known safety profile, they could represent a quicker and de-risked strategy for developing potential new treatments.We are exploring the repurposing potential of these compounds for the treatment of long COVID and ME/CFS (where appropriate).From this analysis for example, we identified TLR4 as an attractive repurposing candidate.Our analysis indicates that 52% of cases included the Severe long COVID cohort had at least one disease signature containing a variant in TLR4 and there is additional supporting evidence that inhibition of TLR4 in a mouse model prevents long term cognitive pathology such as synapse elimination and memory deficits that is caused by the SARS-CoV-2 Spike protein [47].Clinical studies have already shown that antagonizing TLR4 signaling dampens the pathological cytokine storm observed in patients with severe acute COVID-19 and reduces mortality rates in hospitalized COVID-19 patients [69,70].However, our analysis also indicates that antagonism of TLR4 may demonstrate therapeutic effects in long term pathology caused by SARS-CoV-2.
We performed a search of the GlobalData [71] database to further understand the number and stage of development of TLR4 antagonists that are in clinical pipelines.This revealed a total of 88 unique drugs that target TLR4 (either singularly or as part of a combination therapy), including 8 in development for acute COVID-19, the most advanced of which (Paridiprubart, Edesa Biotech Inc) is currently being evaluated in a Phase 3 study in hospitalized COVID-19 patients with Acute Respiratory Distress Syndrome (ARDS) [72].

Discussion
As an approach to identify the drivers of the complex disease biology of long COVID, combinatorial analytics yields more useful signal than GWAS.No SNPs reached the genome-wide significance threshold in either the Severe or Fatigue Dominant cohorts.This underlines the difficulties involved in using monogenic analysis approaches to understanding disease associated genetic variants and mechanistic etiologies in heterogeneous and polygenic diseases, especially with small datasets.
Using combinatorial analytics, we identified 73 unique genes in a long COVID population and highlighted the relevance of subsets of these genes to the different subcohorts of the disease population.At least 9 of the genes identified in this study have been linked to acute COVID-19, and despite key differences in the study designs, we also observe that 14 of the 73 genes were differentially expressed in a transcriptomic analysis of long COVID patients.We can form strong mechanism of action hypotheses for each gene's role in the development of long COVID.
Splitting the population into two long COVID subtypes, Severe and Fatigue Dominant, allowed us to explore the genetic and biological differences underpinning different clinical manifestations.The comparative pathway enrichment analysis identified differences in pathways between the genes uniquely associated with the Severe long COVID group and those uniquely associated with the Fatigue Dominant phenotype (Figure 8).The greater number of genes involved in immune response in the Severe long COVID cohort may also indicate a more severe form of the acute infection.This may potentially arise as a result of patients experiencing higher viral loads than average, as we identified 4 genes that have been functionally linked to SARS-CoV-2 host response and/or acute severe COVID-19 (Table 5).
The pathway enrichment analysis also highlighted an overrepresentation of genes involved in macrophage foam cell differentiation.The formation of foam cells leading to a profibrotic macrophage phenotype is critical in the development of atherosclerosis [73].However, there is also evidence that profibrotic pulmonary macrophages contribute to acute respiratory distress syndrome (ARDS) and lung injury associated with patients with severe COVID-19 [74].
The genes that were associated only with the Fatigue Dominant long COVID cohort are enriched in MAPK and JNK signaling cascades as well as other metabolic processes involved in mitochondrial function and cellular respiration (Table 6).As discussed in our previous ME/CFS paper, dysregulated mitochondrial function, resulting in the inability to increase respiration rates in response to increased demand from stressors such as exercise [75], may result in the post-exertional malaise (PEM) that is a hallmark of ME/CFS.The finding of similar pathways in the Fatigue Dominant long COVID cohort suggests that these patients may also struggle to meet energy demands.
It is known that NK cell effector function (cytotoxic activity) regulated by MAPK signaling cascades, including via the c-Jun N-terminal kinase (JNK) [76] signaling pathway, is dysregulated within patients with ME/ CFS, who exhibit reduced NK cell cytotoxic activity [77].Further work will be required to confirm if similar pathological events occur in patients who develop fatigue dominant long COVID.
When we evaluated the degree of similarity between the genes associated with ME/CFS and long COVID, we found 13 critical SNPs (39 in total) within at least one of the long COVID populations that could be mapped to a gene previously associated with ME/CFS.
In both Severe and Fatigue Dominant long COVID populations, we identified SNPs mapping to the genes ATP9A, INSR, CLOCK, SLC15A4 and GPC5.All of these genetic variants were found in a higher proportion of the Fatigue Dominant and Severe long COVID populations than in the ME/CFS case group.This finding may indicate that the long COVID case group defined by fatigue symptoms is more homogenous than those within the self-reported UK Biobank ME/CFS population, which likely includes a mix of viral and non-viral triggers of chronic fatigue symptoms.
We found that the CLOCK gene is significantly associated with Fatigue Dominant long COVID and ME/CFS.CLOCK (Circadian Locomotor Output Cycles Kaput) is an important regulator of circadian rhythm, disruptions of which have been associated with pain, insomnia, insulin resistance, immunological function and impaired mitochondrial function [78][79][80][81][82]. Interestingly, one of the most common variants identified in ~ 86% of the long COVID Fatigue Dominant population mapped to the gene NLGN1.NLGN1 is also transcriptionally activated by CLOCK in the forebrain [83], which could indicate multiple genetic contributions to dysregulated circadian rhythm in long COVID.
Of the remaining 4 genes common between long COVID and ME/CFS, we identified 3 common variants in the genes ATP9A, INSR and SLC15A4 in both Severe and Fatigue Dominant cohorts (Table 7).
SLC15A4 encodes a transmembrane transport that has previously been associated with inflammatory autoimmune diseases such as systemic lupus erythematosus from genome-wide association studies [84,85].However, SLC15A4 also plays a key role in mitochondrial function, with knock down of the gene resulting in impaired autophagy and mitochondrial membrane potential under cellular stress [86].
We also hypothesized that the genetic variants in ATP9A and INSR both contribute to dysregulated insulin signaling in subgroups of ME/CFS patients.Type 2 diabetes-related signaling pathways and insulin resistance were also a key theme within the genes associated with long COVID, and 11 of the gene targets identified in this analysis have prior associations with type 2 diabetes in the OpenTargets database (Additional file 5: Table S12).Metabolic dysfunction and type 2 diabetes may increase risk of developing severe acute COVID-19 [87] and epidemiological studies have demonstrated that there is an increased risk of developing diabetes post COVID-19 compared against controls who had not been infected with SARS-CoV-2 [88].Furthermore, increased incidence of insulin resistance and glycemic dysregulation was observed in patients 2 months post COVID-19 and in long COVID patients [32,89].
Several of the biological processes that genes identified in this study are significantly enriched for-such as foam cell differentiation-are also associated with known genetic links to metabolic diseases such as type 2 diabetes (Figure ).Metabolic dysfunction has a variety of biological consequences, including increased levels of chronic inflammation, dysregulated immune response to acute infection, endothelial cell dysfunction and defects in coagulation pathways.All of these have been linked to long COVID and severe acute COVID-19 pathogenesis [90].
It is therefore plausible that patients with genetic variants that predispose them to metabolic dysfunction and insulin resistance are more likely to suffer from long term pathological sequelae after the acute phase of COVID-19 infection.From these findings we would indeed expect this population to have increased rates of new-onset type 2 diabetes compared to the non-long COVID population.Unfortunately, longitudinal health record data after the survey was completed was not available to validate this hypothesis in this analysis.
Similarities in indications observed from the cross-disease analysis have also highlighted shared pathways and biological processes associated with genetic drivers of these indications.The results are supported by common clinical manifestations reported in long COVID studies.Of the 27 pathways significantly enriched in the long COVID genes identified in this analysis, 16 (60%) are associated with gene targets previously associated with mental or behavioral disease (Figure 9).This includes indications such as major depressive disorder, anxiety disorder and schizophrenia.A recent meta-analysis of over 10,000 patients indicated that neurological and neuropsychiatric symptoms, such as brain fog, attention deficits and fatigue, were some of the most reported 3 months after acute COVID-19 [91].This analysis may indicate some of the genetic underpinnings of these manifestations post-COVID.

Study limitations
There are several limitations to this study.The most obvious is that the available datasets, even in a disease as topical, prevalent, and debilitating as long COVID are still very small, which notwithstanding the improved sensitivity offered by the combinatorial analytics approach, inevitably poses limits on the statistical power of the study.
The most challenging limitation is the poor representation of diverse ancestries, which is essential to gain a deeper understanding of the variability of disease etiology and achieve a level of health equity.As demonstrated by the cohort analysis, even though considerable effort was made to recruit as diverse a population as possible, the majority of participants recruited to the GOLD study were of self-reported white Caucasian ancestry.It is evident that long COVID is a highly heterogeneous disease with a variety of different symptoms, clinical presentations and underlying disease mechanisms including neurological and metabolic dysregulation.From this dataset, we cannot understand the varying prevalence of these symptoms, or the effects that different genetic ancestries, socioeconomic factors, pathogen exposure levels or geographical differences may have in influencing the risk and presentation of long COVID in different ancestries.
Our cohort analysis also revealed that the incidence of other comorbidities (such as type 2 diabetes, cardiovascular disease etc.) was lower than expected for a cohort with the same average age as the long COVID population.This may indicate a degree of 'otherwise healthy' volunteer bias that limits this dataset as a representative sample of long COVID.Alternatively, it could reflect a problem with under-reporting of other medical conditions within the self-reported questionnaire.
All the non-genomic data was self-reported by the participants via a questionnaire upon recruitment to the study, including long COVID symptoms, level of acute COVID-19 severity and medical history.Unfortunately, no further EHR/primary care data was available.This method for reporting the degree of long COVID symptoms experienced is likely to be more subjective and prone to memory lapses and retrospective interpretation than direct and concurrent clinical information.This creates challenges in identifying the most relevant clusters of long COVID symptoms (e.g., respiratory, fatigue, GI etc.) and evaluating the severity of those symptoms experienced by different subgroups of cases.
We were unable to fully evaluate some of the most significant consequences and secondary diagnoses associated with long COVID disease.In particular, we would have liked to evaluate the specific drivers underlying the development of POTS, which was only recorded as part of participants' free-text responses and not captured in the main questionnaire.In the absence of consistent diagnosis and clinical reporting for POTS, we attempted to analyze the symptoms that patients reported when recruited to the study.Tachycardia, dizziness, palpitations, brain fog and even in some cases POTS were recorded but in insufficient numbers for a meaningful analysis.
Hospital admission with a more severe form of acute COVID-19 has previously been identified as a risk factor for the development of long COVID [92].We were unable to test this finding, as fewer than 10% of any of our case cohorts were hospitalized with COVID-19.As a result, there was insufficient data available to explore if long COVID cases with the 9 variants mapped to genes previously associated with acute COVID-19 (Table 8) were more likely to have experienced a more severe form of acute COVID-19.
Finally, there is some emerging evidence that vaccination against COVID-19 may be protective against the development of long COVID [93].The analysis of our cohort does show a small but significant reduction in the severity of symptoms over the course of the period Jan 2020-Nov 2022.The majority of cases included in our study were first infected in 2020 or early 2021 (prewidespread vaccination) but the questionnaire did not contain any questions regarding vaccination/booster status, or if the participants contracted acute COVID-19 before or after vaccination.As such, we are unable to evaluate the effect of vaccination on long COVID development within this cohort.
There is some additional evidence that omicron variants are less likely to cause long-term symptoms even after adjusting for vaccine status [94] and the evidence available in this study does not contradict that suggestion.However, as demonstrated by our cohort analysis, it was difficult to assess the association of SARS-CoV-2 variant status with long COVID risk due to the limited number of study participants recruited who contracted COVID-19 when omicron and more recent variants were the most prevalent in the UK, the limited amount of information on repeat infections, or vaccination/ booster status.

Conclusions and future perspectives
The results of this study, while encouraging and building consistently on findings in ME/CFS and other diseases with related symptomology, still need to be validated and replicated within an independent long COVID population, which ideally would have much deeper clinical phenotype and longitudinal history information.
Various groups have been collecting large acute COVID-19 and long COVID patient datasets over the last 3 years and we hope that they will now make the individual patient level data available to the wider research community quickly.We can realistically expect that analyzing an independent, larger and more detailed patient dataset using combinatorial analytics approaches will further improve the disease insights that we are gaining in long COVID, offering routes forward to alleviate the massive unmet medical need which has blighted the lives of millions of patients.

Fig. 1
Fig. 1 Monthly distribution of the first self-reported COVID-19 diagnosis for the 1829 individuals included in Sano Genetics' Long COVID GOLD study (2020-2022)

Fig. 2
Fig. 2 Distribution of the (a) 'Total Change' score for cases and controls in the Severe long COVID and (b) 'Fatigue Change' (part of 'Total Change' score) score in the Fatigue Dominant long COVID cohorts

Fig. 6 Fig. 7
Fig. 6 Distribution of combinatorial order (i.e., number of component SNPs) for the validated combinatorial disease signatures from the Severe and Fatigue Dominant long COVID cohorts

Fig. 8
Fig. 8 Pathway enrichment plot for disease-associated genes found in the Severe and Fatigue Dominant long COVID cohorts.GeneRatio represents the ratio of genes found in the pathway compared to the genes associated with a cohort and p.adjust represents the p-value adjusted for multiple testing.The dots in the plot are colour-coded based on their corresponding p.adjust values

Fig. 9
Fig. 9 Heatmap plot showing 19 biological processes (Gene Ontology biological process terms) shared between 73 long COVID genes identified in the GOLD cohort and genes with genetic evidence in one or more indication groups (neurodegenerative, mental or behavioral, cardiovascular, gastrointestinal, autoimmune and metabolic disorders).For each indication group, only the significantly enriched biological processes (p < 0.05) are shown in blue and the intensity of the color is based on the p values of the Gene Ontology term in each indication group

Table 1
Characteristics of the two long COVID cohorts derived from the GOLD study datasetData for fields marked with asterisk (*) were not available for all individuals.Comorbidities marked with † were consistently over-represented in cases compared to controls in all cohorts Severe long COVID n = 1323 Fatigue dominant long COVID n = 1386 Cases (n = 459) Controls (n = 864) Cases (n = 477) Controls (n = 909)

Table 4
List of genes significantly associated with long COVID in both the Severe and Fatigue Dominant

Table 5
List of genes that were uniquely associated with the Severe case cohort

Table 6
List of genes that were uniquely associated with the Fatigue Dominant case cohort

Table 7
List of critical SNPs significantly associated with long COVID in the Severe and Fatigue Dominant long COVID cohorts that can be linked to genes identified in a combinatorial analysis of UK Biobank ME/CFS patients

Table 8
Known associations of genes identified in either one or both of the cohorts of long COVID patients with acute COVID-19