Skip to main content

Integration of gene expression, clinical, and epidemiologic data to characterize Chronic Fatigue Syndrome



Chronic fatigue syndrome (CFS) has no diagnostic clinical signs or diagnostic laboratory abnormalities and it is unclear if it represents a single illness. The CFS research case definition recommends stratifying subjects by co-morbid conditions, fatigue level and duration, or functional impairment. But to date, this analysis approach has not yielded any further insight into CFS pathogenesis. This study used the integration of peripheral blood gene expression results with epidemiologic and clinical data to determine whether CFS is a single or heterogeneous illness.


CFS subjects were grouped by several clinical and epidemiological variables thought to be important in defining the illness. Statistical tests and cluster analysis were used to distinguish CFS subjects and identify differentially expressed genes. These genes were identified only when CFS subjects were grouped according to illness onset and the majority of genes were involved in pathways of purine and pyrimidine metabolism, glycolysis, oxidative phosphorylation, and glucose metabolism.


These results provide a physiologic basis that suggests CFS is a heterogeneous illness. The differentially expressed genes imply fundamental metabolic perturbations that will be further investigated and illustrates the power of microarray technology for furthering our understanding CFS.


Chronic fatigue syndrome (CFS) is defined solely by self-reported symptoms and associated disability. There are no characteristic physical signs or diagnostic laboratory abnormalities. Diagnosis of CFS requires clinical evaluation to rule out other medical or psychiatric conditions that could cause or contribute to the patient's complaints [1]. Indeed, it remains unclear whether CFS represents a unique disease or a common clinical end-point of diverse pathologic processes.

CFS has been hypothesized to involve an abnormal response to infection, immunologic dysfunction, dysregulation of the hypothalamic-pituitary-adrenal axis, and dysautonomia, yet no biologic and physiologic perturbations have been reproducibly detected. This may reflect poor specificity of the case definition, patient selection bias, or other study design issues. Clearly, discovery of laboratory markers that improve the specificity of case ascertainment or differentiate groups within the CFS classification would increase the possibility of identifying pathogenic mechanisms.

The international CFS research guideline recommends that cases be stratified before analysis by several variables including co-morbid conditions, current level and total duration of fatigue, current level of functional impairment and type of fatigue onset [1]. People with CFS often describe a sudden onset to their illness, having become sick over one or two days, while others recount a gradual onset in which the symptom complex develops over weeks or months. Studies indicate that stress history [2] and recovery [3] appear to vary with mode of onset. Another approach is to group subjects based on symptoms. A recent study identified two subgroups, one with higher energy levels and fewer accompanying symptoms and another with significantly lower energy levels [4]. Deciphering the physiologic basis for CFS would go far in accessing the heterogeneity of the illness and would advance diagnosis and treatment.

Unique gene expression profiles have been found in cancer [5], chronic inflammatory/allergic diseases [6, 7], autoimmune disorders (e.g., rheumatoid arthritis) [8], and multiple sclerosis [9]. We have previously shown that peripheral blood mononuclear cell (PBMC) gene expression profiles can distinguish the majority of CFS cases from non-fatigued controls [10]. In this study, we measured levels of gene expression in 23 persons with CFS identified in the general Wichita population. Our objective was to determine if integration of gene expression results with clinical and epidemiologic data would identify CFS subgroups.


Study Design

This study adhered to human experimentation guidelines of the U.S. Department of Health and Human Services. All participants were volunteers who gave informed consent. The Centres for Disease Control and Prevention Human Subjects Committee approved study protocols.

Forty-three CFS subjects were identified in a survey of the Wichita, Kansas's adult population [11]. CFS subjects fulfilled all criteria of the CFS research case definition [1]. The clinical evaluation was used to identify any co-morbid conditions and to detect the presence of exclusionary diagnoses. These included Major Depressive Disorder with Melancholic/Psychotic features, psychosis, alcohol/drug addiction, bulimia/anorexia and medical conditions including cancer, hepatitis or pregnancy. We obtained information concerning current disability, duration of illness, type of fatigue onset, and number and nature of accompanying symptoms. We also obtained blood samples, as described below. Because only 6 CFS subjects were men, we limited the present study to women. Of the 37 female CFS subjects, 5 were excluded because of lack of sample, 7 were excluded because of poor quality RNA, and 2 were excluded because of poor quality of array hybridization, leaving 23 women for analysis.

Table 1 lists the clinical and epidemiologic variables used in our analysis. Onset of illness was defined as sudden (self-reported development of fatigue in less than 1 week) or gradual (developing fatigue over more than 1 month). Only one woman reported that her fatigue developed between 1 week and 1 month (Table 1) so her microarray results were only used in cluster analysis. Age was categorized as ≤50 or >50 years old, and duration of illness was categorized as ≤10 or >10 years (grouping into different periods did not alter the results). Body Mass Index (BMI) was categorized as normal (≤24.9 kg/m2), overweight (25 – 29.9 kg/m2), or obese (30 – 39.9 kg/m2) [12].

Table 1 Clinical and epidemiological characteristics of 23 CFS women.

Gene Expression Profiling

Nucleic acid extraction

During the clinical evaluation, a 10 ml blood sample was obtained and PBMC were isolated using LSM® Lymphocyte Separation Media (ICN Biomedicals, Costa Mesa, CA). Cells were washed, counted and stored for viability in liquid nitrogen as described [13]. Total RNA was extracted using the RNAqueous™ kit (Ambion Inc., Austin, TX) and the quality and quantity were assessed as previously described [14].

Preparation and hybridization of labelled cDNA

Biotinylated cDNA synthesis from 1 μg of total RNA was performed as previously described [14]. The cDNA probe was hybridized to the Atlas™ Human 3.8I oligonucleotide glass microarrays (CLONTECH Laboratories, Inc., Palo Alto, CA) using the Ventana Discovery™ system and their ChipMap™ kit (Ventana Medical Systems, Tucson, AZ). Hybridization was for 12 hours at 42°C, followed by three 10 minute stringency washes in 0.1X SSC at 42°C. Anti-biotin antibodies conjugated to RLS™ particles (Genicon Sciences Corporation, San Diego, CA) were used for signal detection as previously described [14]. The slides were archived and images captured using the GSD-501™ scanner (Genicon Sciences Corporation, San Diego, CA), and analyzed with ArrayVision™ RLS image analysis software (Genicon Sciences Corporation).

Data analysis

The scanned TIFF images were processed using ArrayVision™ (Imaging Research Inc., Ontario, Canada). Features deemed unsuitable for accurate quantitation because of artefacts, poor morphology, or uneven hybridization were excluded from further analysis. A median background value was calculated around each feature and subtracted from the mean signal to give the net signal for the respective gene. Data was uploaded into the CDC MAdB web-based analysis package where background-adjusted intensity values were scaled and normalized to the 75th percentile. Values were log2 transformed and mean centered to fit the data to a Gaussian distribution.

We initially examined gene expression intensities for all 23 CFS subjects using the one-class analysis component of the Significance Analysis of Microarrays (SAM) program [15] to determine if the mean gene expression for each of 3,800 genes differed from zero. In the one-class analysis we used false discovery rates (FDR) of up to 25%. SAM was also used for a two-class analysis to compare the mean differences between the gene intensity values categorised by the clinical and epidemiologic variables listed in Table 1. An FDR of 5% was used for two-class analysis.

To identify distinct gene clusters we performed a two-way hierarchical cluster analysis as described by Eisen et al[16]. The dendograms were viewed using Tree View [16], All genes identified by SAM were submitted to Onto-Express (version 2) [17] to identify current gene ontology classifications. OntoExpress was chosen because it interprets the probability that a particular molecular function, biological process or cellular component occurs by chance in the context of the genes represented on the microarray being used.

The standard statistical t-test (assuming unequal variances) and the nonparametric Wilcoxon rank sum test were used in conjunction with the SAM two-class analysis to examine the potential differences in gene expression with respect to the variables outlined in Table 1. For the t-test and Wilcoxon test statistical significance was set at a p-value <0.01.


Differential gene expression

One-class analysis of gene expression data

Application of this method to the 23 CFS subjects identified no genes with expression variance statistically greater than the average that would provide evidence for heterogeneity of the CFS sample.

Two-class analysis of data

The 23 CFS subjects were grouped with respect to the variables listed in Table 1 and the mean differences between their gene expression values then compared. This approach identified 117 genes that were differentially expressed when the CFS subjects were grouped by onset type (Table 2). Two-class analysis did not detect any differentially expressed genes at a false discovery rate of 5% when comparing any other variable listed in Table 2. Both the t-test and the Wilcoxon test results were similar to the two-class analysis and there was considerable overlap among the genes detected by these 3 tests for type of fatigue onset. In total, 95/117 genes identified by two-class analysis were detected by either t-test or Wilcoxon test. Analysis by age, illness duration, number of CFS symptoms, illness group and BMI identified a few differentially expressed genes (Table 2), but there were no common genes across statistical tests, and no overlap with any of the 117 genes that differentiated onset type. For this reason, only the 117 genes identified by two-class analysis were examined further.

Table 2 Identification of differentially expressed genes in CFS subjects by clinical or epidemiologic variables.

Hierarchical cluster analysis of expression profiles

Figure 1 and 2 displays the two-way hierarchical cluster analysis of the 117 genes. The majority of subjects clustered according to onset type and the genes fell into two distinct clusters. Expression of 19 of the 117 genes was increased in the gradual compared to sudden onset group, while the expression of the remaining 98 genes was decreased.

Figure 1
figure 1

Hierarchical clustering of the differential gene expression patterns for gradual compared with sudden onset in CFS subjects. Matrix of the two-dimensional hierarchical clustering of genes and CFS subjects stratified on syndrome onset. Each row represents the hybridization results for a single gene, and each column represents a CFS subject. Transcript levels that are statististically different between onset types are shown above (red) and below (green) the mean.

Figure 2
figure 2

Hierarchical clustering of the differential gene expression patterns for gradual compared with sudden onset in CFS subjects. Dendograms showing average-linkage hierarchical clustering of CFS subjects. A blue circle indicates a subject with a sudden onset of CFS symptoms, yellow indicates a gradual onset. The black circle is a subject whose onset was between that defined by sudden/gradual onset.

Gene ontology

Figure 3 summarizes the functional classification of all 117 differentially expressed genes with respect to cluster group. Twenty-four genes are associated with metabolism (p < 0.01, hypergeometric probability distribution test). Twenty of these genes were down-regulated in the gradual onset cluster, and they were mainly involved in regulation of glycolysis, glucose and disaccharide metabolism, oxidative phosphorylation, amino acid biosynthesis, and purine or pyrimidine metabolism. Of the 19 up-regulated genes, some were involved in metabolism, but they were not statistically significant. The 7 genes involved in RNA processing were, however, statistically significant in this group (p < 0.01, hypergeometric probability distribution test).

Figure 3
figure 3

Functional categories of the 117 genes selected by SAM program. These genes are differentially expressed and segregate the gradual from sudden onset of fatigue CFS subjects. Red bars represent the 98 genes for which expression was lower in gradual onset subjects; the green bars represent the 19 genes that had increased expression in the gradual onset group. Genes may be classified in more than one functional group.


It is thought that CFS is a heterogeneous illness since a single cause of CFS has not been identified and it is thought that various kinds of physiologic stressors such as infection, trauma and toxins can trigger the development of CFS in susceptible individuals. A major difficulty in identifying etiologies for CFS is that the case definition requires a minimum duration of six months of illness. In most studies, subjects have been ill many years, making it difficult to detect initial disease triggers, as causal factors may be difficult to detect or are no longer present. In addition, in many diseases, factors associated with disability are distinct from causative factors. Biomarkers have the potential to give clues to disease etiology as well as mode of action.

In an attempt determine whether CFS was a single or heterogeneous illness, we used microarrays to profile the expression of 3,800 genes in 23 women with CFS. We analyzed the array data using three statistical tests: 1) a program specifically designed for the analysis of microarray data (SAM), 2) a parametric t-test, and 3) a nonparametric rank sum test. One class analysis by SAM failed to detect differences in gene expression profiles of the CFS subjects because many of the genes introduced noise into the process, masking the differences that were evident in two-class analysis. In the two-class analysis the only variable that differentiated the CFS subjects was type of fatigue onset, that is, whether the women described their fatigue as occurring suddenly over the course of a week, or gradually, over the course of months. Different gene expression profiles among those who describe a difference in illness onset imply distinct etiological or triggering events, and shows that these differences are maintained well into the disease process. All the other variables thought to be important in characterizing and defining CFS did not have any differentially expressed genes associated with them when CFS subjects were grouped accordingly. Interestingly, this is not the first time that type of fatigue onset has distinguished people with CFS. DeLuca et al [18] showed that CFS subjects with gradual onset tend to develop CFS-type physical symptoms as a variant of a psychiatric disorder, while CFS patients with sudden onset may be more closely associated with a non-psychiatric etiology (i.e. a viral or infectious etiology). Mawle et al. [19] reported that CFS patients with gradual onset had more major life events occurring in the year prior to onset than did patients with sudden onset. In this study the 1994 CFS research case definition [1] was strictly used in designating CFS caseness, therefore most psychiatric conditions, (other than Major Depressive Disorder which is comorbid in many people with CFS, or any chronic illness) were exclusionary. We believe that this considerably reduced the other possible symptoms or conditions that may be highly correlated with fatigue and could potentially confound our data.

Our findings of differentially expressed metabolic and RNA processing genes make both biologic and physiologic sense relative to CFS. We identified differences in purine and pyrimidine metabolism, glycolysis, oxidative phosphorylation, and glucose metabolism. Oxidative phosphorylation and the ATP generated by this process are the major source of energy for the normal function of most cells in the body. Metabolic changes are known to take place, and in some instances drive the pathophysiology of a number of chronic diseases. Subjects with sudden onset CFS often describe an infectious, viral-like illness as the initiating process. It is well-known that many RNA processing proteins are central to the effective action of the antiviral interferon [20]. Alterations in effective antimicrobial responses may also explain the chronic fatigue state.

The nature of the specimen determines the view of the disease reviewed by gene expression profiling. In CFS there are no anatomical lesions to sample. Peripheral blood is an accessible source of circulating cells that reflect systemic changes, so it is a good starting point to profile diseases that have no lesions, or lesions that are inaccessible. However, peripheral blood mononuclear cells are themselves very heterogeneous, including B and T lymphocytes, monocytes, and natural killer cells. Changes in gene expression could be due to changes in the cellular composition as well as to differences in cellular activities. However, several groups including our own, [21, 22], have surveyed the magnitude of variation in gene expression patterns of peripheral blood and found it to be fairly limited. This study, as well as an earlier study of PBMCs in CFS[13] indicate that the peripheral blood does detect relevant gene expression differences. Fractionation of the PBMC population may give different insights into the disease process, and will be important to further characterize the pathophysiology of CFS.

The study must be interpreted with caution, as the number of subjects is small and the gene profiled represent a fraction of those potentially of importance. However, these data do support the idea that CFS is a heterogeneous illness with a biochemical basis to explain the fatigue. Different gene expression profiles among those who describe a difference in illness onset imply distinct etiological or triggering events, and shows that these differences are maintained well into the disease process. The results in this study demonstrate the utility of gene expression profiling to characterize an illness at the biological and physiological level. This should advance the cause for defining CFS at a molecular resulting in diagnosis and possible identification of causative agents.


Although the full implication and biologic significance of the differentially expressed genes discussed above are not yet completely understood, the genes may serve as a platform to further explore relevant mechanisms of pathogenesis and improve the understanding of the molecular basis of CFS. It will be important to discover how these differential patterns relate to non-CFS subjects and to expand the number of genes examined. Our work shows that microarrays are an important tool in understanding the wide spectrum of genes likely involved in complex diseases such as CFS.


  1. Fukuda K, Straus SE, Hickie I, Sharpe MC, Dobbins JG, Komaroff A: The chronic fatigue syndrome: a comprehensive approach to its definition and study. International Chronic Fatigue Syndrome Study Group. Ann Intern Med. 1994, 121: 953-959.

    Article  CAS  PubMed  Google Scholar 

  2. Reyes M, Dobbins JG, Mawle AC, Steele L, Gary HE, Malani H, Schmid S, Fukuda K, Stewart J, Nisenbaum R, Reeves WC: Risk factors for CFS: a case control study. Journal of Chronic Fatigue Syndrome. Journal of Chronic Fatigue Syndrome. 1996, 2: 17-33.

    Article  Google Scholar 

  3. Reyes M, Dobbins JG, Nisenbaum R, Subedar N, Randall B, Reeves WC: Chronic fatigue syndrome progression and self-defined recovery: Evidence from the CDC surveillance system. Journal of Chronic Fatigue Syndrome. 1999, 5: 17-27.

    Article  Google Scholar 

  4. Nisenbaum R, Reyes M, Unger ER, Reeves WC: Factor analysis of symptoms among subjects with unexplained chronic fatigue: What can we learn about chronic fatigue syndrome?. J Psychosom Res. 2003, in press:

    Google Scholar 

  5. Nutt CL, Mani DR, Betensky RA, Tamayo P, Cairncross JG, Ladd C, Pohl U, Hartmann C, McLaughlin ME, Batchelor TT, Black PM, von Deimling A, Pomeroy SL, Golub TR, Louis DN: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 2003, 63: 1602-1607.

    CAS  PubMed  Google Scholar 

  6. Hoffmann KF, McCarty TC, Segal DH, Chiaramonte M, Hesse M, Davis EM, Cheever AW, Meltzer PS, Morse HC,III, Wynn TA: Disease fingerprinting with cDNA microarrays reveals distinct gene expression profiles in lethal type 1 and type 2 cytokine-mediated inflammatory reactions. FASEB J. 2001, 15: 2545-2547.

    CAS  PubMed  Google Scholar 

  7. Benson M, Carlsson B, Carlsson LM, Mostad P, Svensson PA, Cardell LO: DNA microarray analysis of transforming growth factor-beta and related transcripts in nasal biopsies from patients with allergic rhinitis. Cytokine. 2002, 18: 20-25. 10.1006/cyto.2002.1012.

    Article  CAS  PubMed  Google Scholar 

  8. TC Van Der Pouw Kraan, Van Gaalen FA, Huizinga TW, Pieterman E, Breedveld FC, Verweij CL: Discovery of distinctive gene expression profiles in rheumatoid synovium using cDNA microarray technology: evidence for the existence of multiple pathways of tissue destruction and repair. Genes Immun. 2003, 4: 187-196. 10.1038/sj.gene.6363975.

    Article  Google Scholar 

  9. Mycko MP, Papoian R, Boschert U, Raine CS, Selmaj KW: cDNA microarray analysis in multiple sclerosis lesions: detection of genes associated with disease activity. Brain. 2003, 126: 1048-1057. 10.1093/brain/awg107.

    Article  PubMed  Google Scholar 

  10. Vernon SD, Unger ER, Dimulescu IM, Rajeevan M, Reeves WC: Utility of the blood for gene expression profiling and biomarker discovery in chronic fatigue syndrome. Dis Markers. 2002, 18: 193-199.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Reyes M, Nisenbaum R, Hoaglin DC, Unger ER, Emmons C, Randall B, Stewart JA, Abbey S, Jones JF, Gantz N, Minden S, Reeves WC: Prevalence and incidence of chronic fatigue syndrome in Wichita, Kansas. Arch Intern Med. 2003, 163: 1530-1536. 10.1001/archinte.163.13.1530.

    Article  PubMed  Google Scholar 

  12. Clinical Guidelines on the Identification, Evaluation, and Treatment of Overweight and Obesity in Adults--The Evidence Report. National Institutes of Health. Obes Res. 1998, 6 Suppl 2: 51S-209S.

  13. Campbell C, Vernon SD, Karem KL, Nisenbaum R, Unger ER: Assessment of normal variability in peripheral blood gene expression. Dis Markers. 2002, 18: 201-206.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  14. Ojaniemi H, Evengard B, Lee DR, Unger ER, Vernon SD: Impact of RNA extraction from limited samples on microarray results. Biotechniques. 2003, 35: in press-

    Google Scholar 

  15. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001, 98: 5116-5121. 10.1073/pnas.091062498.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Khatri P, Draghici S, Ostermeier GC, Krawetz SA: Profiling gene expression using onto-express. Genomics. 2002, 79: 266-270. 10.1006/geno.2002.6698.

    Article  CAS  PubMed  Google Scholar 

  18. DeLuca J, Johnson SK, Ellis SP, Natelson BH: Sudden vs gradual onset of chronic fatigue syndrome differentiates individuals on cognitive and psychiatric measures. J Psychiatr Res. 1997, 31: 83-90. 10.1016/S0022-3956(96)00052-0.

    Article  CAS  PubMed  Google Scholar 

  19. Mawle AC, Nisenbaum R, Dobbins JG, Gary H.E.,Jr., Stewart JA, Reyes M, Steele L, Schmid DS, Reeves WC: Immune responses associated with chronic fatigue syndrome: a case-control study. J Infect Dis. 1997, 175: 136-141.

    Article  CAS  PubMed  Google Scholar 

  20. Samuel CE: Antiviral actions of interferons. Clin Microbiol Rev. 2001, 14: 778-809, table. 10.1128/CMR.14.4.778-809.2001.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  21. Whitney AR, Diehn M, Popper SJ, Alizadeh AA, Boldrick JC, Relman DA, Brown PO: Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci U S A. 2003, 100: 1896-1901. 10.1073/pnas.252784499.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, Morley M, Spielman RS: Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet. 2003, 33: 422-425. 10.1038/ng1094.

    Article  CAS  PubMed  Google Scholar 

Download references


The authors sincerely thank Dr. William C. Reeves for his guidance, expert advice and the many discussions had on interpreting the results in a public health context. His constructive input for drafting this manuscript and his skilled leadership are much appreciated.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Suzanne D Vernon.

Additional information

Authors' contributions

TW contributed to the design of the experimental approach, performed the experimental component and the analysis of the data and drafted the manuscript. RN gave statistical advice, did the non-parametric analysis and contributed to the manuscript preparation. ERU and SDV contributed to the conception and design of this study, assisted in the analysis and assisted in drafting the manuscript. All authors read and approved the final manuscript.

Declaration of competing interests

None declared.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Whistler, T., Unger, E.R., Nisenbaum, R. et al. Integration of gene expression, clinical, and epidemiologic data to characterize Chronic Fatigue Syndrome. J Transl Med 1, 10 (2003).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: