Integration of gene expression, clinical, and epidemiologic data to characterize Chronic Fatigue Syndrome
© Whistler et al 2003
Received: 30 October 2003
Accepted: 01 December 2003
Published: 01 December 2003
Skip to main content
© Whistler et al 2003
Received: 30 October 2003
Accepted: 01 December 2003
Published: 01 December 2003
Chronic fatigue syndrome (CFS) has no diagnostic clinical signs or diagnostic laboratory abnormalities and it is unclear if it represents a single illness. The CFS research case definition recommends stratifying subjects by co-morbid conditions, fatigue level and duration, or functional impairment. But to date, this analysis approach has not yielded any further insight into CFS pathogenesis. This study used the integration of peripheral blood gene expression results with epidemiologic and clinical data to determine whether CFS is a single or heterogeneous illness.
CFS subjects were grouped by several clinical and epidemiological variables thought to be important in defining the illness. Statistical tests and cluster analysis were used to distinguish CFS subjects and identify differentially expressed genes. These genes were identified only when CFS subjects were grouped according to illness onset and the majority of genes were involved in pathways of purine and pyrimidine metabolism, glycolysis, oxidative phosphorylation, and glucose metabolism.
These results provide a physiologic basis that suggests CFS is a heterogeneous illness. The differentially expressed genes imply fundamental metabolic perturbations that will be further investigated and illustrates the power of microarray technology for furthering our understanding CFS.
Chronic fatigue syndrome (CFS) is defined solely by self-reported symptoms and associated disability. There are no characteristic physical signs or diagnostic laboratory abnormalities. Diagnosis of CFS requires clinical evaluation to rule out other medical or psychiatric conditions that could cause or contribute to the patient's complaints . Indeed, it remains unclear whether CFS represents a unique disease or a common clinical end-point of diverse pathologic processes.
CFS has been hypothesized to involve an abnormal response to infection, immunologic dysfunction, dysregulation of the hypothalamic-pituitary-adrenal axis, and dysautonomia, yet no biologic and physiologic perturbations have been reproducibly detected. This may reflect poor specificity of the case definition, patient selection bias, or other study design issues. Clearly, discovery of laboratory markers that improve the specificity of case ascertainment or differentiate groups within the CFS classification would increase the possibility of identifying pathogenic mechanisms.
The international CFS research guideline recommends that cases be stratified before analysis by several variables including co-morbid conditions, current level and total duration of fatigue, current level of functional impairment and type of fatigue onset . People with CFS often describe a sudden onset to their illness, having become sick over one or two days, while others recount a gradual onset in which the symptom complex develops over weeks or months. Studies indicate that stress history  and recovery  appear to vary with mode of onset. Another approach is to group subjects based on symptoms. A recent study identified two subgroups, one with higher energy levels and fewer accompanying symptoms and another with significantly lower energy levels . Deciphering the physiologic basis for CFS would go far in accessing the heterogeneity of the illness and would advance diagnosis and treatment.
Unique gene expression profiles have been found in cancer , chronic inflammatory/allergic diseases [6, 7], autoimmune disorders (e.g., rheumatoid arthritis) , and multiple sclerosis . We have previously shown that peripheral blood mononuclear cell (PBMC) gene expression profiles can distinguish the majority of CFS cases from non-fatigued controls . In this study, we measured levels of gene expression in 23 persons with CFS identified in the general Wichita population. Our objective was to determine if integration of gene expression results with clinical and epidemiologic data would identify CFS subgroups.
This study adhered to human experimentation guidelines of the U.S. Department of Health and Human Services. All participants were volunteers who gave informed consent. The Centres for Disease Control and Prevention Human Subjects Committee approved study protocols.
Forty-three CFS subjects were identified in a survey of the Wichita, Kansas's adult population . CFS subjects fulfilled all criteria of the CFS research case definition . The clinical evaluation was used to identify any co-morbid conditions and to detect the presence of exclusionary diagnoses. These included Major Depressive Disorder with Melancholic/Psychotic features, psychosis, alcohol/drug addiction, bulimia/anorexia and medical conditions including cancer, hepatitis or pregnancy. We obtained information concerning current disability, duration of illness, type of fatigue onset, and number and nature of accompanying symptoms. We also obtained blood samples, as described below. Because only 6 CFS subjects were men, we limited the present study to women. Of the 37 female CFS subjects, 5 were excluded because of lack of sample, 7 were excluded because of poor quality RNA, and 2 were excluded because of poor quality of array hybridization, leaving 23 women for analysis.
Clinical and epidemiological characteristics of 23 CFS women.
Type of fatigue onset a
Age, years b
Duration of illness b
No. of CFS symptoms b
Body mass index c
Illness group d
During the clinical evaluation, a 10 ml blood sample was obtained and PBMC were isolated using LSM® Lymphocyte Separation Media (ICN Biomedicals, Costa Mesa, CA). Cells were washed, counted and stored for viability in liquid nitrogen as described . Total RNA was extracted using the RNAqueous™ kit (Ambion Inc., Austin, TX) and the quality and quantity were assessed as previously described .
Biotinylated cDNA synthesis from 1 μg of total RNA was performed as previously described . The cDNA probe was hybridized to the Atlas™ Human 3.8I oligonucleotide glass microarrays (CLONTECH Laboratories, Inc., Palo Alto, CA) using the Ventana Discovery™ system and their ChipMap™ kit (Ventana Medical Systems, Tucson, AZ). Hybridization was for 12 hours at 42°C, followed by three 10 minute stringency washes in 0.1X SSC at 42°C. Anti-biotin antibodies conjugated to RLS™ particles (Genicon Sciences Corporation, San Diego, CA) were used for signal detection as previously described . The slides were archived and images captured using the GSD-501™ scanner (Genicon Sciences Corporation, San Diego, CA), and analyzed with ArrayVision™ RLS image analysis software (Genicon Sciences Corporation).
The scanned TIFF images were processed using ArrayVision™ (Imaging Research Inc., Ontario, Canada). Features deemed unsuitable for accurate quantitation because of artefacts, poor morphology, or uneven hybridization were excluded from further analysis. A median background value was calculated around each feature and subtracted from the mean signal to give the net signal for the respective gene. Data was uploaded into the CDC MAdB web-based analysis package where background-adjusted intensity values were scaled and normalized to the 75th percentile. Values were log2 transformed and mean centered to fit the data to a Gaussian distribution.
We initially examined gene expression intensities for all 23 CFS subjects using the one-class analysis component of the Significance Analysis of Microarrays (SAM) program  to determine if the mean gene expression for each of 3,800 genes differed from zero. In the one-class analysis we used false discovery rates (FDR) of up to 25%. SAM was also used for a two-class analysis to compare the mean differences between the gene intensity values categorised by the clinical and epidemiologic variables listed in Table 1. An FDR of 5% was used for two-class analysis.
To identify distinct gene clusters we performed a two-way hierarchical cluster analysis as described by Eisen et al. The dendograms were viewed using Tree View , http://rana.lbl.gov/EisenSoftware.htm. All genes identified by SAM were submitted to Onto-Express (version 2) http://vortex.cs.wayne.edu:8080/index.jsp to identify current gene ontology classifications. OntoExpress was chosen because it interprets the probability that a particular molecular function, biological process or cellular component occurs by chance in the context of the genes represented on the microarray being used.
The standard statistical t-test (assuming unequal variances) and the nonparametric Wilcoxon rank sum test were used in conjunction with the SAM two-class analysis to examine the potential differences in gene expression with respect to the variables outlined in Table 1. For the t-test and Wilcoxon test statistical significance was set at a p-value <0.01.
Application of this method to the 23 CFS subjects identified no genes with expression variance statistically greater than the average that would provide evidence for heterogeneity of the CFS sample.
Identification of differentially expressed genes in CFS subjects by clinical or epidemiologic variables.
Clinical or epidemiologic variable
t-Testb (unequal variance)
Type of fatigue onset (gradual or sudden)
Illness group (1 or 2)
Age (≤50 or >50 years)
Body mass index (<25 or >30)
No. of symptoms (4 or ≥6)
Duration of illness (≤10 or >10)
It is thought that CFS is a heterogeneous illness since a single cause of CFS has not been identified and it is thought that various kinds of physiologic stressors such as infection, trauma and toxins can trigger the development of CFS in susceptible individuals. A major difficulty in identifying etiologies for CFS is that the case definition requires a minimum duration of six months of illness. In most studies, subjects have been ill many years, making it difficult to detect initial disease triggers, as causal factors may be difficult to detect or are no longer present. In addition, in many diseases, factors associated with disability are distinct from causative factors. Biomarkers have the potential to give clues to disease etiology as well as mode of action.
In an attempt determine whether CFS was a single or heterogeneous illness, we used microarrays to profile the expression of 3,800 genes in 23 women with CFS. We analyzed the array data using three statistical tests: 1) a program specifically designed for the analysis of microarray data (SAM), 2) a parametric t-test, and 3) a nonparametric rank sum test. One class analysis by SAM failed to detect differences in gene expression profiles of the CFS subjects because many of the genes introduced noise into the process, masking the differences that were evident in two-class analysis. In the two-class analysis the only variable that differentiated the CFS subjects was type of fatigue onset, that is, whether the women described their fatigue as occurring suddenly over the course of a week, or gradually, over the course of months. Different gene expression profiles among those who describe a difference in illness onset imply distinct etiological or triggering events, and shows that these differences are maintained well into the disease process. All the other variables thought to be important in characterizing and defining CFS did not have any differentially expressed genes associated with them when CFS subjects were grouped accordingly. Interestingly, this is not the first time that type of fatigue onset has distinguished people with CFS. DeLuca et al  showed that CFS subjects with gradual onset tend to develop CFS-type physical symptoms as a variant of a psychiatric disorder, while CFS patients with sudden onset may be more closely associated with a non-psychiatric etiology (i.e. a viral or infectious etiology). Mawle et al.  reported that CFS patients with gradual onset had more major life events occurring in the year prior to onset than did patients with sudden onset. In this study the 1994 CFS research case definition  was strictly used in designating CFS caseness, therefore most psychiatric conditions, (other than Major Depressive Disorder which is comorbid in many people with CFS, or any chronic illness) were exclusionary. We believe that this considerably reduced the other possible symptoms or conditions that may be highly correlated with fatigue and could potentially confound our data.
Our findings of differentially expressed metabolic and RNA processing genes make both biologic and physiologic sense relative to CFS. We identified differences in purine and pyrimidine metabolism, glycolysis, oxidative phosphorylation, and glucose metabolism. Oxidative phosphorylation and the ATP generated by this process are the major source of energy for the normal function of most cells in the body. Metabolic changes are known to take place, and in some instances drive the pathophysiology of a number of chronic diseases. Subjects with sudden onset CFS often describe an infectious, viral-like illness as the initiating process. It is well-known that many RNA processing proteins are central to the effective action of the antiviral interferon . Alterations in effective antimicrobial responses may also explain the chronic fatigue state.
The nature of the specimen determines the view of the disease reviewed by gene expression profiling. In CFS there are no anatomical lesions to sample. Peripheral blood is an accessible source of circulating cells that reflect systemic changes, so it is a good starting point to profile diseases that have no lesions, or lesions that are inaccessible. However, peripheral blood mononuclear cells are themselves very heterogeneous, including B and T lymphocytes, monocytes, and natural killer cells. Changes in gene expression could be due to changes in the cellular composition as well as to differences in cellular activities. However, several groups including our own, [21, 22], have surveyed the magnitude of variation in gene expression patterns of peripheral blood and found it to be fairly limited. This study, as well as an earlier study of PBMCs in CFS indicate that the peripheral blood does detect relevant gene expression differences. Fractionation of the PBMC population may give different insights into the disease process, and will be important to further characterize the pathophysiology of CFS.
The study must be interpreted with caution, as the number of subjects is small and the gene profiled represent a fraction of those potentially of importance. However, these data do support the idea that CFS is a heterogeneous illness with a biochemical basis to explain the fatigue. Different gene expression profiles among those who describe a difference in illness onset imply distinct etiological or triggering events, and shows that these differences are maintained well into the disease process. The results in this study demonstrate the utility of gene expression profiling to characterize an illness at the biological and physiological level. This should advance the cause for defining CFS at a molecular resulting in diagnosis and possible identification of causative agents.
Although the full implication and biologic significance of the differentially expressed genes discussed above are not yet completely understood, the genes may serve as a platform to further explore relevant mechanisms of pathogenesis and improve the understanding of the molecular basis of CFS. It will be important to discover how these differential patterns relate to non-CFS subjects and to expand the number of genes examined. Our work shows that microarrays are an important tool in understanding the wide spectrum of genes likely involved in complex diseases such as CFS.
The authors sincerely thank Dr. William C. Reeves for his guidance, expert advice and the many discussions had on interpreting the results in a public health context. His constructive input for drafting this manuscript and his skilled leadership are much appreciated.
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.