Vernon, S. and Reeve, W. 2006. The challenge of integrating disparate high-content data: epidemiological, clinical and laboratory data collected during an in-hospital study of chronic fatigue syndrome.
This paper provides an introduction to the recent efforts by the CDC to produce a model for CFS. These efforts resulted in the simultaneous publication of 14 research papers in the Journal Pharmacogenomics in April, 2006. Almost 20 researchers, most of them working for free, analyzed a large data set gathered from a 2-day hospital visit by CFS patients and controls in Wichita Kansas in 2003.
Vernon and Reeves, the two lead investigators overseeing this effort, begin the paper by placing CFS in the context of other complex diseases as AIDS, asthma and cancer. Refreshingly they also, almost for the first time that I can remember, posit a link between the fatigue seen in CFS with that often seen in cancer.
They posit that ‘complex’ chronic diseases such as these cause alterations in the homeostatic mechanisms in the body. The CDC proposes that these processes are so complex that they require a multi-disciplinary approach to study them.
The approach the CDC took was to take a fairly large set of CFS patients and controls, subject them to a wide variety of tests including gene expression tests and then basically give the data to four groups of investigators and asked them to come with models that would contribute to the classification, diagnosis and treatment of CFS.
For several years now the CDC has been using a new (and rather arduous) method for finding its study participants called community sampling. Designed to eliminate recruitment bias and provide ‘bullet-proof’ control groups, community sampling involves randomly telephoning large numbers of people in order to build a randomly recruited representative set of study participants.
In this case the CDC first screened 56,000 people in Wichita, Kansas for CFS via telephone. Of these they took a core group of 7162 people with fatigue and evaluated them further via telephone and in a clinic over three years. After discarding those with exclusionary factors (another disease, most psychiatric disorders, idiopathic fatigue) they were left with only 70 people who had CFS. Of these 53 were willing to participate in the study.
(But were they CFS patients? See below*). They were matched with 58 controls of the same sex, age, race and body mass index. The CDC also included 59 people who were fatigued but did not meet the criteria for CFS, 41 people with depression who met the criteria for CFS, and 39 people with depression and fatigue but not CFS. According to this paper 99 people with CFS participated in these studies.
CFS with Depression
Fatigued but not CFS
Fatigued Not CFS With Depression
When we look at the physical characteristics of the groups we see that the CFS group was quite ‘large’; about 85% of them were overweight with 43% being obese and 12% morbidly obese. Since obesity, in particular, morbid obesity is itself a risk factor for many problems this factor alone could have skewed the results.
The CDC controlled for this factor, however, by having an equally heavy healthy control group (81% overweight). The median age was about 50, about 85% were women and the group was almost entirely white (95%).
How does this stack up to obesity rates in the nation as a whole? The CDC estimates that about 65% of US adults are overweight and about 30% are obese. This puts the overweight and obese rates in CFS at about 35% higher than normal – perhaps not a surprising finding given the debility, isolation and stress associated with CFS.
The higher median age range for the CFS patients probably also contributed to the increase. One of the studies indicates this could partially reflect impaired metabolism in CFS patients. On the other hand Caucasians have a lower obesity rate than some minorities and this population was all white.
The tests are obviously an extremely important part. It is these data point, after all, that the investigators will use to come up with their models. Obviously the more data points the better – but there are financial limits; the CDC had to choose which data points they felt would contribute most to increasing our understanding of CFS.
Over two days all participants did the following;
- Physical tests – temperature, height, weight and body mass index
- Laboratory tests – Complete blood count (CBC) (c-reactive protein, ALT, SGPT, albumin, AP, AST, bilirubin, calcium, CO2, chloride, creatinine, glucose, potassium, TP, sodium, BUN); HPA axis – salivary cortisol, androstenedione, SHBG, testosterone, ACTH, DHEA, DHEA-S, T3, reverse T3, T4, TSH, insulin-like growth factor, estradiol and progesterone (women); Cytokines – TNF-a, IL-6, sR-IL-6, Catecholamines – norepinephrine, epinephrine, normatinephrine, neuropeptide Y. Mineralcorticoids – renin, aldosterone
- Autonomic Nervous System status – blood pressure, heart rate – lying down, standing
- Psychiatric evaluation
- Medical Outcomes Survey Short Form – measures functional impairment,
- Multi-dimensional Fatigue Inventory – general, physical, mental fatigue, etc.
- CDC Symptom Inventory
- Cambridge Neuropsychological Test Automated Battery – cognitive test measures short-term memory, patterns recognition, reaction time, etc.
- Two night sleep study
- Peripheral blood gene expression – 20,000 genes
(Gene polymorphism data was not mentioned in this introduction.)
There’s a lot here – the CDC called it ‘an exhaustive list’ of clinical, epidemiological and laboratory data. If one excludes the gene expression results, however, this was not the largest set of laboratory measures ever done on CFS patients in a study.
A 1-week intensive twin study done by the Buchwald team at the University of Washington measured pathogen prevalence, immune factors, did brain imaging and sophisticated tests of orthostasis and aerobic functioning as well as a three day sleep study.
The larger size of this study apparently precluded such an intensive effort. This study was unique in several ways, however. It is the first attempt to integrate gene expression data and laboratory and clinical data to build a statistical model of CFS. That, indeed, must have been exhausting given the enormous amounts of data generated by the gene expression studies. Some of these studies are of a magnitude of complexity above that has been attempted before.
The laboratory data covered a wide array of neuroendocrine factors plus other markers believed to reflect the allostatic status of the cardiovascular, immune and other systems in the body. C-reactive protein, for instance, is used to assess the inflammatory status of the body in one study.
While the laboratory data on neuroendocrine factors is extensive the data on the immune system and cardiovascular systems is not. The immune data mostly consists of pro-inflammatory cytokines known to interact with the neuroendocrine system. Some systems are not covered at all; there are no measures of oxidative stress, for instance, in these studies.
With the exception of the gene expression data much of this data is not new. Vernon and Reeves noted that studies in CFS (including the gene expression studies) have generally only uncovered ‘subtle’ perturbations occurring in different systems, in particular, the central nervous system, immune system and metabolism.
Given this history the CDC could not have expected to find other than subtle abnormalities in their laboratory data. They hoped, however, that an analysis of this large data set will reveal patterns of disruption that will differentiate CFS patients from controls.
The CDC, at this point anyway, seems to believe the problem in CFS, as we know it today, is the result of multiple failures, some perhaps subtle, that combine to create the illness known as CFS. The CDC’s choice of the neuroendocrine system to focus around indicates that they believe that the interactions in this large and complex system probably are central to the disease. This system regulates many of the processes that occur in the body including those of the immune system.
This is not the first time researchers have attempted to differentiate CFS patients from controls using common laboratory measures such as the complete blood count (CDC). In what was termed a ‘landmark paper’ by the editors of the Journal of Chronic Fatigue Syndrome, Suhadolnik et. al. published in 2004 that some CBC and immune measures including several involving the RNase L pathway were able to differentiate CFS patients from controls.
Four teams analyzed the data. Their makeup was novel; the first team had computer science, physics and statistics experts as well as an immunologist and psychiatrist; team two had chemical engineering and bioengineering experts as well as an immunologist, pathologist and molecular biologist; team 3 had mathematics and computational chemistry experts as well as a cardiologist and an infectious disease specialist, and team four had mathematics and bioinformatics experts as well as an epidemiologist and pediatrician.
Each treated the data very differently. Team 1, which produced four papers, attempted to delineate the heterogeneity present in CFS. Team 2, which produced three papers, attempted to find the central factors that differentiated the four subgroups.
Team 3, which produced two papers, used the symptom questions to determine the validity of the current CFS classification. Team 4 used the lab, gene and genetic data to differentiate the four different groups according to their allostatic loads.
A different approach
The CDC believes three factors can explain why our understanding of CFS has progressed so slowly over the past 20 years.
First they believe that patient recruitment from specialty and referral clinics results in ‘recruitment bias’. They believe that not only are different kinds of patients drawn to CFS clinics in general, but that each clinic has its unique set of patients. This could make comparing results between studies problematic.
They went so far as to say that this approach, which has dominated CFS research, ‘precludes (a) critical comparison of results’, i.e. makes it impossible to critically compare results from one research group to the next.
Secondly they believe the control groups so important in the study process have been mostly flawed; either they are not present or they are ‘controls of convenience’. Getting healthy controls can be difficult. Often times they come from workers in a hospital or students and may or may not be matched to CFS patients with regard to sex, age, gender or body mass index.
Third they believe the process of diagnosing CFS (i.e. the definition) is flawed.
Most researchers would surely agree with all three points. The first two are standard problems in many research studies but it is possible given the heterogeneity in the CFS population that they are accentuated in CFS research studies.
If CFS is full of subsets then it is possible that certain subsets could be drawn towards certain clinics. Patients with depression might be more likely to end up in clinics lead by psychologists, etc. A consensus seems to gathering that the biggest problem in CFS research is a less than precise definition that allows for inclusion of subsets which end up obscuring research findings.
The CDC cast a rather wide net; their inclusion of patients with idiopathic fatigue meant they took a look at the causes of fatigue in general. Their inclusion of CFS patients with major depression – a subset of patients often excluded from research studies – further broadened their sample base.
One could argue that a larger sample of just CFS patients would have aided them greatly in finding verifiable subsets. Oddly enough, however, the CDC appears to have been handicapped in this regard by their sampling protocol. In the end, if I am reading the data correctly, they only had 70 pure CFS patients that were eligible for this study.
One of the goals of the CDC’s approach was to ‘capture’ that heterogeneity and thus provide a better classification scheme for CFS. This can only be done when using large numbers of CFS patients and large data sets. This is an admirable goal but one wonders if 99 CFS patients, 41 with depression, was a large enough number to do so.
The conclusion in Vernon and Reeve’s overview is fascinating. It says that the integration of different body system measures and clinical features will allow us to identify the subsets present in CFS and the disturbed physiological pathways at work in them; that it will demonstrate that CFS (and other illnesses) with disabling fatigue can be medically explained.
The CDC believes that algorithms (complex formulas) that integrate multi-systemic data will produce an objective diagnostic marker, decipher the pathophysiology and create custom therapies for CFS patients.
Given the future tense used it’s obvious that none of the above were achieved in these studies; i.e. they didn’t identify verifiable subsets, they didn’t medically explain CFS, they didn’t give us a diagnostic marker, etc. Vernon and Reeves do, however, believe the results indicate that they are on the right track and that given time they will; they are embarking on new studies in Georgia to expand and verify their results.
- The CDC’s Pharmacogenomic’s Studies II: The Allostatic Stress In CFS
- The Pharmacogenomics Studies on Chronic Fatigue Syndrome (ME/CFS) III: The Gene Expression Studies
- The CDC’s Pharmacogenomic’s Studies IV: Heredity and Chronic Fatigue Syndrome (ME/CFS)
- The Pharmacogenomics Studies on Chronic Fatigue Syndrome (ME/CFS) V: The Subsets
Reeves, W., Wagner, D., Nisenbaum, R., Jones, J., Gurbaxani, B., Solomon, L., Papanicolaou, D., Unger, E., Vernon, S. and C. Heim. 2005. Chronic fatigue syndrome – a clinically empirical approach to its definition and study. BMC Medicine 3:19.
Suhadolnik, R. A., Peterson, D., Reichenbach, N., Roen, G., Metzger, M., McCahan, J., O’Brien, K., Welsch, S., Gabriel, J., Gaughan, J. and N. McGregor. 2004. Clinical and biochemical characteristics differentiating chronic fatigue syndrome from major depression and healthy control populations: relation to dysfunction of the RNase L pathway. Journal of Chronic Fatigue Syndrome 12: 5-35.
*Studying CFS or something else or does it matter?
Did the CDC analyze CFS patients or somebody else? The overview states that the CFS patients were identified as having CFS not in 2002/2003 when the study was done but during the initial 1997-2000 survey. The Fukuda test is just a series of quick questions that takes no more than a minute to answer. Why did the CDC not refer to the current status of its study participants?
We can answer this question by examining other studies these patients appear to have participated in. One such study that took place at the same time found that only 13% of the people labeled with CFS from 1997-2000 still met the criteria for CFS in 2002/2003 (Reeves et. al. 2005).
Almost 60% of them were now labeled as being ISF – ill but with insufficient symptoms or fatigue to meet the Fukuda definition. Eleven percent of them were in remission (presumably well) and 17 percent had developed exclusionary conditions that prevented them from taking part in the study.
The reduced numbers of CFS patients available were offset only a little bit by patients who had formerly been classified as having ISF but who had worsened and now were classified as CFS. It seems likely that the great majority of the ‘CFS’ patients the CDC used for the Pharmacogenomics studies would not have qualified as CFS patients according to the Fukuda definition.
The CDC did not in 2002 have the option of simply pulling more CFS patients out of referral clinics. Because they were committed to getting their patients using a random sampling approach they had the option of mounting another expensive random sampling effort or using the patients they had. In the end they used the patients they had but chose not to refer to their current CFS status.
One must note that the CDC’s approach is much more sophisticated than has been attempted before and that they may be paying the price for that sophistication. It is possible that the CDC is simply uncovering an aspect of CFS that we haven’t seen before – that CFS patients do over time typically slide between the different disease labels we’ve given them.
Almost all CFS studies define their sample population and then immediately do their study. None that I am aware of define have defined their sample set at one point and then attempted to use them from three to five years later. If they had its entirely possible, perhaps likely, they would have run into the same problem the CDC did.
It is also important to note that there is nothing sacrosanct about the Fukuda definition. Almost everyone agrees it is unsatisfactory – it’s too vague, it allows for the presence of undifferentiated subsets, it’s priorities are off, the symptom set it uses is probably inadequate and misleading, in short, its a poor basis for defining a disease.
One study found that in a set of fatigued patients the Fukuda definition did a poor job of elucidating illness severity, i.e. some of the people who didn’t fulfill the criteria for CFS were actually sicker than some of those who did. It may not, in the long run, mean that much that a person does or does not over time fulfill the Fukuda definition.
These were not patients who were mildly ill or were simply tired; measures of disability, symptom severity and illness duration were quite high in both the ‘CFS’ and the ‘ISF” patients. The average duration of illness in the subsets study was 12 years and the CFS and ISF patients were for the most part easily differentiated in that study from the well patients.
Most of the transitioning took place from CFS to ISF not from CFS or ISF to well. (Almost 20% of the ISF patients were determined, however, to be in remission thus they did transition to wellness fairly frequently).
It is also possible that the CDC has uncovered a population of CFS patients who, while they may or may not be well, are able to transition more quickly out of the disease than the CFS patients that find their way to the clinics and centers. This is probably not an unforeseen finding; most chronic diseases have varying degrees of debilitation, the most severe enough, naturally, are seen more often by physicians.
There are other indications that this is not the same kind of CFS patients that have shown up in studies to day. A much higher percentage than normal, for instance, of these CFS patients had gradual onset of their disease (80%).
(Thanks to Mary Schweitzer for bringing up these questions.)