CDC versus CDC: Which Data to Believe?
Posted by Henry Bauer on 2008/08/15
I’ve commented critically, on numerous occasions, in many connections, on the fallacy of accepting outputs from computer models as though they were reliable data. I’ve also noted on several occasions that the so-called “Surveillance Reports” published by the Centers for Disease Control and Prevention (CDC) have increasingly — since the late 1990s — featured estimates rather than reported numbers (for example, see Table 33, below, from The Origin, Persistence and Failings of HIV/AIDS Theory, and the following pages in the book).
Another egregious example of estimates taking the place of reported numbers turned up as I was looking into information about deaths from “AIDS” (= “HIV disease”). That led me to remember that bureaucracies are ill suited to doing, assessing, managing, or reporting matters scientific: bureaucracies are not good at self-criticism; internal disagreements are wherever possible hidden from outsiders and settled by political rather than scientifically substantive negotiations. That’s part of the reason why 21st-century science is becoming riddled with knowledge monopolies and research cartels.
The Centers for Disease Control and Prevention is a sizeable bureaucracy. Some 16 units report to the Director:
Within the Coordinating Center for Infectious Diseases reside four National Centers, for:
— Immunization and Respiratory Diseases (NCIRD)
— Zoonotic, Vector-Borne, and Enteric Diseases (NCZVED)
— HIV/AIDS, Viral Hepatitis, STD, and TB Prevention (NCHHSTP)
— Preparedness, Detection, and Control of Infectious Diseases (NCPDCID)
NCHHSTP houses a variety of programs under 6 “topics”:
— Sexually Transmitted Diseases
— Viral Hepatitis
— Global AIDS
— BOTUSA (Botswana-USA).
[That “HIV/AIDS” and “Sexually Transmitted Diseases” are separate “topics” does not, regrettably, mean that the CDC has now acknowledged that HIV/AIDS is not sexually transmitted.]
Within (presumably) the “HIV/AIDS” topic is the Division of HIV/AIDS Prevention, which has published HIV/AIDS Surveillance Reports.
Within the Coordinating Center for Health Information and Service (CCHIS) reside three National Centers:
— Health Statistics (NCHS)
— Public Health Informatics (NCPHI) (has 5 divisions)
— Health Marketing (NCHM)
[For anyone who is not squeamish about bureaucratic and PR jargon, I recommend highly the explanation of what “health marketing” is (and if you can explain what the explanation means, please let me know)]
Evidently the publishers of the HIV/AIDS Surveillance Reports are quite a few bureaucratic steps away from the National Center for Health Statistics, which publishes the National Vital Statistics Reports (NVSR) and annual summaries of Health, United States (HUS). Perhaps that explains why the data in the Surveillance Reports differ so much from those in NVSR and HUS.
Take the instance of deaths in 2004 from “HIV disease”.
NVSR 56 #5, 20 November 2007, using “information from all death certificates filed in the 50 states and the District of Columbia”, lists by age group (in its Table 1) the numbers of recorded deaths, and the death rates per 100,000, for the ten leading causes of death in each group. “Human immunodeficiency virus (HIV) disease” appears as one of those ten leading causes only between ages 19 and 54. There are listed 160 deaths among 20-24-year-olds, 1468 deaths among ages 25-34, 4826 deaths among ages 35-44, and 4422 deaths among ages 45-54.
However, numbers for some of the other age groups can be calculated because the death rates for them are supplied in Health, United States, 2007 — With Chartbook on Trends in the Health of Americans (National Center for Health Statistics, Hyattsville, MD: 2007). Appendix I confirms what is said in NSVR: “Numbers of . . . deaths from the vital statistics system represent complete counts . . . . Therefore, they are not subject to sampling error”. Table 42 [also featured in an earlier post, HIV DISEASE” IS NOT AN ILLNESS, 19 March 2008] is for deaths from HIV disease:
* Rates based on fewer than 20 deaths are considered unreliable and are not shown.
(Note again, under the heading of Table 42, “Data are based on death certificates”.)
These rates allow calculation of actual numbers of HIV-disease deaths for age groups from 5 through 84 years of age (column F, Table I below), because the NVSR gives not only numbers but also the corresponding rates for each age group, allowing calculation of the factor connecting rate and number, see column D. (The factor is independent of the particular disease but varies with age: it reflects how many individuals are within that age group in the whole population.) Together with the numbers already given in NVSR, this yields numbers of deaths for the whole range from 5 to 84 years of age, column G.
Now compare those numbers with the estimates published in Table 7 of HIV/AIDS Surveillance Report, volume 18, “Cases of HIV infection and AIDS in the United States and Dependent Areas, 2006”, presenting data “reported to CDC through June 2007”) :
For 2004, here is a comparison of the numbers from these two sources within CDC:
The estimates from the CDC are on average 21% greater than the actually recorded numbers. Moreover, the error varies with age group in a remarkably regular way; one that exaggerates the median age of death by more than 3 years.
Now, Table 7 in the Surveillance Report does have this caveat, in small print in a footnote to the Table: “These numbers do not represent reported case counts. Rather, these numbers are point estimates, which result from adjustments of reported case counts. The reported case counts have been adjusted for reporting delays and for redistribution of cases in persons initially reported without an identified risk factor, but not for incomplete reporting” [emphasis added]. Incomplete reporting for 2004 should hardly be a problem, however, in a publication that presents data “reported to CDC through June 2007”; nor would incomplete reporting vary with age group in this remarkable manner, it would be more random.
Such “adjustments” 3 and 4 years after the event are no rarity in these CDC HIV/AIDS publications. For example, deaths “reported” for the 1980s were “adjusted” downwards in wholesale fashion more than half-a-dozen years later, thereby altering the fact that the earlier data had shown deaths to have been leveling off, see Table 33, p. 221 in The Origin, Persistence and Failings of HIV/AIDS Theory:
Note how “reported” deaths for the years through 1986 somehow decreased dramatically between the 1988 report and the 1989 report. Such re-writing of historical facts will be familiar to students of the former Soviet Union, but it is not normally found in scientific publications.
At any rate, CDC unapologetically—indeed, without admitting it or drawing attention to it—routinely publishes considerably revised “estimates”; for example (Table III), for deaths in 2002 as given in the 2005 and 2006 Surveillance Reports. Table 7 in the 2006 Report does not warn that numbers for as far back as 2002 are different from those for the same years in the 2005 Report.
The Technical Notes do warn: “Tabulations of deaths of persons with AIDS (Table 7) do not reflect actual counts of deaths reported to the surveillance system. Rather, the estimates are based on numbers of reported deaths, which have been adjusted for delays in reporting”.
The estimates may be based on reported deaths; but if so, then they are very loosely based on them indeed, since they differ by as much as 38% in some age groups, see Table II above. That adjustments from one year to the next are so similar in percentage terms for the various age groups (Table III); that the differences between actual counts and “estimates” vary in such regular fashion with age (Table II); and that the numbers given are “point estimates” all indicate that the estimates are arrived at by means of some sort of overarching algorithm, computer model, or graphical representation, with—presumably—periodic adjustment of some of the assumptions or parameters defining the model. However, when estimates, no matter how derived, are claimed to be “based on numbers of reported deaths”, one expects that the mode of estimating will be progressively refined over the years to bring the estimates closer to the actual numbers. That has evidently not been the case here: estimated “data” for deaths for 2004 are shockingly different from the reports based on death certificates (Table II).
Once again—or rather, as usual—HIV/AIDS “researchers” imply greater accuracy than is warranted. The “point estimates” in Table II differ from year to year by a couple of percent, so the numbers should never be written to more than 3 significant figures. When they differ from actual numbers as much as in Table III, even two significant figures give a false impression.
The overall description at the beginning of the Surveillance Report is also misleading: “Data are presented for cases of HIV infection and AIDS reported to CDC through June 2007. All data are provisional.” Nothing here about “estimates”, and the reader who scans without careful attention to fine-print footnotes and Technical Notes could easily believe—given that numbers are given to four and five significant figures—that these really are “reported” “data”, not computer garbage-output emanating from invalid models. Nor are readers referred to NVSR or HUS; the only mention of either is in the Technical Notes and does not refer to Table 7: “The population denominators used to compute these rates for the 50 states and the District of Columbia were based on official postcensus estimates for 2006 from the U.S. Census Bureau  and bridged-race estimates for 2006 obtained from the National Center for Health Statistics .”
Why would one publish estimates when actual numbers are reported by a sibling unit in the same bureaucracy? After all, death certificates are a legal requirement, and information from them should be as trustworthy as demographic data ever can be. Is it coincidental that the HIV/AIDS specialists always overestimate?