those wild, wacky Covid-19 data points

I have noticed for awhile that the CDC’s Covid-19 data doesn’t agree with other sources, which don’t agree with each other. Looking at my home city (and County) of Philadelphia, the CDC’s numbers have been consistently higher for many months. This matters because government agencies, employers (including mine), and individuals are basing decisions on these numbers, often the CDC numbers.

Let’s look at today’s numbers for Philadelphia. I’ll look just at “confirmed cases” because that seems to be the most readily available and frequently updated by all sources, although really I think we should be focused more on deaths at this point, because deaths (although morbid) gives you some information on cases and vaccination/immunity combined. In other words, if cases are high but deaths are low, you would have an annoyance but not a major problem. Nonetheless, let’s look at those cases for Philadelphia today! I’m writing this on Sunday, November 21, 2021. I’m using the links from my coronavirus tracker post.

  • CDC: 111.55 / 100,000 population / 7 days (data from November 13-19)
  • Pennsylvania state health department: 86.4 / 100,000 population / 7 days (data from November 12-18)
  • Covid Act Now: 116.2 / 100,000 population / 7 days (data from November 20 which they describe as a 7 day average provided by the New York Times)

There are a number of things that could explain differences in the numbers. First, the time periods the data represent varying slightly by source. Second, whether the data represent the date the test was done, the test was reported, or the estimated date of infection. Generally I think what is reported is the date the test was done. This is hard data of a sort, but it introduces a time lag as numerous and scattered labs report their data. The data you are looking at might not yet represent all the data available on a given day, and it might be corrected retroactively, meaning if you check what today’s number was a week from now, you might see a different number from today. Finally, when reporting data for a location like a county, it may be important whether they are reporting all tests done in that county or matching tests to the home addresses (or employer addresses?) of the individuals tested. Philadelphia, for example, has a huge health care industry with a lot of commuters not just from surrounding counties in Pennsylvania but parts of New Jersey and Delaware. (States were never the right entities to track this pandemic, it should obviously be done by entities covering metro areas.)

If all the sources were using similar data but using slightly different time periods or calculation methods, I would expect some differences but I would expect the differences to be random. The state health department numbers are consistently lower, however. I am hoping that might be because they are doing a better job of matching tests to home addresses.

Leave a Reply

Your email address will not be published. Required fields are marked *