Math is Sometimes Hard-ish: Tracking Down Some Base COVID Numbers

Readers of this blog possibly recall that, early in this Covidiocy, I, as a numbers/analysis/model builder guy, wanted to see the total number of deaths at the end-of-the-year versus projected deaths, because that way, it would be possible to put a box around the total number of deaths *from* COVID versus deaths attributed to the virus, or, as the CDC puts it, deaths ‘involving’ the virus. One can never, in statistical analyses of any complexity, come to the ‘real’ number in which one has something like certainty – once you’ve got a lot of moving parts, you’ve introduced too much uncertainty. In this case, the CDC will end up with something like 3 million death certificates this year. Each of these certificates will have been filled out by a fallible human being. Most deaths in this and any year involve elderly very sick people with lots of health problems, so that a huge number, I’d guess the vast bulk, of ’causes of death’ involve judgement calls by the doctor or coroner, such that attributing a death to COVID is in many cases inescapably uncertain.

Let me take 2 examples from my own life: my father died of pneumonia. He got it in a nursing home, which he was in due to a series of strokes leading to dementia. The strokes were preceded over the years by a number of hear attacks. He was 88 years old, so, in my last conversation with his doctor, I was told that he was experiencing a general collapse of his systems – he was old, and sick, and his body was simply shutting down.

So, what would a layman say he died of? What does a doctor put on a death cert?

My sister had severe, crippling arthritis for decades. She also spent the first half of her life morbidly obese and smoked. She thinned and stopped smoking in her 40s. She had to take immunosuppressants for her arthritis if she were to do so much as stand up. After decades of this, she developed cancer, which eventually killed her. One’s body’s ability to fight off cancers is severely damaged by years of immunosuppressants, and so cancer is very common among those being treated for severe arthritis. She was 73.

So, what killed her? What goes on the death certificate?

CDC rules, which are designed to facilitate compilation of statistics, require the medical person filling out the death certificate to put something down according to pre-specified categories. ‘Old Age’ isn’t one of the categories. There’s a part 1, which lists whatever the attending physician thinks the immediate causes of death were, and a part 2, which lists contributing causes. As explained here.

The way the CDC collects statistics off death certs is a species of forced ranking, a very dubious statistical practice. ‘Old age’ and ‘I don’t know’ are not categories; the doctor may be stone certain or highly doubtful of the cause of death, but the system records everything as if it is known – both cases come off as certain, once they hit the database: only yes or no answers allowed. Over-certainty is enforced by the very mechanisms used to collect the data.

This is why the CDC numbers list deaths ‘involving’ COVID: if COVID shows up on the death certificate, some guy at the CDC is going to enter that into their systems, and some data analyst is going to query that data, and what he’s going to get is every death where COVID appears on the death cert in either part 1 or 2. So: answering the question ‘how many people died *of* COVID in 2020 is simply not answerable from the available data in any but a very general sense. What we don’t know and cannot reasonably assume includes:

  • Were the rules consistently applied across time and space? Did a doctor in New York in early April, a coroner in Nebraska in June, and a doctor in California in November each apply the rules in the same way? There are reasons to suspect not.
  • Did the rules, and how people understood them, stay the same? Again, rules were changed at least a couple times.
  • Did the doctor correctly characterize the role COVID played in the deceased death? I.e., did the person die *of* Sudden Acute Respiratory distress or complications thereof, or was he simply diagnosed with COVID while dying of something else? Or did – what I think happens, but hey, I can’t be sure – a very sick person catch something very much like the flu that pushed them over the edge?

Note here that I’m assuming no pressure, no panic, just people trying to do a very difficult job.

All this is to say: the upper limit to the number of COVID deaths will be the number of people who died in 2020 above the number of people who would have died anyway. If, say, 2.93 million US deaths were expected in 2020, and 3.03 deaths occurred, then, at most, 100,000 people died of COVID.

More or less. There are some other factors that could affect this, which range, in my expert opinion, from unlikely to far fetched. These other possible factors more or less come down to the following: that the lockdowns and masks or some other factors (e.g., people being extra careful this year about washing their hands) reduced overall deaths from other causes to the point where COVID deaths could be much higher and yet still not push the overall death totals higher.

Offsetting this claim would be the more plausible claim that the stress of lockdowns, job loss, constant abject terror, and deferred or skipped medical treatment due to these other factors would cause a significant number of additional deaths in themselves. These are, I think, the most stressful times since the end of WWII.

Putting it algebraically:

Actual deaths – expected deaths – additional non-COVID deaths + non-COVID deaths prevented by lockdowns, etc. = COVID deaths.

Thus, for example:

  • 3,030,000 actual deaths MINUS
  • 2,930,000 expected deaths MINUS (100,000)
  • 150,000 additional Non-COVID deaths PLUS (-50,000)
  • 100,000 Non-COVID deaths prevented EQUALS (50,000)
  • COVID Deaths

So, in this example, the maximum number of deaths caused by COVID is 50,000.

I stress that the only numbers at all certain here are the total number of 2020 deaths and the projected number of 2020 deaths. While the CDC does in fact keep track of excess deaths by category, so that it is in theory possible to come up with an Additional Non-COVID Deaths number or even lives saved via, for example, reduced traffic fatalities because people drove less, that number will suffer from the same uncertainty as the COVID deaths numbers themselves: they will represent whatever the attending physician or coroner put on the death cert. This uncertainty can be mitigated to some extent, but not eliminated. The number of lives saved from non-COVID death by COVID is even more speculative, to put it generously. We’ll get into this later in the analysis.

The whole point of this exercise is to focus attention on the numbers we do know with some confident, that are less subject to human judgment and error, and acknowledge that other numbers are inescapably uncertain, and will always have a large element of human judgement involved in them.

The next challenge is getting those numbers together. Two should be easy in theory: total 2020 deaths should be available soon; projected 2020 deaths are what we’ll look at now. The other numbers – non-COVID excess deaths and non-COVID lives saved – are going to take some work.

Thanks to reader daledykes who sent me this link, which pointed me to this CDC document, which explains something I’ve long wondered over: how can the UN/WHO projections for 2020 US deaths – .888/100K – be materially different from the CDC’s estimate – .858/100K? As I’ve observed here, the UN estimates looks very much like the projection of existing trends: there’s an obvious upward trend to death rates in the US over the last few years, one would imagine corresponding to an aging population? Or? My attempts to decipher the CDC’s numbers – admittedly, I didn’t spend a ton of time on it – led me to conclude that they were doing some fancy adjusting somewhere, because it wasn’t extrapolating from the trend obvious in the UN data.

And here’s the explanation:

Weekly numbers of deaths by age group (0–24, 25–44, 45–64, 65–74, 75–84, and ≥85 years) and race/ethnicity (Hispanic or Latino [Hispanic], non-Hispanic White [White], non-Hispanic Black or African American [Black], non-Hispanic Asian [Asian], non-Hispanic American Indian or Alaska Native [AI/AN], and other/unknown race/ethnicity, which included non-Hispanic Native Hawaiian or other Pacific Islander, non-Hispanic multiracial, and unknown) were used to examine the difference between the weekly number of deaths occurring in 2020 and the average number occurring in the same week during 2015–2019. These values were used to calculate an average percentage change in 2020 (i.e., above or below average compared with past years), over the period of analysis, by age group and race and Hispanic ethnicity.

Excess Deaths Associated with COVID-19, by Age and Race and Ethnicity — United States, January 26–October 3, 2020

Instead of following the obvious trendline, the CDC instead uses an average over the previous 5 years. The difference between the .888/100k and the .858/100k results in a built-in number of excess deaths of just under 100K.

I saw no explanation of this decision. The analysts, who would have been fired from any non-government job for proposing such a nonsensical and unneeded ‘adjustment’ to projected numbers upon which life and death decisions will partly be made, simply choses to use a lower number – the average of the previous 5 years – rather than a higher number – the extrapolation of an obvious trend. Sure, an aging (or not) population, a different mix of subpopulations, or perhaps some other reasonable idea, might cause someone to propose deviating from the trend. I would say – as a freaking EXPERT in this sort of thing – that the uncertainty in any of of these numbers probably dwarfs any possible value in making such adjustments. It’s an estimate. Go simple, so that people looking at your numbers are clear on what you’re doing. Clarity is valuable over the illusion of greater ‘accuracy’ created by geeky adjustments.

Me? I would have used the past 5 years to get an average weekly distribution of deaths, then apply the .888 factor to that distribution, to get estimated weekly projected deaths, then compare actual weekly deaths to those numbers to determine an excess. Esasy-peasy, and CLEAR.

Here’s a graph from the numbers:

CDC uses per 10K, which is why the decimal place shifted. Using OpenDocs spreadsheet, with which I am not familiar. Could have made this pretty in Excel, but the point should be clear.

To be clear: if deaths followed the simple trend obvious from the above data, the CDC would, according to their methodology, record 100K excess deaths. There’s 100K excess deaths built into the calculations.

It’s not just that we now need to reduce any 2020 excess death numbers coming out of the CDC by 100K, it’s that only a fool would trust their analysis of anything after seeing this level of incompetence (to be generous).

Conclusion: I’m going with the UN/trendline numbers going forward.

I’d be happy to hear any critiques of this analysis and methodology. Next, I’ll try to get to the bottom of total 2020 deaths.

ADDENDUM: One other critical piece: what was the population of the US in 2020? I’ve been using 330M. 2020 Census shows 332,601,000. 15 minutes of searching didn’t turn up what number the CDC is using. It should be obvious that since all these death rates are being applied to some base number in order to produce expectations, it would behoove the report to state what that number is. But it doesn’t, unless I missed it. I recall – for what that’s worth – seeing a 326M number during one of my earlier forays into the CDC numbers (I need to start copying down EVERYTHING), but can’t confirm. If the numbers were -not saying they are, just hypothetically here – off by that 6M, that’s off by 6 * 100,000 * 8,880 deaths – 53,280. In other words, we’d be reducing the excess by another 50K. I’ll keep looking.

Author: Joseph Moore

Enough with the smarty-pants Dante quote. Just some opinionated blogger dude.

2 thoughts on “Math is Sometimes Hard-ish: Tracking Down Some Base COVID Numbers”

  1. Joseph, thanks for looking into this. I have a friend who’s a total stats geek. He doesn’t wholly agree with Murphy. But he does agree that the CDC’s working off five-year average is whack. And I’ve also noted that the UN/WHO projection of .888 is much closer to Murphy’s extrapolation than the five-year average.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s