# Estimating Covid Deaths

Twenty-five percent of all Covid cases in the United States occurred in the month of November. Many healthcare professionals are concerned about the impact of a Thanksgiving surge. With total cases sky-rocketing, people wonder what the eventual outcome will be. How many hospitalizations? How many deaths?

While forecasting sounds simple, the math is actually quite complex. There is a lag between when a case occurs and when a hospitalization or death occurs. Deaths and illnesses in the United States also follow a consistent seasonal pattern. Taking seasonality and lag into account complicates forecasting.

When faced with difficult and complex problems, people often rely on heuristics to understand the problems. Heuristics break down complex problems into simpler ones that produce a 'good enough' answer. We can use heuristics to predict Covid outcomes. Our answers won't be perfect, but we'll get a rule of thumb that is 'good enough' for everyday use.

Problems that trend, have seasonality, or lag between events are often referred to as 'time series analysis' problems. They are common in stock market analysis, sales forecasting, and climatology. Many of the models rely on something called regression to the mean. Regression to the mean is a concept that suggests things always move back to their average. Sports provides a good example. On any given day, a bad professional team can beat a good professional team. If the bad team has a great day when the good team has a terrible one, the bad team can win. Given enough games, however, the better team will win out. This is why so many sports rely on multiple rounds or games for playoffs or matchups.

Time series analysis relies on this same type of mathematical trend. When something is well above its statistical norm, the expectation is that values will decline to meet that norm. When values are below, they will begin to rise above. For example, if stock prices are trading well above their statistical average, the expectation is that they will drop below the average soon and revert--or regress--to their average.

This is different than gambling trends that use independent events like a roulette wheel. Just because there have been five red spins in a row doesn't mean the roulette game is 'due' for a black. Every spin is independent of the prior spins. An athlete, on the other hand, plays within his or her skill level. They cannot play above or below their level indefinitely. Eventually, they trend towards their true skill level.

Covid outcomes operate within the confines of the disease's mortality and infectious rate. We can expect outcomes to occur within a band that represents that rate. If cases are well above, they will eventually revert to the 'normal' level. If they are below, they will rise up to meet the level. If you think about it, this makes sense. When cases rise, people take steps to reduce the effect of the virus and cases eventually start to go down. When cases dip, people drop their guard, take fewer precautions, and cases go back up.

We can use this knowledge to create a simple rule of thumb that predicts Covid deaths based on the number of cases each day. In order to do so, we need to calculate the average number of deaths per case. Generally, this is a simple matter of dividing the number of deaths by the number of cases. In our scenario, however, this won't work. People don't test positive for Covid and then die immediately. There is a lag between when symptoms start and when death occurs. __According to the CDC__, the lag from onset to death averages 19 days. To calculate the percentage of people that die from Covid based on number of positive cases, we need to divide the number of deaths today by the number of cases 19 days ago. You can do this in a spreadsheet or program by shifting the cases forward by 19 days. This will line up today's death count with the case count 19 days ago. Divide each death count by the shifted case count, and you'll have a list of percentages representing the number of deaths per case.

For our analysis, we'll use a seven day moving average of deaths and a seven day moving average of cases for the calculations. A seven day moving average is a common way of smoothing the big swings in data. The data is taken from the __CDC website__. As you look at the charts for cases and deaths, you'll notice a dip around Thanksgiving. Unfortunately, this isn't due to a drop in cases or deaths but holiday reporting. Many agencies don't report over the holiday, so cases are underreported during that time and overreported for the week or two following. This doesn't mean the data is untrustworthy. It just means that data collectors have families and holidays and days off just like everybody else. After they get a holiday, it takes time to catch up. The data has just about caught up, but is still a little bit behind.

We can display deaths divided by cases in the following chart. In the early days of the virus, there was insufficient testing, so the ratio of deaths to cases was abnormally high. Using this early data will skew our percentage calculation. If you look at the chart, the percentages seem to stabilize around mid to late July. To eliminate the skew from the early counts, we'll use July 21st as the cutoff day for our percentage calculations. Making this adjustment, the average percentage of deaths to cases comes to 1.72%

To improve our understanding of the results, it helps to develop a range. The standard deviation tells us how much the results vary from the actual average. The standard deviation for our data is 0.0018%. Statistically, if we take this number and multiply it by two, it captures 95% of the outcomes in our data set. So, if we take our standard deviation, multiply it by two, and add it to the average percentage value, we get about 2%. If we multiply the standard deviation by two and subtract it from our average, we get about 1.35%. Based on this information, our range of expected outcomes is 1.35% to 2%. To make the math easy, if there were 10,000 cases today, we'd expect between 135 (10,000 times 1.35%) and 200 (10,000 times 2%) deaths 19 days from now, with the actual estimate being 172. We can put all of this together in a chart, compare it to actual numbers, and see how close we get.

As you'd expect, in the early days, we're off by quite a bit, because the shortage of tests creates an abnormally high mortality rate. Once we get into July, our model works pretty well until we hit the Thanksgiving day hiccup. You can see, though, that the actual and predicted are converging, and the two will be back in synch within a week or so.

Since we shifted the cases forward by 19 days, we can use the number of cases from the last 19 days to predict the next 19 days. Starting 19 days ago, we multiply the cases from that day by our percentages and get the range for today. Eighteen days ago will give us the range for tomorrow. Seventeen days ago gives us the range two days from now. You get the idea. I'm writing this December 6th, so today's cases give us the range for Christmas Eve. The chart below shows our predictions. It predicts 2,700 deaths for tomorrow and 3,200 deaths for Christmas Eve. If we add up each day, we end up with projections of 42,610 to 65,524 total deaths between now and Christmas, with 54,067 total deaths expected.

The unfortunate thing about our model is that it's based on events that have already occurred. The people in the projected death column are already sick. Barring some event that dramatically alters the course of the virus, these victims are already on their path. There is comfort in knowing that over 98% of the people who test positive today will survive, but that is small comfort for those whose loved ones are in the 2%.

We also need to recognize that--epidemiologically--these are big numbers. Influenza averages about 35,000 deaths per year. We are about to go through two flu seasons in less than three weeks. I am publishing this on the anniversary of Pearl Harbor, which killed 2,403 Americans. Covid is killing more Americans every day than died at Pearl Harbor. Covid is currently the third leading cause of death in the country.

Seeing the number of cases and calculating the cost two to three weeks down the road has been reduced to a simple multiplication problem. It should motivate each of us to take every precaution necessary to protect our friends and loved ones, and it should give us hope and satisfaction when the calculations show us turning the corner.