Introduction

Background

Some basic information about the data set used and the idea of mortality rate. Skip this section to go directly to RLS approach.

State-Level Data

As of today we have 4689 observations across unique 55 states, with 9 columns for each observation, ranging in date from 2020-01-21 to 2020-05-26.

A few sample observations below.

date state fips cases deaths incr.cases incr.deaths smooth.incr.cases smooth.incr.deaths
2020-04-03 New Hampshire 33 540 7 61 2 54 1.33
2020-03-09 Vermont 50 1 0 0 0 NA NA
2020-05-02 Montana 30 454 16 2 0 1 0.00
2020-05-02 Hawaii 15 611 16 1 0 2 0.00
2020-03-18 Illinois 17 286 1 127 0 42 0.33
2020-05-24 Connecticut 9 40468 3693 446 18 392 37.00
2020-03-24 Minnesota 27 262 1 27 0 31 0.00
2020-03-24 Texas 48 857 11 129 4 115 2.00
2020-01-30 California 6 2 0 0 0 NA 0.00
2020-04-15 Virginia 51 6499 195 329 41 410 18.00
2020-05-06 Texas 48 35441 985 1155 30 1062 31.67
2020-02-19 Massachusetts 25 1 0 0 0 0 0.00
2020-03-19 Georgia 13 282 10 89 7 40 3.00
2020-04-21 Georgia 13 19189 810 742 43 768 46.67
2020-04-13 Guam 66 719 6 3 1 74 0.67

Defining Mortality Rate

In theory, mortality rate in any time period = (Cumulative Deaths/Cases) during that time, assuming that we’re measuring the metrics (cases and deaths) correctly during each time period, and we’re attributing the metric to the correct time period.

  • Cases are not really known

A naive approach for computing mortality rate is simply, on any given day, to compute (# deaths)/(# cases). Of course, these numbers will change every day – and in particular the ratio might change every day – so the answer will be a vector of numbers rather than a particular rate.

Notice that the mortality rate (visualized by state above) varies by state but is rather unstable over time. True, during a 2-month period one should expect some variation in mortality rate - based on levels of congestion in the health system, innovations in care, and other factors. However, such factors should cause a few discrete jumps - and a definition of mortality rate in terms of cumulative deaths and cases is simply not consistent with this expectation. Moreover, once an innovation or regime change occurs, its effect should be to change mortality rate (perhaps over a period of few days rather than just one) to a new stable level.

Therefore, it makes sense to compute mortality rate in terms of daily data, and then deal with the challenge of temporal variations in this rate. Our goal is to identify a definition and computational method that will minimize the noise or variation in mortality rate - that is, yield a stable metric as much as possible.

Mortality Rate with Recursive Least Squares

We’ll use quantreg and RLS packages for testing. Analysis with the smoothed incremental (i.e., daily) cases and deaths. Restricted to a few states for now. We’ll eliminate data for (date, state) where cases = 0 or NA.

date state cases deaths smooth.incr.cases smooth.incr.deaths lead1.incr.deaths lead5.incr.deaths lead6.incr.deaths lead9.incr.deaths
2020-03-27 Arizona 665 15 94 3.0 3.0 3.7 5.0 9.7
2020-03-28 Arizona 773 15 103 3.0 3.3 5.0 5.7 8.7
2020-03-29 Arizona 929 18 116 3.3 1.7 5.7 8.0 8.0
2020-03-30 Arizona 1169 20 140 1.7 3.0 8.0 9.7 5.3
2020-03-31 Arizona 1298 24 149 3.0 3.7 9.7 8.7 7.3
2020-04-01 Arizona 1413 29 151 3.7 5.0 8.7 8.0 6.7
2020-04-02 Arizona 1600 35 156 5.0 5.7 8.0 5.3 10.7
2020-04-03 Arizona 1769 41 166 5.7 8.0 5.3 7.3 9.3
2020-04-04 Arizona 2019 53 182 8.0 9.7 7.3 6.7 8.3
2020-04-05 Arizona 2269 64 183 9.7 8.7 6.7 10.7 6.3
2020-04-06 Arizona 2465 67 194 8.7 8.0 10.7 9.3 8.3
2020-04-07 Arizona 2575 77 194 8.0 5.3 9.3 8.3 10.0
2020-04-08 Arizona 2726 80 188 5.3 7.3 8.3 6.3 14.3
2020-04-09 Arizona 3018 89 208 7.3 6.7 6.3 8.3 13.0
2020-04-10 Arizona 3112 97 182 6.7 10.7 8.3 10.0 12.0

The data set df.RLS has 9 states (Arizona, California, Florida, Maryland, Michigan, Minnesota, New York, Texas, Washington). For deaths, we recognize that deaths on date t should really be attributed to the cases on date t-k, i.e., cases are a leading indicator of deaths.

Let’s start by computing a naive mortality rate, defined simply as 100*deaths/cases for each state.

The notable thing here is how unstable the mortality rate is over time – in other words if you were to measure it any particular day, the picture you get is quite far from the truth or what you might see another day.

Recursive Least Squares - Testing for One State

Recursive – Multi state

Log-Log RLS

Now let’s run a log-log recursive least squares regression. Let y represent “daily change in daily smoothed-lagged deaths”, and x represent “daily change in smoothed cases”. We believe the relationship between them is y = b*x where b = mortality rate. Taking log on both sides, we have log(y) = log(b) + log(x)

We’ll run the RLS on

log(y) ~ int + slope * log(x)

so that the “int” = log(b), hence b = exp(x). Here’s what we get.

1-day gap from cases to deaths

5-day gap from cases to deaths

6-day gap from cases to deaths

9-day gap from cases to deaths