Introduction

Background

Some basic information about the data set used and the idea of mortality rate. Skip this section to go directly to RLS approach.

State-Level Data

As of today we have 4689 observations across unique 55 states, with 9 columns for each observation, ranging in date from 2020-01-21 to 2020-05-26.

A few sample observations below.

date	state	fips	cases	deaths	incr.cases	incr.deaths	smooth.incr.cases	smooth.incr.deaths
2020-04-03	New Hampshire	33	540	7	61	2	54	1.33
2020-03-09	Vermont	50	1	0	0	0	NA	NA
2020-05-02	Montana	30	454	16	2	0	1	0.00
2020-05-02	Hawaii	15	611	16	1	0	2	0.00
2020-03-18	Illinois	17	286	1	127	0	42	0.33
2020-05-24	Connecticut	9	40468	3693	446	18	392	37.00
2020-03-24	Minnesota	27	262	1	27	0	31	0.00
2020-03-24	Texas	48	857	11	129	4	115	2.00
2020-01-30	California	6	2	0	0	0	NA	0.00
2020-04-15	Virginia	51	6499	195	329	41	410	18.00
2020-05-06	Texas	48	35441	985	1155	30	1062	31.67
2020-02-19	Massachusetts	25	1	0	0	0	0	0.00
2020-03-19	Georgia	13	282	10	89	7	40	3.00
2020-04-21	Georgia	13	19189	810	742	43	768	46.67
2020-04-13	Guam	66	719	6	3	1	74	0.67

Defining Mortality Rate

In theory, mortality rate in any time period = (Cumulative Deaths/Cases) during that time, assuming that we’re measuring the metrics (cases and deaths) correctly during each time period, and we’re attributing the metric to the correct time period.

Cases are not really known

A naive approach for computing mortality rate is simply, on any given day, to compute (# deaths)/(# cases). Of course, these numbers will change every day – and in particular the ratio might change every day – so the answer will be a vector of numbers rather than a particular rate.

Notice that the mortality rate (visualized by state above) varies by state but is rather unstable over time. True, during a 2-month period one should expect some variation in mortality rate - based on levels of congestion in the health system, innovations in care, and other factors. However, such factors should cause a few discrete jumps - and a definition of mortality rate in terms of cumulative deaths and cases is simply not consistent with this expectation. Moreover, once an innovation or regime change occurs, its effect should be to change mortality rate (perhaps over a period of few days rather than just one) to a new stable level.

Therefore, it makes sense to compute mortality rate in terms of daily data, and then deal with the challenge of temporal variations in this rate. Our goal is to identify a definition and computational method that will minimize the noise or variation in mortality rate - that is, yield a stable metric as much as possible.

Mortality Rate with Recursive Least Squares

We’ll use quantreg and RLS packages for testing. Analysis with the smoothed incremental (i.e., daily) cases and deaths. Restricted to a few states for now. We’ll eliminate data for (date, state) where cases = 0 or NA.

date	state	cases	deaths	smooth.incr.cases	smooth.incr.deaths	lead1.incr.deaths	lead5.incr.deaths	lead6.incr.deaths	lead9.incr.deaths
2020-03-27	Arizona	665	15	94	3.0	3.0	3.7	5.0	9.7
2020-03-28	Arizona	773	15	103	3.0	3.3	5.0	5.7	8.7
2020-03-29	Arizona	929	18	116	3.3	1.7	5.7	8.0	8.0
2020-03-30	Arizona	1169	20	140	1.7	3.0	8.0	9.7	5.3
2020-03-31	Arizona	1298	24	149	3.0	3.7	9.7	8.7	7.3
2020-04-01	Arizona	1413	29	151	3.7	5.0	8.7	8.0	6.7
2020-04-02	Arizona	1600	35	156	5.0	5.7	8.0	5.3	10.7
2020-04-03	Arizona	1769	41	166	5.7	8.0	5.3	7.3	9.3
2020-04-04	Arizona	2019	53	182	8.0	9.7	7.3	6.7	8.3
2020-04-05	Arizona	2269	64	183	9.7	8.7	6.7	10.7	6.3
2020-04-06	Arizona	2465	67	194	8.7	8.0	10.7	9.3	8.3
2020-04-07	Arizona	2575	77	194	8.0	5.3	9.3	8.3	10.0
2020-04-08	Arizona	2726	80	188	5.3	7.3	8.3	6.3	14.3
2020-04-09	Arizona	3018	89	208	7.3	6.7	6.3	8.3	13.0
2020-04-10	Arizona	3112	97	182	6.7	10.7	8.3	10.0	12.0

The data set df.RLS has 9 states (Arizona, California, Florida, Maryland, Michigan, Minnesota, New York, Texas, Washington). For deaths, we recognize that deaths on date t should really be attributed to the cases on date t-k, i.e., cases are a leading indicator of deaths.

Let’s start by computing a naive mortality rate, defined simply as 100*deaths/cases for each state.

The notable thing here is how unstable the mortality rate is over time – in other words if you were to measure it any particular day, the picture you get is quite far from the truth or what you might see another day.

Recursive Least Squares - Testing for One State

Recursive – Multi state

Log-Log RLS

Now let’s run a log-log recursive least squares regression. Let y represent “daily change in daily smoothed-lagged deaths”, and x represent “daily change in smoothed cases”. We believe the relationship between them is y = b*x where b = mortality rate. Taking log on both sides, we have log(y) = log(b) + log(x)

We’ll run the RLS on

log(y) ~ int + slope * log(x)

so that the “int” = log(b), hence b = exp(x). Here’s what we get.

Covid19 Analysis using New York Times Data

HKB

5/24/2020

Introduction

Background

State-Level Data

Defining Mortality Rate

Mortality Rate with Recursive Least Squares

Recursive Least Squares - Testing for One State

Recursive – Multi state

Log-Log RLS

1-day gap from cases to deaths

5-day gap from cases to deaths

6-day gap from cases to deaths

9-day gap from cases to deaths