Introduction

Data

Some basic information about the data sets used.

Data on Covid Metrics

There are several metrics one could look at. The most basic is “cases” (or known positive results) - but we know this is an inccurate and conservative estimate of true “infections”.

How inaccurate? Depends on the amount of testing being done. As testing increases you might think that cases is approaching infections.

Note that cases(t) has a lot of noise in it because of reporting and aggregation errors. The cases reported on day t, could well represent a mix of positive results based on tests on day t-5 etc.

At the end of the funnel is deaths - and these are generally accurate, and time-attribution is also quite accurate.

For both cases and deaths, the cumulative at time t = sum(Daily increment) from time 0 to t.

In between, we have hospitalizations (which could be in.hospital each day or total over time – but note that unlike cases, total hospitalizations at time t is NOT equal to sum of in.hospital from 0 to t). Same thing with ICU and ventilator numbers.

date 2020-04-21 2020-04-22 2020-04-23 2020-04-24 2020-04-25 2020-04-26
state CA CA CA CA CA CA
cases 33261 35396 37369 39254 41137 42164
tests 300100 465327 482097 494173 506035 526084
in.hosp 4886 4984 4929 4880 4847 4928
in.ICU 1502 1551 1531 1521 1458 1473
on.Vent NA NA NA NA NA NA
deaths 1268 1354 1469 1562 1651 1710
incr.cases 2283 2135 1973 1885 1883 1027
incr.deaths 60 86 115 93 89 59

Note that this data is at the state level, and the display above is a transpose of the data structure.

Data on NPIs

Now let’s get data on some non-medical interventions. We’ll use the data put out by Google. There are 6 columns of data that represent “% change from baseline” of the amount of mobile phone activity measured in each of the 6 categories. So, each day compared to some baseline expectation.

65 66 67 68 69 70
state.name California California California California California California
date 2020-04-29 2020-04-30 2020-05-01 2020-05-02 2020-05-03 2020-05-04
county
retail.rec.chg -49 -47 -47 -50 -50 -44
grocery.pharm.chg -15 -12 -9 -8 -12 -10
parks.chg -27 -24 -23 -30 -21 -17
transit.st.chg -50 -50 -48 -43 -45 -48
workplaces.chg -50 -50 -48 -32 -32 -47
residential.chg 21 21 21 14 12 19
state CA CA CA CA CA CA

In working with the Google Mobility data (which provides changes in the metric relative to some baseline, each day, at the county level) we assumed that county="" represents data at the state level. The example above is a transpose of a subset of the data.

Combining Metrics and NPI data

153 154 NA NA.1 NA.2 NA.3
date 2020-08-03 2020-08-04 NA NA NA NA
state.x CA CA NA NA NA NA
cases 514901 519427 NA NA NA NA
tests 8184696 8305713 NA NA NA NA
in.hosp 7629 7630 NA NA NA NA
in.ICU 2069 2082 NA NA NA NA
on.Vent NA NA NA NA NA NA
deaths 9388 9501 NA NA NA NA
incr.cases 5739 4526 NA NA NA NA
incr.deaths 32 113 NA NA NA NA
state.name California California NA NA NA NA
county NA NA NA NA
retail.rec.chg -26 -27 NA NA NA NA
grocery.pharm.chg -7 -6 NA NA NA NA
parks.chg 15 19 NA NA NA NA
transit.st.chg -40 -40 NA NA NA NA
workplaces.chg -43 -43 NA NA NA NA
residential.chg 13 13 NA NA NA NA
state.y CA CA NA NA NA NA

Exploring the Data

Of the states that have hit more than 50000 cases, what is the picture regarding the level of lockdown (or activity reduction) they experienced? The states are:

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] NY   NJ   MA   IL   CA   PA   MI   TX   FL   MD    GA    VA    NC    AZ   
##      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
## [1,] LA    OH    TN    SC    IN    WA    WI    MN    MS    MO    NV    CT   
## 56 Levels: AK AL AR AS AZ CA CO CT DC DE FL GA GU HI IA ID IL IN KS KY ... WY

Changes in Transit Activity

Changes in Retail Activity

Changes in Workplace Activity