Evaluation of the Model

Model Asumptions:
Normality
lock2 %>% 
  ggplot(aes(sample =  pop_density)) +
           stat_qq() +
           stat_qq_line() +
           theme_classic()

The continuous explanatory variable(pop_density) looks decidedly non-normal. Let’s try taking its log and see if that improves the appearance.

lock2 %>% 
  ggplot(aes(sample =  l_pop_density)) +
           stat_qq() +
           stat_qq_line() +
           theme_classic()

Well, that looks better.

Run the model again with logs:

m2 <- lm(coefs ~ 0 + l_pop_density + school_close +
           mask_pub_require + grp_ban + travel_restrict +
           daycare_close + bar_rest_close + retail_close , data = lock2)
summary(m1) 
## 
## Call:
## lm(formula = coefs ~ 0 + pop_density + school_close + mask_pub_require + 
##     grp_ban + travel_restrict + daycare_close + bar_rest_close + 
##     retail_close, data = lock2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25995 -0.09416 -0.00639  0.03246  1.02656 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)   
## pop_density           0.0003293  0.0001171   2.811  0.00768 **
## school_closeno_order  0.0133458  0.2376669   0.056  0.95551   
## school_closeopen      0.0622903  0.2611046   0.239  0.81269   
## school_closepartial  -0.0663001  0.2576594  -0.257  0.79829   
## mask_pub_require1    -0.0106164  0.0730868  -0.145  0.88526   
## grp_ban>50           -0.0424904  0.1060837  -0.401  0.69095   
## grp_banall            0.0905460  0.0722371   1.253  0.21750   
## travel_restrict1     -0.0794097  0.0653034  -1.216  0.23129   
## daycare_close1        0.0301930  0.1088303   0.277  0.78291   
## bar_rest_close1       0.0371167  0.2723330   0.136  0.89229   
## retail_close1         0.0102528  0.1522711   0.067  0.94666   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1942 on 39 degrees of freedom
## Multiple R-squared:  0.4831, Adjusted R-squared:  0.3373 
## F-statistic: 3.313 on 11 and 39 DF,  p-value: 0.002745

Compare AIC’s

AIC(m1, m2)
##    df        AIC
## m1 12 -10.405246
## m2 12  -9.394987

M1 comes out a little better, so we’ll go with that.

Heterogeneity

lock2 %>%
  mutate(residuals = resid(m1)) %>% 
  ggplot(aes(x = residuals)) +
    geom_histogram(binwidth = .08) +
    theme_classic()

The residuals appear to be roughly normally distributed so I will call this a homogenous model. The outliers will be addressed later.

lock2 %>% 
  ggplot(aes(x = fitted(m1), y = resid(m1))) +
           geom_point() +
           theme_classic() +
  geom_hline(yintercept = 0)

The spread of the residuals looks relatively even.

Independence

lock2 %>% 
  ggplot(aes(x = coefs, y = residuals(m1))) +
  geom_point() +
  theme_classic() +
  geom_hline(yintercept = 0)

There doesn’t appear to be any clear pattern, so there isn’t a violation of independence.

You probably noticed the outliers (New York and New Jersey). Here’s a more formal showing of the Cook’s distances of the model (32 is New York, 39 is New Jersey).

gg_cooksd(m1)

Our model appears to be a good linear model that doesn’t greatly violate the assumptions of a linear model. Leaving the outliers in I think is important here since they are important players in the story. Removing them would more likely result in an overfitted model. Adjusted R2 explains 34% of variation in the outcome. So there are other potential variables to measure which may produce a more accurate model. Population density was the only variable showing statistical significance as an explanatory variable. It would be interesting to see if variation in state requirements for vaccination and for universal testing had any influence on the outcome.

I chose deaths rather than cases as the outcome variable because the daily case count(number of people testing positive) includes people who may have been infected even months in the past, people who will not have symptoms or get sick, and false positives. The death count is a strictly binomial event and doesn’t include the problems encountered using cases as the outcome.

This model would tend to cast some doubt on the efficacy of mass lockdown measures to influence the outcome of Covid-19 infections.