Load packages
Import data
Lawsuits data
State lawsuits data came from HAK. StateCases.csv was prepared in cleanUp.Rmd.
Excess deaths by the Economist
It was imported from the sars2pack
package.
State Government responses to COVID
It was imported from the sars2pack
package.
Preprep data for regression
- Removed policy_binary datasets because some policies are enacted almost all states and, when it is viwed by individual type of lawsuits, it is nearly a perfect predictor (I mean not really). For example, NOS of business closure all but one state had Gather.
- Removed Habeas-related NOS. They are prisoners sueing to get out of prison and irrelevant to this study.
policy_mandate
dataset is from the University of Washington and has
- 17 different policies
- 50 states and DC
- Not all states have all 17 policies with a mandate. Alask has the most with 14, while South Dakota have the least with 4.
- Some policies are a mandate (499) and others are a recommendation (95)
policy_mandate
dataset is from the University of Washington and has
- 17 columns: 16 policty binary variables plus state
- only 16 policies (there are a total of 17 policies) because one policy got dropped as it is not mandated in any state
- 51 rows: 50 states plus DC
COVID_deaths_us
dataset is from the Economist and has:
- 53 rows: 50 states, DC, New York City, and the United States
- 2 columns: region (states and others) and COVID_deaths
- COVID_deaths was calculated by (total_deaths - expected_deaths)/10000
state_stats
dataset has:
- 51 rows: 50 states plus DC
- 24 columns: 21 variables of characteristics plus state, abbr, fips
- Removed abbr and fips. They would cause problems in regression as they are perfectly correlated with state. Remove pop2000 as well as it is also correlated with pop2010.
- Missing values are imputed using the KNN method. Five variables (murder, robberty, agg_assault, larcency, and motor_theft) all had 3 missing values for the same states (Kansas, Kentucky, and Montana).
- After preprocessing,
state_stats
has only 17 variables: 16 characteristics variables plus state.
data_NOS
dataset has:
- 51 rows and 27 columns after removing Habeas-related NOS
- Original dataset had 53 rows: 50 states plus DC plus US Virgin Islands and NA
- Values in NOS variables represent # of lawsuits
NOS_stateFacts
after merging has:
- 51 rows: 50 states and DC
- 44 columns: 16 variables on state characteristics from
state_stats
, 27 NOS variables from data_NOS
, and def_name.
Explore data
lawsuits

- Outliers are present in some NOS: 1) None; Habeas/Con, 2) Habeas/Confinem

Removing Habeas-related NOS reveals:
- Some NOS have the postively skewed distribution with a long tail to the right: e.g., Business Closure, Voting; Civil Rights.
- Unlawful Termination has a bimodal distribution.
- Some NOS have a normal distribution.
- Not all NOS has a distribution displayed because they have too few observations: e.g., one suit




Scatterplot for lawsuits and excess deaths after controlling for population 
Government responses to COVID
- There are 17 different stat policies.
- Not every state enacted all 17 government policies. For example, Alaska tops the list with 16 policies. New York has 12.




- All 50 states and District of Columbia mandated two restrictions. See the top of the plot (e.g., RestaurantRestrict and EmergDec).
- Only a few states mandated travel-related restrictions.
- For documentation of the state policies, click https://github.com/COVID19StatePolicy/SocialDistancing/
Are any particular state COVID policies associated with any individual type of lawsuits?

- Points in the plot represent 50 states plus D.C.
- The plot is generated from a data set that represent 50 states plus D.C., 18 different state policy types, and 25 different NOS.
- The plot shows that there are no states that mandated EmergDec and RetaurantRestrict.
- The most interesting case is CaseIsolation (Case-based isolation orders). There is a significant divide between states that mandated the restriction and those that didn’t. The typical state that mandated the restriction had only about 2 Businessclosur lawsuits, while those that didn’t have nearly 10 BusinessClosur lawsuits.

- In the case of Voting;Civil Ri, no state policy stands out as an important factor in determining number of lawsuits.

- In the case of Other;Civil Rig, no state policy stands out as an important factor in determining number of lawsuits.
Regression
Aggregate data for regression
Correlation
Highly correlated predictors shouldn’t be included together in the same regression. Pay particular attention to COVID_deaths as it appears to be the most powerful predictor.

Correlation Zooming on crime variables 
- Not surprisingly, crime variables are highly correlated among themseleves.
- They are also correlated with tr_deaths_no_alc, one of our predictors. It means we should remove tr_deaths_no_alc if we want to include a crime variable in the regression model.
- Crime variables tend to show negative association with lawsuits. However, no crime variable shows stronger association with lawsuits than tr_deaths_no_alc. So replacing the traffic deaths with a crime variable is not expected to increase the model preformance.
Regression modle by all lawsuits
# run regression
model_linear <- lm(suits_perState ~ COVID_deaths+n_policies, data = regData)
# examine results
glance(model_linear)
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.451 0.428 40.8 19.7 5.58e-7 2 -260. 528. 536.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
tidy(model_linear)
## # A tibble: 3 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -53.9 50.8 -1.06 0.294
## 2 COVID_deaths 21.7 3.48 6.23 0.000000111
## 3 n_policies 3.98 4.39 0.906 0.370
- This linear model has the best fit. I tried with many other combinations of variables and log tranformation, but didn’t generate better fit.
- The linear regression model was built to explain number of COVID-19 related lawsuits by 50 states plus DC. In other words, the data has only 51 rows.
- Only two predictors seem significant: COVID_deaths is highly significant at 1%, number of state government COVID policies only at 10%.
- COVID_death is significant and has the expected sign. COVID-19 lawsuits against the state increases by an average of 6.7 cases per 10,000 COVID deaths in the state.
- Number of state government policies is not as significant. However, it is significant at 10%, which is respectable considering the small sample size of 51 observations. COVID-19 lawsuits increases by an average of 1.6 cases per policy by the state.
- The linear model explains 63.8% of the variations in number of lawsuits by states.
Classification model by all lawsuits
Classificaiton model is not conducted because there isn’t a single state that didn’t have a COVID-related lawsuit against them. Classification model requires a dependent variable that has binary outcomes.
regression lawsuits by NOS


Major findings from the preliminary linear regressions
- In some NOS, the model doesn’t explain lawsuits at all. This is not surprising because these NOS only have a few cases.
- Only in two NOS, 1) Other;Labor & E and 2) Business Closur, these variables explained more than 50% of the variations in number of state lawsuits.
- COVID_deaths, (total deaths - expected deaths)/10,000, is significant at 10% in most NOS and has the positive sign. Business closure lawsuits against the state increased by nearly 2 cases per 10,000 COVID deaths.
- Interestingly, COVID_deaths isn’t significant for two NOS: 1) Insurance, and 2) Price gouging; C.
- Instead, Insurance and Price gourging; C were explained by number of state government COVID responses. It is significant at 10% and has the expected sign. Insurance lawsuits against the state increased by 1.7 cases per 10 state government restrictions.