Analyze state lawsuits

Load packages
Import data
Preprep data for regression
Explore data
- lawsuits
- Government responses to COVID
Regression

Load packages

Import data

Lawsuits data

State lawsuits data came from HAK. StateCases.csv was prepared in cleanUp.Rmd.

Excess deaths by the Economist

It was imported from the sars2pack package.

State Government responses to COVID

It was imported from the sars2pack package.

Preprep data for regression

Removed policy_binary datasets because some policies are enacted almost all states and, when it is viwed by individual type of lawsuits, it is nearly a perfect predictor (I mean not really). For example, NOS of business closure all but one state had Gather.
Removed Habeas-related NOS. They are prisoners sueing to get out of prison and irrelevant to this study.

policy_mandate dataset is from the University of Washington and has

17 different policies
50 states and DC
Not all states have all 17 policies with a mandate. Alask has the most with 14, while South Dakota have the least with 4.
Some policies are a mandate (499) and others are a recommendation (95)

policy_mandate dataset is from the University of Washington and has

17 columns: 16 policty binary variables plus state
only 16 policies (there are a total of 17 policies) because one policy got dropped as it is not mandated in any state
51 rows: 50 states plus DC

COVID_deaths_us dataset is from the Economist and has:

53 rows: 50 states, DC, New York City, and the United States
2 columns: region (states and others) and COVID_deaths
COVID_deaths was calculated by (total_deaths - expected_deaths)/10000

state_stats dataset has:

51 rows: 50 states plus DC
24 columns: 21 variables of characteristics plus state, abbr, fips
Removed abbr and fips. They would cause problems in regression as they are perfectly correlated with state. Remove pop2000 as well as it is also correlated with pop2010.
Missing values are imputed using the KNN method. Five variables (murder, robberty, agg_assault, larcency, and motor_theft) all had 3 missing values for the same states (Kansas, Kentucky, and Montana).
After preprocessing, state_stats has only 17 variables: 16 characteristics variables plus state.

data_NOS dataset has:

51 rows and 27 columns after removing Habeas-related NOS
Original dataset had 53 rows: 50 states plus DC plus US Virgin Islands and NA
Values in NOS variables represent # of lawsuits

NOS_stateFacts after merging has:

51 rows: 50 states and DC
44 columns: 16 variables on state characteristics from state_stats, 27 NOS variables from data_NOS, and def_name.

Explore data

lawsuits

Outliers are present in some NOS: 1) None; Habeas/Con, 2) Habeas/Confinem

Removing Habeas-related NOS reveals:

Some NOS have the postively skewed distribution with a long tail to the right: e.g., Business Closure, Voting; Civil Rights.
Unlawful Termination has a bimodal distribution.
Some NOS have a normal distribution.
Not all NOS has a distribution displayed because they have too few observations: e.g., one suit

Scatterplot for lawsuits and excess deaths after controlling for population

Government responses to COVID

There are 17 different stat policies.
Not every state enacted all 17 government policies. For example, Alaska tops the list with 16 policies. New York has 12.

All 50 states and District of Columbia mandated two restrictions. See the top of the plot (e.g., RestaurantRestrict and EmergDec).
Only a few states mandated travel-related restrictions.
For documentation of the state policies, click https://github.com/COVID19StatePolicy/SocialDistancing/

Are any particular state COVID policies associated with any individual type of lawsuits?

Points in the plot represent 50 states plus D.C.
The plot is generated from a data set that represent 50 states plus D.C., 18 different state policy types, and 25 different NOS.
The plot shows that there are no states that mandated EmergDec and RetaurantRestrict.
The most interesting case is CaseIsolation (Case-based isolation orders). There is a significant divide between states that mandated the restriction and those that didn’t. The typical state that mandated the restriction had only about 2 Businessclosur lawsuits, while those that didn’t have nearly 10 BusinessClosur lawsuits.

In the case of Voting;Civil Ri, no state policy stands out as an important factor in determining number of lawsuits.

In the case of Other;Civil Rig, no state policy stands out as an important factor in determining number of lawsuits.

Regression

Aggregate data for regression

Correlation

Highly correlated predictors shouldn’t be included together in the same regression. Pay particular attention to COVID_deaths as it appears to be the most powerful predictor.

Correlation Zooming on crime variables

Not surprisingly, crime variables are highly correlated among themseleves.
They are also correlated with tr_deaths_no_alc, one of our predictors. It means we should remove tr_deaths_no_alc if we want to include a crime variable in the regression model.
Crime variables tend to show negative association with lawsuits. However, no crime variable shows stronger association with lawsuits than tr_deaths_no_alc. So replacing the traffic deaths with a crime variable is not expected to increase the model preformance.

Regression modle by all lawsuits

# run regression
model_linear <- lm(suits_perState ~ COVID_deaths+n_policies, data = regData)

# examine results
glance(model_linear)

## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.451         0.428  40.8      19.7 5.58e-7     2  -260.  528.  536.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

tidy(model_linear)

## # A tibble: 3 x 5
##   term         estimate std.error statistic     p.value
##   <chr>           <dbl>     <dbl>     <dbl>       <dbl>
## 1 (Intercept)    -53.9      50.8     -1.06  0.294      
## 2 COVID_deaths    21.7       3.48     6.23  0.000000111
## 3 n_policies       3.98      4.39     0.906 0.370

This linear model has the best fit. I tried with many other combinations of variables and log tranformation, but didn’t generate better fit.
The linear regression model was built to explain number of COVID-19 related lawsuits by 50 states plus DC. In other words, the data has only 51 rows.
Only two predictors seem significant: COVID_deaths is highly significant at 1%, number of state government COVID policies only at 10%.
COVID_death is significant and has the expected sign. COVID-19 lawsuits against the state increases by an average of 6.7 cases per 10,000 COVID deaths in the state.
Number of state government policies is not as significant. However, it is significant at 10%, which is respectable considering the small sample size of 51 observations. COVID-19 lawsuits increases by an average of 1.6 cases per policy by the state.
The linear model explains 63.8% of the variations in number of lawsuits by states.

Classification model by all lawsuits

Classificaiton model is not conducted because there isn’t a single state that didn’t have a COVID-related lawsuit against them. Classification model requires a dependent variable that has binary outcomes.

regression lawsuits by NOS

Major findings from the preliminary linear regressions

In some NOS, the model doesn’t explain lawsuits at all. This is not surprising because these NOS only have a few cases.
Only in two NOS, 1) Other;Labor & E and 2) Business Closur, these variables explained more than 50% of the variations in number of state lawsuits.
COVID_deaths, (total deaths - expected deaths)/10,000, is significant at 10% in most NOS and has the positive sign. Business closure lawsuits against the state increased by nearly 2 cases per 10,000 COVID deaths.
Interestingly, COVID_deaths isn’t significant for two NOS: 1) Insurance, and 2) Price gouging; C.
Instead, Insurance and Price gourging; C were explained by number of state government COVID responses. It is significant at 10% and has the expected sign. Insurance lawsuits against the state increased by 1.7 cases per 10 state government restrictions.