Introduction.

Each state or territory government in the USA has considerable power in the policies put in place on how to best deal with the Covid Pandemic crisis and the health measures put in place in each state. Throughout the covid-19 pandemic, the two political parties have often approached managing the pandemic differently and implemented varying degrees of severity of public health orders, ie length and severity of lockdowns. Therefore, it is to be investigated whether it can be said if one parties policies had more of an effect of lowering their states mean covid mortality rate over the other party. This is done by finding the number of covid deaths per 1000 people for each state then calculating the mean deaths across all the states in the country. This death rate will be then grouped by whether it is a republican or democrat state and then a general mean death rate per 1000 population will be compared for each of the two major parties.

map of political affiliation of state executives

Problem Statement

This report will investigate whether the government elected in each state and subsequently the general differences in the public health responses between the two political parties had any effect on the death rate of covid per 1000 people.

The mean death rate from the start of the pandemic (22/01/2020) to 10/10/2021 for each of the two political parties across america will be calculated
An initial inspection using side by side boxplots will be done to see if there is any visible difference between red and blue states as well as to see if there is any skewness
Hypothesis testing will then be performed using a two-sample t-test to compare the mean death count of democrat lead states against republican lead ones.

Covid-19 deaths per state

Data

Population data was collected from the US census bureau 2019 population estimates
Covid-19 death count data was collected from usafacts.org

Data is imported: (population + covid-19 death count)

covid_deaths_usafacts <- read_csv("covid_deaths_usafacts.csv")[c(3,632)]
censusdata <- read_excel("censusdata.xlsx",skip = 3)[6:56,c(1,13)]

Death count per state was calculated

covid_death <- covid_deaths_usafacts%>% group_by(State) %>% summarise(deaths = sum(`2021-10-10`))

State variable was converted to corresponding abbreviations & datasets are joined by state

states <- probes <- append(state.abb, "DC", after=8)
censusdata$...1 <- censusdata$...1 %>% factor(labels  = states)
data <- censusdata %>% left_join(covid_death,by =c( '...1'="State"))
colnames(data) <- c('State',"population (2019)","No. of covid deaths\n(22/01/2020-10/10/2021)")

Data Cont.

Covid-19 deaths per 100,000 people for each state was calculated:

data <- data %>% mutate('Deaths per 100,000 people' = (data$`No. of covid deaths
(22/01/2020-10/10/2021)`/data$`population (2019)`)*100000)

Political affiliation of state executive variable was generated:

dem <- c('CA','CO','CT','DE','DC','HI','IL','KS','LA','ME','MI','MN','NV','NJ','NM','NY','NC','OR','PA','RI','VA','WA',"WI","KY")
i <- 1
for (row in data$State){
  if (row %in% dem){
  data$`Political affiliation of state executive`[i] <- 'Democratic Party'
  } else{
    ifelse(row != 'MT', data$`Political affiliation of state executive`[i] <- 'Republican Party',i<i-1)

  }
  i <- i+1
}

NOTE: Montana was removed from the dataset as it had a democrat and republican state executive during the data interval

Data Summary

Varibales:

State: state abbreviation
Population (2019): State populations using 2019 census estimates
No. of covid deaths: State total deaths caused by covid from 22/01/2020 to 10/10/2021
Deaths per 100,000 people: Sate covid-19 moratility per 100,000 in given period

Political affiliation of state executive: Indicator showing whether the state is run by a democrat or republican

State	population (2019)	No. of covid deaths (22/01/2020-10/10/2021)	Deaths per 100,000 people	Political affiliation of state executive
AL	4903185	14756	300.9472	Republican Party
AK	731545	566	77.3705	Republican Party
AZ	7278717	20319	279.1563	Republican Party
AR	3017804	7810	258.7975	Republican Party

Boxplot visualisation

Boxplot visualization shows:
- One outlier was detected in red states data
- Both red and blue states data appears approximately normal (needs further investigation)

pal <- c('#457b9d','#e63946')
plot <- ggplot(data = data,aes(x=`Deaths per 100,000 people`,y=`Political affiliation of state executive`,fill=`Political affiliation of state executive`))+
  geom_boxplot(show.legend  = FALSE)+coord_flip()+labs(title = "Covid deaths in democrat and rebpublican states")+theme(
    panel.background = element_rect(fill ='#f1faee'),plot.background =element_rect(fill ='#f1faee') )+
  scale_fill_manual(values = pal)
plot

Initial inspection shows there is a slight difference in median deaths per 1000 people in red vs blue states

Outlier removal.

Outlier present in republican state data was removed (<5% of data):

data_rep <- data %>% subset(`Political affiliation of state executive` == "Republican Party")
is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
loc <- which(is_outlier(data_rep$`Deaths per 100,000 people`))
out_state <- data_rep[loc,"State"] %>% as.character()
data <- data %>% filter(data$State !=out_state)

Descriptive Statistics

check for missing and special values:

na <- which(is.na(data))
nan <- sapply(data,is.nan)
inf <- sapply(data,is.infinite)
which(inf | nan | na )

## integer(0)

No missing/special values found

Descriptive statistics:

data %>% group_by(`Political affiliation of state executive`) %>% summarise(Min = min(data$`Deaths per 100,000 people`,na.rm = TRUE),Q1 = quantile(`Deaths per 100,000 people`,probs = .25,na.rm = TRUE),
                                           Median = median(`Deaths per 100,000 people`, na.rm = TRUE),
                                           Q3 = quantile(`Deaths per 100,000 people`,probs = .75,na.rm = TRUE),
                                           Max = max(`Deaths per 100,000 people`,na.rm = TRUE),
                                           Mean = mean(`Deaths per 100,000 people`, na.rm = TRUE),
                                           SD = sd(`Deaths per 100,000 people`, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(`Deaths per 100,000 people`))) -> table1
kbl(table1) %>% kable_classic( html_font = "Timesnewroman") %>% kable_styling(font_size = 16)

Political affiliation of state executive	Min	Q1	Median	Q3	Max	Mean	SD	n	Missing
Democratic Party	58.33861	151.6586	202.8300	234.3376	304.8748	191.6956	68.15866	24	0
Republican Party	58.33861	182.5347	221.0928	254.5051	329.6542	213.8196	60.80173	26	0

Hypothesis Testing

Two sample t-test is used to determine if there is a significantly different mean between the no. of covid-19 deaths in republican and democrat states.
Statistical hypotheses: \[H_0: \mu_1 - \mu_2 =0 \] \[H_A: \mu_1 - \mu_2 \ne 0\] \(\mu_1\) : mean of total covid-19 deaths in republican states

\(\mu_2\) : mean of total covid-19 deaths in democrat states

Assumptions:

Both populations are independent
Both populations are normally distributed (see slide 11 & 12)
Equal variance (see slide 13)

Normaility testing

data_rep <- data %>% subset(`Political affiliation of state executive` == "Republican Party")
plot1 <- ggplot(data = data_rep ,mapping =  aes(sample=`Deaths per 100,000 people`))+stat_qq_line()+ stat_qq_point()+stat_qq_band(alpha=0.5,distribution = "norm",fill ='#e63946') +
           theme(panel.background = element_rect(fill ='#f1faee')  
                 ,plot.background =element_rect(fill ='#f1faee'))+ 
           theme_light()+ labs(title = "Q-Q plot for republican data")+xlab( "Deaths per 100,000 people")
plot1

Non-normality is generally characterised by a defined s-shape , as there is only 2 points (with no continuing trend) that fall outside the 95% CI for the normal quantiles , it can be said the data only has a very minor departure from normality towards the lower tail of the distribution. A two sample t-test can still be performed as they are generally robust against minor departures from normality and will tend to maintain the desired significance level (e.g. 0.05) even if normality is not strictly met.

Normaility testing cont.

data_dem <- data %>% subset(`Political affiliation of state executive` == "Democratic Party")
plot2 <- ggplot(data = data_dem ,mapping =  aes(sample=`Deaths per 100,000 people`))+stat_qq_line()+ stat_qq_point()+stat_qq_band(alpha=0.5,distribution = "norm",fill ='#457b9d') +
           theme(panel.background = element_rect(fill ='#f1faee')  
                 ,plot.background =element_rect(fill ='#f1faee'))+ 
           theme_light()+ labs(title = "Q-Q plot for democrat data")+xlab( "Deaths per 100,000 people")
plot2

As seen in the Q-Q plot above all the points of the democrat data is within the 95% confidence interval of the normal distribution , meaning normality can be assumed and a two-sample t-test can be performed on the datasets.

Homogeneity of Variance

Assumption of equal variance is tested using the levene’s test \[H_0: \sigma_1^2 - \sigma_2^2 =0 \] \[H_A: \sigma_1^2 - \sigma_2^2 \ne 0\]

tab2 <- leveneTest(data$`Deaths per 100,000 people`~data$`Political affiliation of state executive`)
kbl(tab2) %>% kable_classic( html_font = "Timesnewroman") %>% kable_styling(font_size = 16)

	Df	F value	Pr(>F)
group	1	0.6740034	0.4157182
	48	NA	NA

The p-value for the Levene’s test of equal variance for no. covid deaths between red and blue states was 0.416. This values is greater than 0.05 , therefore, we fail to reject \(H_0\) and assume equal variance

Two-sample t-test - Assuming Equal Variance

t.test(data = data,`Deaths per 100,000 people`~`Political affiliation of state executive`,var.equal = TRUE,
       alternative = "two.sided")

## 
##  Two Sample t-test
## 
## data:  Deaths per 100,000 people by Political affiliation of state executive
## t = -1.213, df = 48, p-value = 0.2311
## alternative hypothesis: true difference in means between group Democratic Party and group Republican Party is not equal to 0
## 95 percent confidence interval:
##  -58.79523  14.54738
## sample estimates:
## mean in group Democratic Party mean in group Republican Party 
##                       191.6956                       213.8196

p > 0.05 , therefore we fail to reject \(H_0\).

There is no statistically significant difference between red and blue states covid-19 mortality means

Discussion - RESULTS

A two-sample test test was used to test for a significant difference between the mean covid mortality counts of red and blue states.
Through inspection via normal Q-Q plots the democrat data exhibited normality , whilst the republican data showed a very minor deviation , but due to the robust nature of t-tests normaility can be assumed.
Leven’s test of homogeneity of variances showed there is no statistically significant difference between the variance of each dataset.
The results of the t-test assuming equal variance found the difference in the mean values of the Democrat vs Republican data to NOT be statistically significant.
t(df=48)=−1.213, p=.2311, 95% CI for the difference in means [-58.80 14.547]

Overall the investigation suggests that political affiliation of a state’s representative has no effect on Ovid mortality in said state.

Discussion - Strengths and Limitaions

Strengths:

There is no sampling in the statistics and whole population data is used and is therefore no risk of sampling error
Covid data is very accurate and easy to come by

Limitations:

Relatively small data set
Correlation between variables is heavily effected by multiple factors:
- Health responses could have varied between individual democrat states and the same for republican states.
- Governor executive power varies by state
- Vaccine hesitancy

Discussion - future investigations

As a much larger proportion of democratic states were the intial port of entry for covid-19 in jan-march 2020 , investigating the difference in means on a month or year basis might be useful.
An investigation which takes into account if party in power in each state controls state houses as well as just the Party of the elected governor might also provide further insight.
An additional investigation of public health measures effectiveness such as mask rules or lockdown length depending on daily case numbers could also be implemented.

Conclusion

Despite there being a higher mean death count in republican states during initial inspection, there was a statistically insignificant difference between the mean deaths of covid per 1000 people in republican run states compared with states that are run by Democrat governors and therefore no insight into the effectiveness of any political parties ability to reduce the mortality rate of covid was gained.

References

Bureau, U. C. (n.d.-a). Population and Housing Unit Estimates Datasets. Retrieved October 13, 2021, from https://www.census.gov/programs-surveys/popest/data/data-sets.html
CDC COVID Data Tracker. (n.d.). Retrieved October 13, 2021, from https://covid.cdc.gov/covid-data-tracker/#cases_deathsinlast7days
List of current United States governors - Wikipedia. (n.d.). Retrieved October 13, 2021, from https://en.wikipedia.org/wiki/List_of_current_United_States_governors
US COVID-19 cases and deaths by state | USAFacts. (n.d.). Retrieved October 13, 2021, from https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/
Donkey vs Elephant: The Democratic And Republican Symbols. (n.d.). Retrieved October 13, 2021, from https://fabrikbrands.com/donkey-vs-elephant-meaning-of-the-democratic-and-republican-symbols/

Covid mortality in America

Mean covid moratality per 100,000 people of red and blue states across the USA