library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
Voterdata <- read.csv("C:/Users/12055/Documents/3_24AbbreviatedVoterDatasetLabeled.csv")

Data and Research Question

I will be analyzing data from the provided US voter data set. I hypothesize that 2016 voters who placed a higher level of importance on religion were more likely to vote for Donald Trump than Hillary Clinton. If there is proof of a relationship between stated religious importance (independent variable) and votes for Donald Trump over Hillary Clinton(dependent variable), we will better understand one factor that played a part in the unexpected 2016 election results.

voterdata <- Voterdata%>%
  select(ReligiousImportance, Vote2016)%>%
  filter(ReligiousImportance  %in%c("Not at all Important", "Not too Important", "Somewhat Important", "Very Important"), Vote2016  %in%c("Hillary Clinton","Donald Trump"))

Below is a preview of the data.

head(voterdata)
##    ReligiousImportance        Vote2016
## 1   Somewhat Important Hillary Clinton
## 2       Very Important    Donald Trump
## 3 Not at all Important Hillary Clinton
## 4       Very Important    Donald Trump
## 5 Not at all Important Hillary Clinton
## 6       Very Important Hillary Clinton

Relationship of Interest

The crosstab supports my hypothesis. Amongst voters who responded that religion was very important to them, about 61% voted for Donald Trump compared to 39% who voted for Hillary Clinton. By contrast, of voters who responded that religion was not at all important to them, 25% voted for Donald Trump, compared to 75% who voted for Hillary Clinton.

table(voterdata$ReligiousImportance, voterdata$Vote2016)%>%
prop.table(1)%>%
round(2)
##                       
##                        Donald Trump Hillary Clinton
##   Not at all Important         0.25            0.75
##   Not too Important            0.41            0.59
##   Somewhat Important           0.54            0.46
##   Very Important               0.61            0.39

Expected Values

The null hypothesis is that there is no relationship between stated religious importance and choice of candidate in 2016. This would indicate that the two variables are independent of each other.

chisq.test(voterdata$ReligiousImportance, voterdata$Vote2016 )[7]
## $expected
##                              voterdata$Vote2016
## voterdata$ReligiousImportance Donald Trump Hillary Clinton
##          Not at all Important     641.1257        653.8743
##          Not too Important        512.9006        523.0994
##          Somewhat Important       915.8939        934.1061
##          Very Important          1400.0799       1427.9201

Observed Values

When we compare the expected values and the observed values between stated religious importance and choice of candidate, we see a marked difference, suggesting that the two variables are not independent of each other. The following can be remarked:

chisq.test(voterdata$ReligiousImportance, voterdata$Vote2016 )[6]
## $observed
##                              voterdata$Vote2016
## voterdata$ReligiousImportance Donald Trump Hillary Clinton
##          Not at all Important          318             977
##          Not too Important             427             609
##          Somewhat Important            997             853
##          Very Important               1728            1100

Chi-squared test for independence

Based on the results of the chi-square test, there is a significant relationship between stated religious importance and choice of candidate in 2016. The p-value is 2.2e-16, lower than the predetermined significance level of p<0.05. I reject the null hypothesis.

chisq.test(voterdata$ReligiousImportance, voterdata$Vote2016)
## 
##  Pearson's Chi-squared test
## 
## data:  voterdata$ReligiousImportance and voterdata$Vote2016
## X-squared = 517.36, df = 3, p-value < 2.2e-16