library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
Voterdata <- read.csv("C:/Users/12055/Documents/3_24AbbreviatedVoterDatasetLabeled.csv")
I will be analyzing data from the provided US voter data set. I hypothesize that 2016 voters who placed a higher level of importance on religion were more likely to vote for Donald Trump than Hillary Clinton. If there is proof of a relationship between stated religious importance (independent variable) and votes for Donald Trump over Hillary Clinton(dependent variable), we will better understand one factor that played a part in the unexpected 2016 election results.
voterdata <- Voterdata%>%
select(ReligiousImportance, Vote2016)%>%
filter(ReligiousImportance %in%c("Not at all Important", "Not too Important", "Somewhat Important", "Very Important"), Vote2016 %in%c("Hillary Clinton","Donald Trump"))
head(voterdata)
## ReligiousImportance Vote2016
## 1 Somewhat Important Hillary Clinton
## 2 Very Important Donald Trump
## 3 Not at all Important Hillary Clinton
## 4 Very Important Donald Trump
## 5 Not at all Important Hillary Clinton
## 6 Very Important Hillary Clinton
The crosstab supports my hypothesis. Amongst voters who responded that religion was very important to them, about 61% voted for Donald Trump compared to 39% who voted for Hillary Clinton. By contrast, of voters who responded that religion was not at all important to them, 25% voted for Donald Trump, compared to 75% who voted for Hillary Clinton.
table(voterdata$ReligiousImportance, voterdata$Vote2016)%>%
prop.table(1)%>%
round(2)
##
## Donald Trump Hillary Clinton
## Not at all Important 0.25 0.75
## Not too Important 0.41 0.59
## Somewhat Important 0.54 0.46
## Very Important 0.61 0.39
The null hypothesis is that there is no relationship between stated religious importance and choice of candidate in 2016. This would indicate that the two variables are independent of each other.
chisq.test(voterdata$ReligiousImportance, voterdata$Vote2016 )[7]
## $expected
## voterdata$Vote2016
## voterdata$ReligiousImportance Donald Trump Hillary Clinton
## Not at all Important 641.1257 653.8743
## Not too Important 512.9006 523.0994
## Somewhat Important 915.8939 934.1061
## Very Important 1400.0799 1427.9201
When we compare the expected values and the observed values between stated religious importance and choice of candidate, we see a marked difference, suggesting that the two variables are not independent of each other. The following can be remarked:
The expected value that represents the number of voters who responded that religion was very important to them and voted for Donald Trump is 1400.0799. The higher observed value is 1,728.
The expected value that represents the number of voters who responded that religion was not at all important to them and voted for Donald Trump is 641.1257. The lower observed value is 318.
chisq.test(voterdata$ReligiousImportance, voterdata$Vote2016 )[6]
## $observed
## voterdata$Vote2016
## voterdata$ReligiousImportance Donald Trump Hillary Clinton
## Not at all Important 318 977
## Not too Important 427 609
## Somewhat Important 997 853
## Very Important 1728 1100
Based on the results of the chi-square test, there is a significant relationship between stated religious importance and choice of candidate in 2016. The p-value is 2.2e-16, lower than the predetermined significance level of p<0.05. I reject the null hypothesis.
chisq.test(voterdata$ReligiousImportance, voterdata$Vote2016)
##
## Pearson's Chi-squared test
##
## data: voterdata$ReligiousImportance and voterdata$Vote2016
## X-squared = 517.36, df = 3, p-value < 2.2e-16