library(readr)
## Warning: package 'readr' was built under R version 3.6.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.3
voterdata <- read.csv("F:/Voter Data 2019.csv")
I will compare white and black voters found in the variable “race_2019” in the voter data on their responses on the approval or disapproval of Donald Trump as a President, which are found in the variable “trumpapp_2019” in the same voter data. For this assignment, people who responded “Strongly Approve” or “Somewhat approve” are classified as people who approve Donald Trump as a President. Those who responded “Strongly Disapprove” or “Somewhat Disapprove” are classified as people who disapprove him as a President.

Data Preparation

new_voterdata <- voterdata %>%
  mutate(race = ifelse(race_2019==1,"white",
                ifelse(race_2019==2,"black", NA)),
         approvetrump = ifelse(trumpapp_2019==1, "Approve",
                        ifelse(trumpapp_2019==2, "Approve",
                        ifelse(trumpapp_2019==3, "Disapprove",
                        ifelse(trumpapp_2019==4, "Disapprove",
                        ifelse(trumpapp_2019==5, "Don't Know", NA))))))%>%
  select(race, approvetrump)%>%
  filter(race %in% c("white", "black"), approvetrump %in% c("Approve", "Disapprove"))

Computing the Percentage Distributions if The Two Variables are Independent from Each Other

table(new_voterdata$race)%>%
  prop.table()%>%
  round(2)
## 
## black white 
##  0.09  0.91
table(new_voterdata$approvetrump)%>%
  prop.table()%>%
  round(2)
## 
##    Approve Disapprove 
##       0.45       0.55
Based on the percentages displayed in the table above, 4% should be black respondents who approve Donald Trump as a president, 5% should be black respondents who disapprove him as a president, 41% should be white respondents who approve him as a president and 50% should be white respondents who disapprove him as a president.

Actual Observations

table(new_voterdata$approvetrump, new_voterdata$race)%>%
  prop.table()%>%
  round(2)
##             
##              black white
##   Approve     0.01  0.45
##   Disapprove  0.08  0.46
According to the values computed and the percentages in the above, there is a 3% difference between the computed percentages and the percentages from the actual observations for the black respondents who disapprove Donald Trump as a president and the black respondents who approve him as a president.
Similarly, there is a 5% difference for the white respondents who approve and disapprove Donald Trump as a president.

Chi-Square Test

chisq.test(new_voterdata$race, new_voterdata$approvetrump)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  new_voterdata$race and new_voterdata$approvetrump
## X-squared = 296.5, df = 1, p-value < 2.2e-16
Based on the results of the chi-square test generated above, p-value is 0.00000000000000022, which is less than 0.05. This implies that there is a statistically significant relationship between the race of a person and if he/she approves or disapproves Donald Trump as a president.

Null-Hypothesis Table

chisq.test(new_voterdata$race, new_voterdata$approvetrump)[7]
## $expected
##                   new_voterdata$approvetrump
## new_voterdata$race   Approve Disapprove
##              black  227.5843   274.4157
##              white 2265.4157  2731.5843

Observed Values Table

chisq.test(new_voterdata$race, new_voterdata$approvetrump)[6]
## $observed
##                   new_voterdata$approvetrump
## new_voterdata$race Approve Disapprove
##              black      44        458
##              white    2449       2548
Based on the two tables generated above, there is a significant different in the values for the black respondents as the values from the null-hypothesis table is much higher than those from the observed values table. As for the values for the white respondents, the observed value is higher for those who approve him as a president and lower for those who disapprove him as a president.