library(readr)
## Warning: package 'readr' was built under R version 3.6.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.3
voterdata <- read.csv("F:/Voter Data 2019.csv")
The two groups of people from the dataset that I want to compare are people who describe their political viewpoints as liberal and conservative. They are found in the variable “ideo5_2017”. For this assignment, people who describe their political viewpoints as “very liberal” and “liberal” are classified as liberals and those who describe their political viewpoints as “very conservative” and “conservative” are classified as conservatives. The continuous variable in which I will compare them is their feeling towards immigrants (ft_immig_2017).
Data Preparation
new_voterdata <- voterdata%>%
mutate(politicalviewpoint = ifelse(ideo5_2017==1,"Liberal",
ifelse(ideo5_2017==2,"Liberal",
ifelse(ideo5_2017==4,"Conservative",
ifelse(ideo5_2017==5, "Conservative",NA)))),
ft_immig_2017 = ifelse(ft_immig_2017==997,NA, ft_immig_2017))%>%
select(ft_immig_2017, politicalviewpoint)%>%
filter(politicalviewpoint %in% c("Liberal", "Conservative"), ft_immig_2017<=100)
Comparing the Averages
new_voterdata%>%
group_by(politicalviewpoint)%>%
summarize(avgft_immig_2017 = mean(ft_immig_2017, na.rm=TRUE))
## # A tibble: 2 x 2
## politicalviewpoint avgft_immig_2017
## <chr> <dbl>
## 1 Conservative 50.6
## 2 Liberal 77.0
Comparing the Averages (Visualization)
new_voterdata%>%
ggplot()+geom_histogram(aes(x=ft_immig_2017))+
facet_wrap(~politicalviewpoint)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Generaing Sampling Distributions
liberal_data <- new_voterdata%>%
filter(politicalviewpoint=="Liberal")
conservative_data <-new_voterdata%>%
filter(politicalviewpoint=="Conservative")
replicate(10000, sample(liberal_data$ft_immig_2017, 40)%>%
mean(na.rm=TRUE))%>%
data.frame()%>%
rename("mean"=1)%>%
ggplot()+geom_histogram(aes(x=mean), fill="blue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

replicate(10000, sample(conservative_data$ft_immig_2017, 40)%>%
mean(na.rm=TRUE))%>%
data.frame()%>%
rename("mean"=1)%>%
ggplot()+geom_histogram(aes(x=mean), fill="red")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

T-Test
t.test(ft_immig_2017~politicalviewpoint, data = new_voterdata)
##
## Welch Two Sample t-test
##
## data: ft_immig_2017 by politicalviewpoint
## t = -32.549, df = 3562.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -27.97121 -24.79286
## sample estimates:
## mean in group Conservative mean in group Liberal
## 50.59953 76.98157
Conclusion
Based on the averages computed and the histogram that represent the population distribution of the values for people’s feeling towards immigrants, people who describe their political viewpoints as liberal provided higher values for their feeling towards immigrants (76.98157 > 50.59953). This result implies they are more supportive than those who describe their political viewpoints as conservative. Based on the histograms for the sampling distributions, the sampling distributions are normal. This implies that the Central Limit Theorem holds. According to the T-test, the p-value for this test is less than 0.5 (0.0000000000000022 < 0.5). Therefore, there is a statistical significance between liberals and conservatives in their feelings towards immigrants.