Introduction
We are going to focus on using data to back up assertions about what liberals and conservatives commonly believe with regards to government involvement and global warming. For the purpose of our analysis we will be using data from the ANES 2016, which is a national survey that is conducted before and after every presidential election. We will be taking a look at a variable that is asks respondents to place on a scale describing how liberal or conservative their beliefs are with 1 being extremely liberal and 7 being extremely conservative. We will be using a statistical measure called Pearson’s Chi-squared test to test if two categories are related to each other. The higher the chi squared value (also larger p-value), the less correlation there is.
Another statistical test we’ll be using is called Pearson correlation coefficient (r) is used to indicate if there is a linear relationship between any two variables. An r value of 0 indicates there’s no linear relationship, r close to 1 indicates strong positive linear relationship and r close to -1 indicates strong negative linear relationship.
First we load in the libraries and data that we will be using. We will then remove negative values because they are usually used to indicate a non respones; either the person refused to response or didn’t know how to answer the question. We also have to be careful to check the variable to see if they are any other variables that can interfere with our analysis. Since the variable we will be sticking with is idealogy, we will remove the responses with 99 in them. We double check the range to make sure we have the correct values.
# Installation stuff
# install.packages("devtools")
# install.packages("dplyr")
# devtools::install_github("jamesmartherus/anesr")
library(anesr)
library("ggpubr")
library(tidyverse)
library(MASS)
data(timeseries_2016)
anes16 <- timeseries_2016
rm(timeseries_2016)
# Remove all the negative values from
clean <- function(x){ifelse(x < 0, NA, x)}
anes16_clean <- anes16 %>%
mutate(across(everything(), clean))
anes16_clean <- anes16_clean %>%
filter(V161126 != 99) %>%
mutate(Ideology = V161126)Our range for Idealogy is from 1 to 7 as expected, so let’s get started!
## Ideology
## [1,] 1
## [2,] 7
Services a government should provide
We know that people who are liberal are more likely to believe that the government should be offering more services. Let’s look at how the data supports this claim.
The variable V161178 in the ANES survey denotes how much a government should be involved in offering services, with 1 being that the government should provide many fewer services and 7 being that the Government should provide many more services. 99 indicates that the respondent didn’t think about it much so we have to filter it out.
First we verify that the range is between 1 and 7 which it is. Then we perform a Chi square test.
The Null Hypothesis for the test will be that there is no relationship between the two variables. Let the significance level be 0.001, which means that there is a 0.1 % risk of concluding that a difference exists when there is no diference.
ide_gov <- anes16_clean %>%
dplyr::select(Ideology, government_service = V161178) %>%
filter(government_service != 99) %>%
drop_na()
range_gov_serv <- ide_gov %>%
dplyr::select(government_service) %>%
sapply(range)
range_gov_serv## government_service
## [1,] 1
## [2,] 7
tble <- table(ide_gov$Ideology, ide_gov$government_service)
chisq.test(tble, simulate.p.value = TRUE)##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: tble
## X-squared = 1368.4, df = NA, p-value = 0.0004998
Since the p-value is low for a confidence level, we reject the null hypothesis and conclude that there is a relationship between the two variables
Let’s plot out the graph.
r <- cor.test(ide_gov$Ideology, ide_gov$government_service)
ggscatter(ide_gov, x = 'Ideology', y = 'government_service',
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Increasing Conservative Ideology", ylab = "Government Service")## `geom_smooth()` using formula 'y ~ x'
The reason the graph looks weird is because all the values fall into fixed values;a lot of the values fall on top of each othere, so they don’t appear to be correlated even though they have an r of -0.56, which implies a negative correlation which we expect. People who lean liberal are more likely to think that the government should do more to offer services.
Global warming
We know that liberals are more likely to think Global warming exists. This varaible V161221 represents the response people gave about global warming, with 1 indicating that it has probably been happening and 2 indicating that it has probably not been happening.
OUr null hypothesis is the same as before, that there is no relationship between the variables. Our significancle level is 0.0001
ide_glb <- anes16_clean %>%
dplyr::select(Ideology, global_warming= V161221) %>%
drop_na()
tble <- table(ide_glb$Ideology, ide_glb$global_warming)
chisq.test(tble, simulate.p.value = TRUE)##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: tble
## X-squared = 441.68, df = NA, p-value = 0.0004998
##
## Pearson's product-moment correlation
##
## data: ide_glb$Ideology and ide_glb$global_warming
## t = 20.166, df = 3270, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3017546 0.3627152
## sample estimates:
## cor
## 0.3325823
ggscatter(ide_glb, x = 'Ideology', y = 'global_warming',
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Increasing Conservative Ideology", ylab = "Increase in Global Warming Skepticism")## `geom_smooth()` using formula 'y ~ x'
We reject the null hypothesis and conclude that there is a relationship between the variables. And we see that the r value is 0.33 . While this indicates that liberals are more likley to believe in Global warming, the correlation in terms of magnitude is not as big as the correlation with the government providing services.
Through these examples we have seen how commonly held beliefs were supported by the data and how we could even quantify the realtionship to compare it to with another variable. Specifically, we were able to see how the linear relationship between conservative ideology and government providing less services was stronger than the linear relationship between the liberal idealogy and belief in global warming.