library(tidyverse)
library(openintro)
library(tidymodels)
data("gss")
Chapter 18 - Chi Square
Load the libraries and data
A question in two variables
Does level of education have an association with political party affiliation?
Chi Square
When we are looking at a two-way table, we can explore the \(\chi^2\) distribution
Start with a bar plot
# Visualize distribution
|>
gss ggplot(aes(x = partyid, fill = college)) +
# Add bar layer of proportions
geom_bar(position = "fill")
The education proportions for each party look relatively similar.
Remove position = “fill”
|>
gss ggplot(aes(x = partyid, fill = college)) +
# Add bar layer of proportions
geom_bar()
# From previous step
<- gss |>
Obs select(college, partyid) |>
table()
# Convert table back to tidy df
|>
Obs # Tidy the table
tidy() |>
# Expand out the counts
uncount(n)
Warning: 'tidy.table' is deprecated.
Use 'tibble::as_tibble()' instead.
See help("Deprecated")
# A tibble: 500 × 2
college partyid
<chr> <chr>
1 no degree dem
2 no degree dem
3 no degree dem
4 no degree dem
5 no degree dem
6 no degree dem
7 no degree dem
8 no degree dem
9 no degree dem
10 no degree dem
# ℹ 490 more rows
Perform a chi-squre hypothesis test
# Create one permuted data set
<- gss |>
perm_1 # Specify the variables of interest
specify(college ~ partyid) |>
# Set up the null
hypothesize(null = "independence") |>
# Generate a single permuted data set
generate(reps = 1, type = "permute")
Dropping unused factor levels DK from the supplied explanatory variable
'partyid'.
perm_1
Response: college (factor)
Explanatory: partyid (factor)
Null Hypothesis: independence
# A tibble: 500 × 3
# Groups: replicate [1]
college partyid replicate
<fct> <fct> <int>
1 degree ind 1
2 no degree rep 1
3 degree ind 1
4 no degree ind 1
5 degree rep 1
6 no degree rep 1
7 degree dem 1
8 degree ind 1
9 degree rep 1
10 no degree dem 1
# ℹ 490 more rows
# Visualize permuted data
ggplot(perm_1, aes(x = partyid, fill = college)) +
# Add bar layer
geom_bar()
# Compute chi-squared stat
chisq.test(perm_1$partyid, perm_1$college)
Warning in chisq.test(perm_1$partyid, perm_1$college): Chi-squared
approximation may be incorrect
Pearson's Chi-squared test
data: perm_1$partyid and perm_1$college
X-squared = 1.5636, df = 3, p-value = 0.6677
With a p-value of 0.524, there is no compelling evidence that there is an association between having a college degree or not and the political party affiliation.