Chapter 18 - Chi Square

Author

R Saidi

Load the libraries and data

library(tidyverse)
library(openintro)
library(tidymodels)
data("gss")

A question in two variables

Does level of education have an association with political party affiliation?

Chi Square

When we are looking at a two-way table, we can explore the \(\chi^2\) distribution

Start with a bar plot

# Visualize distribution
gss |>
  ggplot(aes(x = partyid, fill = college)) +
  # Add bar layer of proportions
  geom_bar(position = "fill")

The education proportions for each party look relatively similar.

Remove position = “fill”

gss |>
  ggplot(aes(x = partyid, fill = college)) +
  # Add bar layer of proportions
  geom_bar()

# From previous step
Obs <- gss |>
  select(college, partyid) |>
  table()
  
# Convert table back to tidy df
Obs |>
  # Tidy the table
  tidy() |>
  # Expand out the counts
  uncount(n)
Warning: 'tidy.table' is deprecated.
Use 'tibble::as_tibble()' instead.
See help("Deprecated")
# A tibble: 500 × 2
   college   partyid
   <chr>     <chr>  
 1 no degree dem    
 2 no degree dem    
 3 no degree dem    
 4 no degree dem    
 5 no degree dem    
 6 no degree dem    
 7 no degree dem    
 8 no degree dem    
 9 no degree dem    
10 no degree dem    
# ℹ 490 more rows

Perform a chi-squre hypothesis test

# Create one permuted data set
perm_1 <- gss |>
  # Specify the variables of interest
  specify(college ~ partyid) |>
  # Set up the null
  hypothesize(null = "independence") |>
  # Generate a single permuted data set
  generate(reps = 1, type = "permute")
Dropping unused factor levels DK from the supplied explanatory variable
'partyid'.
perm_1
Response: college (factor)
Explanatory: partyid (factor)
Null Hypothesis: independence
# A tibble: 500 × 3
# Groups:   replicate [1]
   college   partyid replicate
   <fct>     <fct>       <int>
 1 degree    ind             1
 2 no degree rep             1
 3 degree    ind             1
 4 no degree ind             1
 5 degree    rep             1
 6 no degree rep             1
 7 degree    dem             1
 8 degree    ind             1
 9 degree    rep             1
10 no degree dem             1
# ℹ 490 more rows
# Visualize permuted data
ggplot(perm_1, aes(x = partyid, fill = college)) +
  # Add bar layer
  geom_bar()

# Compute chi-squared stat
chisq.test(perm_1$partyid, perm_1$college)
Warning in chisq.test(perm_1$partyid, perm_1$college): Chi-squared
approximation may be incorrect

    Pearson's Chi-squared test

data:  perm_1$partyid and perm_1$college
X-squared = 1.5636, df = 3, p-value = 0.6677

With a p-value of 0.524, there is no compelling evidence that there is an association between having a college degree or not and the political party affiliation.