Chi-Square Test of Independence for Two Factor Variables

If you have two factor variables with multiple levels each, then you have to use the chi-square statistic to test whether or not they are related (there is some association) or independent (they are not associated).

setwd("~/Dropbox/Data General/GSS")  #Set your working directory to whatever folder holds GSS.csv
options(scipen = 999)  #Turn off scientific notation
options(warn = -1)  #Turn off warnings
options(digits = 8)  #Limit number of digits to 8
x <- read.csv("GSS.csv")

I'm interested in whether knowledge of the North-American Free Trade Agreement is associated with support for it, or if these two phenomena are independent of each other (no relationship). There are two variables in the GSS dataset, from the year 1996, asking people these two questions:

summary(x$nafta1)  #what do you know about NAFTA?
##          a lot       not much nothing at all    quite a bit           NA's 
##            324           1115            420            695          52533
summary(x$nafta2)  #will NAFTA benefit or not benefit the United States
##                  benefits          does not benefit 
##                       377                       270 
##                 dont know have never heard of nafta 
##                       581                       120 
##                      NA's 
##                     53739

The chi-square test of independence is executed with the function chisq.test(). We run the function on a simple frequency table of the two variables. The null hypotheses is that the two variables are independent, so lower p-values provide evidence that there is a systematic relationship and higher p-values provide evidence that variation in the data is simply random chance.

table <- table(x$nafta1, x$nafta2)  #make a simple contingency table of frequencies
table  #print the table
##                 
##                  benefits does not benefit dont know
##   a lot                88               68        25
##   not much            111               75       359
##   nothing at all        4                3        92
##   quite a bit         173              124       104
##                 
##                  have never heard of nafta
##   a lot                                  0
##   not much                              16
##   nothing at all                        99
##   quite a bit                            5
chisq.test(table)  #test research hypothesis that there is an association (null = independence)
## 
##  Pearson's Chi-squared test
## 
## data:  table 
## X-squared = 779.5446, df = 9, p-value < 0.00000000000000022

Here we find a very small p-value so we reject the null hypothesis of independence and conclude that levels of knowledge regarding NAFTA bear some relationship to evaluations of NAFTA. Notice, however, that this test does not allow us to parse specific relationships between categories. It only summarizes whether or not there is an overall association.