Chi Square Test of Independence

Data Set

I will use the built-in data set "survey"" of the package "MASS" in "R".

The Smoke column in the dataset records the students smoking habit, while the Exer column records their exercise level. 

The allowed values in Smoke are "Heavy", "Regul" (regularly), "Occas" (occasionally) and "Never". 

As for Exer, they are "Freq" (frequently), "Some" and "None".

We can tally the students smoking habit against the exercise level with the table function in R. 

The result is called the contingency table of the two variables. 
# load the MASS package 
library(MASS)

# generating the table on variable under interests
tbl <- table(survey$Smoke, survey$Exer) 

# viewing the contingency table 
tbl
##        
##         Freq None Some
##   Heavy    7    1    3
##   Never   87   18   84
##   Occas   12    3    4
##   Regul    9    1    7

Let us set up the hypothesis

Hypothesis

Test the hypothesis whether the students smoking habit is independent of their exercise level at .05 significance level. 

Chi Square Test

# Aplly Chi Square Test on sample dataset
chisq.test(tbl) 
## 
##  Pearson's Chi-squared test
## 
## data:  tbl
## X-squared = 5.4885, df = 6, p-value = 0.4828
The chisq.test to the contingency table "tbl" found the p-value to be 0.4828

Interpretation

As the p-value 0.4828 is greater than the .05 significance level, we do not reject the null hypothesis that the smoking habit is independent of the exercise level of the students.