Using this in-built dataset, we shall test the association between smoking and exercise.
Test the hypothesis whether the students smoking habit is independent of their exercise level at 0.05 significance level.
library(MASS) # load the MASS package
tbl = table(survey$Smoke, survey$Exer)
tbl # the contingency table
##
## Freq None Some
## Heavy 7 1 3
## Never 87 18 84
## Occas 12 3 4
## Regul 9 1 7
# Notice the small cell sizes
class(survey$Smoke)
## [1] "factor"
# Sort the factors
levels(survey$Smoke)=c('Never','Occas','Regul','Heavy')
levels(survey$Exer)=c('None','Some','Freq')
chisq.test(tbl)
## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: tbl
## X-squared = 5, df = 6, p-value = 0.5
We have applied the chisq.test() function to the contingency table tbl, and found the p-value to be 0.4828.
We fail to reject the null hypothesis.
The warning message found in the solution above is due to the small cell values in the contingency table.
To avoid such warning, we could combine the second and third columns of tbl.
# Dont throw out the raw data,make a 'derived variable' instead.
survey$Exer2 <- survey$Exer
levels(survey$Exer2) <- list(Rare = c('None','Some'),Freq = 'Freq')
chisq.test(survey$Smoke,survey$Exer2)
## Warning in chisq.test(survey$Smoke, survey$Exer2): Chi-squared approximation may
## be incorrect
##
## Pearson's Chi-squared test
##
## data: survey$Smoke and survey$Exer2
## X-squared = 5, df = 3, p-value = 0.2
A useful command associated with the Chi-Square Test is prob.table(), whicn converts count data to proportions.
### OVerall Proportions
prop.table(tbl)
##
## Freq None Some
## Heavy 0.02966 0.00424 0.01271
## Never 0.36864 0.07627 0.35593
## Occas 0.05085 0.01271 0.01695
## Regul 0.03814 0.00424 0.02966
### Proportion of Row Variable
prop.table(tbl,1)
##
## Freq None Some
## Heavy 0.6364 0.0909 0.2727
## Never 0.4603 0.0952 0.4444
## Occas 0.6316 0.1579 0.2105
## Regul 0.5294 0.0588 0.4118
### Proportion of Column Variable
prop.table(tbl,2)
##
## Freq None Some
## Heavy 0.0609 0.0435 0.0306
## Never 0.7565 0.7826 0.8571
## Occas 0.1043 0.1304 0.0408
## Regul 0.0783 0.0435 0.0714