Chi Square Test

We use the Chi square test for testing association in two way contingency tables.
Contingency tables: Outcomes are categorized into rows and columns.
This can be used to test the differences between several groups.
The null hypothesis is that there is not differences between the groups.
The alternative is that there is difference between the groups.
Is there Independence or Association?

Working with Categorical Data

Chi-Square

The table below shows the relationship between gender and party identification in a US state.

% & Democrat &Independent & Republican & Total \ %Male &279& 73 &225 &577 \ %Female &165& 47 & 191 &403 \ %Total &444 & 120 &416& 980 \

Test for association between gender and party affiliation at two appropriate levels and comment on your results.

Set out the null hypothesis that there is no association between method of computation and gender against the alternative, that there is. Be careful to get these the correct way round!

H0: There is no association. H1: There is an association.

Work out the expected values. For example, you should work out the expected value for the number of males who use no aids from the following: (95/195) × 22 = 10.7.


chisq.test(c(59,20,11,10))

chisq.test(c(59,20,11,10), p=c(9/16,3/16,3/16,1/16))


library(MASS)     # load the MASS package

tbl = table(survey$Smoke, survey$Exer)

Section 3. Chi-squared Test of Independence

Two random variables x and y are called independent if the probability distribution of one variable is not affected by the presence of another.

Assume fij is the observed frequency count of events belonging to both i-th category of x and j-th category of y.
Also assume eij to be the corresponding expected count if x and yare independent.
The null hypothesis of the independence assumption is to be rejected if the p-value of the following Chi-squared test statistics is less than a given significance level α.

> chisq.test(ctbl)

Pearson’s Chi-squared test

data: ctbl X-squared = 3.2328, df = 3, p-value = 0.3571

### Smoking Example

Using this in-built dataset, we shall test the association between smoking and exercise.

Test the hypothesis whether the students smoking habit is independent of their exercise level at 0.05 significance level.

r library(MASS)       # load the MASS package tbl = table(survey$Smoke, survey$Exer) tbl                 # the contingency table

## ##         Freq None Some ##   Heavy    7    1    3 ##   Never   87   18   84 ##   Occas   12    3    4 ##   Regul    9    1    7

r # Notice the small cell sizes

r class(survey$Smoke)

## [1] "factor"

```r # Sort the factors

levels(survey$Smoke)=c('Never','Occas','Regul','Heavy') levels(survey$Exer)=c(‘None’,‘Some’,‘Freq’) ```

### Chi Square Test for Independence

r chisq.test(tbl)

## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect

## ## Pearson's Chi-squared test ## ## data: tbl ## X-squared = 5, df = 6, p-value = 0.5

* We have applied the chisq.test() function to the contingency table tbl, and found the p-value to be 0.4828. * We fail to reject the null hypothesis.

* The warning message found in the solution above is due to the small cell values in the contingency table. * To avoid such warning, we could combine the second and third columns of tbl.

```r # Dont throw out the raw data,make a ‘derived variable’ instead. survey$Exer2 <- survey$Exer

levels(survey$Exer2) <- list(Rare = c(‘None’,‘Some’),Freq = ‘Freq’) ```

r chisq.test(survey$Smoke,survey$Exer2)

## Warning in chisq.test(survey$Smoke, survey$Exer2): Chi-squared approximation ## may be incorrect

## ## Pearson's Chi-squared test ## ## data: survey$Smoke and survey$Exer2 ## X-squared = 5, df = 3, p-value = 0.2

### prob.table()

A useful command associated with the Chi-Square Test is prob.table(), whicn converts count data to proportions.

r ### OVerall Proportions prop.table(tbl)

## ##            Freq    None    Some ##   Heavy 0.02966 0.00424 0.01271 ##   Never 0.36864 0.07627 0.35593 ##   Occas 0.05085 0.01271 0.01695 ##   Regul 0.03814 0.00424 0.02966

r ### Proportion of Row Variable prop.table(tbl,1)

## ##           Freq   None   Some ##   Heavy 0.6364 0.0909 0.2727 ##   Never 0.4603 0.0952 0.4444 ##   Occas 0.6316 0.1579 0.2105 ##   Regul 0.5294 0.0588 0.4118

r ### Proportion of Column Variable prop.table(tbl,2)

## ##           Freq   None   Some ##   Heavy 0.0609 0.0435 0.0306 ##   Never 0.7565 0.7826 0.8571 ##   Occas 0.1043 0.1304 0.0408 ##   Regul 0.0783 0.0435 0.0714

Chi Square Test

Statistics with R

DragonflyStats.github.io

Chi Square Test

Working with Categorical Data

Chi-Square

Section 3. Chi-squared Test of Independence