Topic 10: Chi-squared Tests for Categorical Data


These are the solutions for Computer Lab 11, and use data sourced from Czarniecka-Skubina et al. (2021).


1 Chi-squared Goodness of Fit Test

1.1

No solutions required.

1.2

We can test the following hypotheses:

\(H_0\): There is no significant difference between the observed and expected distribution of proportions of coffee consumption frequency of Poles.

\(H_1\): There is a significant difference between the observed and expected distribution of proportions of coffee consumption frequency of Poles.

1.3

We have 7 categories of coffee consumption frequency. Therefore the degrees of freedom for this test is \(7-1 = 6\).

1.4

No solutions required.

1.5

Compare your results with the R output below:

## 
##  Chi-squared test for given probabilities
## 
## data:  obs.freq
## X-squared = 53.26, df = 6, p-value = 1.04e-09

1.6

The test statistic is \(53.26\) and the \(p\)-value is approximately \(0\). Based on these results, we can reject \(H_0\) at the 5% level of significance, and conclude that there is a significant difference between the observed and expected distribution of proportions of coffee consumption frequency of Poles.

1.7

We note from the R code output below that the expected counts are 150, 75, 75, 150, 300, 450, 300. Since all the numbers are greater than 5, this means that:

  • No more than 20% of the categories have an expected count of less than 5.
  • There are no expected counts of 0.

Hence our assumptions are satisfied.

## [1] 150  75  75 150 300 450 300

2 Chi-squared Goodness of Fit Test using a summary data set

How did you go? Check with your computer lab demonstrator if you are unsure.

3 Chi-squared Test of Independence

3.1

Here, our null and alternative hypotheses are:

\(H_0\): There is no association between coffee consumption frequency and age of Poles.

\(H_1\): There is an association between coffee consumption frequency and age of Poles.

3.2

The degrees of freedom for our Chi-square test of independence will be \((5-1) \times (2-1) = 4\), since we have \(5\) rows and \(2\) columns.

3.3

No solutions required.

3.4

Compare your results with the R output provided below:

## 
##  Pearson's Chi-squared test
## 
## data:  table
## X-squared = 130.59, df = 4, p-value < 2.2e-16

3.5

The test statistic is \(130.59\) and the \(p\)-value is approximately \(0\). Based on these results, we can reject \(H_0\) at the 5% level of significance, and conclude that there is an association between coffee consumption frequency and age of Poles.

It would be interesting to conduct this test for segments of the population, to see if this association holds when considering only Poles within a certain age group (e.g. 18-30).

3.6

We note from the R code output below that the expected counts are 334.08, 183.552, 198.144, 239.616, 196.608, 100.92, 55.448, 59.856, 72.384, 59.392. Since all the numbers are greater than 5, this means that:

  • No more than 20% of the categories have an expected count of less than 5.
  • There are no expected counts of 0.

Hence our assumptions are satisfied.

##           [,1]    [,2]
## group1 334.080 100.920
## group2 183.552  55.448
## group3 198.144  59.856
## group4 239.616  72.384
## group5 196.608  59.392

4 Chi-squared Test of Independence using a summary data set

How did you go? Check with your computer lab demonstrator if you are unsure.


That’s everything! If there were any parts you were unsure about, take a look back over the relevant sections of the Topic 10 material.


References

Czarniecka-Skubina, E., M. Pielak, P. Sałek, R. Korzeniowska-Ginter, and T. Owczarek. 2021. “Consumer Choices and Habits Related to Coffee Consumption by Poles.” International Journal of Environmental Research and Public Health 18 (8). https://doi.org/10.3390/ijerph18083948.


These notes have been prepared by Amanda Shaker and Rupert Kuveke. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.