These are the solutions for Computer Lab 11, and use data sourced from Czarniecka-Skubina et al. (2021).
No solutions required.
We can test the following hypotheses:
\(H_0\): There is no significant difference between the observed and expected distribution of proportions of coffee consumption frequency of Poles.
\(H_1\): There is a significant difference between the observed and expected distribution of proportions of coffee consumption frequency of Poles.
We have 7 categories of coffee consumption frequency. Therefore the degrees of freedom for this test is \(7-1 = 6\).
No solutions required.
Compare your results with the R output below:
##
## Chi-squared test for given probabilities
##
## data: obs.freq
## X-squared = 53.26, df = 6, p-value = 1.04e-09
The test statistic is \(53.26\) and the \(p\)-value is approximately \(0\). Based on these results, we can reject \(H_0\) at the 5% level of significance, and conclude that there is a significant difference between the observed and expected distribution of proportions of coffee consumption frequency of Poles.
We note from the R code output below that the expected counts are 150, 75, 75, 150, 300, 450, 300. Since all the numbers are greater than 5, this means that:
Hence our assumptions are satisfied.
## [1] 150 75 75 150 300 450 300
How did you go? Check with your computer lab demonstrator if you are unsure.
Here, our null and alternative hypotheses are:
\(H_0\): There is no association between coffee consumption frequency and age of Poles.
\(H_1\): There is an association between coffee consumption frequency and age of Poles.
The degrees of freedom for our Chi-square test of independence will be \((5-1) \times (2-1) = 4\), since we have \(5\) rows and \(2\) columns.
No solutions required.
Compare your results with the R output provided below:
##
## Pearson's Chi-squared test
##
## data: table
## X-squared = 130.59, df = 4, p-value < 2.2e-16
The test statistic is \(130.59\) and the \(p\)-value is approximately \(0\). Based on these results, we can reject \(H_0\) at the 5% level of significance, and conclude that there is an association between coffee consumption frequency and age of Poles.
It would be interesting to conduct this test for segments of the population, to see if this association holds when considering only Poles within a certain age group (e.g. 18-30).
We note from the R code output below that the expected counts are 334.08, 183.552, 198.144, 239.616, 196.608, 100.92, 55.448, 59.856, 72.384, 59.392. Since all the numbers are greater than 5, this means that:
Hence our assumptions are satisfied.
## [,1] [,2]
## group1 334.080 100.920
## group2 183.552 55.448
## group3 198.144 59.856
## group4 239.616 72.384
## group5 196.608 59.392
How did you go? Check with your computer lab demonstrator if you are unsure.
These notes have been prepared by Amanda Shaker and Rupert Kuveke. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.