“The purpose of this analysis was to examine whether students’ favorite drink preferences were evenly distributed across four categories: Coffee, Soda, Tea, and Water. Because the variable of interest is categorical, a chi-square goodness-of-fit test was the appropriate statistical method.
The chi-square goodness-of-fit test evaluates whether the observed frequencies in each category differ significantly from the expected frequencies. This test does not require assumptions of normality, as categorical data cannot be represented on a histogram. ”
## Open the packages
library(readxl)
library(ggplot2)
library(rcompanion)
Data Preparation
The dataset (DatasetA2) was imported from an Excel file using the readxl package. The dataset contained responses from 100 students, each reporting their favorite drink. The variable FavoriteDrink included four categories: Coffee, Soda, Tea, and Water.
DatasetA2 <- read_excel("C:/Users/susmi/Downloads/DatasetA2.xlsx")
table(DatasetA2$StudentID, DatasetA2$FavoriteDrink)
##
## Coffee Soda Tea Water
## 1 0 1 0 0
## 2 0 1 0 0
## 3 0 1 0 0
## 4 1 0 0 0
## 5 0 1 0 0
## 6 1 0 0 0
## 7 1 0 0 0
## 8 1 0 0 0
## 9 0 1 0 0
## 10 0 0 1 0
## 11 0 0 0 1
## 12 1 0 0 0
## 13 1 0 0 0
## 14 0 0 1 0
## 15 1 0 0 0
## 16 0 1 0 0
## 17 0 0 0 1
## 18 0 0 1 0
## 19 0 1 0 0
## 20 0 1 0 0
## 21 0 0 1 0
## 22 0 0 0 1
## 23 0 0 1 0
## 24 0 0 1 0
## 25 0 0 1 0
## 26 0 1 0 0
## 27 0 0 0 1
## 28 1 0 0 0
## 29 0 1 0 0
## 30 1 0 0 0
## 31 0 0 1 0
## 32 1 0 0 0
## 33 0 1 0 0
## 34 0 0 0 1
## 35 1 0 0 0
## 36 0 0 1 0
## 37 0 1 0 0
## 38 0 1 0 0
## 39 0 0 1 0
## 40 0 0 0 1
## 41 0 1 0 0
## 42 0 0 0 1
## 43 1 0 0 0
## 44 0 0 1 0
## 45 0 1 0 0
## 46 0 0 1 0
## 47 0 0 1 0
## 48 1 0 0 0
## 49 0 1 0 0
## 50 0 1 0 0
## 51 0 0 0 1
## 52 0 0 1 0
## 53 0 1 0 0
## 54 0 0 1 0
## 55 0 1 0 0
## 56 0 0 0 1
## 57 1 0 0 0
## 58 0 0 1 0
## 59 1 0 0 0
## 60 0 0 1 0
## 61 0 0 1 0
## 62 0 0 0 1
## 63 0 0 0 1
## 64 0 1 0 0
## 65 0 0 1 0
## 66 1 0 0 0
## 67 0 0 1 0
## 68 0 0 1 0
## 69 0 1 0 0
## 70 0 0 1 0
## 71 1 0 0 0
## 72 0 0 1 0
## 73 0 1 0 0
## 74 0 0 1 0
## 75 0 1 0 0
## 76 1 0 0 0
## 77 0 0 0 1
## 78 0 1 0 0
## 79 0 0 0 1
## 80 0 0 0 1
## 81 1 0 0 0
## 82 1 0 0 0
## 83 0 1 0 0
## 84 0 0 0 1
## 85 1 0 0 0
## 86 1 0 0 0
## 87 0 1 0 0
## 88 0 1 0 0
## 89 0 0 0 1
## 90 0 0 1 0
## 91 1 0 0 0
## 92 1 0 0 0
## 93 0 0 1 0
## 94 1 0 0 0
## 95 0 0 0 1
## 96 0 0 1 0
## 97 0 0 1 0
## 98 1 0 0 0
## 99 0 1 0 0
## 100 0 1 0 0
A contingency table is used to show the frequency distribution between two categorical variables and how their categories interact. It helps determine whether there is an association between the variables, especially for a Chi-Square Test of Independence.
table(DatasetA2$FavoriteDrink)
##
## Coffee Soda Tea Water
## 26 29 28 17
A frequency table was created to summarize the number of students who preferred each drink. The observed frequencies were:
Coffee: 26
Soda: 29
Tea: 28
Water: 17
This table provided a clear overview of how students were distributed across the four drink categories.
ggplot(DatasetA2, aes(x = FavoriteDrink, fill = FavoriteDrink)) +
geom_bar() +
labs(
x = "FavoriteDrink",
y = "Frequency",
title = "Distribution of favourite Drinks"
) +
theme(
text = element_text(size = 14),
axis.title = element_text(size = 14),
axis.text = element_text(size = 14),
plot.title = element_text(size = 14),
legend.position = "none"
)
A bar chart was created using ggplot2 to visually display the frequency of each favorite drink category. The bar chart showed that Coffee, Soda, and Tea had similar frequencies, while Water had a noticeably lower frequency. This visual inspection suggested that the distribution might not be perfectly equal; however, a statistical test was required to determine whether these differences were statistically significant.
observed <- c(26, 29, 28,17)
expected <- c(.25, .25, .25, .25)
chisq.test(x = observed, p = expected)
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 3.6, df = 3, p-value = 0.308
table2 <- table(DatasetA2$FavoriteDrink, DatasetA2$StudentID)
chi_result <- chisq.test(table2)
## Warning in chisq.test(table2): Chi-squared approximation may be incorrect
w <- sqrt(chi_result$statistic / sum(table2))
w
## X-squared
## 1.732051
“A chi-square goodness-of-fit test indicated that the observed frequencies were not different from the expected frequencies, χ²(2) = 2.35, p = .308.”
The chi-square goodness-of-fit test indicated that the observed frequencies were not significantly different from the expected frequencies, χ²(3) = 3.60, p = .308.
Because the p-value was greater than the standard alpha level of .05, the null hypothesis was not rejected. This indicates that any differences observed among the drink categories are likely due to random variation rather than a meaningful preference difference.
There was no statistically significant difference in students’ favorite drink preferences. The results suggest that students appear to prefer coffee, soda, tea, and water equally. Although small differences were observed in the frequency counts, these differences were not large enough to be considered statistically significant.