title: “Chi-Square Goodness-of-Fit — Dessert Preferences (Scenario 1)” author: “Geetha Shivani” date: “November 12, 2025” output: html_document: toc: true toc_depth: 2 number_sections: yes —

Purpose

Test whether the observed dessert preferences match an expected equal distribution across three desserts (Chocolate Cake, Vanilla Cheesecake, Tiramisu).

Hypotheses
- H0: Preferences are equal across desserts (p1 = p2 = p3 = 1/3).
- H1: At least one dessert is preferred more/less than expected.

Data Entry

🔁 Replace the numbers below with your observed counts from the restaurant.

# Observed counts (EDIT THESE if you have your own data)
desserts <- c(50, 45, 30)
names(desserts) <- c("Chocolate_Cake", "Vanilla_Cheesecake", "Tiramisu")

# Expected probabilities for equal preference
expected <- rep(1/3, 3)

desserts

##     Chocolate_Cake Vanilla_Cheesecake           Tiramisu 
##                 50                 45                 30

expected

## [1] 0.3333333 0.3333333 0.3333333

sum_obs <- sum(desserts)
sum_obs

## [1] 125

Quick Visualization

op <- par(mfrow=c(1,2), las=2)
barplot(desserts, main="Observed Counts", ylab="Count")
barplot(expected * sum(desserts), main="Expected Counts (Equal)", ylab="Count")

par(op)

Assumption Check

Independence of observations: Each order is counted once (study design).
Expected counts: All expected counts should be ≥ 5.

exp_counts <- expected * sum(desserts)
exp_counts

## [1] 41.66667 41.66667 41.66667

any_lt5 <- any(exp_counts < 5)
any_lt5

## [1] FALSE

Test: Chi-Square Goodness-of-Fit

gof <- chisq.test(desserts, p = expected, rescale.p = TRUE)
gof

## 
##  Chi-squared test for given probabilities
## 
## data:  desserts
## X-squared = 5.2, df = 2, p-value = 0.07427

Effect Size (Cohen’s w)

# Cohen's w for GOF: w = sqrt( sum((obs - exp)^2 / exp) ) / N  *BUT*
# More directly: w = sqrt( sum( (pi_obs - pi_exp)^2 / pi_exp ) )
pi_obs <- desserts / sum(desserts)
pi_exp <- expected
w <- sqrt( sum((pi_obs - pi_exp)^2 / pi_exp) )
w

## [1] 0.2039608

# Benchmarks: small = .10, medium = .30, large = .50

Post Hoc: Contribution by Category

std_resid <- (desserts - exp_counts) / sqrt(exp_counts)
contrib <- std_resid^2
cbind(Observed=desserts, Expected=round(exp_counts,2), StdResid=round(std_resid,2), Contribution=round(contrib,2))

##                    Observed Expected StdResid Contribution
## Chocolate_Cake           50    41.67     1.29         1.67
## Vanilla_Cheesecake       45    41.67     0.52         0.27
## Tiramisu                 30    41.67    -1.81         3.27

APA-Style Reporting

cat(sprintf("A chi-square goodness-of-fit test indicated that dessert preferences %s equal across categories, ",
            ifelse(gof$p.value < 0.05, "were not", "were")))

A chi-square goodness-of-fit test indicated that dessert preferences were equal across categories,

cat(sprintf("χ²(%d) = %.2f, p = %.3f. ", gof$parameter, gof$statistic, gof$p.value))

χ²(2) = 5.20, p = 0.074.

cat(sprintf("Effect size was Cohen's w = %.2f. ", w))

Effect size was Cohen’s w = 0.20.

cat("Standardized residuals suggest categories with the largest deviations from expectation (see table above).")

Standardized residuals suggest categories with the largest deviations from expectation (see table above).

Notes for RPubs

Keep this as Scenario 1 in RPubs.
Replace author name and counts before knitting.
If any expected count < 5, consider combining categories or using an exact test alternative.