Binary Variable: ‘high_rating’
Goal: Model whether a coffee is highly rated based on various factors
coffee_clean <- coffee_clean %>%
mutate(
high_rating = if_else(rating >= 95, 1, 0)
)
table(coffee_clean$high_rating)
##
## 0 1
## 1742 338
‘100g_USD’ - price per 100g of coffee in USD (continuous)
‘roast_num’ - ordered roast level (integer)
‘is_top_country’ - 1 if roast is in top 10, 0 otherwise
logit_mod <- glm(
high_rating ~ `100g_USD` + roast_num + is_top_country,
data = coffee_clean,
family = binomial
)
summary(logit_mod)
##
## Call:
## glm(formula = high_rating ~ `100g_USD` + roast_num + is_top_country,
## family = binomial, data = coffee_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.359102 1.379100 -2.436 0.0149 *
## `100g_USD` 0.056863 0.006346 8.961 <2e-16 ***
## roast_num -0.279408 0.109508 -2.551 0.0107 *
## is_top_country 1.700220 1.348236 1.261 0.2073
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1846.2 on 2079 degrees of freedom
## Residual deviance: 1723.1 on 2076 degrees of freedom
## AIC: 1731.1
##
## Number of Fisher Scoring iterations: 5
Model Interpretation:
\[ logit(Pr(\text{high_rating=1}))=β_0+β_1(\text{100g_USD})+β_2(\text{roast_num})+β_3(\text{is_top_country}) \]
exp(coef(logit_mod)["`100g_USD`"])
## `100g_USD`
## 1.058511
\(Estimate = 0.0569\)
\(p < 2e^{-16}\)
Interpretation: Holding roast level and country group constant, each $1 increase per 100g increases the odds of being highly rated by about 5.9%. This is statistically strong and practically meaningful: higher‑priced coffees are more likely to be elite.
exp(coef(logit_mod)["roast_num"])
## roast_num
## 0.7562313
\(Estimate = -0.2794\)
\(p = 0.0107\)
Interpretation: For each level darker in coffee roast, the odds of being highly rating, which I defined as rated 95/100 and above, decrease by approximately 24%, holding price and country constant. I conclude from this that light roasts receive consistently far higher ratings.
exp(coef(logit_mod)["is_top_country"])
## is_top_country
## 5.475151
\(Estimate = 1.7002\)
\(p = 0.2073\)
Interpretation: Coffees roasted in a top‑10 country have 5.5 times the odds of being highly rated compared to others, controlling for price and roast. However, the p‑value indicates this effect is not statistically significant in this model. Ultimately, the direction is signficant, but uncertainty is also large.
Calculate 95% Confidence Interval for ‘100g_USD’
coef_est <- coef(logit_mod)["`100g_USD`"]
se_est <- summary(logit_mod)$coefficients["`100g_USD`", "Std. Error"]
# 95% CI on log-odds scale
lower_log <- coef_est - 1.96 * se_est
upper_log <- coef_est + 1.96 * se_est
c(lower_log, upper_log)
## `100g_USD` `100g_USD`
## 0.04442523 0.06930109
# Convert to odds ratio CI
exp(c(lower_log, upper_log))
## `100g_USD` `100g_USD`
## 1.045427 1.071759
Odds Ratio Confidence Interval:
\[ [1.0454, 1.0718] \]
Holding roast level and country group constant, each additional $1 per 100g of coffee is associated with an increase in the odds of being highly rated by between 4.5% and 7.2%, with 95% confidence.
Because the entire confidence interval lies above 1, the effect of price is:
positive
statistically significant
consistent across plausible values of the coefficient
This means that even after accounting for roast level and top‑country status, higher‑priced coffees reliably have higher odds of being rated 95 or above. This is statistically important because the effect of price is not only significant, but highly precise. From this, I can confidently conclude that the practical effect of this model is real.