Hypothesis 1: Neyman-Pearson Framework

Main variable: ‘rating’

Group A: Taiwan

Group B: Non-Taiwan

1. Null/Alternative Hypothesis

Null Hypothesis (H₀): The mean rating of Taiwan roasted coffees equals the mean rating of non-Taiwan roasted coffees.

\[ ^μTaiwan = ^μNon-Taiwan \]

Alternative Hypothesis (H₁): The mean rating of Taiwan roasted coffees differs from that of non-Taiwan roasted coffees.

\[ ^μTaiwan \neq ^μNon-Taiwan \]

Interpretation: Do coffees roasted in Taiwan differ in mean rating from non-Taiwan roasted coffees?

2. Intentional Desgin

\(\alpha\) = 0.05 (5% false positive risk)
Power = 0.80 (20% false negative risk)
Minimum practicality meaningful difference = 1.00 rating point
- 1.00 is small enough but not too small, and I did not want to go higher than a full rating point since the reviews are all between 84-98, despite the rating scale being in between 0-100 technically

3. Sample Size Calculation

power_calc <- power.t.test(
  delta = 1,        # minimum difference in means we care about
  sd    = 2,        # assumed SD of rating
  sig.level = 0.05, # alpha
  power = 0.80,
  type = "two.sample",
  alternative = "two.sided"
)
power_calc

## 
##      Two-sample t test power calculation 
## 
##               n = 63.76576
##           delta = 1
##              sd = 2
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

coffee_clean %>% count(country_group)

## # A tibble: 2 × 2
##   country_group     n
##   <chr>         <int>
## 1 Non-Taiwan     1531
## 2 Taiwan          549

Sample Size Conclusion: There are hundreds of observations for variable A and B, so I have enough data to test based on my design and effect size! This is far more than the required minimum as n =63.77, which means that I need about 64 Taiwan and 64 non-Taiwan roasted coffee observations to reliably detect a 1 rating point difference with 80% power, and my actual sample size is far larger.

4. Perform Test

t_test_res <- t.test(
  rating ~ country_group,
  data = coffee_clean,
  var.equal = FALSE  # Welch t-test
)
t_test_res

## 
##  Welch Two Sample t-test
## 
## data:  rating by country_group
## t = -6.7582, df = 1183.2, p-value = 2.189e-11
## alternative hypothesis: true difference in means between group Non-Taiwan and group Taiwan is not equal to 0
## 95 percent confidence interval:
##  -0.6099728 -0.3354930
## sample estimates:
## mean in group Non-Taiwan     mean in group Taiwan 
##                 92.98628                 93.45902

Test Summary Insights: The mean rating of coffees roasted in Taiwan is +0.47 rating points higher than that of non-Taiwan roasted coffees. After flipping my confidence interval to interpret Taiwan to non-Taiwan, my 95% confidence interval is: [0.335, 0.610], which is quite narrow, meaning it has high precision.

5. Visualization

ggplot(coffee_clean, aes(x = country_group, y = rating, fill = country_group)) +
  geom_boxplot(alpha = 0.7) +
  geom_jitter(width = 0.15, alpha = 0.4) +
  labs(
    title = "Coffee Ratings: Taiwan vs Non-Taiwan",
    x = "Country group",
    y = "Rating"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

6. Conclusion

Reject H₀: There is extremely strong evidence that Taiwan roasted coffees have higher mean ratings than non-Taiwan roasted coffees. Despite a somewhat small effect size at 0.47 rating points, it is nonetheless consistent, precise, and statistically convincing due to a large sample size. Overall, the effect is small but quite real and detectable due to the large dataset. The visualization strongly supports this evidence depicting a visibly higher mean rating for coffees roasted in Taiwan, with very few coffees receiving below a 90/100 rating, where as non-Taiwan roasted coffees have a much more significant variance with many sub 90/100 ratings!

Hypothesis 2: Fisher’s Significance Testing

Main Variable: ‘high_rating’ (binary: 1 if rating \(\geq\) 93; 0 if rating < 93)

Group A/B: ‘high_price’ & ‘low_price’ (“High price” vs “Low price”)

High price = above median price
Low price = below median price

1. Null/Alternative Hypothesis

Null Hypothesis (H₀): Proportion of high ratings is the same in both price groups

\[ P_{High price} = P_{Low price} \]

Alternative Hypothesis (H₁): The proportions differ

\[ P_{High price} \neq P_{Low price} \]

Interpretation: Are high price coffees more likely to receive high ratings?

2. 2x2 Table

tab_price_rating <- table(coffee_clean$high_price, coffee_clean$high_rating)
tab_price_rating

##             
##                0   1
##   High price 220 815
##   Low price  393 652

High price group: \[ P_1 = \frac{815}{1035} = 0.787 \]

Low price group: \[ P_2 = \frac{652}{1045} = 0.624 \]

Difference in proportions:\[ P_1 - P_2 = 0.163 \]

There is a 16.3 percentage point difference.

3. Perform Fisher / Two-proportion Test

prop_res <- prop.test(
  x = c(
    sum(coffee_clean$high_rating[coffee_clean$high_price == "High price"], na.rm = TRUE),
    sum(coffee_clean$high_rating[coffee_clean$high_price == "Low price"], na.rm = TRUE)
  ),
  n = c(
    sum(coffee_clean$high_price == "High price", na.rm = TRUE),
    sum(coffee_clean$high_price == "Low price", na.rm = TRUE)
  ),
  alternative = "two.sided",
  correct = TRUE
)
prop_res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(sum(coffee_clean$high_rating[coffee_clean$high_price == "High price"], na.rm = TRUE), sum(coffee_clean$high_rating[coffee_clean$high_price == "Low price"], na.rm = TRUE)) out of c(sum(coffee_clean$high_price == "High price", na.rm = TRUE), sum(coffee_clean$high_price == "Low price", na.rm = TRUE))
## X-squared = 66.104, df = 1, p-value = 4.277e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1240346 0.2029977
## sample estimates:
##    prop 1    prop 2 
## 0.7874396 0.6239234

Fisher / Two-proportion Test Insights: The 95% confidence interval is [0.124, 0.203], which tells us that high price coffees are 12.4% to 20.3% more likely to receive a rating \[\geq\] 93 than low price coffees. Additionally, the p-value is far below 0.05, suggesting that we should reject \[H_0\] , since there is strong evidence that the probability of receiving high ratings differs significantly between coffees priced above the median and coffees priced below the median.

4. Visualization

prop_df <- coffee_clean %>%
  group_by(high_price) %>%
  summarise(
    prop_high = mean(high_rating, na.rm = TRUE),
    n = n()
  )

ggplot(prop_df, aes(x = high_price, y = prop_high, fill = high_price)) +
  geom_col(alpha = 0.8) +
  scale_fill_manual(values = c("High price" = "green", 
                               "Low price" = "lightgreen")) +
  geom_text(aes(label = round(prop_high, 2)), vjust = -0.5) +
  ylim(0, 1) +
  labs(
    title = "Proportion of High Ratings (≥ 93) by Price Group",
    x = "Price group",
    y = "Proportion high rating"
  ) +
  theme_minimal()

5. Conclusion

Reject H₀: The Two-Proportion and Fisher Test strongly support that higher prices have a much higher chance of receiving a high rating compared to lower priced coffees. The visual supports and clearly depicts these statistically significant results that the correlation between low priced coffee ratings and high priced coffee ratings is certainly not proportional. The bigger picture insight we gain is the justification of higher priced coffees, as they consistently live up to their value, receiving much higher ratings than lower priced coffees. It is important for consumers to know that they are receiving premium coffee when they buy higher priced coffees. Ultimately, this supports the notion that premium pricing aligns with higher quality coffee far more often than not.

Week7 Data Dive - Hypothesis Testing

Woods

2026-03-02

Create Groups & Define Variables

Hypothesis 1 Grouping

Hypothesis 2 Grouping

Hypothesis 1: Neyman-Pearson Framework

1. Null/Alternative Hypothesis

2. Intentional Desgin

3. Sample Size Calculation

4. Perform Test

5. Visualization

6. Conclusion

Hypothesis 2: Fisher’s Significance Testing

1. Null/Alternative Hypothesis

2. 2x2 Table

3. Perform Fisher / Two-proportion Test

4. Visualization

5. Conclusion