Two different judges were asked to rate a variety of different wines.
The wines were unlabeled to remove any bias that the judges may have towards a particular wine.
A higher score represents a better wine, scoring is from 0-100.
2024-11-01
Two different judges were asked to rate a variety of different wines.
The wines were unlabeled to remove any bias that the judges may have towards a particular wine.
A higher score represents a better wine, scoring is from 0-100.
wine judge.A judge.B alc.per 1 A 15 21 5 2 B 76 83 13 3 C 77 92 15 4 D 79 81 21 5 E 80 84 20 6 F 82 72 19 7 G 85 73 26 8 H 86 99 27 9 I 93 94 25 10 J 99 91 25 11 K 96 89 24 12 L 98 95 28
The ratings by Judge A and Judge B of the different wines were compared.
Wine A was disliked the most between the judges.
We model Judge A and Judge B’s rating as a function of alcohol content (%).
The linear regression equation used is as follows:
\[ \hat{y} = \beta_0 + \beta_1 \times \text{a%} \]
where:
\(\hat{y}\) is the predicted score for Judge A
\(\beta_0\) is the intercept
\(\beta_1\) is the coefficient for alcohol percentage
a% is the alcohol percentage of the wine
The goodness of fit for the model is represented by the \(R^2\) value.
The \(R^2\) equation used is as follows:
\[ R^2 = 1 - \frac{\sum (y_i - \hat{y_i})^2}{\sum (y_i - \bar{y})^2} \]
where: - \(y_i\) are the observed values
\(\hat{y_i}\) are the predicted values
\(\bar{y}\) is the mean of observed values
Code was used to quickly and easily create a summary of the results for the wine scoring.
modelA <- lm(judge.A ~ alc.per, data = wine) r2_A <- summary(modelA)$r.squared modelB <- lm(judge.B ~ alc.per, data = wine) r2_B <- summary(modelB)$r.squared
Call:
lm(formula = judge.A ~ alc.per, data = wine)
Residuals:
Min 1Q Median 3Q Max
-20.9850 -5.1661 0.7908 6.1994 17.2839
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.7781 10.9351 1.992 0.074425 .
alc.per 2.8414 0.5046 5.631 0.000218 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.4 on 10 degrees of freedom
Multiple R-squared: 0.7603, Adjusted R-squared: 0.7363
F-statistic: 31.71 on 1 and 10 DF, p-value: 0.000218
Call:
lm(formula = judge.B ~ alc.per, data = wine)
Residuals:
Min 1Q Median 3Q Max
-24.0986 -3.6196 0.0082 3.5315 23.8792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.5875 13.6471 2.461 0.03361 *
alc.per 2.3022 0.6297 3.656 0.00442 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14.23 on 10 degrees of freedom
Multiple R-squared: 0.572, Adjusted R-squared: 0.5292
F-statistic: 13.37 on 1 and 10 DF, p-value: 0.004419
Predicting the score from Judge A based on the alcohol percentage of the wine.
## `geom_smooth()` using formula = 'y ~ x'
Predicting the score from Judge B based on the alcohol percentage of the wine.
## `geom_smooth()` using formula = 'y ~ x'
For Judge A, \(R^2\)=0.76 when comparing alcohol percentage and rating.
For Judge B, \(R^2\)=0.57 when comparing alcohol percentage and rating.
Based on the summaries of the models comparing each of the judge’s rating and the alcohol percentage:
For Judge A, p=0.000218
For Judge B, p=0.004419
The p-values (p=0.000218 and p=0.004419 for Judge A and B, respectively) for both models are statistically significant (both p<0.05), confirming that alcohol percentage affects both judges’ scores.
However, the higher \(R^2\) (\(R^2\)=0.76 and \(R^2\)=0.57 for Judge A and B, respectively) for Judge A implies that alcohol content is a more influential factor in their ratings compared to Judge B’s.
Additionally, the variance around the linear model, described by the \(R^2\) values, indicate that there may be other factors that may affect the quality of a wine and therefore result in a higher or lower rating.