P-Value in Chemistry: Alcohol Content and Wine Quality

2025-06-08

What is p-value

A p-value helps us evaluate evidence against a null hypothesis.
It’s the probability of obtaining a result at least as extreme as the one observed, assuming null hypothesis (or \(H_0\)) is true.

\[ H_0: \mu_{\text{High}} = \mu_{\text{Low}} \\ H_A: \mu_{\text{High}} \ne \mu_{\text{Low}} \]

Significance and Thresholds

Set a significance level: \(\alpha = 0.05\)
Interpret the p-value: \[ \text{If } p < \alpha, \text{ reject } H_0 \]
Smaller p-values mean stronger evidence against the null hypothesis.

Dataset and Hypothesis

Data used is from the UCI Wine Quality Dataset (Red Wine), which includes:

Alcohol, pH, acidity, and more
Quality scored from 0 to 10

Hypothesis: Alcohol content differ significantly between low and high quality wines

Null hypothesis: Alcohol content does not differ between low and high quality wines

Alcohol vs Quality Group

Higher quality alcohol tend to have higher alcohol percentage.

Alcohol Distribution

A majority of low quality alcohol have less than 10% alcohol.
Alcohol percentage in high quality alcohol is distributed evenly between 9% to 12%.

Alcohol vs pH vs Quality

t-test on Alcohol Content

t.test(alcohol ~ quality_group, data = wine)

Comparison: Alcohol content between High and Low wine quality groups
Mean Alcohol Content:
- High Quality: 10.86%
- Low Quality: 9.93%
t-value: 19.78
p-value: < 2.2 × 10⁻¹⁶
Conclusion: Significant difference in alcohol content between the two quality groups (p < 0.05)
95% CI for difference in means: (0.84, 1.02)

Conclusion

p-value is 2.2e-16 or 0.00000000000000022
Since p < 0.05, we reject the null hypothesis.
There is strong evidence that alcohol content differs between low and high quality wines.
Higher percentage alcohol is associated with better wine quality.