2025-06-08

What is p-value

  • A p-value helps us evaluate evidence against a null hypothesis.
  • It’s the probability of obtaining a result at least as extreme as the one observed, assuming null hypothesis (or \(H_0\)) is true.

\[ H_0: \mu_{\text{High}} = \mu_{\text{Low}} \\ H_A: \mu_{\text{High}} \ne \mu_{\text{Low}} \]

Significance and Thresholds

  • Set a significance level: \(\alpha = 0.05\)

  • Interpret the p-value: \[ \text{If } p < \alpha, \text{ reject } H_0 \]

  • Smaller p-values mean stronger evidence against the null hypothesis.

Dataset and Hypothesis

Data used is from the UCI Wine Quality Dataset (Red Wine), which includes:

  • Alcohol, pH, acidity, and more
  • Quality scored from 0 to 10

Hypothesis: Alcohol content differ significantly between low and high quality wines

Null hypothesis: Alcohol content does not differ between low and high quality wines

Alcohol vs Quality Group

  • Higher quality alcohol tend to have higher alcohol percentage.

Alcohol Distribution

  • A majority of low quality alcohol have less than 10% alcohol.
  • Alcohol percentage in high quality alcohol is distributed evenly between 9% to 12%.

Alcohol vs pH vs Quality

t-test on Alcohol Content

t.test(alcohol ~ quality_group, data = wine)
  • Comparison: Alcohol content between High and Low wine quality groups
  • Mean Alcohol Content:
    • High Quality: 10.86%
    • Low Quality: 9.93%
  • t-value: 19.78
  • p-value: < 2.2 × 10⁻¹⁶
  • Conclusion: Significant difference in alcohol content between the two quality groups (p < 0.05)
  • 95% CI for difference in means: (0.84, 1.02)

Conclusion

  • p-value is 2.2e-16 or 0.00000000000000022
  • Since p < 0.05, we reject the null hypothesis.
  • There is strong evidence that alcohol content differs between low and high quality wines.
  • Higher percentage alcohol is associated with better wine quality.