2025-11-09

What is Hypothesis testing?

To understand what hypothesis testing is, it’s important first to understand what a hypothesis is.

A hypothesis or hypothesis statement seeks to explain why something has happened, or what might happen, under certain conditions. It can also be used to understand how different variables relate to each other. Hypotheses are often written as if-then statements; for example, “If this happens, then this will happen.”

Hypothesis testing, then, is a statistical means of testing an assumption stated in a hypothesis. While the specific methodology leveraged depends on the nature of the hypothesis and data available, hypothesis testing typically uses sample data to extrapolate insights about a larger population.

Source: https://online.hbs.edu/blog/post/hypothesis-testing

Hypothesis Testing Framework

Hypothesis testing helps us determine whether observed patterns in data are statistically significant. We can explore associations between hair color, eye color, and sex using the HairEyeColor dataset in R.

Let: \[ H_0: \text{Hair and Eye color are independent} \\ H_a: \text{Hair and Eye color are associated} \]

Our null hypothesis is that hair and eye color are independent, and our alternative hypothesis is that they are associated. We will use the chi-square test to evaluate independence between these categorical variables.

Dataset Overview

A subset of HairEyeColor dataset
Hair Eye Sex Freq
2 Brown Brown Male 53
20 Blond Brown Female 4
28 Blond Hazel Female 5
30 Brown Green Female 14
5 Black Blue Male 11
3 Red Brown Male 10

Hair vs Eye Color by Sex

(ggplot2)

Chi-Square Test: Hair vs Eye

Hypotheses: \[ H_0: \text{Hair and Eye color are independent} \\ H_a: \text{Hair and Eye color are associated} \]

\[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \]

\[ \chi^2 \text{= chi squared} \\ O_i \text{= observed value} \\ E_i \text{= expected value} \\ \]

Running Chi Square Test

    Pearson's Chi-squared test

data:  hair_eye_color
X-squared = 138.29, df = 9, p-value < 2.2e-16

Heatmap of Hair by Eye Color

(ggplot2)

Scatterplot of Hair by Eye Color

(plotly)

Conclusion

We introduced the concept of Hypothesis Testing and explored associations between hair color, eye color, and sex.

Additionally, we conducted a chi-square test to see if hair color and eye color are independent, or associated with each other.

Based on the observations and results of the chi-square test, we can conclude that hair color and eye color are highly likely to be associated with each other.