2025-03-06

Introduction to Hypothesis Testing

  • A method to test assumptions about a population parameter
  • Two types of hypotheses:
    • Null Hypothesis (H0): No effect or difference
    • Alternative Hypothesis (H1): Some effect or difference
  • When testing, you can test one-tailed or two-tailed.

In Application

  • One-tailed test example: Blood pressure medication efficacy.
    • \(H_0: \mu_1 = \mu_2\)
    • \(H_1: \mu_1 < \mu_2\)
    • To reject \(H_0\), we need p-value \(<\alpha\).
  • Two-tailed test example: Height comparison between sexes.
    • \(H_0: \mu_1 = \mu_2\)
    • \(H_1: \mu_1 \neq \mu_2\)
    • To reject \(H_0\), we need p-value \(< \frac{\alpha}{2}\).

Steps in Hypothesis Testing

  1. State the hypotheses
  2. Choose a significance level (α) such as 0.1, 0.05, or 0.01 depending on confidence levels
    • A common value is 0.05, meaning there’s a 5% chance of rejecting the null hypothesis when it’s actually true
  3. Select a test statistic
    • T-test when comparing means of 1-2 groups with normal, continuous data
    • Chi-square test for categorical data
    • ANOVA for comparison of means of 3 or more groups
  4. Compute the p-value
  5. Compare p-value with α and make a decision

Applying Hypothesis Testing to the Iris Dataset

  • Question: Do setosa and versicolor have different sepal lengths?
  • Hypotheses:
    • \(H_0: \mu_{setosa} = \mu_{versicolor}\)
    • \(H_1: \mu_{setosa} \neq \mu_{versicolor}\)

Visualizing the Iris Dataset

Based of this ggplot, let’s focus on setosa and versicolor

Applying Hypothesis Testing to the Iris Dataset

data(iris)
setosa_sepal <- iris$Sepal.Length[iris$Species == "setosa"]
versicolor_sepal <- iris$Sepal.Length[iris$Species == "versicolor"]
t.test(setosa_sepal, versicolor_sepal, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  setosa_sepal and versicolor_sepal
## t = -10.521, df = 86.538, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.1057074 -0.7542926
## sample estimates:
## mean of x mean of y 
##     5.006     5.936

Analyzing with Iris Dataset

  • Through the t-test we have our p-value < 2.2e-16
  • p-value < 2.2e-16 < \(\frac{\alpha}{2}\), given \(\alpha\) = .05
    • Thus we reject the \(H_0\) and can say with 95% confidence that setosa and versicolor have different sepal lengths.

What else could we possibly compare?

Conclusion

  • Hypothesis testing helps make data-driven decisions.
  • p-value determines statistical significance.
  • Visualization aids interpretation.
  • YOU GOT THIS!