2026-02-08

Hypothesis Testing Overview

Hypothesis testing is used to make decisions using sample data.

We test whether a population mean equals a claimed value.

Example: Is the average score equal to 70?

Hypotheses

\[H_0: \mu = 70\]

\[H_1: \mu \ne 70\]

Null = no difference
Alternative = difference exists

Test Statistic

\[ t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}} \]

Large absolute t means evidence against H0.

Generate Example Data

scores <- rnorm(40, mean=74, sd=10)
scores
##  [1] 67.73546 75.83643 65.64371 89.95281 77.29508 65.79532 78.87429 81.38325
##  [9] 79.75781 70.94612 89.11781 77.89843 67.78759 51.85300 85.24931 73.55066
## [17] 73.83810 83.43836 82.21221 79.93901 83.18977 81.82136 74.74565 54.10648
## [25] 80.19826 73.43871 72.44204 59.29248 69.21850 78.17942 87.58680 72.97212
## [33] 77.87672 73.46195 60.22940 69.85005 70.05710 73.40687 85.00025 81.63176

ggplot Histogram

ggplot(data.frame(scores), aes(scores)) +
  geom_histogram(bins=10)

ggplot Boxplot

ggplot(data.frame(scores), aes(y=scores)) +
  geom_boxplot()

Plotly Interactive Plot

x <- seq(40,100,length=200)
plot_ly(x=x, y=dnorm(x,70,10), type="scatter", mode="lines")

Hypothesis Test in R

t.test(scores, mu=70)
## 
##  One Sample t-test
## 
## data:  scores
## t = 3.5096, df = 39, p-value = 0.001149
## alternative hypothesis: true mean is not equal to 70
## 95 percent confidence interval:
##  72.08456 77.75596
## sample estimates:
## mean of x 
##  74.92026

Code Example

mean(scores)
## [1] 74.92026
sd(scores)
## [1] 8.86667
length(scores)
## [1] 40

p-value Rule

If p < 0.05 → reject H0
If p ≥ 0.05 → fail to reject H0

Applications

Used in medicine, engineering, finance, and data science.

End

Steps:

state hypotheses
compute statistic
get p-value
compare
decide