2025-09-17


Hypothesis Testing

Fundamental test in statistics, with two outcomes: - reject the null hypothesis - fail to reject the null hypothesis

Hypothesis testing is used to literally “test a hypothesis” someone may have about the validity of a statistic and seeing if that hypothesis may or may not be true.


Null and Alternative Hypothesis

  • They are opposites (only one can be true)
  • The Alternative Hypothesis \((H_1)\) is claim that is tested
  • Null Hypothesis \((H_0)\) is what is assumed (default)
  • Test are created to try and prove in favor of \(H_1\) disproving \(H_0\)

\[\text{(Null Hypothesis)} \hspace{5mm} H_0 : \mu = \mu_0 \\ \text{(Alternative Hypothesis)} \hspace{5mm} H_1 : \mu \neq \mu_0 \]


Type I vs Type II Errors

  • A Type I Error is can be more easily referred to as a false positive: this is when the Null Hypothesis \((H_0)\) is wrongly rejected
  • A Type II Error or a false negative: happens when the Null Hypothesis \((H_0)\) is incorrectly NOT rejected

\[\text{(Type I Error)} \hspace{5mm} \alpha = P(\text{Reject } H_0 | H_0 \text{ is true}) \\ \text{(Type II Error)} \hspace{5mm} \beta = P(\text{Fail to reject } H_0 | H_0 \text{ is false}) \]


R Code

  • Below is the code for the Normal Distribution graph on the next slide
  • As you can see the graph goes from -4 to 4 and the shaded areas are from -4 to -1.96 on the left side and from 1.96 to 4 on the right side
  • Also the shaded areas are done with darkgray and the distribution line is colored red
df <- data.frame(x = c(-4, 4))
# Normal Curve
ggplot(df, aes(x = x)) +
stat_function(fun = dnorm, args = list(mean = 0, sd = 1), color = "red") +
  
# Left Rejection
stat_function(fun = dnorm, args = list(mean = 0, sd = 1), xlim = c(-4, -1.96), 
geom = "area", fill = "darkgray", alpha = 0.5) +
  
#Right Rejection
stat_function(fun = dnorm, args = list(mean = 0, sd = 1), xlim = c(1.96, 4),  
geom = "area", fill = "darkgray", alpha = 0.5) 


Normal Distribution

  • Below is a Normal Distribution Curve
  • Used to observe essentially, how accurate your hypothesis may or may not be
  • By examing the “critical regions”, we shade these areas to show that this is where the hypothesis is rejected and as you can see in this specific graph, the hypothesis is mostly accepted.


Plotly plot

  • In this interactive version you can hover over the points on the plot in order to visualize the p-value
  • The vertical line represents the test statistic. This is the hypothesized point that is either rejected or not rejected.


Sampling Distribution

  • Below is a histogram showing a distribution of ages
  • It shows the possible values of a test statistic if \(H_0\) is true
  • Centered at the null value we can see that most of the of the ages or our observed test statistic is in the center thus, we fail to reject the hypothesis


Conclusion

  • Hypothesis Testing is a critical part of statistics
  • Interestingly we never accept a hypothesis test, we just fail to reject or reject
  • By understanding these concepts we can test the likelyhood that a hypothesis is true or false.
  • Understanding these slides will ensure a base knowledge about different topics and graphs related to hypothesis testing