Basics of Hypothesis Testing

11/1/2024

Slide 1: What is Hypothesis Testing?

Hypothesis testing is a statistical method used to test to decide whether or not data sufficiently supports its claim.

Used widespread in many different fields.

Slide 2: Key Terms in Hypothesis Testing

Null Hypothesis (\(H_0\)): The claim that there is no real statistical significance exists
Alternative Hypothesis (\(H_1\)): The claim that there is some significance
Significance Level (\(\alpha\)): The probability threshold for rejecting \(H_0\), often set at 0.05.
p-value: The probability of observing the data, assuming \(H_0\) is true.

Slide 3: Hypothesis Testing Steps

State \(H_0\) and \(H_1\).
Choose \(\alpha\) (commonly 0.05).
Collect data and calculate the test statistic.
Compute the p-value.
Decide to reject or fail to reject \(H_0\) based on the p-value.

Slide 4: Example - One-Sample t-Test (Math)

Suppose we want to test if the average height in a population is 5.5 feet.

Null Hypothesis (\(H_0\)): \(\mu = 5.5\)
Alternative Hypothesis (\(H_1\)): \(\mu \neq 5.5\)
We’ll use a sample to perform a t-test.

The formula for the t-test statistic is:

\[ t = \frac{\bar{X} - \mu}{s / \sqrt{n}} \]

where: - \(\bar{X}\) = sample mean - \(\mu\) = hypothesized mean - \(s\) = sample standard deviation - \(n\) = sample size

Slide 5: Generating a Sample Dataset

set.seed(123)
sample_data <- rnorm(30, mean = 5.4, sd = 0.3)
print(sample_data)

##  [1] 5.231857 5.330947 5.867612 5.421153 5.438786 5.914519 5.538275 5.020482
##  [9] 5.193944 5.266301 5.767225 5.507944 5.520231 5.433205 5.233248 5.936074
## [17] 5.549355 4.810015 5.610407 5.258163 5.079653 5.334608 5.092199 5.181333
## [25] 5.212488 4.893992 5.651336 5.446012 5.058559 5.776144

Slide 6: Conducting the One-Sample t-Test

We will now use a one-sample t-test to test if the mean of our sample data is significantly different from 5.5.

t_test_result <- t.test(sample_data, mu = 5.5)
t_test_result

## 
##  One Sample t-test
## 
## data:  sample_data
## t = -2.124, df = 29, p-value = 0.04232
## alternative hypothesis: true mean is not equal to 5.5
## 95 percent confidence interval:
##  5.275972 5.495766
## sample estimates:
## mean of x 
##  5.385869

Slide 7: Visualizing the Sample Data with ggplot

library(ggplot2)
ggplot(data.frame(sample_data), aes(x = sample_data)) +
  geom_histogram(color = "black", fill = "skyblue", bins = 10) +
  labs(title = "Histogram of Sample Data", x = "Height (feet)", y = "Frequency")

Slide 8: p-value interpretation (Math)

In hypothesis testing, the p-value helps us decide whether to reject the Null Hypothesis.

If the p-value \(< \alpha\), reject the null hypothesis.
Otherwise, fail to reject the null hypothesis.

The p-value in our t-test result is 0.042.

Since it is less than \(\alpha\) we can reject the Null Hypothesis showing that there is some significance

Slide 9: 3D Visualization of p-Value Distribution

This plot shows a theoretical distribution of p-values for t-statistics.

suppressPackageStartupMessages(library(plotly))
x <- seq(-4, 4, length.out = 100)
y <- dnorm(x)
z <- outer(x, y, FUN = "*")

plot_ly(x = ~x, y = ~y, z = ~z) %>%
  add_surface() %>%
  layout(title = "3D Visualization of p-Value Distribution")