Hypothesis testing is a statistical method used to test to decide whether or not data sufficiently supports its claim.
- Used widespread in many different fields.
11/1/2024
Hypothesis testing is a statistical method used to test to decide whether or not data sufficiently supports its claim.
Suppose we want to test if the average height in a population is 5.5 feet.
The formula for the t-test statistic is:
\[ t = \frac{\bar{X} - \mu}{s / \sqrt{n}} \]
where: - \(\bar{X}\) = sample mean - \(\mu\) = hypothesized mean - \(s\) = sample standard deviation - \(n\) = sample size
set.seed(123) sample_data <- rnorm(30, mean = 5.4, sd = 0.3) print(sample_data)
## [1] 5.231857 5.330947 5.867612 5.421153 5.438786 5.914519 5.538275 5.020482 ## [9] 5.193944 5.266301 5.767225 5.507944 5.520231 5.433205 5.233248 5.936074 ## [17] 5.549355 4.810015 5.610407 5.258163 5.079653 5.334608 5.092199 5.181333 ## [25] 5.212488 4.893992 5.651336 5.446012 5.058559 5.776144
We will now use a one-sample t-test to test if the mean of our sample data is significantly different from 5.5.
t_test_result <- t.test(sample_data, mu = 5.5) t_test_result
## ## One Sample t-test ## ## data: sample_data ## t = -2.124, df = 29, p-value = 0.04232 ## alternative hypothesis: true mean is not equal to 5.5 ## 95 percent confidence interval: ## 5.275972 5.495766 ## sample estimates: ## mean of x ## 5.385869
library(ggplot2) ggplot(data.frame(sample_data), aes(x = sample_data)) + geom_histogram(color = "black", fill = "skyblue", bins = 10) + labs(title = "Histogram of Sample Data", x = "Height (feet)", y = "Frequency")
In hypothesis testing, the p-value helps us decide whether to reject the Null Hypothesis.
The p-value in our t-test result is 0.042.
Since it is less than \(\alpha\) we can reject the Null Hypothesis showing that there is some significance
This plot shows a theoretical distribution of p-values for t-statistics.
suppressPackageStartupMessages(library(plotly)) x <- seq(-4, 4, length.out = 100) y <- dnorm(x) z <- outer(x, y, FUN = "*") plot_ly(x = ~x, y = ~y, z = ~z) %>% add_surface() %>% layout(title = "3D Visualization of p-Value Distribution")