Hypothesis Testing

01/29/2024

Introduction to Hypothesis Testing

Hypothesis Testing: A Fundamental Concept in Statistics
Making Inferences About Populations Based on Sample Data
Exploring the Key Concepts and Procedures

Key Concepts in Hypothesis Testing

Significance level (α)
Test statistics
P-value interpretation

Null and Alternative Hypotheses

Formulation of Null and Alternative Hypotheses
- Null Hypothesis (H0): Represents the default assumption. It states that there is no significant difference or effect.
- Alternative Hypothesis (Ha): Represents what we want to find evidence for. It states that there is a significant difference or effect.

One-tailed vs. Two-tailed Tests

library(ggplot2)
library(patchwork)

# Create data for illustration
x <- seq(-3, 3, length.out = 1000)
y <- dnorm(x)

# Data for shaded areas
one_tail <- subset(data.frame(x, y), x >= 1.645) 
two_tail <- subset(data.frame(x, y), abs(x) >= 1.96)  

# Plot for one-tailed test
one_tailed_plot <- ggplot() +
  geom_line(data = data.frame(x, y), aes(x, y), color = "blue") +
  geom_area(data = one_tail, aes(x, y), fill = "skyblue") +
  annotate("text", x = 2, y = 0.15, label = "Rejection Region", 
           size = 2, color = "blue") +
  labs(title = "One-Tailed Test", x = "Z-score", y = "Density") +
  theme_minimal()

# Plot for two-tailed test
two_tailed_plot <- ggplot() +
  geom_line(data = data.frame(x, y), aes(x, y), color = "violetred3") +
  geom_area(data = two_tail, aes(x, y), fill = "violetred1") +
  annotate("text", x = -2.5, y = 0.15, label = "Rejection Regions", 
           size = 2, color = "violetred3") +
  labs(title = "Two-Tailed Test", x = "Z-score", y = "Density") +
  theme_minimal() 

# Combine plots side by side
combined_plots <- one_tailed_plot + two_tailed_plot

# Print the combined plots
print(combined_plots)

Example Hypothesis Testing Scenario

library(ggplot2)
# Simulated data for weight loss (hypothetical)
weight_loss <- c(2.1, 1.8, 2.5, 1.5, 1.9, 2.2, 2.0, 1.7, 2.3, 1.6)
# Plotting the histogram of weight loss
ggplot(data.frame(weight_loss), aes(x = weight_loss)) +
  geom_histogram(binwidth = 0.2, fill = "palevioletred4", color = "black", 
                 alpha = 0.8) + labs(title = "Histogram of Weight Loss",
       x = "Weight Loss (lbs)",
       y = "Frequency") +
  theme_minimal()

Calculation and Interpretation

Test Statistic: Measures difference between data and null hypothesis.
p-value: Probability of observing data if null hypothesis true.
Decision: Compare p-value to significance level (α).
- If p-value < α, reject null hypothesis.
- Otherwise, fail to reject it.

Visualization in Hypothesis Testing (ggplot)

Visualize test statistic distribution.

library(ggplot2)

# Simulated data for test statistic distribution
test_statistic <- rnorm(1000)

# Create a histogram to visualize the distribution
ggex1 <- ggplot(data.frame(test_statistic), aes(x = test_statistic)) +
  geom_histogram(binwidth = 0.5, fill = "pink2", color = "black", alpha = 0.8) + labs(title = "Distribution of Test Statistic",
       x = "Test Statistic",
       y = "Frequency") + 
  theme_minimal()

ggex1

Visualization in Hypothesis Testing (plotly) - 3D Scatter Plot

Interactive Visualization of p-value and Significance Level

Conclusion

Hypothesis testing -> essential for making inferences about populations based on sample data
Formulating clear null and alternative hypotheses guides hypothesis testing
Significance level (α) and p-values inform decision-making in hypothesis testing.
Test statistics help in deciding whether to reject or fail to reject the null hypothesis.
Hypothesis testing contributes to evidence-based decision-making, scientific research, quality control, policy-making, and risk assessment.

Mathematical Formulation (1): Test Statistic

Understanding the Formula for the Test Statistic

In hypothesis testing, the test statistic measures evidence against the null hypothesis. It’s calculated as the difference between sample mean and population mean, divided by standard error.

\[ \text{Test Statistic} = \frac{\text{Sample Mean} - \text{Population Mean}}{\text{Standard Error}} \]

Mathematical Formulation (2): P-Value Calculation

Understanding the Formula for P-Value Calculation

In hypothesis testing, the p-value assesses evidence against the null hypothesis. Its formula depends on the specific test.

\[ \text{P-Value} = \text{Dependent on the Test Being Performed} \]

R Code Example: Creating a ggplot

Showcase R code for creating a ggplot visualization

# Load required packages
library(ggplot2)

# Simulated data for test statistic distribution
test_statistic <- rnorm(1000)

# Create a histogram to visualize the distribution
ggex2 <- ggplot(data.frame(test_statistic), aes(x = test_statistic)) +
  geom_histogram(binwidth = 0.5, fill = "lightpink", color = "black", alpha = 0.8) +
  labs(title = "Distribution of Test Statistic",
       x = "Test Statistic",
       y = "Frequency") +
  theme_minimal()

ggex2