01/29/2024

Introduction to Hypothesis Testing

  • Hypothesis Testing: A Fundamental Concept in Statistics
  • Making Inferences About Populations Based on Sample Data
  • Exploring the Key Concepts and Procedures

Key Concepts in Hypothesis Testing

  • Significance level (α)
  • Test statistics
  • P-value interpretation

Null and Alternative Hypotheses

  • Formulation of Null and Alternative Hypotheses
    • Null Hypothesis (H0): Represents the default assumption. It states that there is no significant difference or effect.
    • Alternative Hypothesis (Ha): Represents what we want to find evidence for. It states that there is a significant difference or effect.

One-tailed vs. Two-tailed Tests

library(ggplot2)
library(patchwork)

# Create data for illustration
x <- seq(-3, 3, length.out = 1000)
y <- dnorm(x)

# Data for shaded areas
one_tail <- subset(data.frame(x, y), x >= 1.645) 
two_tail <- subset(data.frame(x, y), abs(x) >= 1.96)  

# Plot for one-tailed test
one_tailed_plot <- ggplot() +
  geom_line(data = data.frame(x, y), aes(x, y), color = "blue") +
  geom_area(data = one_tail, aes(x, y), fill = "skyblue") +
  annotate("text", x = 2, y = 0.15, label = "Rejection Region", 
           size = 2, color = "blue") +
  labs(title = "One-Tailed Test", x = "Z-score", y = "Density") +
  theme_minimal() 

# Plot for two-tailed test
two_tailed_plot <- ggplot() +
  geom_line(data = data.frame(x, y), aes(x, y), color = "violetred3") +
  geom_area(data = two_tail, aes(x, y), fill = "violetred1") +
  annotate("text", x = -2.5, y = 0.15, label = "Rejection Regions", 
           size = 2, color = "violetred3") +
  labs(title = "Two-Tailed Test", x = "Z-score", y = "Density") +
  theme_minimal() 

# Combine plots side by side
combined_plots <- one_tailed_plot + two_tailed_plot

# Print the combined plots
print(combined_plots)

Example Hypothesis Testing Scenario

library(ggplot2)
# Simulated data for weight loss (hypothetical)
weight_loss <- c(2.1, 1.8, 2.5, 1.5, 1.9, 2.2, 2.0, 1.7, 2.3, 1.6)
# Plotting the histogram of weight loss
ggplot(data.frame(weight_loss), aes(x = weight_loss)) +
  geom_histogram(binwidth = 0.2, fill = "palevioletred4", color = "black", 
                 alpha = 0.8) + labs(title = "Histogram of Weight Loss",
       x = "Weight Loss (lbs)",
       y = "Frequency") +
  theme_minimal()

Calculation and Interpretation

  • Test Statistic: Measures difference between data and null hypothesis.
  • p-value: Probability of observing data if null hypothesis true.
  • Decision: Compare p-value to significance level (α).
    • If p-value < α, reject null hypothesis.
    • Otherwise, fail to reject it.

Visualization in Hypothesis Testing (ggplot)

  • Visualize test statistic distribution.
library(ggplot2)

# Simulated data for test statistic distribution
test_statistic <- rnorm(1000)

# Create a histogram to visualize the distribution
ggex1 <- ggplot(data.frame(test_statistic), aes(x = test_statistic)) +
  geom_histogram(binwidth = 0.5, fill = "pink2", color = "black", alpha = 0.8) + labs(title = "Distribution of Test Statistic",
       x = "Test Statistic",
       y = "Frequency") + 
  theme_minimal()

ggex1

Visualization in Hypothesis Testing (plotly) - 3D Scatter Plot

  • Interactive Visualization of p-value and Significance Level

Conclusion

  • Hypothesis testing -> essential for making inferences about populations based on sample data
  • Formulating clear null and alternative hypotheses guides hypothesis testing
  • Significance level (α) and p-values inform decision-making in hypothesis testing.
  • Test statistics help in deciding whether to reject or fail to reject the null hypothesis.
  • Hypothesis testing contributes to evidence-based decision-making, scientific research, quality control, policy-making, and risk assessment.

Mathematical Formulation (1): Test Statistic

  • Understanding the Formula for the Test Statistic

In hypothesis testing, the test statistic measures evidence against the null hypothesis. It’s calculated as the difference between sample mean and population mean, divided by standard error.

\[ \text{Test Statistic} = \frac{\text{Sample Mean} - \text{Population Mean}}{\text{Standard Error}} \]

Mathematical Formulation (2): P-Value Calculation

  • Understanding the Formula for P-Value Calculation

In hypothesis testing, the p-value assesses evidence against the null hypothesis. Its formula depends on the specific test.

\[ \text{P-Value} = \text{Dependent on the Test Being Performed} \]

R Code Example: Creating a ggplot

  • Showcase R code for creating a ggplot visualization
# Load required packages
library(ggplot2)

# Simulated data for test statistic distribution
test_statistic <- rnorm(1000)

# Create a histogram to visualize the distribution
ggex2 <- ggplot(data.frame(test_statistic), aes(x = test_statistic)) +
  geom_histogram(binwidth = 0.5, fill = "lightpink", color = "black", alpha = 0.8) +
  labs(title = "Distribution of Test Statistic",
       x = "Test Statistic",
       y = "Frequency") +
  theme_minimal()

ggex2