Recap of Parametric Hypothesis Tests & Why Non-Parametric Tests?

1. Introduction

Statistical hypothesis testing is a fundamental concept in data analysis that helps us make informed decisions based on sample data. Traditionally, hypothesis tests are classified into two broad categories:

  1. Parametric Tests: Assume a specific distribution for the data (e.g., normality).
  2. Non-Parametric Tests: Do not assume any particular distribution.

In this document, we will: - Recap key parametric tests used in hypothesis testing. - Understand their limitations and when they may not be appropriate. - Learn why non-parametric tests are valuable and when to use them.


2. Recap of Parametric Hypothesis Tests

Definition

A parametric test is a statistical test that assumes the data follows a specific probability distribution (e.g., normal distribution). These tests estimate parameters (such as the mean, variance) and rely on assumptions about the population.

Examples of Parametric Tests

Here are some commonly used parametric tests:

Test Purpose Example Use Case
Z-Test Tests whether the mean of a sample differs from a known population mean when population variance is known. Checking if the average IQ of students in a school is 100.
t-Test Compares means between one or two samples when population variance is unknown. Comparing exam scores between students who took online vs. offline classes.
ANOVA (F-Test) Tests for differences among more than two group means. Comparing salaries across different industries.
Pearson Correlation Measures linear association between two variables. Checking the relationship between height and weight.
Linear Regression Models the relationship between one or more predictors and an outcome. Predicting house prices based on size, location, etc.

Mathematical Formulation of a Parametric Test

Consider a one-sample t-test, which tests if the mean of a sample differs from a hypothesized mean \(\mu_0\):

  • Null Hypothesis (\(H_0\)): \(\mu = \mu_0\)
  • Alternative Hypothesis (\(H_A\)): \(\mu \neq \mu_0\)

The test statistic is:

\[ t = \frac{\bar{X} - \mu_0}{\frac{S}{\sqrt{n}}} \]

where: - \(\bar{X}\) = Sample mean - \(S\) = Sample standard deviation - \(n\) = Sample size

Under \(H_0\), the test statistic follows a t-distribution with \(n-1\) degrees of freedom.


3. Limitations of Parametric Tests

Despite their usefulness, parametric tests come with strict assumptions:

  1. Normality Assumption: The data must follow a normal distribution.
  2. Equal Variance (Homogeneity of Variance): Variability across groups should be equal.
  3. Independent Observations: Each observation should be independent of others.
  4. Interval or Ratio Data: Parametric tests require numerical data (not ordinal).

What Happens When These Assumptions Are Violated?

  • If data are skewed or contain outliers, parametric tests may give misleading results.
  • If variances are unequal, tests like ANOVA may become invalid.
  • If the sample size is small, the Central Limit Theorem (CLT) does not apply, making normality assumptions unreliable.

This is where non-parametric tests come to the rescue!


4. Why Non-Parametric Tests?

Definition

A non-parametric test does not assume a specific distribution for the data. Instead, it relies on ranks or medians, making it robust to non-normality, small samples, and outliers.

Advantages of Non-Parametric Tests

No Normality Assumption: Works well with skewed data or ordinal data.
Handles Outliers: Since it is based on ranks, extreme values do not distort results.
Small Sample Friendly: Does not require large sample sizes to be reliable.
Works with Ordinal Data: Useful for surveys, customer ratings, or Likert scales (1-5).

Examples of Non-Parametric Tests

Non-Parametric Test Alternative To Purpose
Wilcoxon Signed-Rank One-sample t-test Tests whether the median of a single sample differs from a known value.
Mann-Whitney U Test Independent t-test Compares two independent groups when normality is violated.
Kruskal-Wallis Test One-way ANOVA Compares three or more groups when normality is violated.
Spearman’s Rank Correlation Pearson Correlation Measures monotonic relationships (not necessarily linear).
Friedman Test Repeated Measures ANOVA Compares multiple paired samples when normality is violated.
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
library(ggplot2)
library(dplyr)
library(knitr)
library(car)
library(ggpubr)
# Load the dataset
data("iris")

# Display the first few rows
kable(head(iris), caption = "Sample of Iris Dataset")
Sample of Iris Dataset
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
# Load dataset
data("mtcars")

# Convert 'am' to a factor for readability
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))

# Display first few rows
kable(head(mtcars), caption = "First Six Rows of mtcars Dataset")
First Six Rows of mtcars Dataset
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 Manual 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 Manual 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 Manual 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 Automatic 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 Automatic 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 Automatic 3 1
# Load the dataset
data("faithful")

# Display the first few rows
head(faithful)
##   eruptions waiting
## 1     3.600      79
## 2     1.800      54
## 3     3.333      74
## 4     2.283      62
## 5     4.533      85
## 6     2.883      55
ggplot(iris, aes(x = Petal.Length, fill = Species)) +
  geom_histogram(bins = 15, alpha = 0.6, position = "identity") +
  facet_wrap(~Species) +
  labs(title = "Histogram of Petal Length by Species", x = "Petal Length", y = "Count") +
  theme_minimal()

ggplot(mtcars, aes(x = mpg, fill = am)) +
  geom_histogram(bins = 10, alpha = 0.6, position = "identity") +
  facet_wrap(~am) +
  labs(title = "Histogram of MPG for Automatic vs. Manual Cars",
       x = "Miles Per Gallon (MPG)", y = "Count") +
  theme_minimal()

Automatic cars show a nearly normal distribution.

Manual cars exhibit right skewness, which may violate the assumption of normality.

# Histogram of waiting times
ggplot(faithful, aes(x = waiting)) +
  geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Waiting Times Between Eruptions",
       x = "Waiting Time (minutes)",
       y = "Frequency") +
  theme_minimal()

ggqqplot(iris, x = "Petal.Length", facet.by = "Species", color = "Species") +
  labs(title = "Q-Q Plot of Petal Length by Species")

ggqqplot(mtcars, x = "mpg", facet.by = "am", color = "am") +
  labs(title = "Q-Q Plot of MPG by Transmission Type")

# Q-Q plot
ggplot(faithful, aes(sample = waiting)) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Waiting Times",
       x = "Theoretical Quantiles",
       y = "Sample Quantiles") +
  theme_minimal()

# Apply Shapiro-Wilk test separately for each species
shapiro_results <- iris %>%
  group_by(Species) %>%
  summarise(p_value = shapiro.test(Petal.Length)$p.value)

print(shapiro_results, caption = "Shapiro-Wilk Test for Normality")
## # A tibble: 3 × 2
##   Species    p_value
##   <fct>        <dbl>
## 1 setosa      0.0548
## 2 versicolor  0.158 
## 3 virginica   0.110

Is the Normality Assumption Valid Based on the Shapiro-Wilk Test Results?

The Shapiro-Wilk test checks whether data follow a normal distribution. The null hypothesis (H0) assumes normality, and we reject H0 if the p-value is less than 0.05.

Interpreting the Shapiro-Wilk Test Results:

Species p-value Decision (α = 0.05) Interpretation
Setosa 0.0548 Fail to Reject H0 Data appear to follow a normal distribution.
Versicolor 0.1585 Fail to Reject H0 Data appear to follow a normal distribution.
Virginica 0.1098 Fail to Reject H0 Data appear to follow a normal distribution.

Since all p-values are greater than 0.05, we fail to reject the null hypothesis for all three species. This suggests that the petal length data are not significantly different from a normal distribution.

Final Verdict: Is the Normality Assumption OK?

✅ Yes! The normality assumption holds for all species.

Since all species pass the Shapiro-Wilk test, using parametric tests like ANOVA is appropriate.

However, for small samples (n < 30), normality tests can be less reliable. Always combine statistical tests with visual methods (e.g., Q-Q plots, histograms) before making final conclusions.

Next Steps

  • 📌 If ANOVA is applied, confirm homogeneity of variance (e.g., using Levene’s test).
  • 📌 If normality was borderline (p ~ 0.05), consider checking histograms and Q-Q plots.
  • 📌 If data were not normally distributed, use the Kruskal-Wallis test instead.

🔍 Final Answer: Normality assumption is OK. Parametric tests are justified. 🚀

# Apply Shapiro-Wilk test separately for each transmission type
shapiro_results <- mtcars %>%
  group_by(am) %>%
  summarise(p_value = shapiro.test(mpg)$p.value)

print(shapiro_results, caption = "Shapiro-Wilk Test for Normality")
## # A tibble: 2 × 2
##   am        p_value
##   <fct>       <dbl>
## 1 Automatic   0.899
## 2 Manual      0.536
# Shapiro-Wilk test
shapiro_test <- shapiro.test(faithful$waiting)
shapiro_test
## 
##  Shapiro-Wilk normality test
## 
## data:  faithful$waiting
## W = 0.92215, p-value = 1.015e-10

Conclusion Based on Shapiro-Wilk Normality Test

The Shapiro-Wilk test assesses whether the waiting times in the faithful dataset follow a normal distribution.

  • H0: Data are normally distributed.
  • HA: Data are not normally distributed.

Test Results:

  • Test Statistic: W = 0.92215
  • p-value: 1.015 × 10-10

Interpretation:

Since the p-value is less than 0.05, we reject H0.

This means the waiting time data are not normally distributed. The normality assumption is violated, making parametric tests inappropriate.

Final Decision:

❌ Parametric tests (e.g., t-test, ANOVA) should not be used.

✅ Non-parametric tests (e.g., Wilcoxon rank-sum test, Kruskal-Wallis test) should be applied instead.

Example: Student Exam Scores

# Simulated exam scores
set.seed(42)
exam_scores <- rnorm(30, mean = 75, sd = 10)

# Perform One-Sample t-Test
t_test_one_sample <- t.test(exam_scores, mu = 75)

# Print test result
t_test_one_sample
## 
##  One Sample t-test
## 
## data:  exam_scores
## t = 0.29933, df = 29, p-value = 0.7668
## alternative hypothesis: true mean is not equal to 75
## 95 percent confidence interval:
##  70.99952 80.37222
## sample estimates:
## mean of x 
##  75.68587

Interpretation

If the p-value is less than 0.05, we reject H0:

H0 → The mean exam score is significantly different from 75.

If the p-value is greater than or equal to 0.05, we fail to reject H0:

H0 → There is no statistically significant difference between the mean exam score and 75.

1.2 Two-Sample t-Test (Comparing Two Independent Groups)

Hypothesis

  • H0: μ1 = μ2
  • HA: μ1μ2

Example: Traditional vs. Online Teaching Methods

# Simulated teaching method data
traditional <- rnorm(15, mean = 75, sd = 10)
online <- rnorm(15, mean = 78, sd = 9)

# Perform Two-Sample t-Test
t_test_two_sample <- t.test(traditional, online, var.equal = TRUE)

# Print test result
t_test_two_sample
## 
##  Two Sample t-test
## 
## data:  traditional and online
## t = -2.0235, df = 28, p-value = 0.05266
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -14.69936545   0.08973592
## sample estimates:
## mean of x mean of y 
##  71.57939  78.88420

1.3 Paired t-Test (Before-After Comparison in a Single Group)

Hypothesis

  • H0: μD = 0
  • HA: μD ≠ 0

Example: Weight Reduction Before & After a Diet

# Simulated before-after weight data
before_diet <- rnorm(20, mean = 80, sd = 5)
after_diet  <- before_diet - rnorm(20, mean = 2, sd = 1)  # Expected weight loss

# Perform Paired t-Test
t_test_paired <- t.test(before_diet, after_diet, paired = TRUE)

# Print test result
t_test_paired
## 
##  Paired t-test
## 
## data:  before_diet and after_diet
## t = 10.142, df = 19, p-value = 4.188e-09
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  1.652319 2.511593
## sample estimates:
## mean difference 
##        2.081956

1.4 One-Way ANOVA (Comparing Means Across Multiple Groups)

Hypothesis

  • H0: μ1 = μ2 = μ3 (All group means are equal)
  • HA: At least one mean is different

Example: Exam Scores for Different Teaching Methods

# Simulated exam scores for 3 groups
set.seed(42)
method1 <- rnorm(15, mean = 70, sd = 10)
method2 <- rnorm(15, mean = 75, sd = 9)
method3 <- rnorm(15, mean = 78, sd = 8)

# Create data frame
df_anova <- data.frame(
  Method = rep(c("Traditional", "Online", "Hybrid"), each = 15),
  Score = c(method1, method2, method3)
)

# Perform ANOVA
anova_test <- aov(Score ~ Method, data = df_anova)

# Print summary
summary(anova_test)
##             Df Sum Sq Mean Sq F value Pr(>F)
## Method       2    102   51.11   0.482  0.621
## Residuals   42   4450  105.95

Interpretation

If the p-value is less than 0.05, we reject H0:

H0 → The mean exam scores are equal across all teaching methods. (Or, more formally: All group means are equal.)

We conclude that there is a statistically significant difference in mean exam scores between at least one pair of teaching methods. (It’s important to note that this doesn’t tell us which* teaching methods are different, just that at least one pair is.)*

3. Parametric vs. Non-Parametric: A Comparison

Test Type Parametric Test Non-Parametric Equivalent
One Sample Test One-Sample t-Test Wilcoxon Signed-Rank Test
Two Sample Test Independent t-Test Mann-Whitney U Test
Paired Sample Test Paired t-Test Wilcoxon Signed-Rank Test
Multiple Groups One-Way ANOVA Kruskal-Wallis Test
Association Pearson Correlation Spearman Rank Correlation

4. Hands-On Exercise: Which Test to Use?

Scenario

A researcher collects data from 3 groups of students who used different study techniques and recorded their final exam scores.

Task for Students

  1. Determine whether the data are normally distributed.
  2. If the data are normally distributed, perform a One-Way ANOVA.
  3. If the data are not normally distributed, perform the Kruskal-Wallis test.

Starter Code

# Simulated data
technique1 <- rnorm(12, mean = 70, sd = 10)
technique2 <- rnorm(12, mean = 75, sd = 12)
technique3 <- rnorm(12, mean = 78, sd = 8)

df_exercise <- data.frame(
  Technique = rep(c("Flashcards", "Practice Tests", "Summarization"), each = 12),
  Score = c(technique1, technique2, technique3)
)

# Check normality
shapiro.test(df_exercise$Score)
## 
##  Shapiro-Wilk normality test
## 
## data:  df_exercise$Score
## W = 0.91258, p-value = 0.00767

6. Conclusion


7. References

This document provides a detailed recap of parametric hypothesis tests and explains why non-parametric alternatives are important. 🚀

Early Foundations: Pre-20th Century

The roots of nonparametric statistics lie in early probability theory and empirical methods. Some of the key developments before the formalization of statistical hypothesis testing include:

  • Pierre-Simon Laplace (1774): Introduced early ideas on probability distributions, which influenced later statistical methods [@laplace1774memoire].
  • Karl Pearson (1900): Developed the Chi-Square test for categorical data analysis, one of the earliest nonparametric methods [@pearson1900x].
  • Ronald A. Fisher (1925): Established the foundation of modern hypothesis testing, though mainly in parametric contexts [@fisher1925statistical].

Development of Key Nonparametric Tests in the 20th Century

As statisticians recognized the limitations of parametric tests, new nonparametric methods were developed. Below are some of the most important milestones:

  • Wilcoxon (1945): Introduced the Wilcoxon Signed Rank Test, one of the first nonparametric alternatives to the paired t-test [@wilcoxon1945individual].
  • Mann and Whitney (1947): Developed the Mann–Whitney U Test for comparing two independent groups without assuming normality [@mann1947test].
  • Kruskal and Wallis (1952): Proposed the Kruskal–Wallis test as a nonparametric counterpart to one-way ANOVA [@kruskal1952use].
  • Spearman (1904): Introduced Spearman’s Rank Correlation as an alternative to Pearson correlation [@spearman1904proof].
  • Kolmogorov and Smirnov (1933): Developed the Kolmogorov-Smirnov test for comparing empirical distributions [@kolmogorov1933sulla].
  • Friedman (1937): Introduced the Friedman test for repeated measures analysis [@friedman1937use].

Modern Advancements and Computational Approaches

With the advent of computers, nonparametric methods have become more computationally feasible, leading to the development of advanced resampling techniques:

  • Efron (1979): Introduced the bootstrap method, allowing for robust estimation of sampling distributions [@efron1979bootstrap].
  • Monte Carlo Methods: Used extensively to approximate p-values and test statistics in nonparametric hypothesis testing.
  • Bayesian Nonparametrics: Emerging as an alternative framework where priors are placed on infinite-dimensional spaces (e.g., Dirichlet Process).

References