Recap of Parametric Hypothesis Tests & Why Non-Parametric Tests?

1. Introduction

Statistical hypothesis testing is a fundamental concept in data analysis that helps us make informed decisions based on sample data. Traditionally, hypothesis tests are classified into two broad categories:

Parametric Tests: Assume a specific distribution for the data (e.g., normality).
Non-Parametric Tests: Do not assume any particular distribution.

In this document, we will: - Recap key parametric tests used in hypothesis testing. - Understand their limitations and when they may not be appropriate. - Learn why non-parametric tests are valuable and when to use them.

2. Recap of Parametric Hypothesis Tests

Definition

A parametric test is a statistical test that assumes the data follows a specific probability distribution (e.g., normal distribution). These tests estimate parameters (such as the mean, variance) and rely on assumptions about the population.

Examples of Parametric Tests

Here are some commonly used parametric tests:

Test	Purpose	Example Use Case
Z-Test	Tests whether the mean of a sample differs from a known population mean when population variance is known.	Checking if the average IQ of students in a school is 100.
t-Test	Compares means between one or two samples when population variance is unknown.	Comparing exam scores between students who took online vs. offline classes.
ANOVA (F-Test)	Tests for differences among more than two group means.	Comparing salaries across different industries.
Pearson Correlation	Measures linear association between two variables.	Checking the relationship between height and weight.
Linear Regression	Models the relationship between one or more predictors and an outcome.	Predicting house prices based on size, location, etc.

Mathematical Formulation of a Parametric Test

Consider a one-sample t-test, which tests if the mean of a sample differs from a hypothesized mean \(\mu_0\):

Null Hypothesis (\(H_0\)): \(\mu = \mu_0\)
Alternative Hypothesis (\(H_A\)): \(\mu \neq \mu_0\)

The test statistic is:

\[ t = \frac{\bar{X} - \mu_0}{\frac{S}{\sqrt{n}}} \]

where: - \(\bar{X}\) = Sample mean - \(S\) = Sample standard deviation - \(n\) = Sample size

Under \(H_0\), the test statistic follows a t-distribution with \(n-1\) degrees of freedom.

3. Limitations of Parametric Tests

Despite their usefulness, parametric tests come with strict assumptions:

Normality Assumption: The data must follow a normal distribution.
Equal Variance (Homogeneity of Variance): Variability across groups should be equal.
Independent Observations: Each observation should be independent of others.
Interval or Ratio Data: Parametric tests require numerical data (not ordinal).

What Happens When These Assumptions Are Violated?

If data are skewed or contain outliers, parametric tests may give misleading results.
If variances are unequal, tests like ANOVA may become invalid.
If the sample size is small, the Central Limit Theorem (CLT) does not apply, making normality assumptions unreliable.

This is where non-parametric tests come to the rescue!

4. Why Non-Parametric Tests?

Definition

A non-parametric test does not assume a specific distribution for the data. Instead, it relies on ranks or medians, making it robust to non-normality, small samples, and outliers.

Advantages of Non-Parametric Tests

✅ No Normality Assumption: Works well with skewed data or ordinal data.
✅ Handles Outliers: Since it is based on ranks, extreme values do not distort results.
✅ Small Sample Friendly: Does not require large sample sizes to be reliable.
✅ Works with Ordinal Data: Useful for surveys, customer ratings, or Likert scales (1-5).

Examples of Non-Parametric Tests

Non-Parametric Test	Alternative To	Purpose
Wilcoxon Signed-Rank	One-sample t-test	Tests whether the median of a single sample differs from a known value.
Mann-Whitney U Test	Independent t-test	Compares two independent groups when normality is violated.
Kruskal-Wallis Test	One-way ANOVA	Compares three or more groups when normality is violated.
Spearman’s Rank Correlation	Pearson Correlation	Measures monotonic relationships (not necessarily linear).
Friedman Test	Repeated Measures ANOVA	Compares multiple paired samples when normality is violated.

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
library(ggplot2)
library(dplyr)
library(knitr)
library(car)
library(ggpubr)

# Load the dataset
data("iris")

# Display the first few rows
kable(head(iris), caption = "Sample of Iris Dataset")

Sample of Iris Dataset
Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa
5.4	3.9	1.7	0.4	setosa

# Load dataset
data("mtcars")

# Convert 'am' to a factor for readability
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))

# Display first few rows
kable(head(mtcars), caption = "First Six Rows of mtcars Dataset")

First Six Rows of mtcars Dataset
	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	Manual	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	Manual	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	Manual	4	1
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	Automatic	3	1
Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	Automatic	3	2
Valiant	18.1	6	225	105	2.76	3.460	20.22	1	Automatic	3	1

# Load the dataset
data("faithful")

# Display the first few rows
head(faithful)

##   eruptions waiting
## 1     3.600      79
## 2     1.800      54
## 3     3.333      74
## 4     2.283      62
## 5     4.533      85
## 6     2.883      55

ggplot(iris, aes(x = Petal.Length, fill = Species)) +
  geom_histogram(bins = 15, alpha = 0.6, position = "identity") +
  facet_wrap(~Species) +
  labs(title = "Histogram of Petal Length by Species", x = "Petal Length", y = "Count") +
  theme_minimal()

ggplot(mtcars, aes(x = mpg, fill = am)) +
  geom_histogram(bins = 10, alpha = 0.6, position = "identity") +
  facet_wrap(~am) +
  labs(title = "Histogram of MPG for Automatic vs. Manual Cars",
       x = "Miles Per Gallon (MPG)", y = "Count") +
  theme_minimal()

Automatic cars show a nearly normal distribution.

Manual cars exhibit right skewness, which may violate the assumption of normality.

# Histogram of waiting times
ggplot(faithful, aes(x = waiting)) +
  geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Waiting Times Between Eruptions",
       x = "Waiting Time (minutes)",
       y = "Frequency") +
  theme_minimal()

ggqqplot(iris, x = "Petal.Length", facet.by = "Species", color = "Species") +
  labs(title = "Q-Q Plot of Petal Length by Species")

ggqqplot(mtcars, x = "mpg", facet.by = "am", color = "am") +
  labs(title = "Q-Q Plot of MPG by Transmission Type")

# Q-Q plot
ggplot(faithful, aes(sample = waiting)) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Waiting Times",
       x = "Theoretical Quantiles",
       y = "Sample Quantiles") +
  theme_minimal()

# Apply Shapiro-Wilk test separately for each species
shapiro_results <- iris %>%
  group_by(Species) %>%
  summarise(p_value = shapiro.test(Petal.Length)$p.value)

print(shapiro_results, caption = "Shapiro-Wilk Test for Normality")

## # A tibble: 3 × 2
##   Species    p_value
##   <fct>        <dbl>
## 1 setosa      0.0548
## 2 versicolor  0.158 
## 3 virginica   0.110

Is the Normality Assumption Valid Based on the Shapiro-Wilk Test Results?

The Shapiro-Wilk test checks whether data follow a normal distribution. The null hypothesis (H₀) assumes normality, and we reject H₀ if the p-value is less than 0.05.

Interpreting the Shapiro-Wilk Test Results:

Species	p-value	Decision (α = 0.05)	Interpretation
Setosa	0.0548	Fail to Reject H₀	Data appear to follow a normal distribution.
Versicolor	0.1585	Fail to Reject H₀	Data appear to follow a normal distribution.
Virginica	0.1098	Fail to Reject H₀	Data appear to follow a normal distribution.

Since all p-values are greater than 0.05, we fail to reject the null hypothesis for all three species. This suggests that the petal length data are not significantly different from a normal distribution.

Final Verdict: Is the Normality Assumption OK?

✅ Yes! The normality assumption holds for all species.

Since all species pass the Shapiro-Wilk test, using parametric tests like ANOVA is appropriate.

However, for small samples (n < 30), normality tests can be less reliable. Always combine statistical tests with visual methods (e.g., Q-Q plots, histograms) before making final conclusions.

Next Steps

📌 If ANOVA is applied, confirm homogeneity of variance (e.g., using Levene’s test).
📌 If normality was borderline (p ~ 0.05), consider checking histograms and Q-Q plots.
📌 If data were not normally distributed, use the Kruskal-Wallis test instead.

🔍 Final Answer: Normality assumption is OK. Parametric tests are justified. 🚀

# Apply Shapiro-Wilk test separately for each transmission type
shapiro_results <- mtcars %>%
  group_by(am) %>%
  summarise(p_value = shapiro.test(mpg)$p.value)

print(shapiro_results, caption = "Shapiro-Wilk Test for Normality")

## # A tibble: 2 × 2
##   am        p_value
##   <fct>       <dbl>
## 1 Automatic   0.899
## 2 Manual      0.536

# Shapiro-Wilk test
shapiro_test <- shapiro.test(faithful$waiting)
shapiro_test

## 
##  Shapiro-Wilk normality test
## 
## data:  faithful$waiting
## W = 0.92215, p-value = 1.015e-10

Conclusion Based on Shapiro-Wilk Normality Test

The Shapiro-Wilk test assesses whether the waiting times in the faithful dataset follow a normal distribution.

H₀: Data are normally distributed.
H_A: Data are not normally distributed.

Test Results:

Test Statistic: W = 0.92215
p-value: 1.015 × 10^-10

Interpretation:

Since the p-value is less than 0.05, we reject H₀.

This means the waiting time data are not normally distributed. The normality assumption is violated, making parametric tests inappropriate.

Final Decision:

❌ Parametric tests (e.g., t-test, ANOVA) should not be used.

✅ Non-parametric tests (e.g., Wilcoxon rank-sum test, Kruskal-Wallis test) should be applied instead.

Example: Student Exam Scores

# Simulated exam scores
set.seed(42)
exam_scores <- rnorm(30, mean = 75, sd = 10)

# Perform One-Sample t-Test
t_test_one_sample <- t.test(exam_scores, mu = 75)

# Print test result
t_test_one_sample

## 
##  One Sample t-test
## 
## data:  exam_scores
## t = 0.29933, df = 29, p-value = 0.7668
## alternative hypothesis: true mean is not equal to 75
## 95 percent confidence interval:
##  70.99952 80.37222
## sample estimates:
## mean of x 
##  75.68587

Interpretation

If the p-value is less than 0.05, we reject H₀:

H₀ → The mean exam score is significantly different from 75.

If the p-value is greater than or equal to 0.05, we fail to reject H₀:

H₀ → There is no statistically significant difference between the mean exam score and 75.

1.2 Two-Sample t-Test (Comparing Two Independent Groups)

Hypothesis

H₀: μ₁ = μ₂
H_A: μ₁ ≠ μ₂

Example: Traditional vs. Online Teaching Methods

# Simulated teaching method data
traditional <- rnorm(15, mean = 75, sd = 10)
online <- rnorm(15, mean = 78, sd = 9)

# Perform Two-Sample t-Test
t_test_two_sample <- t.test(traditional, online, var.equal = TRUE)

# Print test result
t_test_two_sample

## 
##  Two Sample t-test
## 
## data:  traditional and online
## t = -2.0235, df = 28, p-value = 0.05266
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -14.69936545   0.08973592
## sample estimates:
## mean of x mean of y 
##  71.57939  78.88420

1.3 Paired t-Test (Before-After Comparison in a Single Group)

Hypothesis

H₀: μ_D = 0
H_A: μ_D ≠ 0

Example: Weight Reduction Before & After a Diet

# Simulated before-after weight data
before_diet <- rnorm(20, mean = 80, sd = 5)
after_diet  <- before_diet - rnorm(20, mean = 2, sd = 1)  # Expected weight loss

# Perform Paired t-Test
t_test_paired <- t.test(before_diet, after_diet, paired = TRUE)

# Print test result
t_test_paired

## 
##  Paired t-test
## 
## data:  before_diet and after_diet
## t = 10.142, df = 19, p-value = 4.188e-09
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  1.652319 2.511593
## sample estimates:
## mean difference 
##        2.081956

1.4 One-Way ANOVA (Comparing Means Across Multiple Groups)

Hypothesis

H₀: μ₁ = μ₂ = μ₃ (All group means are equal)
H_A: At least one mean is different

Example: Exam Scores for Different Teaching Methods

# Simulated exam scores for 3 groups
set.seed(42)
method1 <- rnorm(15, mean = 70, sd = 10)
method2 <- rnorm(15, mean = 75, sd = 9)
method3 <- rnorm(15, mean = 78, sd = 8)

# Create data frame
df_anova <- data.frame(
  Method = rep(c("Traditional", "Online", "Hybrid"), each = 15),
  Score = c(method1, method2, method3)
)

# Perform ANOVA
anova_test <- aov(Score ~ Method, data = df_anova)

# Print summary
summary(anova_test)

##             Df Sum Sq Mean Sq F value Pr(>F)
## Method       2    102   51.11   0.482  0.621
## Residuals   42   4450  105.95

Interpretation

If the p-value is less than 0.05, we reject H₀:

H₀ → The mean exam scores are equal across all teaching methods. (Or, more formally: All group means are equal.)

We conclude that there is a statistically significant difference in mean exam scores between at least one pair of teaching methods. (It’s important to note that this doesn’t tell us which* teaching methods are different, just that at least one pair is.)*

3. Parametric vs. Non-Parametric: A Comparison

Test Type	Parametric Test	Non-Parametric Equivalent
One Sample Test	One-Sample t-Test	Wilcoxon Signed-Rank Test
Two Sample Test	Independent t-Test	Mann-Whitney U Test
Paired Sample Test	Paired t-Test	Wilcoxon Signed-Rank Test
Multiple Groups	One-Way ANOVA	Kruskal-Wallis Test
Association	Pearson Correlation	Spearman Rank Correlation

4. Hands-On Exercise: Which Test to Use?

Scenario

A researcher collects data from 3 groups of students who used different study techniques and recorded their final exam scores.

Task for Students

Determine whether the data are normally distributed.
If the data are normally distributed, perform a One-Way ANOVA.
If the data are not normally distributed, perform the Kruskal-Wallis test.

Starter Code

# Simulated data
technique1 <- rnorm(12, mean = 70, sd = 10)
technique2 <- rnorm(12, mean = 75, sd = 12)
technique3 <- rnorm(12, mean = 78, sd = 8)

df_exercise <- data.frame(
  Technique = rep(c("Flashcards", "Practice Tests", "Summarization"), each = 12),
  Score = c(technique1, technique2, technique3)
)

# Check normality
shapiro.test(df_exercise$Score)

## 
##  Shapiro-Wilk normality test
## 
## data:  df_exercise$Score
## W = 0.91258, p-value = 0.00767

6. Conclusion

Parametric tests are powerful but require strict assumptions.
Non-parametric tests provide a robust alternative when assumptions are violated.
Both types of tests have their place depending on the data type, sample size, and research question.

7. References

Fisher, R. A. (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.
Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6), 80-83.
Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether One of Two Random Variables Is Stochastically Larger than the Other. Annals of Mathematical Statistics, 18(1), 50-60.

This document provides a detailed recap of parametric hypothesis tests and explains why non-parametric alternatives are important. 🚀

Early Foundations: Pre-20th Century

The roots of nonparametric statistics lie in early probability theory and empirical methods. Some of the key developments before the formalization of statistical hypothesis testing include:

Pierre-Simon Laplace (1774): Introduced early ideas on probability distributions, which influenced later statistical methods [@laplace1774memoire].
Karl Pearson (1900): Developed the Chi-Square test for categorical data analysis, one of the earliest nonparametric methods [@pearson1900x].
Ronald A. Fisher (1925): Established the foundation of modern hypothesis testing, though mainly in parametric contexts [@fisher1925statistical].

Development of Key Nonparametric Tests in the 20th Century

As statisticians recognized the limitations of parametric tests, new nonparametric methods were developed. Below are some of the most important milestones:

Wilcoxon (1945): Introduced the Wilcoxon Signed Rank Test, one of the first nonparametric alternatives to the paired t-test [@wilcoxon1945individual].
Mann and Whitney (1947): Developed the Mann–Whitney U Test for comparing two independent groups without assuming normality [@mann1947test].
Kruskal and Wallis (1952): Proposed the Kruskal–Wallis test as a nonparametric counterpart to one-way ANOVA [@kruskal1952use].
Spearman (1904): Introduced Spearman’s Rank Correlation as an alternative to Pearson correlation [@spearman1904proof].
Kolmogorov and Smirnov (1933): Developed the Kolmogorov-Smirnov test for comparing empirical distributions [@kolmogorov1933sulla].
Friedman (1937): Introduced the Friedman test for repeated measures analysis [@friedman1937use].

Modern Advancements and Computational Approaches

With the advent of computers, nonparametric methods have become more computationally feasible, leading to the development of advanced resampling techniques:

Efron (1979): Introduced the bootstrap method, allowing for robust estimation of sampling distributions [@efron1979bootstrap].
Monte Carlo Methods: Used extensively to approximate p-values and test statistics in nonparametric hypothesis testing.
Bayesian Nonparametrics: Emerging as an alternative framework where priors are placed on infinite-dimensional spaces (e.g., Dirichlet Process).

Recap of Parametric Hypothesis Tests & Why Non-Parametric Tests?

Dr. Debashis Chatterjee

2025-02-19

Recap of Parametric Hypothesis Tests & Why Non-Parametric Tests?

1. Introduction

2. Recap of Parametric Hypothesis Tests

Definition

Examples of Parametric Tests

Mathematical Formulation of a Parametric Test

3. Limitations of Parametric Tests

What Happens When These Assumptions Are Violated?

4. Why Non-Parametric Tests?

Definition

Advantages of Non-Parametric Tests

Examples of Non-Parametric Tests

Is the Normality Assumption Valid Based on the Shapiro-Wilk Test Results?

Interpreting the Shapiro-Wilk Test Results:

Final Verdict: Is the Normality Assumption OK?

Next Steps

Conclusion Based on Shapiro-Wilk Normality Test

Test Results:

Interpretation:

Final Decision:

Example: Student Exam Scores

Interpretation

1.2 Two-Sample t-Test (Comparing Two Independent Groups)

Hypothesis

Example: Traditional vs. Online Teaching Methods

1.3 Paired t-Test (Before-After Comparison in a Single Group)

Hypothesis

Example: Weight Reduction Before & After a Diet

1.4 One-Way ANOVA (Comparing Means Across Multiple Groups)

Hypothesis

Example: Exam Scores for Different Teaching Methods

Interpretation

3. Parametric vs. Non-Parametric: A Comparison

4. Hands-On Exercise: Which Test to Use?

Scenario

Task for Students

Starter Code

6. Conclusion

7. References

Early Foundations: Pre-20th Century

Development of Key Nonparametric Tests in the 20th Century

Modern Advancements and Computational Approaches

References