Permutation Tests

General Idea

Permutation tests are non-parametric methods used to test hypotheses, especially when the assumptions of traditional parametric tests (like normality or equal variances) may not hold. The core idea is that, under the null hypothesis, the group labels assigned to data points are exchangeable. Therefore, we can approximate the null distribution of a test statistic by repeatedly and randomly permuting the group labels and recalculating the statistic for each permutation.

Example: Permutation Test Using PlantGrowth Data

The PlantGrowth dataset records the weight of plants under three conditions: a control group (ctrl) and two treatment groups (trt1 and trt2). We’ll compare the control group (ctrl) to the first treatment group (trt1) using a permutation test.

# Load and subset data
data("PlantGrowth")
subdata <- PlantGrowth[PlantGrowth$group %in% c("ctrl", "trt1"), ]
y <- subdata$weight
group <- as.character(subdata$group)

# Define test statistic
testStat <- function(w, g) mean(w[g == "ctrl"]) - mean(w[g == "trt1"])

# Calculate observed test statistic
observedStat <- testStat(y, group)

# Generate permutation distribution
set.seed(42)
permutations <- replicate(10000, testStat(y, sample(group)))

# Calculate p-value (one-sided)
p_value <- mean(permutations >= observedStat)

Interpretation

In this test, we assess whether the observed difference in means between Spray B and Spray C could plausibly occur under the null hypothesis — that both sprays are equally effective. By permuting the group labels (i.e., randomly assigning “B” and “C” labels), we generate a distribution of the test statistic under the null. The proportion of permuted statistics more extreme than the observed gives us a p-value.

Key Advantages:

No assumption of normality or equal variance
Applicable to small samples
Highly flexible — works with any test statistic

Special Instances of Permutation Tests

Rank Sum Test (Mann–Whitney U)

The Rank Sum Test is a non-parametric alternative to the two-sample t-test. Though often taught separately, it can be viewed as a permutation test where the test statistic is based on the ranks of the observations rather than the raw values.

Test statistic: Sum (or average) of ranks in one of the groups.
Null distribution: Obtained by randomly permuting group labels and recalculating rank sums.

Fisher’s Exact Test

This is a classic example of an exact test used for 2×2 contingency tables.

Test statistic: Typically the probability of the observed table under a hypergeometric distribution.
Null distribution: All possible tables with fixed margins are considered — effectively a full enumeration of permutations.

While not always described this way, Fisher’s Exact Test is conceptually a permutation test for categorical data with small sample sizes.

Randomisation Tests

Randomisation tests are permutation tests specifically rooted in experimental designs. They are particularly useful when testing the significance of a treatment effect in randomised trials.

Test statistic: Can be any summary measure (mean difference, median difference, etc.).
Approach: Rerun the analysis on data with randomly reassigned group labels (under the assumption that treatment assignment was random).

These are often synonymous with permutation tests, especially in the context of designed experiments.

Permutation Strategies for Regression

Permutation methods extend naturally to regression and other multivariable models. Several strategies exist:

Permuting the response: Breaks the relationship between predictors and the response.
Permuting residuals: Permute residuals under a reduced model (e.g., without a predictor of interest) and add them back to the fitted values.
Permuting a predictor: Tests the marginal effect of one predictor while keeping the rest of the data fixed.

This allows for hypothesis testing about regression coefficients, model fit (e.g., R²), or overall model structure without relying on normality or homoscedasticity assumptions.

Final Remarks

Permutation tests offer a robust, assumption-light alternative to classical parametric hypothesis tests. They are especially valuable when sample sizes are small, distributions are unknown or non-normal, or when working with unconventional test statistics.

With the rise of computational tools, permutation tests are now accessible and practical for a wide range of applications — from experimental design and regression modelling to machine learning validation.

They exemplify a core idea in modern statistics: when in doubt, shuffle the data and see what happens.