Nonparametric testing

Often ecological data is messy stuff that fails to meet our statistical assumptions. So what do we do when this is the case? One option is nonparametric tests. These tests have relaxed assumptions about normality and homogeneity of variances. Sampling still needs to be random and independent though!

When to use nonparametric tests?

The most likely scenario to use nonparametric tests is you wanted to use a one-way ANOVA or t-test, but the assumptions aren’t met. Then we use a nonparametric test.

There is significant debate in the statistical field on if this is an appropriate use of nonparametric tests. I have to admit that I do this though. Look into the literature and form your own opinions - I won’t make arguments either way.

Some experimental designs are actually well-suited to nonparametric analysis, especially ones with naturally skewed or otherwise nonnormal distributions.

Often nonparametric tests are described as less ‘powerful’ (the ability to detect a significant difference) than parametric tests (like ANOVA, t-test, etc). But it’s actually not that simple. Nonparametric tests can actually be more powerful than parametric tests in certain situations - like when data is very skewed.

Kruskal-Wallis Test

The Kruskal-Wallis Test (K-W) is roughly equivalent to one-way ANOVA. The K-w is used to test for differences among groups. It is often said that it tests for differences in median among groups - this is not technically true. It is more like it tests for differences in distribution of ranked data. Frankly, I don’t understand nonparametric mechanics well enough to say anything more than that. The simplest way for me to understand the K-W (and related tests) is that it looks for differences between medians of groups, instead of means.

Even if I (and maybe you) don’t properly understand the mechanics of nonparametric tests, we can still continue using them for the analyses that I lay out below. Years of research supports their use for these applications.

K-W Code

The code for K-W test is super easy and there are several methods for the code. For this tutorial, we will use the freely available iris dataset from the R world. This dataframe can be directly called. For your own data, you can read it into the program using read.csv(). Please see the Getting Started with R tutorial.

Let’s take a look at it.

View(iris)

The first 4 columns are numeric data showing petal and sepal length and width. The 5th column is a categorical variable - species of iris measured.

For this analysis, we will see if petal length differs among species. We will use a K-W.

First method

No packages are needed for the function kruskal.test. The first variable (add here) is the numeric response variable. The second variable (add here) is our categorical explanatory variable. I will show two equivalent ways of coding this way, the first using the attach function to ‘attach’ the data. The second way directly references the dataframe for each variable.

#First Way
attach(iris) #makes it so that R references the iris dataframe by default
kruskal.test(Petal.Length,Species)

#Second Way
kruskal.test(iris$Petal.Length,iris$Species)

Second Method

Same function here, but now we write out the formula of the K-W test. The ~ means ‘given’ or ‘explained by’. We directly reference the iris dataset with the data argument.

kruskal.test(Petal.Length~Species,data=iris)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Petal.Length by Species
## Kruskal-Wallis chi-squared = 130.41, df = 2, p-value < 2.2e-16

K-W results

Each of these K-W methods produces the same result - a table that tells us the Kruskal X² of this test is 130.41, there were two degrees of freedom, and a super small p-value.

So at least one of the groups has significantly different petal lengths from the others! But which group could it be? Read the next section to find out (#Rclickbait).

Post-hoc multiple comparisons for K-W

There are multiple post-hoc methods for K-W. I will show you just one, the Nemenyi test, which I use often. Others may be better suited to your study, so poke around. This online discussion might be a good starting point.

library(PMCMR) #make sure to install this package first if you haven't before, only have to once

## Warning: package 'PMCMR' was built under R version 3.4.4

        #install.packages("PMCMR")
###if this package stops working, use PMCMRplus
posthoc.kruskal.nemenyi.test(iris$Petal.Length,iris$Species,data=iris)

## Warning in posthoc.kruskal.nemenyi.test.default(iris$Petal.Length, iris
## $Species, : Ties are present, p-values are not corrected.

## 
##  Pairwise comparisons using Tukey and Kramer (Nemenyi) test  
##                    with Tukey-Dist approximation for independent samples 
## 
## data:  iris$Petal.Length and iris$Species 
## 
##            setosa  versicolor
## versicolor 1.4e-08 -         
## virginica  < 2e-16 8.6e-08   
## 
## P value adjustment method: none

Looking at the results, we see that each pairwise comparison has a p-value less than 0.05 so all of the groups differ from the others. This function defaults to a Tukey type method. If you would like to change this to a chi-square method, use ?posthoc.kruskal.nemenyi.test to find out how to change this (and other arguments).

Wilcoxon test (Mann-Whitney U)

This is the nonparametric equivalent of a t-test and goes by several names - Wilcoxon, Mann-Whitney, etc.

It is very similar to a K-W test, except that you only compare between two groups.

We will use the iris dataset again, but this time use the subset function to toss out the versicolor species, so that we only have two groups (species) to compare between. Note: the ‘!=’ means ‘not equal to’. For other operators (such as equal to, or, etc), see this link.

irisSubset <- subset(iris,Species!="versicolor")

Wilcoxon Test Code

Same structure here as with the last K-W method: our response variable (Petal.Length) explained by our explanatory variable (Species).

wilcox.test(Petal.Length~Species,data=irisSubset) #using subsetted data

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Petal.Length by Species
## W = 0, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

There is a significant difference between the two groups!

Boxplots: Visualizing Nonparametric Data

Graphing averages for data analyzed by nonparametric analyses is inappropriate! Therefore we will use a boxplot. Boxplots show medians and quartiles, which are based on ranks of data (not actual values).

boxplot(Petal.Length~Species,data=iris,ylab="Petal Length (mm)",xlab="Iris species",main="")

Want to make the ticks go inward? R does not make it easy.

boxplot(Petal.Length~Species,data=iris,ylab="Petal Length (mm)",xlab="Iris species",main="",yaxt="n",xaxt="n")
axis(side=2,tcl=.2,at=NULL,labels=TRUE) #for longer ticks, increase tcl
axis(side=1,tcl=.2,at=c(1,2,3),labels=c("setosa","versicolor","virginica"))

  #increase tick length (tcl) here also
  #the 'at' function says where to put each tick along the axis, usually integers
  #'labels' function places labels at ticks

*For more tips for making similar plots, see the Plotting tutorial.

Now with your data

Here is all of the code you should need to do K-W and Wilcoxon tests.

#Kruskal-Wallis test
#First method
attach(iris) #makes it so that R references the iris dataframe by default
kruskal.test(Petal.Length,Species)
#Second method
kruskal.test(iris$Petal.Length,iris$Species)
#Third method
kruskal.test(Petal.Length~Species,data=iris)
#################################
####   All methods produce   ####
####     same results        ####
#################################

#Post-hoc for Kruskal-Wallis: Nemenyi test
library(PMCMR) #make sure to install this package first if you haven't before, only have to once
        #install.packages("PMCMR")
###if this package stops working, use PMCMRplus
posthoc.kruskal.nemenyi.test(iris$Petal.Length,iris$Species,data=iris)

#Wilcoxon test (Mann-Whitney U)

#Line below only needed to run given example
#irisSubset <- subset(iris,Species!="versicolor")

#Wilcoxon test code
wilcox.test(Petal.Length~Species,data=irisSubset) #using subsetted data

#Plotting a boxplot
boxplot(Petal.Length~Species,data=iris,ylab="Petal Length (mm)",xlab="Iris species",main="")

#Boxplot with ticks inward
boxplot(Petal.Length~Species,data=iris,ylab="Petal Length (mm)",xlab="Iris species",main="",yaxt="n",xaxt="n")
axis(side=2,tcl=.2,at=NULL,labels=TRUE) #for longer ticks, increase tcl
axis(side=1,tcl=.2,at=c(1,2,3),labels=c("setosa","versicolor","virginica"))
  #increase tick length (tcl) here also
  #the 'at' function says where to put each tick along the axis, usually integers
  #'labels' function places labels at ticks

Basic Nonparametric Tests

Michael Sinclair

May 21, 2019