Handling violations of assumptions: R code for Chapter 13 examples

Note: This document was converted to R-Markdown from this page by M. Drew LaMar. You can download the R-Markdown here.

Download the R code on this page as a single file here

New methods

The variable names are plucked from the examples further below.

Data transformations

marinelnBiomassRatio <- log(marine$biomassRatio)

Quantile plots

qqnorm(marine$biomassRatio, datax = TRUE)

Mann-Whitney $U$ -test (R uses the equivalent Wilcoxon rank-sum test)

wilcox.test(timeToMating ~ feedingStatus, data = cannibalism)

Other new methods

Shapiro-Wilk test of normality.
Sign test.
Permutation test.

Example 13.1. Biomass in marine reserves

The normal quantile plot, Shapiro-Wilk test of normality, and the log transformation, investigating the ratio of biomass between marine reserves and non-reserve control areas.

Read and inspect the data.

marine <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e1MarineReserve.csv"))
head(marine)

##   biomassRatio
## 1         1.34
## 2         1.96
## 3         2.49
## 4         1.27
## 5         1.19
## 6         1.15

Histogram of biomass ratio (Figure 13.1-4).

hist(marine$biomassRatio, right = FALSE, col = "firebrick", las = 1)

Normal quantile plot of biomass ratio data.

qqnorm(marine$biomassRatio, datax = TRUE)

Commands for a normal quantile plot with more options (Figure 13.1-4):

qqnorm(marine$biomassRatio, datax = TRUE, pch = 16, col = "firebrick", las = 1, ylab = "Biomass ratio", xlab = "Normal quantile", main = "")

Shapiro-Wilk test of normality.

shapiro.test(marine$biomassRatio)

## 
##  Shapiro-Wilk normality test
## 
## data:  marine$biomassRatio
## W = 0.81751, p-value = 8.851e-05

Natural log-transformation and resulting confidence interval of the a mean of marine biomass ratio.

Log-transformation. The function log() takes the natural logarithm of all the elements of a vector or variable. The following command puts the results into a new variable in the same data frame, marine.

marine$lnBiomassRatio <- log(marine$biomassRatio)

Histogram of the log-transformed marine biomass ratio (Figure 13.3-2) .

hist(marine$lnBiomassRatio, right = FALSE, col = "firebrick", las = 1)

95% confidence interval of the mean using the log-transformed data.

t.test(marine$lnBiomassRatio)$conf.int

## [1] 0.3470180 0.6112365
## attr(,"conf.level")
## [1] 0.95

Back-transform lower and upper limits of confidence interval (exp is the inverse of the log function).

exp(t.test(marine$lnBiomassRatio)$conf.int)

## [1] 1.414842 1.842708
## attr(,"conf.level")
## [1] 0.95

Example 13.4. Sexual conflict and speciation

The sign test, comparing the numbers of species in 25 pairs of closely related insect taxa.

Read and inspect the data.

conflict <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter13/chap13e4SexualConflict.csv"))
head(conflict)

##   taxonPair nSpeciesMultipleMating nSpeciesSingleMating difference
## 1         A                     53                   10         43
## 2         B                     73                  120        -47
## 3         C                    228                   74        154
## 4         D                    353                  289         64
## 5         E                    157                   30        127
## 6         F                    300                    4        296

Histogram of the difference in numbers of species (Figure 13.4-1).

hist(conflict$difference, right = FALSE, col = "firebrick", breaks = 50, xlab = "Difference in species number", las = 1)

Count up the frequency of differences that are below, equal to, and above zero. The first command below creates a new variable called zero in the data frame and sets all elements to “equal”. This is just to get started. The next two commands overwrite those elements corresponding to taxon pairs with a difference greater than zero, or less than zero, respectively.

conflict$zero <- "equal"
conflict$zero[conflict$difference > 0] <- "above"
conflict$zero[conflict$difference < 0] <- "below"
conflict$zero <- factor(conflict$zero, levels = c("below", "equal", "above"))
table(conflict$zero)

## 
## below equal above 
##     7     0    18

Sign test. The sign test is just a binomial test. The result includes a confidence interval for the proportion using the Clopper-Pearson method, which isn’t covered in the book.

binom.test(7, n = 25, p = 0.5)

## 
##  Exact binomial test
## 
## data:  7 and 25
## number of successes = 7, number of trials = 25, p-value = 0.04329
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.1207167 0.4938768
## sample estimates:
## probability of success 
##                   0.28

Example 13.5. Cricket sexual cannibalism

The Wilcoxon rank-sum test (equivalent to the Mann-Whitney $U$ -test) comparing times to mating (in hours) of starved and fed female sagebrush crickets. We also apply the permutation test to the same data.

Read and inspect data.

cannibalism <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e5SagebrushCrickets.csv"))
head(cannibalism)

##   feedingStatus timeToMating
## 1       starved          1.9
## 2       starved          2.1
## 3       starved          3.8
## 4       starved          9.0
## 5       starved          9.6
## 6       starved         13.0

Multiple histograms (Figure 13.5-1) using the lattice package

library(lattice)
histogram( ~ timeToMating | feedingStatus, data = cannibalism, layout = c(1,2), col = "firebrick", breaks = seq(0, 100, by = 20), type = "count", xlab = "Time to mating (hours)", ylab = "Frequency")

Commands for multiple histograms in basic R:

oldpar = par(no.readonly = TRUE) # make backup of default graph settings
par(mfrow = c(2,1), oma = c(4, 6, 2, 6), mar = c(3, 4, 3, 2))
hist(cannibalism$timeToMating[cannibalism$feedingStatus == "starved"], col = "firebrick", las = 1, breaks = seq(0, 100, by = 20), main = "starved", ylab = "Frequency")
hist(cannibalism$timeToMating[cannibalism$feedingStatus == "fed"], col = "firebrick", las = 1, breaks = seq(0, 100, by = 20), main = "fed", ylab = "Frequency")
mtext("Time to mating (hours)", side = 1, outer = TRUE, padj = 0)

par(oldpar) # revert to default graph settings

Wilcoxon rank-sum test (equivalent to Mann-Whitney $U$ -test)

wilcox.test(timeToMating ~ feedingStatus, data = cannibalism)

## 
##  Wilcoxon rank sum test
## 
## data:  timeToMating by feedingStatus
## W = 88, p-value = 0.3607
## alternative hypothesis: true location shift is not equal to 0

Permutation test of the difference between mean time to mating of starved and fed crickets.

Begin by calculating the observed difference between means (starved minus fed). The difference is -18.25734 in this data set.

cricketMeans <- tapply(cannibalism$timeToMating, cannibalism$feedingStatus, mean)
cricketMeans

##      fed  starved 
## 35.98462 17.72727

diffMeans <- cricketMeans[2] - cricketMeans[1]
diffMeans

##   starved 
## -18.25734

Decide on the number of permutations.

nPerm <- 10000

Create a loop to permute the data many times (determined by nperm). In the loop, i is just a counter that goes from 1 to nPerm by 1; each permuted difference is saved in the permResult.

permResult <- vector() # initializes
for(i in 1:nPerm){
    # step 1: permute the times to mating
    permSample <- sample(cannibalism$timeToMating, replace = FALSE)
    # step 2: calculate difference betweeen means
    permMeans <- tapply(permSample, cannibalism$feedingStatus, mean)
    permResult[i] <- permMeans[2] - permMeans[1]
    }

Plot null distribution based on the permuted differences (Figure 13.8-1).

hist(permResult, right = FALSE, breaks = 100)

Use the null distribution to calculate an approximate $P$ -value. This is the twice the proportion of the permuted means that fall below the observed difference in means, diffMeans (-18.25734 in this example). The following code calculates the number of permuted means falling below diffMeans.

sum(as.numeric(permResult <= diffMeans))

## [1] 661

These commands obtain the fraction of permuted means falling below diffMeans.

sum(as.numeric(permResult <= diffMeans)) / nPerm

## [1] 0.0661

Finally, multiply by 2 to get the $P$ -value for a two-sided test.

2 * ( sum(as.numeric(permResult <= diffMeans)) / nPerm )

## [1] 0.1322

Handling violations of assumptions: R code for Chapter 13 examples

Michael Whitlock and Dolph Schluter

New methods

Example 13.1. Biomass in marine reserves

Example 13.4. Sexual conflict and speciation

Example 13.5. Cricket sexual cannibalism