Bootstrapping Measures of Biodiversity

This is a mosaic based version of my lab involving using bootstrapping to estimate Shannon's index of biodiversity. I will use an artificial example involving gummi bears and a real life example of amphibian biodiversity collected in 2009 as part of BioMaPS by Glenna Buford and Mandy Main.

First, I create a function called shannon to compute Shannon's index, \[ H=-\sum p_i \log(p_i) \] where \( p_i \) is the observed proportion of species \( i \).

shannon <- function(data) {
    counts <- table(data)
    p.i <- counts/sum(counts)
    H <- (-1) * sum(p.i * log(p.i))
    return(H)
}

Now, let us test the function with a silly artificial example involving gummi bears.

require(mosaic)
gummi.bears <- c(rep("red", 50), rep("green", 30), rep("giant yellow", 20))
tally(gummi.bears)
## 
## giant yellow        green          red        Total 
##           20           30           50          100
shannon(gummi.bears)
## [1] 1.03

Now, let us take a bootstrap sample of our species of gummi bears; that is, a sample with replacement of the same size.

b1 <- resample(gummi.bears)
tally(b1)
## 
## giant yellow        green          red        Total 
##           18           28           54          100
shannon(b1)
## [1] 0.9978

Now let's do this a bunch of times \( B \) and construct a sampling distribution of the diversity of our gummi bears. We then compute the mean, median, standard error, and a 95% bootstrapped confidence interval for Shannon's index. Finally, display it graphically using a histogram, a boxplot, and a density plot.

B <- 1000
sampDist <- do(B) * (shannon(resample(gummi.bears)))
head(sampDist)
##   result
## 1 1.0823
## 2 0.9804
## 3 1.0709
## 4 1.0402
## 5 0.9946
## 6 1.0252
ObsShannon <- shannon(gummi.bears)
ObsShannon
## [1] 1.03
bootMean <- mean(~result, data = sampDist)
bootMean
## [1] 1.021
bootMedian <- median(~result, data = sampDist)
bootMedian
## [1] 1.025
bootSE <- sd(~result, data = sampDist)
bootSE
## [1] 0.03633
bootCI <- qdata(c(0.025, 0.975), vals = result, data = sampDist)
bootCI
##   2.5%  97.5% 
## 0.9402 1.0768
xhistogram(~result, data = sampDist)

plot of chunk unnamed-chunk-4

bwplot(~result, data = sampDist)

plot of chunk unnamed-chunk-4

densityplot(~result, data = sampDist)

plot of chunk unnamed-chunk-4

Let's create a more elaborate function that will not just compute the observed Shannon's index for a dataset, but one that will also compute the bootstrapped mean, median, standard error, and confidence interval for a desired confidence level.

boot.shannon <- function(data, B = 1000, conf.level = 0.95) {
    shannon.boot <- do(B) * shannon(resample(data))
    boot.mean <- mean(~result, data = shannon.boot)
    boot.se <- sd(~result, data = shannon.boot)
    boot.median <- median(~result, data = shannon.boot)
    alpha <- 1 - conf.level
    boot.lci <- qdata(alpha/2, vals = result, data = shannon.boot)
    boot.uci <- qdata(1 - alpha/2, vals = result, data = shannon.boot)
    boot.stats <- c(boot.mean, boot.se, boot.median, boot.lci, boot.uci)
    names(boot.stats) <- c("Mean", "Std.Error", "Median", "Lower CI", "Upper CI")
    return(round(boot.stats, 3))
}

Let's demonstrate using the new function with the gummi.bears data. First, we use the default values of \( B=1000 \) resamples and 95% confidence, then change to \( B=10000 \) trials and 80% confidence.

boot.shannon(gummi.bears)
##      Mean Std.Error    Median  Lower CI  Upper CI 
##     1.020     0.038     1.024     0.932     1.079
boot.shannon(data = gummi.bears, B = 10000, conf.level = 0.8)
##      Mean Std.Error    Median  Lower CI  Upper CI 
##     1.019     0.038     1.024     0.969     1.065

Last, we'll read in Glenna and Mandy's example with amphibians at the Cassity Lane site, in rural Calloway County near homes and farm fields, and the Colson's Overlook site in LBL, which is more wooded and protected'.

cassity.lane <- c(rep("green.frog", 29), rep("red.spotted.newt", 6), rep("bull.frog", 
    3), rep("cricket.frog", 2), "fowlers.toad", "copes.gray.treefrog")
colson.overlook <- c(rep("copes.gray.treefrog", 38), rep("narrowmouth.toad", 
    5), rep("spotted.salamander", 3), "mole.salamander")

shannon(cassity.lane)
## [1] 1.045
boot.shannon(cassity.lane)
##      Mean Std.Error    Median  Lower CI  Upper CI 
##     0.973     0.172     0.977     0.618     1.288
shannon(colson.overlook)
## [1] 0.6678
boot.shannon(colson.overlook)
##      Mean Std.Error    Median  Lower CI  Upper CI 
##     0.635     0.139     0.640     0.350     0.899

In a subsequent document, we'll consider the problem of testing for a difference in diversity between Cassity Lane and Colson's Overlook, along with whether there is a significant difference in the diversity of make & state' (i.e. the make of your car and the state your license plate is from) on cars in student parking lots versus faculty parking lots. It will be fun.