Bootstrapping Measures of Biodiversity

This is a mosaic based version of my lab involving using bootstrapping to estimate Shannon's index of biodiversity. I will use an artificial example involving gummi bears and a real life example of amphibian biodiversity collected in 2009 as part of BioMaPS by Glenna Buford and Mandy Main.

First, I create a function called shannon to compute Shannon's index, \[ H=-\sum p_i \log(p_i) \] where \( p_i \) is the observed proportion of species \( i \).

shannon <- function(data) {
    counts <- table(data)
    p.i <- counts/sum(counts)
    H <- (-1) * sum(p.i * log(p.i))
    return(H)
}

Now, let us test the function with a silly artificial example involving gummi bears.

require(mosaic)
gummi.bears <- c(rep("red", 50), rep("green", 30), rep("giant yellow", 20))
tally(gummi.bears)

## 
## giant yellow        green          red        Total 
##           20           30           50          100

shannon(gummi.bears)

## [1] 1.03

Now, let us take a bootstrap sample of our species of gummi bears; that is, a sample with replacement of the same size.

b1 <- resample(gummi.bears)
tally(b1)

## 
## giant yellow        green          red        Total 
##           18           28           54          100

shannon(b1)

## [1] 0.9978

Now let's do this a bunch of times \( B \) and construct a sampling distribution of the diversity of our gummi bears. We then compute the mean, median, standard error, and a 95% bootstrapped confidence interval for Shannon's index. Finally, display it graphically using a histogram, a boxplot, and a density plot.

B <- 1000
sampDist <- do(B) * (shannon(resample(gummi.bears)))
head(sampDist)

##   result
## 1 1.0823
## 2 0.9804
## 3 1.0709
## 4 1.0402
## 5 0.9946
## 6 1.0252

ObsShannon <- shannon(gummi.bears)
ObsShannon

## [1] 1.03

bootMean <- mean(~result, data = sampDist)
bootMean

## [1] 1.021

bootMedian <- median(~result, data = sampDist)
bootMedian

## [1] 1.025

bootSE <- sd(~result, data = sampDist)
bootSE

## [1] 0.03633

bootCI <- qdata(c(0.025, 0.975), vals = result, data = sampDist)
bootCI

##   2.5%  97.5% 
## 0.9402 1.0768

xhistogram(~result, data = sampDist)

plot of chunk unnamed-chunk-4

bwplot(~result, data = sampDist)

plot of chunk unnamed-chunk-4

densityplot(~result, data = sampDist)

plot of chunk unnamed-chunk-4

Let's create a more elaborate function that will not just compute the observed Shannon's index for a dataset, but one that will also compute the bootstrapped mean, median, standard error, and confidence interval for a desired confidence level.

boot.shannon <- function(data, B = 1000, conf.level = 0.95) {
    shannon.boot <- do(B) * shannon(resample(data))
    boot.mean <- mean(~result, data = shannon.boot)
    boot.se <- sd(~result, data = shannon.boot)
    boot.median <- median(~result, data = shannon.boot)
    alpha <- 1 - conf.level
    boot.lci <- qdata(alpha/2, vals = result, data = shannon.boot)
    boot.uci <- qdata(1 - alpha/2, vals = result, data = shannon.boot)
    boot.stats <- c(boot.mean, boot.se, boot.median, boot.lci, boot.uci)
    names(boot.stats) <- c("Mean", "Std.Error", "Median", "Lower CI", "Upper CI")
    return(round(boot.stats, 3))
}

Let's demonstrate using the new function with the gummi.bears data. First, we use the default values of \( B=1000 \) resamples and 95% confidence, then change to \( B=10000 \) trials and 80% confidence.

boot.shannon(gummi.bears)

##      Mean Std.Error    Median  Lower CI  Upper CI 
##     1.020     0.038     1.024     0.932     1.079

boot.shannon(data = gummi.bears, B = 10000, conf.level = 0.8)

##      Mean Std.Error    Median  Lower CI  Upper CI 
##     1.019     0.038     1.024     0.969     1.065

Last, we'll read in Glenna and Mandy's example with amphibians at the Cassity Lane site, in rural Calloway County near homes and farm fields, and the Colson's Overlook site in LBL, which is more wooded and protected'.

cassity.lane <- c(rep("green.frog", 29), rep("red.spotted.newt", 6), rep("bull.frog", 
    3), rep("cricket.frog", 2), "fowlers.toad", "copes.gray.treefrog")
colson.overlook <- c(rep("copes.gray.treefrog", 38), rep("narrowmouth.toad", 
    5), rep("spotted.salamander", 3), "mole.salamander")

shannon(cassity.lane)

## [1] 1.045

boot.shannon(cassity.lane)

##      Mean Std.Error    Median  Lower CI  Upper CI 
##     0.973     0.172     0.977     0.618     1.288

shannon(colson.overlook)

## [1] 0.6678

boot.shannon(colson.overlook)

##      Mean Std.Error    Median  Lower CI  Upper CI 
##     0.635     0.139     0.640     0.350     0.899

In a subsequent document, we'll consider the problem of testing for a difference in diversity between Cassity Lane and Colson's Overlook, along with whether there is a significant difference in the diversity of make & state' (i.e. the make of your car and the state your license plate is from) on cars in student parking lots versus faculty parking lots. It will be fun.