Bootstrap Example

For today’s example, I will demonstrate how to use the Bootstrap method to help evaluate statistics. The dataset in use will be from Larry Winner’s collection at http://www.stat.ufl.edu/~winner/datasets.html. The data are from R. J. Gladstone’s (1905) “A Study of the Relations of the Brain to the Size of the Head”, Biometrika, Vol. 4, pp105-123.

For this example I will use as a guide, Wikipedia’s article regarding brain size, https://en.wikipedia.org/wiki/Brain_size. The article refer’s to Hays-Gilpin and Whitley’s Reader in Gender Archaeology in stating that the average male adult brain weighs around 1345 grams, while the female brain has a weight of around 1222. Using these figures, we can arrive at a ratio for average male versus female brain size.

avg.males <- 1345
avg.females <- 1222
avg.ratio <- avg.males/avg.females
avg.ratio

## [1] 1.100655

Keeping this in mind, we can proceed to load the data and use a boxplot to compare the distribution of male and female brain weights. The dataset, brainhead.dat, contains 237 rows and the following parameters:

Gender 8 /* 1=Male, 2=Female */
Age Range 16 /* 1=20-46, 2=46+ */
Head size (cm^3) 21-24
Brain weight (grams) 29-32

bh <- read.table('brainhead.dat',sep='')
## Here we add column names to the data
names(bh) <- c('gender','ageRange','headSize','brainWt')
boxplot(brainWt~gender,data=bh,names=c('Males','Females'),main='Male vs. Female brain weights (grams)')

To gain some understanding on the ratio of male to female brain weights, we will perform bootstrap sampling for both male and female sets of measures and calculate ratios. We will then be able to graphically make sense of the data’s distribution. First, we will create variables for both male and female brain weights.

data.male <- bh$brainWt[bh$gender == 1]
data.female <- bh$brainWt[bh$gender == 2]

Then we can write a bit of code that will produce the ratios we are seeking.

boot.ratio <- function(data1,data2){
    ## create variables for the length of each vector
    l1 <- length(data1)
    l2 <- length(data2)
    ## now samples are made of each vector with replacement
    samp1 <- sample(data1,l1,replace=TRUE)
    samp2 <- sample(data2,l2,replace=TRUE)
    ## return a ratio of the mean of each sample
    return(mean(samp1)/mean(samp2))
    }

If coded correctly, appropriate results should occur below.

boot.ratio(data.male,data.female)

## [1] 1.086261

boot.ratio(data.male,data.female)

## [1] 1.094391

boot.ratio(data.male,data.female)

## [1] 1.09255

We can now, using the replicate() function, run the code 10000 times for 10000 sample ratios.

wtRatio <- replicate(10000,boot.ratio(data.male,data.female))

before displaying the data in a histogram, we will ascertain where the 2.5% and 97.5% quantiles are for the data. Then we can plot out the histogram, breaks for the quantiles, and add a point for where the estimate from Wikipedia lies.

qr <- quantile(wtRatio,c(.025,.975))
hist(wtRatio,breaks=100,main='Histogram of male vs. female brain weights (grams)',xlab='male brain wt./female brain wt.')
abline(v=qr[1],col='red',lty='dashed');abline(v=qr[2],col='red',lty='dashed')
points(avg.ratio,0,pch=19,col='red')

We can then see that, though the value from Wikipedia does not fall at the mean which is 1.0923985, it still falls between 2.5% and 95% of the data. This is not suprising as the article did note a fair amount of variability within genders.

Bootstrap Example

Pierre Phanor

September 24, 2015