In this R sessions, we used the bootstrap to estimate standard errors and confidence intervals of predictions of ordinary and mixed-effects linear models. I start with a simple demonstration of the bootstrap, and remind everyone of the concepts population, sample, sampling distribution and standard error.

Also read these notes from previous sessions for more on the bootstrap:

Understanding the bootstrap

Consider a simple random sampling design: there is some (unknown) population of items, and you wish to estimate the population mean by taking a random sample of 10 units. Clearly, if you take multiple random samples the mean estimated for each sample will differ. We know from statistical theory that the distribution of these sample means in fact follows a normal distribution (independent of the actual distribution of the population, this is the Central Limit Theorem!). The distribution of the sample means is an example of a sampling distribution, and it describes how different our subsequent samples are, taken randomly from the population.

Now, the standard deviation of the sampling distribution is known as the standard error. Normally we estimate the standard error using the familiar equation,

\[SE = s / \sqrt{n}\]

where s is the sample standard deviation, that is, the standard deviation calculated from your sample, and n is the sample size (10 in this example).

In this example, we make a population, take a sample and calculate the SE of the mean.

# An example population
Population <- rweibull(100000, shape=2, scale=20)

# A random sample of 10
Sample <- sample(Population, 10, replace=FALSE)

# The mean and SE of the mean
mean(Sample)
## [1] 22.48
sd(Sample)/sqrt(10)
## [1] 3.401

We said in the above that the standard error is the standard deviation of the sampling distribution. Since, in this case, we actually know the entire population (it is Population), and can test this assertion via simulation. We can simply take a sample of 10 a thousand times, calculate the sample mean each time, and look at the standard deviation across these 1000 sample means. Is it similar to our SE calculated for a single sample?

# Use replicate to do something lots of times
r <- replicate(1000, mean(sample(Population, 10, replace=F)))

# This is 'very similar' to the SE calculated above.
sd(r)
## [1] 2.93

I say it is ‘very similar’, of course it is certainly not the same. The SE calculated from the sample is a much worse estimate since it is based only on the sample itself. However, they are certainly close enough that we can believe the method. It is very useful here for yourself to play around with the sample size, and see what happens.

Now to the bootstrap. The simple idea of the bootstrap is that we can resample the sample itself, with replacement, and the resulting distribution is in fact a close estimate of the true sampling distribution. Sampling with replacement means that many of our resamples contain the same unit more than once. This is all rather bizarre, but is turns out to work very well.

# Resample our sample 999 times, and calculate the mean each time.
boot <- replicate(999, mean(sample(Sample, replace=T)))

# As before, the SD of the sampling distribution (here approximated via the bootstrap)
# is the standard error of the mean.
sd(boot)
## [1] 3.264

Confidence intervals

With the bootstrap, we calculate the confidence interval simply as the quantiles of the sampling distribution. Here you can see that, because the sampling distribution is normal, this calculation is very similar to the standard “twice the standard error” method (which is in fact based on the quantiles of the standard normal distribution).

# Bootstrap method
quantile(boot, probs=c(0.025, 0.975))
##  2.5% 97.5% 
## 16.69 29.14
# Pretty similar to twice the standard error calculated for the sample
c(mean(boot) - 2*sd(Sample)/sqrt(10), mean(boot) + 2*sd(Sample)/sqrt(10))
## [1] 15.73 29.33

The chick weight data

The previous example is not practically useful, since you know how to calculate the SE and CI for the population mean anyway. But the bootstrap can be used in pretty much any situation, and is particularly powerful for calculating a confidence interval when your usual assumptions are probably not met. In that case, the classical CI will often be too narrow, which can lead to wrong conclusions about significance.

I will illustrate this with the ChickWeight data (built-in dataset in base R, see ?ChickWeight). In this dataset we have weights over time of chicks being fed one of four diets. The same chicks were reweighed, so we have a clear random effects structure (because the weighings are not independent observations). Suppose I am interested in weight at time=15 for some reason, and want to test whether the diets have a significant effect on weight at that time. If I was just interested in time=0, I could study the fitted intercept of the model, and if I was interested in some centered mean I could use least-square means. But in this case I can use predict.

First I will demonstrate the use of the bootstrap for this situation for lm, where we can compare the resulting SE’s also to predict.lm, if we set se=TRUE.

# Fit a linear model (ignoring repeated measures!)
fit1 <- lm(weight ~ Time*Diet, data=ChickWeight)

# Make dataframe for which we want predictions (recall from the above we are interested in Time=15)
newdat <- data.frame(Time=15, Diet=levels(ChickWeight$Diet))

# And predict from the model, also get SE
p <- predict(fit1, newdat, se=TRUE)

# Inspect the estimated SEs:
p$se.fit
##     1     2     3     4 
## 2.765 3.631 3.631 3.710

Now we do the same using the bootstrap. When bootstrapping, we resample our data, fit the model, and repeat. This is fairly easy to write with lapply or similar, but we can also use bootCase from car, like this:

library(car)

# Fit the model 999 times, apply a function each time. In this case, we predict from the model.
bootfit1 <- bootCase(fit1, function(x)predict(x, newdat), B=999)

# If you inspect bootfit1 you will see it is a matrix with 999 rows, and four columns (the Diets).
# Get the column-wise standard deviations like this.
apply(bootfit1, 2, sd)
##     1     2     3     4 
## 3.700 5.989 5.736 3.265

These four values are the SE of the predicted value of chick weight at Time=15. Compared to the ones we calculated with predict above, they are much larger. It turns out, and you can check this for yourself, that we violated every single assumption in fitting the linear model to the ChickWeight data. Particularly in the case of non-constant variance, the SEs calculated by predict (and lm) are too small (technically this is the case when the variance increases, which is pretty typical).

When your regression assumptions are violated and you cannot find a solution, use the bootstrap as the standard errors calculated with that are still valid.

Bootstrapping a linear mixed-effects model

Finally, we repeat the above using a mixed-effects model. The approach is identical, just the details differ. Also, this may take quite a while on a slower machine (this takes about a minute on my fast desktop).

library(lme4)
## Loading required package: Matrix
## Loading required package: Rcpp
library(lmerTest)
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
## 
## Attaching package: 'lmerTest'
## 
## The following object is masked from 'package:lme4':
## 
##     lmer
## 
## The following object is masked from 'package:stats':
## 
##     step
# Chicks are remeasured.
fit2 <- lmer(weight ~ Time*Diet + (Time|Chick),
             data=ChickWeight)

# quickly test if we need a random Time slope with rand from lmerTest
rand(fit2)
## Analysis of Random effects Table:
##            Chi.sq Chi.DF p.value    
## Time:Chick    688      2  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# bootMer from lme4 is similar to bootCase from car (but poorly documented)
# Make sure to set re.form=NA to output only predictions using the fixed effects.
bootfit2 <- bootMer(fit2, FUN=function(x)predict(x, newdat, re.form=NA),
                    nsim=999)
## Warning: Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
## Warning: Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
# We have to find that the $t element contains the predictions output by bootMer.
apply(bootfit2$t, 2, sd)
##      1      2      3      4 
##  9.005 12.843 12.298 12.344

You may receive some warnings about the model not converging, you can safely ignore that (the model won’t fit to some rare resampled datasets).

Once again the standard errors of the predictions are even larger. This makes sense because we included random Chick effect in the model. The added uncertainty comes from the fact that the Chicks differ a lot in weight at Time=15, and this uncertainty has an effect on the estimates of the fixed effects as well (which in turn, affects the predicted weight of the chicks at Time=15).

Prediction intervals

Finally, instead of just predicting at Time=15, it is now straightforward to predict across the entire range of the data, so we can plot confidence intervals around the prediction.

First we need a function that plots a transparent polygon if we give it three vectors: X, y1 (the upper part of the prediction interval), and y2 (the lower end). We can use polygon but it is rather annoying to use, so I wrote this simplifying function.

library(scales)

# Function to add a polygon if we have an X vector and two Y vectors of the same length.
addpoly <- function(x,y1,y2,col=alpha("lightgrey",0.8),...){
  ii <- order(x)
  y1 <- y1[ii]
  y2 <- y2[ii]
  x <- x[ii]
  polygon(c(x,rev(x)), c(y1, rev(y2)), col=col, border=NA,...)
}

We will redo the bootstrap, since this time we want predictions for all values of Time. If we call predict without a data argument, it will predict for all X values used when fitting the model.

bb <- bootMer(fit2,
              FUN=function(x)predict(x, re.form=NA),
              nsim=500)
## Warning: Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
## Warning: Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
## Warning: Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
# Take quantiles of predictions from bootstrap replicates.
# These represent the confidence interval of the mean at any value of Time.
ChickWeight$lci <- apply(bb$t, 2, quantile, 0.025)
ChickWeight$uci <- apply(bb$t, 2, quantile, 0.975)
ChickWeight$weightpred <- predict(fit2, re.form=NA)

# We will just plot one Diet for illustration
dat <- subset(ChickWeight, Diet == "1")

with(dat, {
  plot(Time,weight)
  addpoly(Time, lci,uci)
  lines(Time, weightpred)
})

plot of chunk unnamed-chunk-9