(Instructor : Nishant Panda)

Additional References

  1. (IB) : An Introduction to the Bootstrap, Efron and Tibshirani, Chapman & Hall/CRC

Introduction

(One of!) The most important task in statistical analyisis is understanding properties of an estimator. Typically, one studies toy problems where analytical results can be derived for the estimators at hand. Although this is an important pedogogical requirement, in practice one encounters tedious estimators whose anlytical properties cannot be derived (or difficult to derive). A simulation study then becomes paramount, and the Bootstrap is one such extremely useful and popular technique that we will exlpore in these lectures.

A simple estimator and the plug-in principle.

Let \(X \sim F\) be a random variable (could be multidimensional!) whose c.d.f is given by \(F(x)\). Let us denote the expectation and variance of \(X\) by \(\mu_{F}\) and \(\sigma^{2}_{F}\) to emphasize the distribution.

\[ \mu_{F} = \mathbb{E}_{F}\left[(X)\right], \hspace{0.01in} \sigma^{2}_{F} =\mathbb{E}_{F}\left[(X - \mu_F)^{2}\right]. \] Let \(X_1, X_2, \dots, X_n\) be a sample of \(X\) of size \(n\). If we want to \(\mu_F\) from this sample, a popular estimator is the sample mean \(\overline{X}\). \[ \overline{X} = \frac{1}{n}\sum\limits_{i = 1}^{n}X_i. \] How “good” of an estimator is this. This is vague right now, but typically we would like to know \(\mathbb{E}\left[\overline{X}\right]\) and \(var\left[\overline{X}\right]\). It is not hard to show that \[ \mathbb{E}\left[\bar{X}\right] = \mu_{F}, \] and \[ var\left[\bar{X}\right] = \frac{\sigma^2_{F}}{n} \] The standard error \(SE(\bar{X})\) is then, \[ SE_{F}(\bar{X}) = \sqrt{var\left[\bar{X}\right]} = \frac{\sigma_F}{\sqrt{n}} \]

However, these are precisely the quantities that we don’t know. In particular, we usually don’t know the population distribution (C.D.F) \(F\). Hence, we need to approximate(estimate) these quantities in order to talk about how good an estimator \(\overline{X}\) is. The most common estimate of \(\mu_{F}\) is the sample mean \(\bar{X}\), and of \(\sigma^2_{F}\) is the sample variance \(S^2\). However, these are not the only approximations!

Note that we can get an estimate of \(F\) from the data! We have seen this before. Let \(\widehat{F} = F_e\) be the emperical c.d.f \[ \widehat{F}(x) = \frac{1}{n}\sum\limits_{i = 1}^{n}\mathbb{I}_{(X_i \leq x)} \] The plug-in principle says that any quantity that depends on \(F\) can be approximated by \(\widehat{F}\)! Expectations behave nicely with respect to \(\widehat{F}\) above. If \(\widehat{F}\) is the emperical c.d.f, then, \[ \mathbb{E}_{\widehat{F}}\left[g(X)\right] = \frac{1}{n}\sum\limits_{i = 1}^{n}g(X_i) \]

Let us look at an example:

Example 1 (Approximating the standard error):

Let \(X\sim Bernoulli(p)\) with C.D.F \(F\). Let \(X_1, X_2, \dots, X_n\) be a sample of \(X\) of size \(n\). A good estimator of \(p\) is \(\overline{X}\).

What are \(\mu_F\) and \(\sigma^2_{F}\)? Since \(X\) is Bernoulli, \(\mu_F = p\) and \(\sigma^2_{F} = p(1-p)\).

Note that \[ S = \sum\limits_{i = 1}^{n} X_i \sim Bin(n, p) \]

\(\mathbb{E}\left[\bar{X}\right] = p\) and \(var\left[\bar{X}\right] = \frac{p(1-p)}{n}\). However, we don’t know \(p\). The plug-in principle states that we use an approximate c.d.f \(\widehat{F}\) to approximate these quantities. Let \(\widehat{F} = F_e\) be the emperical c.d.f, then, \[ \hat{p} = \mu_{\widehat{F}} = \mathbb{E}_{\widehat{F}}\left[X\right] = \bar{X} \]

\[ \sigma^2_{\widehat{F}} = var_{\widehat{F}}\left[X \right] = \mathbb{E}_{\widehat{F}}\left[(X - \mu_{\widehat{F}})^2\right] \]

Since, \(\widehat{F}\) is the emperical c.d.f, we get \[ \sigma^2_{\widehat{F}} = \frac{1}{n}\sum\limits_{i = 1}^{n}(X_i - \hat{p}_{})^2. \] (Here we also used the plug-in estimate for \(\mu_{F}\). Note that this estimate of variance is different from \(S^2\)! This, lets us approximate the standard error \[ SE_{\widehat{F}}(\bar{X}) = \frac{\sigma^2_{\widehat{F}}}{\sqrt{n}}. \]

In this example, we could explicitly carry out the plug-in estimate. In most real cases, this won’t be possible. The Bootstrap estimate will allow us to estimate the standard errors (and other statistical properties) of an arbitrary estimator \(\widehat{\theta}\).

Overview and notation overload!

We employ statistical theory to derive the sampling distribution of the statistic (in particular an estimator). From the sampling distribution, we can obtain various statistical properties like variance, biasedness; construct confidence interval etc.

Biggest challenge : What if the sampling distribution is impossible to obtain, or what if the asymtotic theory doesn’t hold?

Main idea of bootstrapping is to estimate the sampling distribution of the statistic using data.

Let \(X \sim F\) with \(F\) being the c.d.f of \(X\). Let \(X_1, X_2, \dots, X_n\) be a sample of size \(n\). Let \(\theta = T(F)\) be some parameter of interest and let \(\widehat{\theta}\) be an estimator. Bootstrap method uses bootstrap samples to estimate statistical properties of \(\widehat{\theta}\) using an estimated c.d.f \(\widehat{F}\). For example, if we are interested in the standard error \(SE_{F}(\widehat{\theta})\), the bootstrap will approximate the plug-in estimate \(SE_{\widehat{F}}(\widehat{\theta})\).

The bootstrap alogrithm is as follows:
1. Draw a bootstrap sample of size \(n\) from \(\widehat{F}\): \(y_1^{\ast}, y_2^{\ast}, \dots y_n^{\ast}\).
2. For each bootstrap sample compute the statistic \(\widehat{\theta_{r}^{\ast}}\) using the boostrap sample \(y_1^{\ast}, y_2^{\ast}, \dots y_n^{\ast}\).
3. Repeat the procedure \(R\) times to get \(\widehat{\theta_{1}^{\ast}}, \dots,\widehat{\theta_{R}^{\ast}}\). Make inferences (SE, histogram etc.)

The key step : How do we draw bootstrap samples? This leads to two different approaches.

  1. Non-parametric bootstrap: \(\widehat{F} = F_e\) is the emperical c.d.f. In this case drawing bootstrap samples is equivalent to drawing samples with replacement from the data!

  2. Parameteric bootstrap: \(\widehat{F}\) is obtained from a parameteric model (eg. Max likelihoods etc.) and we use simulation methods (Monte Carlo!) to draw samples from \(\widehat{F}\).

A simple illustration using boot pacakge in R

data("trees")
library(boot)

head(trees$Girth)
## [1]  8.3  8.6  8.8 10.5 10.7 10.8
# our test statistic is the sample median
sample.med <- function (x, d){
  return(median(x[d]))
}

# get the bootstrap object
b <- boot(trees$Girth, sample.med, 1000)

b
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = trees$Girth, statistic = sample.med, R = 1000)
## 
## 
## Bootstrap Statistics :
##     original  bias    std. error
## t1*     12.9  -0.299   0.8732133