(Instructor : Nishant Panda)

Additional References

  1. (IB) : An Introduction to the Bootstrap, Efron and Tibshirani, Chapman & Hall/CRC

Introduction

Last lecture we used the bootstrap technique to estimate the standard error of an estimator. In this lecture, we will go further and see how we can use bootstrap to estimate bias and construct confidence intervals. The following notation from last lecture is written down for completeness.

Let \(X \sim F\) be a random variable (could be multidimensional!) whose c.d.f is given by \(F(x)\). Let us denote the expectation and variance of \(X\) by \(\mu_{F}\) and \(\sigma^{2}_{F}\) to emphasize the distribution. These are all parameters of the distribution \(F\). In general, we denote a parameter \(\theta\) as a function the distribution \(F\), denoted as \(\theta = T(F)\) or \(\theta_{F}\) for short . Let \(X_1, X_2, \dots, X_n\) be a random sample of size \(n\) of \(X\). An estimator \(\widehat{\theta}\), is a function of the sample, i.e \(\widehat{\theta} = h(X_1, X_2, \dots, X_n)\). For example, if \(\theta_{F} = \mu_F\), then \(\widehat{\theta} = \overline{X}\) is an estimator for the mean. Similarly, if \(\theta_{F} = \sigma^{2}_F\), then \(\widehat{\theta} = S^2\) is an estimator for the variance. Let \(\widehat{F}\) be an estimated c.d.f, then \(\widehat{\theta}_{\widehat{F}}\) is the plug-in estimator for \(\theta\).

As usual if \(\theta_{F} = \sigma^2_{F}\), then \(S^2\) is an estimator for the variance given by \[ \widehat{\theta} = S^2 = \frac{1}{n-1}\sum\limits_{i = 1}^{n}(X_i - \overline{X})^2. \] But the plug-in estimator using the emperical c.d.f is \[ \widehat{\theta_{\widehat{F}}} = \sigma^2_{\widehat{F}} = \frac{1}{n}\sum\limits_{i = 1}^{n}(X_i - \overline{X})^2. \]

Estimating Bias through bootstrapping.

Let \(X \sim F\). Let \(X_1, X_2, \dots, X_n\) be a random sample of \(X\). Let \(\widehat{F}\) be an estimated c.d.f. And say we are interested in estimating some parameter \(\theta_{F}\) using the sample through an estimator \(\widehat{\theta}\). The plug-in estimator is denote by \(\widehat{\theta_{\widehat{F}}}\). We define the Bias of the estimator \(\widehat{\theta}\) as,

\[ \beta_{F}(\widehat{\theta}, \theta) = \mathbb{E}_{F}\left[\widehat{\theta}\right] - \theta_{F}. \] As you must now have guessed, the Bootstrap estimate of the bias is given by the plug-in estimator, \[ \beta_{\widehat{F}}(\widehat{\theta}, \theta) = \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] - \widehat{\theta_{\widehat{F}}} \]

Note, that the expectation can be approximated from Monte Carlo,

\[ \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] \approx \frac{1}{R}\sum\limits_{j = 1}^{R}\widehat{\theta(\mathbf{x^{\ast}_{j}})} \] where the notation is from last lecture. Thus, Bootstrap bias estimate of \(\widehat{\theta}\) is

\[ \beta_{\widehat{F}}(\widehat{\theta}, \theta) \approx \frac{1}{R}\sum\limits_{j = 1}^{R}\widehat{\theta(\mathbf{x^{\ast}_{j}})} - \widehat{\theta_{\widehat{F}}}. \]

Example 1: Estimate the bias of \(S^2\) from last Lecture using Bootstrap.

# Observed data from Last Lecture
set.seed(42)
n <- 40
obs.data <- rnorm(n, mean = 5, sd = 2)
# First let us create our estimator
theta.hat <- function(sample){
  return (var(sample))
}
# Set the number of bootstrap samples R
R <- 1e3
# create a vector for sampling distribution
theta.hat.dist <- rep(0, R)

# do the bootstrap to get the sampling distribution
for (boot in 1:R){
  
  # get sample of size n from F hat.
  x.star <- sample(obs.data, n, replace = TRUE)
  
  # get the estimated value from the estimator
  theta.hat.dist[boot] <- theta.hat(x.star)
}
# create the plug-in estimator function
g <- function(x, x_bar){
  (x - x_bar)^2
}

# Compute the bias

# Get the expected value of the estimator from bootstrap
mu.Fhat <- mean(theta.hat.dist)

# Get the plug-in estimator
theta.Fhat <- sum(sapply(obs.data, g, x_bar = mean(obs.data)))/n

# Compute the bias as the difference
bias.boot <- mu.Fhat - theta.Fhat
bias.boot
## [1] -0.03825479

Let us check if we did the above correctly with the boot package in R

library(boot)
# our test statistic is the sample variance
sample.var <- function (x, d){
  return(var(x[d]))
}

# get the bootstrap object
b <- boot(obs.data, sample.var, R)

b
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = obs.data, statistic = sample.var, R = R)
## 
## 
## Bootstrap Statistics :
##     original     bias    std. error
## t1* 5.976929 -0.1527511    1.195859

The Bias seems different!. The boot pacakage seems to be computing bias using \[ \beta_{\widehat{F}}(\widehat{\theta}, \theta) = \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] - \widehat{\theta} \]

That is it does not use the plug-in estimator in the last term but uses the original estimator. If \(\widehat{\theta}\) was a plug-in estimator the boot package would have given the same answer. (There is no way for a code to know what is the plug-in estimator for a general estimator. It can only take the estimator you have as an argument.)

# Compute the bias as the difference between
# bootstrap expectation and original estimator
bias.boot.new = mu.Fhat - var(obs.data)
print(bias.boot.new)
## [1] -0.187678

This should not be a problem when the size of your observed data is large but it is something to keep in mind.

(Home Assignment!) For the obs.data here, assume that your estimator is not \(S^2\) but in-fact the plug-in estimator \[\sigma^2_{\widehat{F}} = \frac{1}{n}\sum\limits_{i = 1}^{n}(X_i - \overline{X})^2.\] That is, you are estimating \(\sigma^2\) using \(\sigma^2_{\widehat{F}}\). Mimic the code above and compute the Standard Error and Bias of \(\sigma^2_{\widehat{F}}\). Also check with boot package.

Bootstrap Confidence Interval

Percentile Confidence Intervals

The simplest Bootstrap Confidence Interval can be described as follows. If \(\widehat{\theta}\) is the parameter of interest, and \(\lbrace \widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace\) are the bootstrap estimates for \(1 \leq j \leq R\), then the sampling distribution of \(\widehat{\theta}\) is approximated by the histogram of \(\lbrace \widehat{\theta(\mathbf{x^{\ast}_{j}}})\rbrace\). Say we want a \(90\%\) confidence interval. Then using the histogram we get the \(5^{th}\) and \(95^{th}\) percentiles and these form the lower and upper end of the interval. In math lingo, if we seek a confidence level \((1 - \alpha)\) let \[ m = \left[\frac{\alpha}{2}R\right] \]

where, \(\left[\cdot\right]\) denotes the greatest integer function. Then, to compute the \((1 - \alpha)\) confidence interval,

  1. Order the bootstrap estimates \(\lbrace \widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace\) in increasing order \[ \widehat{\theta^{\ast}_{(1)}}, \widehat{\theta^{\ast}_{(2)}}, \ldots, \widehat{\theta^{\ast}_{(R)}} \]

  2. The confidence interval is given by \[ \left(\widehat{\theta^{\ast}_{(m)}}, \widehat{\theta^{\ast}_{(R+1-m)}}\right) \]

This is easier to code than to write!

Example 2: Construct a 95% confidence interval for \(\widehat{\theta}\) as in Example 1.

# We already did all of the work here.
bounds <- quantile(theta.hat.dist, c(0.025, (1 - 0.025)))
sprintf("(%s, %s)", round(bounds[1],2), round(bounds[2],2))
## [1] "(3.54, 8.23)"

As a sanity check (that is using actual values)

m = floor(R*0.025)
sorted.thetaH <- sort(c(theta.hat.dist, var(obs.data)))
sprintf("(%s, %s)", round(sorted.thetaH[m], 2), round(sorted.thetaH[(R+1-m)]),2)
## [1] "(3.54, 8)"

Let us check this using boot package

# b was our bootstrap object from before
boot.ci(b, type = "perc")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b, type = "perc")
## 
## Intervals : 
## Level     Percentile     
## 95%   ( 3.523,  8.216 )  
## Calculations and Intervals on Original Scale
Studentized Bootstrap

The percentile Bootstrap can get narrow in practice (undercoverage). The studentized Bootstrap (also known as Bootstrap-t) is a technique that prevents this flaw in the percentile Bootstrap. First some theory. For some unbiased estimator \(\widehat{\theta}\) for \(\theta\) if C.L.T holds, then \[ Z = \frac{\widehat{\theta} - \theta}{SE} \sim AN(0, 1) \] But \(SE\) is typically not known. “Student” found that if \(\widehat{\theta} = \overline{X}\) for data drawn from normal distribution, then \[ T = \frac{\widehat{\theta} - \theta}{\widehat{SE}} \sim t_{n-1} \] For any arbitrary estimator \(\widehat{\theta}\), this may not be true. Bootstrap-t method tries to get an approximation of \(T\) directly from data using the fact that \(\widehat{\theta} - \theta\) is distributionally closer to \(\widehat{\theta}^{\ast} - \widehat{\theta}\), where \(\widehat{\theta}^{\ast}\) is the bootstrap distribution of \(\widehat{\theta}\) i.e \[ T \approx \frac{\widehat{\theta}^{\ast} - \widehat{\theta}}{\widehat{SE^{\ast}}} \] where, \(\widehat{SE^{\ast}}\) is the plug-in estimate of \(SE^{\ast}\). Here is an algorithm to get the distribution of \(Z\).

  1. Get a bootstrap sample \(\mathbf{x^{\ast}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\). Suppose (and this is a big assumption) you can estimate the SE of \(\widehat{\theta^{\ast}}\) for this sample \(\mathbf{x^{\ast}}\), then \[ T({\mathbf{x^{\ast}}}) = \frac{\widehat{\theta(\mathbf{x^{\ast}})} - \widehat{\theta}}{\widehat{SE}(\mathbf{x^{\ast}})} \]
  2. Repeat this some \(R\) times and get the distribution of \(T^{\ast}\).

  3. The confidence interval is then given by \[ \left(\widehat{\theta} - t^{\ast}_{1 - \alpha/2}\widehat{SE}^{\ast}, \widehat{\theta} - t^{\ast}_{\alpha/2}\widehat{SE}^{\ast}\right) \]

The big question here is how do we get \(\widehat{SE^{\ast}}\)? This is mind-bending (inception). Let us go step by step.

\[ \widehat{SE^{\ast}} = \sqrt{{var}_{\widehat{F^{\ast}}}\left[\widehat{\theta^{\ast}}\right]} \] Now, \[ {var}_{\widehat{F^{\ast}}}\left[\widehat{\theta^{\ast}}\right] = \mathbb{E}_{\widehat{F^{\ast}}}\left[\left(\widehat{\theta^{\ast}} - \mathbb{E}_{\widehat{F^{\ast}}}\left[ \widehat{\theta^{\ast}}\right]\right)^2\right] \]

Note the \(\widehat{F^{\ast}}\)! If \(\widehat{\theta}\) is complicated, we need to use Bootstrap to estimate \(var_{\widehat{F^{\ast}}}\left[\widehat{\theta^{\ast}}\right]\)! This is double bootstrap.

In order to get the distribution of a statistic we take bootstrap samples from the data. In order to get the distribution of the bootstrap statistic we take bootstrap samples from the bootstrap data!

Population : c.d.f is \(F\), parameter is \(\theta_{F}\)

You sample from \(F\), you get data!

Data : approx c.d.f is \(\widehat{F}\), estimator is \(\widehat{\theta}\)

You sample from \(\widehat{F}\) you get the Bootstrap sample

Bootstrap sample : approx c.d.f is \(\widehat{F}^{\ast}\), Bootstrap estimator is \(\widehat{\theta^{\ast}}\)

Example 3: Let \(X \sim F\) be a random variable and let \(X_1, X_2,\dots, X_n\) be a random sample. Suppose you are interested in estimating the mean of \(X\) by the sample mean. That is, your parameter \(\theta_{F} = \mu_{F}\) and your estimator \(\widehat{\theta} = \overline{X}\). Let \(\widehat{F}\) be the emperical c.d.f What are the standar error \(SE\), the plug-in estimator \(\widehat{\theta_{\widehat{F}}}\), the plug-in estimate of the standard error \(\widehat{SE}\), the bootstrap estimator \(\widehat{\theta^{\ast}}\) and the plug-in estimator of the bootstrap standard error \(\widehat{SE^{\ast}}\)? .

  1. standard error\(SE\).

\[SE(\overline{X}) = \sqrt{var_{F}\left[\overline{X}\right]}\]

Using the fact that, \[var_{F}\left[\overline{X}\right] = \frac{\sigma^{2}_{F}}{n}\], we get \[ SE(\overline{X}) = \frac{\sigma_{F}}{\sqrt{n}}. \]

  1. the plug-in estimator \(\widehat{\theta_{\widehat{F}}}\) By definition, plug-in estimator is the parameter evaluated w.r.t \(\widehat{F}\). \[ \widehat{\theta_{\widehat{F}}} = \theta_{\widehat{F}} = \mu_{\widehat{F}} \]

Note that, since \(\mu_F = \mathbb{E}_{F}\left[X\right]\), \[ \mu_{\widehat{F}} = \mathbb{E}_{\widehat{F}}\left[X\right] \] This is easy to calculate \[ \mathbb{E}_{\widehat{F}}\left[X\right] = \frac{1}{n}\sum\limits_{i = 1}^{n}X_i = \overline{X} \]

  1. the plug-in estimate of the standard error \(\widehat{SE}\) \[ \text{plug-in } \widehat{SE} = SE_{\widehat{F}} = \sqrt{var_{\widehat{F}}\left[\overline{X}\right]} \] Using the fact that, \[ var_{\widehat{F}}\left[\overline{X}\right] = var_{\widehat{F}}\left[\frac{1}{n}\sum\limits_{i = 1}^{n}X_i\right] \] along with the fact that \(X_i\) is an iid sample of \(X\), we get \[ var_{\widehat{F}}\left[\overline{X}\right] = \frac{1}{n}var_{\widehat{F}}\left[X\right] = \frac{1}{n}\mathbb{E}_{\widehat{F}}\left[\left(X - \mathbb{E}_{\widehat{F}}\left[X\right]\right)^2\right] \] Thus, \[ var_{\widehat{F}}\left[\overline{X}\right] = \frac{1}{n^2}\sum\limits_{i = 1}^{n}\left(X_i - \overline{X}\right)^2 \]

\[ \text{plug-in } \widehat{SE} = \frac{\sqrt{\sum\limits_{i = 1}^{n}\left(X_i - \overline{X}\right)^2}}{n} \]

  1. Bootstrap estimator \(\widehat{\theta^{\ast}}\) Let \(X_1^{\ast}, X_2^{\ast}, \dots, X_n^{\ast}\) be a bootstrap sample and let \(\widehat{F^{\ast}}\) be the emperical c.d.f of the bootstrap sample. Then, if \(\widehat{\theta} = h(X_1, X_2, \dots, X_n)\), then \(\widehat{\theta^{\ast}} = h(X_1^{\ast}, X_2^{\ast}, \dots, X_n^{\ast})\). Thus,

\[ \widehat{\theta^{\ast}} = \overline{X^{\ast}} = \frac{1}{n}\sum\limits_{i = 1}^{n}X_i^{\ast} \] 5. Convince yourself that \[ \text{plug-in } \widehat{SE^{\ast}} = \frac{\sqrt{\sum\limits_{i = 1}^{n}\left(X_i^{\ast} - \overline{X^{\ast}}\right)^2}}{n} \]

(Home Assignment!): With the same observed data obs.data as in the examples, say you are now interested in the mean and your estimator is \(\overline{X}\). Get the studentized bootstrap confidence interval for this estimator. You will need Example 3 i.e you don’t need to do bootstrap.

In the next Lecture we will see how to implement double bootstrap to construct a studentized bootstrap confidence interval for \(S^2\).