(Instructor : Nishant Panda)

Additional References

  1. (IB) : An Introduction to the Bootstrap, Efron and Tibshirani, Chapman & Hall/CRC

Introduction

Last Lecture we looked at the bootstrap esitmate of bias and used the bootstrap to construct a confidence interval. In this lecture we will complete the bootstrap technique to construct confidence interval.

Let \(X \sim F\) be a random variable (could be multidimensional!) whose c.d.f is given by \(F(x)\). Let us denote the expectation and variance of \(X\) by \(\mu_{F}\) and \(\sigma^{2}_{F}\) to emphasize the distribution. These are all parameters of the distribution \(F\). In general, we denote a parameter \(\theta\) as a function the distribution \(F\), denoted as \(\theta = T(F)\) or \(\theta_{F}\) for short . Let \(X_1, X_2, \dots, X_n\) be a random sample of size \(n\) of \(X\). An estimator \(\widehat{\theta}\), is a function of the sample, i.e \(\widehat{\theta} = h(X_1, X_2, \dots, X_n)\). For example, if \(\theta_{F} = \mu_F\), then \(\widehat{\theta} = \overline{X}\) is an estimator for the mean. Similarly, if \(\theta_{F} = \sigma^{2}_F\), then \(\widehat{\theta} = S^2\) is an estimator for the variance. Let \(\widehat{F}\) be an estimated c.d.f, then \(\widehat{\theta}_{\widehat{F}}\) is the plug-in estimator for \(\theta\).

As usual if \(\theta_{F} = \sigma^2_{F}\), then \(S^2\) is an estimator for the variance given by \[ \widehat{\theta} = S^2 = \frac{1}{n-1}\sum\limits_{i = 1}^{n}(X_i - \overline{X})^2. \] But the plug-in estimator using the emperical c.d.f is \[ \widehat{\theta_{\widehat{F}}} = \sigma^2_{\widehat{F}} = \frac{1}{n}\sum\limits_{i = 1}^{n}(X_i - \overline{X})^2. \] By using bootstrap, we get an approximation of the sampling distribution of our estimator \(\widehat{\theta}\). This sampling distribution lets us estimate many useful quantities like the standard error for example.

Bootstrap Confidence interval

Let \(X \sim F\). Let \(X_1, X_2, \dots, X_n\) be a random sample of \(X\). Let \(\widehat{F}\) be an estimated c.d.f. And say we are interested in estimating some parameter \(\theta_{F}\) using the sample through an estimator \(\widehat{\theta}\). The plug-in estimator is denote by \(\widehat{\theta_{\widehat{F}}}\).

The Percentile bootstrap C.I

The percentile bootstrap confidence interval is the simplest to implement and can be explained as follows:

  1. Approximate the sampling distribution of \(\widehat{\theta}\) using bootstrap samples \(\lbrace \mathbf{x^{\ast}_{j}}\rbrace_{j = 1}^{R}\), where \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is a sample from \(\widehat{F}\), by computing \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).

  2. This is a vector. For a given confidence level \(\alpha\), get the quantiles \(\alpha/2\) and \((1 - \alpha/2)\). This forms the lower and upper end of the percentile bootstrap C.I intervals.

Percentile confidence intervals are robust (they use quantiles rather than means). Hence, if your original data has outliers, they are really good. However, they are narrow in the sense that the don’t achieve the theoretical gurantee of “trapping the true parameter”.

The Normal bootstrap C.I

Using asymptotic theory, we note that the confidence interval around the estimate \(\widehat{\theta}\) is given by \[ \widehat{\theta} \pm z_{1-\alpha/2}SE_{F}(\widehat{\theta}) \]

Since, \(F\) is unknown, we estimate the standard error. In particular, if we use the Boostrap estimator \(SE_{\widehat{F}}(\widehat{\theta})\), we get the normal bootstrap confidence interval \[ \widehat{\theta} \pm z_{1-\alpha/2}SE_{\widehat{F}}(\widehat{\theta}) \]

In Lecture \(9\), we saw how to compute the boostrap estimate of standard error \(SE_{\widehat{F}}(\widehat{\theta})\). Thus, to get the normal bootstrap confidence interval

  1. Approximate the sampling distribution of \(\widehat{\theta}\) using bootstrap samples \(\lbrace \mathbf{x^{\ast}_{j}}\rbrace_{j = 1}^{R}\), where \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is a sample from \(\widehat{F}\), by computing \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).

  2. See Lecture \(9\) on how to compute\(SE_{\widehat{F}}(\widehat{\theta})\) from the sampling distribution \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\). If you are in a hurry, just use \[ sd\left(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\right) \]

  3. Confidence interval is then, \[ \widehat{\theta} \pm z_{1-\alpha/2}sd\left(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\right) \]

The Bias Corrected normal bootstrap C.I

Here, instead of using the normal confidence interval around the point estimate we use the confidence interval around the bias corrected point estimate, given by \[ \widehat{\theta^{BC}} = \widehat{\theta} - \beta_{F}(\widehat{\theta}, \theta) \]

Again, we use the bootstrap estimate of the bias \[ \widehat{\theta^{BC}} \approx \widehat{\theta} - \beta_{\widehat{F}}(\widehat{\theta}, \theta). \]

See Lecture \(10\) to compute the approximate bias \(\beta_{\widehat{F}}(\widehat{\theta}, \theta)\). Hence, to get the normal bootstrap bias corrected confidence interval

  1. Approximate the sampling distribution of \(\widehat{\theta}\) using bootstrap samples \(\lbrace \mathbf{x^{\ast}_{j}}\rbrace_{j = 1}^{R}\), where \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is a sample from \(\widehat{F}\), by computing \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).

  2. See Lecture \(9\) on how to compute \(SE_{\widehat{F}}(\widehat{\theta})\) from the sampling distribution \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\). If you are in a hurry, just use \[ sd\left(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\right) \]

  3. See Lecture \(10\) to compute \(\beta_{\widehat{F}}(\widehat{\theta}, \theta)\) which is the bootrap estimate of bias of \(\widehat{\theta}\) from the sampling distribution \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).

  4. Confidence interval is then, \[ \left(\widehat{\theta} -\beta_{\widehat{F}}(\widehat{\theta}, \theta) \right) \pm z_{1-\alpha/2}sd\left(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\right) \]

Note: If your estimator \(\widehat{\theta}\) is not the plug-in estimator, the bias computed will be slighly off. In that case you may use (with caution!) \[ \beta_{\widehat{F}}(\widehat{\theta}, \theta) = \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] - \widehat{\theta} \] instead of the usual \[ \beta_{\widehat{F}}(\widehat{\theta}, \theta) = \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] - \widehat{\theta_{\widehat{F}}}. \]

Example 1: Let \(X \sim N(\mu,\sigma^2)\) from lectures \(9\) and \(10\). Say we are interested in estimating \(\mu\) using the sample mean \(\overline{X}\) as the estimator. Given a data of size \(n\), compute percentile bootstrap confidence interval, normal bootstrap confidence interval and the bias corrected bootstrap confidence interval.

# Number of bootstrap samples
R = 2000
# Let us use the boot package
my.estimator <- function(sample, indices){
  mean(sample[indices])
}
library(boot)
# get the bootstrap object
b <- boot(obs.data, my.estimator, R)
# get confidence intervals
boot.ci(b, conf = 0.95, type = c("perc", "norm", "bca"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b, conf = 0.95, type = c("perc", "norm", "bca"))
## 
## Intervals : 
## Level      Normal             Percentile            BCa          
## 95%   ( 4.181,  5.674 )   ( 4.176,  5.686 )   ( 4.169,  5.681 )  
## Calculations and Intervals on Original Scale

(Home Assignment!): Write your own code to compute all 3 confidence intervals in Example 1 above and check with the boot package.

(Home Assignment!): Now you are interested in estimating \(\sigma^2\) using the plug-in variance \(\sigma^2_{\widehat{F}}\) as your estimator. Write your own code to compute all 3 confidence intervals in Example 1 above and check with the boot package.

The studentized bootstrap C.I (t-bootstrap confidence interval)

To compute the studentized bootstrap, we approximate \[ T = \frac{\widehat{\theta} - \theta}{SE}\approx \frac{\widehat{\theta}^{\ast} - \widehat{\theta}}{\widehat{SE^{\ast}}} \] where, if \(\widehat{\theta} = h(X_1, X_2,\dots, X_n)\), then \(\widehat{\theta^{\ast}} = h(X_1^{\ast}, \dots, X_n^{\ast})\) and \(\widehat{SE^{\ast}}\) is the plugin-in estimate of the standard error for \(\widehat{\theta^{\ast}}\). To compute, the t-bootstrap confidence interval,

  1. Approximate the sampling distribution of your estimator \(\widehat{\theta}\) using bootstrap samples \(\lbrace \mathbf{x^{\ast}_{j}}\rbrace_{j = 1}^{R}\), where \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is a sample from \(\widehat{F}\), by computing\(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).

  2. For each bootstrap sample \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\), compute the standard error estimate \(\widehat{SE(\mathbf{x^{\ast}_{j}})}\).

  3. Thus using Steps 1 and 2, compute the distribution \(T^{\ast} = \lbrace T(\mathbf{x^{\ast}_{j}})\rbrace_{j = 1}^{R}\). Note that \(T^{\ast}\) is an approximation of the distribution of \(T\). For a given confidence level \(\alpha\), compute the quantiles \(t^{\ast}_{\alpha/2}\) and \(t^{\ast}_{1 - \alpha/2}\).

  4. The t-C.I is given by, \[ \left(\widehat{\theta} - t^{\ast}_{1 - \alpha/2}\widehat{SE}, \widehat{\theta} - t^{\ast}_{\alpha/2}\widehat{SE}\right), \] where \(\widehat{SE}\) is the bootstrap estimate of the standard error of \(\widehat{\theta}\).

Example 2 Compute the t-bootstrap C.I for \(\overline{X}\) from Example \(1\) above.

STEP 0: Create your estimator (as a function of data or \(\widehat{F}\))

my.estimator <- function(sample){
  mean(sample)
}

STEP 1: Approximate the sampling distribution of your estimator \(\widehat{\theta}\) using bootstrap samples \(\lbrace \mathbf{x^{\ast}_{j}}\rbrace_{j = 1}^{R}\), where \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is a sample from \(\widehat{F}\), by computing \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).

Get bootstrap samples

# Get bootstrap samples
# Number of bootstrap samples
R <- 2000

# Get R bootstrap samples
# create a list to store bootstrap samples
x.star <- lapply(1:R, function(i) sample(obs.data, length(obs.data), replace = TRUE))
# print out x.star[j] for j = 1
x.star[1]
## [[1]]
##  [1]  9.0368474  4.1390617  5.7262568  0.1715847  7.0702070  3.7200102
##  [7]  3.7821472  7.4293494  7.0702070  4.4854612  4.4854612  4.6561653
## [13]  0.1715847  4.7333573  3.7200102  4.4314942  2.2222786  4.6561653
## [19]  4.8106819  3.8706037 -0.3129108  1.5659826  8.7903869  0.1715847
## [25]  8.0230440  9.0368474  3.7200102  6.2657252  6.2657252  3.7821472
## [31]  7.0702070  4.3867228  4.1390617  6.2719008  4.4854612  3.7821472
## [37]  1.4373831  4.4424225  0.1190661  3.7200102

Get samping distribution of estimator from bootstrap samples. \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).

# Get sampling distribution of your estimator
# theta.hat.star
theta.hat.star <- function(boot.sample){
  mean(boot.sample)
}
# get the sampling distribution
theta.star.dist <- sapply(x.star, theta.hat.star)

STEP 2: For each bootstrap sample \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\), compute the standard error estimate \(\widehat{SE(\mathbf{x^{\ast}_{j}})}\)

Note that if \(\widehat{\theta}\) is \(\overline{X}\), then \[\text{plug-in } \widehat{SE^{\ast}} = \frac{\sqrt{\sum\limits_{i = 1}^{n}\left(X_i^{\ast} - \overline{X^{\ast}}\right)^2}}{n}\]

Get \(\widehat{SE(\mathbf{x^{\ast}_{j}})}\) from bootstrap samples.

# se.star 
se.star <- function(boot.sample){
  mean.boot.samp <- mean(boot.sample)
  temp.deviation <- boot.sample - mean.boot.samp
  square.deviation <- temp.deviation^2
  se.star.boot <- sqrt(sum(square.deviation))/length(boot.sample)
}

STEP 3: Thus compute \(T^{\ast} = \lbrace T(\mathbf{x^{\ast}_{j}})\rbrace_{j = 1}^{R}\). \(T^{\ast}\) is an approximation of the distribution of \(T\).

# T.star

se.star.dist <- sapply(x.star, se.star)
T.star <- (theta.star.dist - my.estimator(obs.data))/se.star.dist

For a given confidence level \(\alpha\), compute the quantiles \(t^{\ast}_{\alpha/2}\) and \(t^{\ast}_{1 - \alpha/2}\).

# 9
alpha <- 0.05
crit.bounds <- quantile(T.star,c(alpha/2, 1-alpha/2))

STEP 4: The t-C.I is given by, \[ \left(\widehat{\theta} - t^{\ast}_{1 - \alpha/2}\widehat{SE}, \widehat{\theta} - t^{\ast}_{\alpha/2}\widehat{SE}\right), \] where \(\widehat{SE}\) is the bootstrap estimate of the standard error of \(\widehat{\theta}\)

# get the bootstrap se estimate
se.boot <- sd(theta.star.dist)
# lower bound
lower.bound <- my.estimator(obs.data) - crit.bounds[2]*se.boot
upper.bound <- my.estimator(obs.data) - crit.bounds[1]*se.boot
sprintf("(%s, %s)", round(lower.bound,3), round(upper.bound,3))
## [1] "(4.127, 5.672)"

To run this using boot package we need to return the estimated SE in our estimator function

my.estimator2 <- function(sample, indices){
  theta_hat.star <- mean(sample[indices])
  var.star <- sum((sample[indices] -theta_hat.star)^2)/(length(sample[indices]))^2
  c(theta_hat.star, var.star)
}
boot.obj <- boot(obs.data, my.estimator2, R)
boot.ci(boot.obj, conf = 0.95, type = "stud")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.obj, conf = 0.95, type = "stud")
## 
## Intervals : 
## Level    Studentized     
## 95%   ( 4.094,  5.668 )  
## Calculations and Intervals on Original Scale

Note that if we don’t have an estimate for the SE (no simple function) we need to run a bootstrap estimate for the SE.

Example 3: Testing converage properties. Let \(X \sim N(\mu, \sigma^2)\). Say we are interested in \(e^{\mu}\). Thus our \(\theta_{F} = e^{\mu}\). Our estimator in this case be \(e^{\overline{X}}\). Say we know the true \(\mu,\sigma\). For concreteness sake, let \(\mu = 0\) and \(\sigma = 1\). For a confidence level of \(0.10\), check by simulation methods if we get the right coverage of all three C.I i.e if the probability that the true parameter \(e^5\) is in the confidence interval \(90\%\) of the time.

num.sim = 1000
R = 1000
normal.ci <- rep(0, num.sim)
perc.ci <- rep(0, num.sim)
bc.ci <- rep(0, num.sim)

my_estimator <- function(sample, indices){
  exp(mean(sample[indices]))
}
mu <- 0
sigma <- 1
true.param <- exp(mu)

n <- 10

for (i in 1:num.sim){
  obs.data.sim <- rnorm(n, mean =mu, sd = sigma)
  # get bootstrap samples and compute the bootstrap estimate of the
  # estimator
  boot.object <- boot(obs.data.sim, my_estimator, R)
  # construct bootstrap confidence intervals
  boot.conf <- boot.ci(boot.object, conf = 0.95, 
                       type = c("perc", "norm","bca"))
  if((boot.conf$normal[2] <= true.param) &&
     (boot.conf$normal[3] >= true.param)){normal.ci[i] = 1}

  if((boot.conf$percent[4] <= true.param) &&
     (boot.conf$percent[5] >= true.param)){perc.ci[i] = 1}

  if((boot.conf$bca[4] <= true.param) &&
     (boot.conf$bca[5] >= true.param)){bc.ci[i] = 1}
}
# check the coverage of each C.I method

missed.normal <- (1 - sum(normal.ci)/num.sim)*100
missed.perc <- (1 - sum(perc.ci)/num.sim)*100
missed.bc <- (1 - sum(bc.ci)/num.sim)*100
missed.normal
## [1] 10.9
missed.perc
## [1] 11.2
missed.bc
## [1] 11

(Home Assignment!): Run the above example using studentizied t-Bootstrap.

(Home Assignment!): Load nycflights13 package and remove NA items. Take a sample of size 30 from the dep delay and build the 3 confidence interval (normal, percent, bca). Which C.I method do you think will be better? Test the coverage properties of each of the three methods by running a simulation 1000 times.

library(nycflights13)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# remove NA dep_delays
flights <- filter(flights, !is.na(dep_delay))
# sample of size 30
my.sample <- sample(flights$dep_delay, 30)