(Instructor : Nishant Panda)
Additional References
Last Lecture we looked at the bootstrap esitmate of bias and used the bootstrap to construct a confidence interval. In this lecture we will complete the bootstrap technique to construct confidence interval.
Let \(X \sim F\) be a random variable (could be multidimensional!) whose c.d.f is given by \(F(x)\). Let us denote the expectation and variance of \(X\) by \(\mu_{F}\) and \(\sigma^{2}_{F}\) to emphasize the distribution. These are all parameters of the distribution \(F\). In general, we denote a parameter \(\theta\) as a function the distribution \(F\), denoted as \(\theta = T(F)\) or \(\theta_{F}\) for short . Let \(X_1, X_2, \dots, X_n\) be a random sample of size \(n\) of \(X\). An estimator \(\widehat{\theta}\), is a function of the sample, i.e \(\widehat{\theta} = h(X_1, X_2, \dots, X_n)\). For example, if \(\theta_{F} = \mu_F\), then \(\widehat{\theta} = \overline{X}\) is an estimator for the mean. Similarly, if \(\theta_{F} = \sigma^{2}_F\), then \(\widehat{\theta} = S^2\) is an estimator for the variance. Let \(\widehat{F}\) be an estimated c.d.f, then \(\widehat{\theta}_{\widehat{F}}\) is the plug-in estimator for \(\theta\).
As usual if \(\theta_{F} = \sigma^2_{F}\), then \(S^2\) is an estimator for the variance given by \[ \widehat{\theta} = S^2 = \frac{1}{n-1}\sum\limits_{i = 1}^{n}(X_i - \overline{X})^2. \] But the plug-in estimator using the emperical c.d.f is \[ \widehat{\theta_{\widehat{F}}} = \sigma^2_{\widehat{F}} = \frac{1}{n}\sum\limits_{i = 1}^{n}(X_i - \overline{X})^2. \] By using bootstrap, we get an approximation of the sampling distribution of our estimator \(\widehat{\theta}\). This sampling distribution lets us estimate many useful quantities like the standard error for example.
Let \(X \sim F\). Let \(X_1, X_2, \dots, X_n\) be a random sample of \(X\). Let \(\widehat{F}\) be an estimated c.d.f. And say we are interested in estimating some parameter \(\theta_{F}\) using the sample through an estimator \(\widehat{\theta}\). The plug-in estimator is denote by \(\widehat{\theta_{\widehat{F}}}\).
The percentile bootstrap confidence interval is the simplest to implement and can be explained as follows:
Approximate the sampling distribution of \(\widehat{\theta}\) using bootstrap samples \(\lbrace \mathbf{x^{\ast}_{j}}\rbrace_{j = 1}^{R}\), where \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is a sample from \(\widehat{F}\), by computing \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).
This is a vector. For a given confidence level \(\alpha\), get the quantiles \(\alpha/2\) and \((1 - \alpha/2)\). This forms the lower and upper end of the percentile bootstrap C.I intervals.
Percentile confidence intervals are robust (they use quantiles rather than means). Hence, if your original data has outliers, they are really good. However, they are narrow in the sense that the don’t achieve the theoretical gurantee of “trapping the true parameter”.
Using asymptotic theory, we note that the confidence interval around the estimate \(\widehat{\theta}\) is given by \[ \widehat{\theta} \pm z_{1-\alpha/2}SE_{F}(\widehat{\theta}) \]
Since, \(F\) is unknown, we estimate the standard error. In particular, if we use the Boostrap estimator \(SE_{\widehat{F}}(\widehat{\theta})\), we get the normal bootstrap confidence interval \[ \widehat{\theta} \pm z_{1-\alpha/2}SE_{\widehat{F}}(\widehat{\theta}) \]
In Lecture \(9\), we saw how to compute the boostrap estimate of standard error \(SE_{\widehat{F}}(\widehat{\theta})\). Thus, to get the normal bootstrap confidence interval
Approximate the sampling distribution of \(\widehat{\theta}\) using bootstrap samples \(\lbrace \mathbf{x^{\ast}_{j}}\rbrace_{j = 1}^{R}\), where \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is a sample from \(\widehat{F}\), by computing \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).
See Lecture \(9\) on how to compute\(SE_{\widehat{F}}(\widehat{\theta})\) from the sampling distribution \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\). If you are in a hurry, just use \[ sd\left(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\right) \]
Confidence interval is then, \[ \widehat{\theta} \pm z_{1-\alpha/2}sd\left(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\right) \]
Here, instead of using the normal confidence interval around the point estimate we use the confidence interval around the bias corrected point estimate, given by \[ \widehat{\theta^{BC}} = \widehat{\theta} - \beta_{F}(\widehat{\theta}, \theta) \]
Again, we use the bootstrap estimate of the bias \[ \widehat{\theta^{BC}} \approx \widehat{\theta} - \beta_{\widehat{F}}(\widehat{\theta}, \theta). \]
See Lecture \(10\) to compute the approximate bias \(\beta_{\widehat{F}}(\widehat{\theta}, \theta)\). Hence, to get the normal bootstrap bias corrected confidence interval
Approximate the sampling distribution of \(\widehat{\theta}\) using bootstrap samples \(\lbrace \mathbf{x^{\ast}_{j}}\rbrace_{j = 1}^{R}\), where \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is a sample from \(\widehat{F}\), by computing \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).
See Lecture \(9\) on how to compute \(SE_{\widehat{F}}(\widehat{\theta})\) from the sampling distribution \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\). If you are in a hurry, just use \[ sd\left(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\right) \]
See Lecture \(10\) to compute \(\beta_{\widehat{F}}(\widehat{\theta}, \theta)\) which is the bootrap estimate of bias of \(\widehat{\theta}\) from the sampling distribution \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).
Confidence interval is then, \[ \left(\widehat{\theta} -\beta_{\widehat{F}}(\widehat{\theta}, \theta) \right) \pm z_{1-\alpha/2}sd\left(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\right) \]
Note: If your estimator \(\widehat{\theta}\) is not the plug-in estimator, the bias computed will be slighly off. In that case you may use (with caution!) \[ \beta_{\widehat{F}}(\widehat{\theta}, \theta) = \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] - \widehat{\theta} \] instead of the usual \[ \beta_{\widehat{F}}(\widehat{\theta}, \theta) = \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] - \widehat{\theta_{\widehat{F}}}. \]
# Number of bootstrap samples
R = 2000
# Let us use the boot package
my.estimator <- function(sample, indices){
mean(sample[indices])
}
library(boot)
# get the bootstrap object
b <- boot(obs.data, my.estimator, R)
# get confidence intervals
boot.ci(b, conf = 0.95, type = c("perc", "norm", "bca"))## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = b, conf = 0.95, type = c("perc", "norm", "bca"))
##
## Intervals :
## Level Normal Percentile BCa
## 95% ( 4.181, 5.674 ) ( 4.176, 5.686 ) ( 4.169, 5.681 )
## Calculations and Intervals on Original Scale
To compute the studentized bootstrap, we approximate \[ T = \frac{\widehat{\theta} - \theta}{SE}\approx \frac{\widehat{\theta}^{\ast} - \widehat{\theta}}{\widehat{SE^{\ast}}} \] where, if \(\widehat{\theta} = h(X_1, X_2,\dots, X_n)\), then \(\widehat{\theta^{\ast}} = h(X_1^{\ast}, \dots, X_n^{\ast})\) and \(\widehat{SE^{\ast}}\) is the plugin-in estimate of the standard error for \(\widehat{\theta^{\ast}}\). To compute, the t-bootstrap confidence interval,
Approximate the sampling distribution of your estimator \(\widehat{\theta}\) using bootstrap samples \(\lbrace \mathbf{x^{\ast}_{j}}\rbrace_{j = 1}^{R}\), where \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is a sample from \(\widehat{F}\), by computing\(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).
For each bootstrap sample \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\), compute the standard error estimate \(\widehat{SE(\mathbf{x^{\ast}_{j}})}\).
Thus using Steps 1 and 2, compute the distribution \(T^{\ast} = \lbrace T(\mathbf{x^{\ast}_{j}})\rbrace_{j = 1}^{R}\). Note that \(T^{\ast}\) is an approximation of the distribution of \(T\). For a given confidence level \(\alpha\), compute the quantiles \(t^{\ast}_{\alpha/2}\) and \(t^{\ast}_{1 - \alpha/2}\).
The t-C.I is given by, \[ \left(\widehat{\theta} - t^{\ast}_{1 - \alpha/2}\widehat{SE}, \widehat{\theta} - t^{\ast}_{\alpha/2}\widehat{SE}\right), \] where \(\widehat{SE}\) is the bootstrap estimate of the standard error of \(\widehat{\theta}\).
STEP 0: Create your estimator (as a function of data or \(\widehat{F}\))
my.estimator <- function(sample){
mean(sample)
}STEP 1: Approximate the sampling distribution of your estimator \(\widehat{\theta}\) using bootstrap samples \(\lbrace \mathbf{x^{\ast}_{j}}\rbrace_{j = 1}^{R}\), where \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is a sample from \(\widehat{F}\), by computing \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).
Get bootstrap samples
# Get bootstrap samples
# Number of bootstrap samples
R <- 2000
# Get R bootstrap samples
# create a list to store bootstrap samples
x.star <- lapply(1:R, function(i) sample(obs.data, length(obs.data), replace = TRUE))
# print out x.star[j] for j = 1
x.star[1]## [[1]]
## [1] 9.0368474 4.1390617 5.7262568 0.1715847 7.0702070 3.7200102
## [7] 3.7821472 7.4293494 7.0702070 4.4854612 4.4854612 4.6561653
## [13] 0.1715847 4.7333573 3.7200102 4.4314942 2.2222786 4.6561653
## [19] 4.8106819 3.8706037 -0.3129108 1.5659826 8.7903869 0.1715847
## [25] 8.0230440 9.0368474 3.7200102 6.2657252 6.2657252 3.7821472
## [31] 7.0702070 4.3867228 4.1390617 6.2719008 4.4854612 3.7821472
## [37] 1.4373831 4.4424225 0.1190661 3.7200102
Get samping distribution of estimator from bootstrap samples. \(\lbrace\widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace_{j = 1}^{R}\).
# Get sampling distribution of your estimator
# theta.hat.star
theta.hat.star <- function(boot.sample){
mean(boot.sample)
}
# get the sampling distribution
theta.star.dist <- sapply(x.star, theta.hat.star)STEP 2: For each bootstrap sample \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\), compute the standard error estimate \(\widehat{SE(\mathbf{x^{\ast}_{j}})}\)
Note that if \(\widehat{\theta}\) is \(\overline{X}\), then \[\text{plug-in } \widehat{SE^{\ast}} = \frac{\sqrt{\sum\limits_{i = 1}^{n}\left(X_i^{\ast} - \overline{X^{\ast}}\right)^2}}{n}\]
Get \(\widehat{SE(\mathbf{x^{\ast}_{j}})}\) from bootstrap samples.
# se.star
se.star <- function(boot.sample){
mean.boot.samp <- mean(boot.sample)
temp.deviation <- boot.sample - mean.boot.samp
square.deviation <- temp.deviation^2
se.star.boot <- sqrt(sum(square.deviation))/length(boot.sample)
}STEP 3: Thus compute \(T^{\ast} = \lbrace T(\mathbf{x^{\ast}_{j}})\rbrace_{j = 1}^{R}\). \(T^{\ast}\) is an approximation of the distribution of \(T\).
# T.star
se.star.dist <- sapply(x.star, se.star)
T.star <- (theta.star.dist - my.estimator(obs.data))/se.star.distFor a given confidence level \(\alpha\), compute the quantiles \(t^{\ast}_{\alpha/2}\) and \(t^{\ast}_{1 - \alpha/2}\).
# 9
alpha <- 0.05
crit.bounds <- quantile(T.star,c(alpha/2, 1-alpha/2))STEP 4: The t-C.I is given by, \[ \left(\widehat{\theta} - t^{\ast}_{1 - \alpha/2}\widehat{SE}, \widehat{\theta} - t^{\ast}_{\alpha/2}\widehat{SE}\right), \] where \(\widehat{SE}\) is the bootstrap estimate of the standard error of \(\widehat{\theta}\)
# get the bootstrap se estimate
se.boot <- sd(theta.star.dist)
# lower bound
lower.bound <- my.estimator(obs.data) - crit.bounds[2]*se.boot
upper.bound <- my.estimator(obs.data) - crit.bounds[1]*se.boot
sprintf("(%s, %s)", round(lower.bound,3), round(upper.bound,3))## [1] "(4.127, 5.672)"
To run this using boot package we need to return the estimated SE in our estimator function
my.estimator2 <- function(sample, indices){
theta_hat.star <- mean(sample[indices])
var.star <- sum((sample[indices] -theta_hat.star)^2)/(length(sample[indices]))^2
c(theta_hat.star, var.star)
}
boot.obj <- boot(obs.data, my.estimator2, R)
boot.ci(boot.obj, conf = 0.95, type = "stud")## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot.obj, conf = 0.95, type = "stud")
##
## Intervals :
## Level Studentized
## 95% ( 4.094, 5.668 )
## Calculations and Intervals on Original Scale
Note that if we don’t have an estimate for the SE (no simple function) we need to run a bootstrap estimate for the SE.
num.sim = 1000
R = 1000
normal.ci <- rep(0, num.sim)
perc.ci <- rep(0, num.sim)
bc.ci <- rep(0, num.sim)
my_estimator <- function(sample, indices){
exp(mean(sample[indices]))
}
mu <- 0
sigma <- 1
true.param <- exp(mu)
n <- 10
for (i in 1:num.sim){
obs.data.sim <- rnorm(n, mean =mu, sd = sigma)
# get bootstrap samples and compute the bootstrap estimate of the
# estimator
boot.object <- boot(obs.data.sim, my_estimator, R)
# construct bootstrap confidence intervals
boot.conf <- boot.ci(boot.object, conf = 0.95,
type = c("perc", "norm","bca"))
if((boot.conf$normal[2] <= true.param) &&
(boot.conf$normal[3] >= true.param)){normal.ci[i] = 1}
if((boot.conf$percent[4] <= true.param) &&
(boot.conf$percent[5] >= true.param)){perc.ci[i] = 1}
if((boot.conf$bca[4] <= true.param) &&
(boot.conf$bca[5] >= true.param)){bc.ci[i] = 1}
}# check the coverage of each C.I method
missed.normal <- (1 - sum(normal.ci)/num.sim)*100
missed.perc <- (1 - sum(perc.ci)/num.sim)*100
missed.bc <- (1 - sum(bc.ci)/num.sim)*100missed.normal## [1] 10.9
missed.perc## [1] 11.2
missed.bc## [1] 11
library(nycflights13)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# remove NA dep_delays
flights <- filter(flights, !is.na(dep_delay))
# sample of size 30
my.sample <- sample(flights$dep_delay, 30)