(Instructor : Nishant Panda)
Additional References
Last lecture we used the bootstrap technique to estimate the standard error of an estimator. In this lecture, we will go further and see how we can use bootstrap to estimate bias and construct confidence intervals. The following notation from last lecture is written down for completeness.
Let \(X \sim F\) be a random variable (could be multidimensional!) whose c.d.f is given by \(F(x)\). Let us denote the expectation and variance of \(X\) by \(\mu_{F}\) and \(\sigma^{2}_{F}\) to emphasize the distribution. These are all parameters of the distribution \(F\). In general, we denote a parameter \(\theta\) as a function the distribution \(F\), denoted as \(\theta = T(F)\) or \(\theta_{F}\) for short . Let \(X_1, X_2, \dots, X_n\) be a random sample of size \(n\) of \(X\). An estimator \(\widehat{\theta}\), is a function of the sample, i.e \(\widehat{\theta} = h(X_1, X_2, \dots, X_n)\). For example, if \(\theta_{F} = \mu_F\), then \(\widehat{\theta} = \overline{X}\) is an estimator for the mean. Similarly, if \(\theta_{F} = \sigma^{2}_F\), then \(\widehat{\theta} = S^2\) is an estimator for the variance. Let \(\widehat{F}\) be an estimated c.d.f, then \(\widehat{\theta}_{\widehat{F}}\) is the plug-in estimator for \(\theta\).
As usual if \(\theta_{F} = \sigma^2_{F}\), then \(S^2\) is an estimator for the variance given by \[ \widehat{\theta} = S^2 = \frac{1}{n-1}\sum\limits_{i = 1}^{n}(X_i - \overline{X})^2. \] But the plug-in estimator using the emperical c.d.f is \[ \widehat{\theta_{\widehat{F}}} = \sigma^2_{\widehat{F}} = \frac{1}{n}\sum\limits_{i = 1}^{n}(X_i - \overline{X})^2. \]
Let \(X \sim F\). Let \(X_1, X_2, \dots, X_n\) be a random sample of \(X\). Let \(\widehat{F}\) be an estimated c.d.f. And say we are interested in estimating some parameter \(\theta_{F}\) using the sample through an estimator \(\widehat{\theta}\). The plug-in estimator is denote by \(\widehat{\theta_{\widehat{F}}}\). We define the Bias of the estimator \(\widehat{\theta}\) as,
\[ \beta_{F}(\widehat{\theta}, \theta) = \mathbb{E}_{F}\left[\widehat{\theta}\right] - \theta_{F}. \] As you must now have guessed, the Bootstrap estimate of the bias is given by the plug-in estimator, \[ \beta_{\widehat{F}}(\widehat{\theta}, \theta) = \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] - \widehat{\theta_{\widehat{F}}} \]
Note, that the expectation can be approximated from Monte Carlo,
\[ \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] \approx \frac{1}{R}\sum\limits_{j = 1}^{R}\widehat{\theta(\mathbf{x^{\ast}_{j}})} \] where the notation is from last lecture. Thus, Bootstrap bias estimate of \(\widehat{\theta}\) is
\[ \beta_{\widehat{F}}(\widehat{\theta}, \theta) \approx \frac{1}{R}\sum\limits_{j = 1}^{R}\widehat{\theta(\mathbf{x^{\ast}_{j}})} - \widehat{\theta_{\widehat{F}}}. \]
# Observed data from Last Lecture
set.seed(42)
n <- 40
obs.data <- rnorm(n, mean = 5, sd = 2)# First let us create our estimator
theta.hat <- function(sample){
return (var(sample))
}
# Set the number of bootstrap samples R
R <- 1e3
# create a vector for sampling distribution
theta.hat.dist <- rep(0, R)
# do the bootstrap to get the sampling distribution
for (boot in 1:R){
# get sample of size n from F hat.
x.star <- sample(obs.data, n, replace = TRUE)
# get the estimated value from the estimator
theta.hat.dist[boot] <- theta.hat(x.star)
}# create the plug-in estimator function
g <- function(x, x_bar){
(x - x_bar)^2
}
# Compute the bias
# Get the expected value of the estimator from bootstrap
mu.Fhat <- mean(theta.hat.dist)
# Get the plug-in estimator
theta.Fhat <- sum(sapply(obs.data, g, x_bar = mean(obs.data)))/n
# Compute the bias as the difference
bias.boot <- mu.Fhat - theta.Fhatbias.boot## [1] -0.03825479
Let us check if we did the above correctly with the boot package in R
library(boot)
# our test statistic is the sample variance
sample.var <- function (x, d){
return(var(x[d]))
}
# get the bootstrap object
b <- boot(obs.data, sample.var, R)
b##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = obs.data, statistic = sample.var, R = R)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 5.976929 -0.1527511 1.195859
The Bias seems different!. The boot pacakage seems to be computing bias using \[ \beta_{\widehat{F}}(\widehat{\theta}, \theta) = \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] - \widehat{\theta} \]
That is it does not use the plug-in estimator in the last term but uses the original estimator. If \(\widehat{\theta}\) was a plug-in estimator the boot package would have given the same answer. (There is no way for a code to know what is the plug-in estimator for a general estimator. It can only take the estimator you have as an argument.)
# Compute the bias as the difference between
# bootstrap expectation and original estimator
bias.boot.new = mu.Fhat - var(obs.data)
print(bias.boot.new)## [1] -0.187678
This should not be a problem when the size of your observed data is large but it is something to keep in mind.
The simplest Bootstrap Confidence Interval can be described as follows. If \(\widehat{\theta}\) is the parameter of interest, and \(\lbrace \widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace\) are the bootstrap estimates for \(1 \leq j \leq R\), then the sampling distribution of \(\widehat{\theta}\) is approximated by the histogram of \(\lbrace \widehat{\theta(\mathbf{x^{\ast}_{j}}})\rbrace\). Say we want a \(90\%\) confidence interval. Then using the histogram we get the \(5^{th}\) and \(95^{th}\) percentiles and these form the lower and upper end of the interval. In math lingo, if we seek a confidence level \((1 - \alpha)\) let \[ m = \left[\frac{\alpha}{2}R\right] \]
where, \(\left[\cdot\right]\) denotes the greatest integer function. Then, to compute the \((1 - \alpha)\) confidence interval,
Order the bootstrap estimates \(\lbrace \widehat{\theta(\mathbf{x^{\ast}_{j}})}\rbrace\) in increasing order \[ \widehat{\theta^{\ast}_{(1)}}, \widehat{\theta^{\ast}_{(2)}}, \ldots, \widehat{\theta^{\ast}_{(R)}} \]
The confidence interval is given by \[ \left(\widehat{\theta^{\ast}_{(m)}}, \widehat{\theta^{\ast}_{(R+1-m)}}\right) \]
This is easier to code than to write!
# We already did all of the work here.
bounds <- quantile(theta.hat.dist, c(0.025, (1 - 0.025)))
sprintf("(%s, %s)", round(bounds[1],2), round(bounds[2],2))## [1] "(3.54, 8.23)"
As a sanity check (that is using actual values)
m = floor(R*0.025)
sorted.thetaH <- sort(c(theta.hat.dist, var(obs.data)))
sprintf("(%s, %s)", round(sorted.thetaH[m], 2), round(sorted.thetaH[(R+1-m)]),2)## [1] "(3.54, 8)"
Let us check this using boot package
# b was our bootstrap object from before
boot.ci(b, type = "perc")## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = b, type = "perc")
##
## Intervals :
## Level Percentile
## 95% ( 3.523, 8.216 )
## Calculations and Intervals on Original Scale
The percentile Bootstrap can get narrow in practice (undercoverage). The studentized Bootstrap (also known as Bootstrap-t) is a technique that prevents this flaw in the percentile Bootstrap. First some theory. For some unbiased estimator \(\widehat{\theta}\) for \(\theta\) if C.L.T holds, then \[ Z = \frac{\widehat{\theta} - \theta}{SE} \sim AN(0, 1) \] But \(SE\) is typically not known. “Student” found that if \(\widehat{\theta} = \overline{X}\) for data drawn from normal distribution, then \[ T = \frac{\widehat{\theta} - \theta}{\widehat{SE}} \sim t_{n-1} \] For any arbitrary estimator \(\widehat{\theta}\), this may not be true. Bootstrap-t method tries to get an approximation of \(T\) directly from data using the fact that \(\widehat{\theta} - \theta\) is distributionally closer to \(\widehat{\theta}^{\ast} - \widehat{\theta}\), where \(\widehat{\theta}^{\ast}\) is the bootstrap distribution of \(\widehat{\theta}\) i.e \[ T \approx \frac{\widehat{\theta}^{\ast} - \widehat{\theta}}{\widehat{SE^{\ast}}} \] where, \(\widehat{SE^{\ast}}\) is the plug-in estimate of \(SE^{\ast}\). Here is an algorithm to get the distribution of \(Z\).
Repeat this some \(R\) times and get the distribution of \(T^{\ast}\).
The confidence interval is then given by \[ \left(\widehat{\theta} - t^{\ast}_{1 - \alpha/2}\widehat{SE}^{\ast}, \widehat{\theta} - t^{\ast}_{\alpha/2}\widehat{SE}^{\ast}\right) \]
The big question here is how do we get \(\widehat{SE^{\ast}}\)? This is mind-bending (inception). Let us go step by step.
\[ \widehat{SE^{\ast}} = \sqrt{{var}_{\widehat{F^{\ast}}}\left[\widehat{\theta^{\ast}}\right]} \] Now, \[ {var}_{\widehat{F^{\ast}}}\left[\widehat{\theta^{\ast}}\right] = \mathbb{E}_{\widehat{F^{\ast}}}\left[\left(\widehat{\theta^{\ast}} - \mathbb{E}_{\widehat{F^{\ast}}}\left[ \widehat{\theta^{\ast}}\right]\right)^2\right] \]
Note the \(\widehat{F^{\ast}}\)! If \(\widehat{\theta}\) is complicated, we need to use Bootstrap to estimate \(var_{\widehat{F^{\ast}}}\left[\widehat{\theta^{\ast}}\right]\)! This is double bootstrap.
In order to get the distribution of a statistic we take bootstrap samples from the data. In order to get the distribution of the bootstrap statistic we take bootstrap samples from the bootstrap data!
Population : c.d.f is \(F\), parameter is \(\theta_{F}\)
You sample from \(F\), you get data!
Data : approx c.d.f is \(\widehat{F}\), estimator is \(\widehat{\theta}\)
You sample from \(\widehat{F}\) you get the Bootstrap sample
Bootstrap sample : approx c.d.f is \(\widehat{F}^{\ast}\), Bootstrap estimator is \(\widehat{\theta^{\ast}}\)
\[SE(\overline{X}) = \sqrt{var_{F}\left[\overline{X}\right]}\]
Using the fact that, \[var_{F}\left[\overline{X}\right] = \frac{\sigma^{2}_{F}}{n}\], we get \[ SE(\overline{X}) = \frac{\sigma_{F}}{\sqrt{n}}. \]
Note that, since \(\mu_F = \mathbb{E}_{F}\left[X\right]\), \[ \mu_{\widehat{F}} = \mathbb{E}_{\widehat{F}}\left[X\right] \] This is easy to calculate \[ \mathbb{E}_{\widehat{F}}\left[X\right] = \frac{1}{n}\sum\limits_{i = 1}^{n}X_i = \overline{X} \]
\[ \text{plug-in } \widehat{SE} = \frac{\sqrt{\sum\limits_{i = 1}^{n}\left(X_i - \overline{X}\right)^2}}{n} \]
\[ \widehat{\theta^{\ast}} = \overline{X^{\ast}} = \frac{1}{n}\sum\limits_{i = 1}^{n}X_i^{\ast} \] 5. Convince yourself that \[ \text{plug-in } \widehat{SE^{\ast}} = \frac{\sqrt{\sum\limits_{i = 1}^{n}\left(X_i^{\ast} - \overline{X^{\ast}}\right)^2}}{n} \]
In the next Lecture we will see how to implement double bootstrap to construct a studentized bootstrap confidence interval for \(S^2\).