(Instructor : Nishant Panda)
Additional References
Last lecture we introduced the bootstrap as a technique to evaluate statistical (frequentist) properties of an estimator. We looked at the plug-in estimator and laid down notation for understanding the bootstrap method. Today we will explore the bootstrap technique to approximate the standard error and bias of an estimator.
Let \(X \sim F\) be a random variable (could be multidimensional!) whose c.d.f is given by \(F(x)\). Let us denote the expectation, variance and median of \(X\) by \(\mu_{F}\), \(\sigma^{2}_{F}\) and \(med_{F}\) to emphasize the distribution. These are all parameters of the distribution \(F\). In general, we denote a parameter \(\theta\) as a function the distribution \(F\), denoted as \(\theta = T(F)\) or \(\theta_{F}\) for short . Let \(X_1, X_2, \dots, X_n\) be a random sample of size \(n\) of \(X\). An estimator \(\widehat{\theta}\), is a function of the sample, i.e \(\widehat{\theta} = h(X_1, X_2, \dots, X_n)\). For example, if \(\theta_{F} = \mu_F\), then \(\widehat{\theta} = \overline{X}\) is an estimator for the mean. Similarly, if \(\theta_{F} = \sigma^{2}_F\), then \(\widehat{\theta} = S^2\) is an estimator for the variance. Let \(\widehat{F}\) be an estimated c.d.f, then \(\widehat{\theta}_{\widehat{F}}\) is the plug-in estimator for \(\theta\). As we saw last time, if \(\widehat{F}\) is the emperical c.d.f then, \[ \frac{1}{n}\sum\limits_{i = 1}^{n}(X_i - \overline{X})^2 \]
is the plug-in estimator for the variance. This is different from the usual estimator \(S^2\) (note the \(1/n\) as compared to the \(1/(n-1)\)!
Let \(X \sim N(\mu, \sigma^2)\). Let \(X_1, X_2, \dots, X_n\) be a random sample of \(X\). And say we are interested in estimating \(\sigma^2\) using the sample variance as the estimator. Let \(F\) be the normal c.d.f and let us indicate the mean and the variance as \(\mu_F\) and \(\sigma^2_{F}\). Estimate the Standard Error (SE) of \(S^2\).
Hence, \(\theta_{F} = \sigma^2\) and \(\widehat{\theta} = S^2\). What is the sampling distribution of \(S^2\)?To compute the SE we would need \(\mathbb{E}\left[\widehat{\theta}\right]\) and \(var\left[\widehat{\theta}\right]\). If we know, \(F\), we can get estimates (actually get exact values analytically!) by sampling from \(F\). Note that \(S^2\) is an unbiased estimate of \(\sigma^2\).
# Let us fix some observed data.
# Say we have a data of size n = 40
set.seed(42)
n <- 40
# pretend we don't know what mean and sd are!
obs.data <- rnorm(n, mean = 5, sd = 2)# Known F
# If we know F, we know mean and sd
mu.F <- 5
sigma.F <- 2
# number of simulation : number of times
# we draw from "population"
num.sim <- 1e3
sam.var <- rep(0, num.sim)
for(sim in 1:num.sim){
# take a sample of size $n$ from N(5,2)
data <- rnorm(n, mean = 5, sd = 2)
# compute the estimator
sam.var[sim] = var(data)
}# Get the expected value of the estimator.
# this should be close to 4!
exp.thetah <- mean(sam.var)
# Get the SE of the estimator
# Do we even know the forumla of the SE of S^2?
se.thetah <- sd(sam.var)
print(exp.thetah)## [1] 4.064378
print(se.thetah)## [1] 0.9040893
hist(sam.var, breaks = 21, col = "steelblue",
xlab = "S^2: sample variance",
main = "Sampling distibution of sample variance")Recall that in bootstrap we use the plug-in estimator to find properties of the estimator. (Thus, replace \(F\) with \(\hat{F}\) !)
Let \(X_1, X_2, \dots, X_n\) be a random sample. Let \(\hat{F}\) be an approximate c.d.f.
Note that \[ SE_{F}(\widehat{\theta}) \approx SE_{\widehat{F}}(\widehat{\theta}) = \sqrt{var_{\widehat{F}}\left[\widehat{\theta}\right]} \] and \[ var_{\widehat{F}}\left[\widehat{\theta}\right] = \mathbb{E}_{\widehat{F}}\left[\left(\widehat{\theta} - \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right]\right)^2\right]. \] Let \[ g(\widehat{\theta}) = g\left(\widehat{\theta} - \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right]\right)^2 \]
Thus, \[ var_{\widehat{F}}\left[\widehat{\theta}\right] = \mathbb{E}_{\widehat{F}}\left[g(\widehat{\theta})\right] \]
How do we compute \(\mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right]\) and \(\mathbb{E}_{\widehat{F}}\left[g(\widehat{\theta})\right]\)?
We can approximate this using Monte Carlo integration! First Note that \(\widehat{\theta}\) is a function of \((X_1, X_2, \dots, X_n)\). Hence, to use Monte Carlo integration,
Get \(R\) samples of size \(n\) from \(\widehat{F}\).
Each sample, say for example \(j^{th}\) sample, is of size \(n\) is denoted by \(\mathbf{x^{\ast}_{j}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\). From this, compute \(\widehat{\theta(\mathbf{x^{\ast}_{j}})}\) and \(g(\widehat{\theta(\mathbf{x^{\ast}_{j}})})\)
Then, by Mote carlo, \[ \mathbb{E}_{\widehat{F}}\left[\widehat{\theta}\right] \approx \frac{1}{R}\sum\limits_{j = 1}^{R}\widehat{\theta(\mathbf{x^{\ast}_{j}})}, and \]
\[ \mathbb{E}_{\widehat{F}}\left[g(\widehat{\theta})\right] \approx \frac{1}{R}\sum\limits_{j = 1}^{R}g(\widehat{\theta(\mathbf{x^{\ast}_{j}})}) \]
In case of non-parametric bootstrap \(\hat{F} = F_e\) is the emperical c.d.f. In that case, any sample \(\mathbf{x^{\ast}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is just a sample drawn from the observed data WITH REPLACEMENT.
In case of parameteric bootstrap if \(\widehat{F}\) is obtained using a statistical model e.g using max-likelihood to fit parameters to a known distribution, sample \(\mathbf{x^{\ast}} = \left(x_1^{\ast}, x_2^{\ast}, \dots x_n^{\ast}\right)\) is just a sample drawn from that fitted distribution.
Note that our observed data was of size \(n = 40\). First let us look at the observed data
hist(obs.data, breaks = 11, xlab = "observed X",
main = "histogram of the observed data" )# Check for normality
qqnorm(obs.data)
qqline(obs.data)# First let us create our estimator
theta.hat <- function(sample){
return (var(sample))
}
# Set the number of bootstrap samples R
R <- 1e3
# create a vector for sampling distribution
theta.hat.dist <- rep(0, R)
# do the bootstrap
for (boot in 1:R){
# get sample of size n from F hat.
x.star <- sample(obs.data, n, replace = TRUE)
# get the estimated value from the estimator
theta.hat.dist[boot] <- theta.hat(x.star)
}# get the standard error estimate of S^2
# First create function g
g <- function(theta_hat, mu_hat){
(theta_hat - mu_hat)^2
}
# Get the expected value of the estimator
mu.hat <- mean(theta.hat.dist)
# Get the variance of the estimator using the bootstrap Monte Carlo
var.boot <- sum(sapply(theta.hat.dist, g, mu_hat = mu.hat))/R
# Compute the Standard Error SE as variance of the estimator
se.boot <- sqrt(var.boot)se.boot## [1] 1.184573
Let us check if we did the above correctly with the boot package in R
library(boot)
# our test statistic is the sample variance
sample.var <- function (x, d){
return(var(x[d]))
}
# get the bootstrap object
b <- boot(obs.data, sample.var, R)
b##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = obs.data, statistic = sample.var, R = R)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 5.976929 -0.153534 1.152481
This checks out. But compared to the exact (known F in example 1) it looks like we overshot! Let us look at the sampling distribution of the estimator as computed from bootstrap.
hist(theta.hat.dist, breaks = 21, col = "steelblue",
xlab = "S^2: sample variance",
main = "(Bootstrap) Sampling distibution of sample variance") This does look a bit biased!
# create a data frame
df <- data.frame(sam.var, theta.hat.dist)
names(df) <- c("true.dist", "boot.dist")library(ggplot2)
ggplot(data = df, aes(true.dist)) +
geom_density(alpha = 0.2, fill = "red") +
geom_histogram(aes(x = boot.dist, y= ..density..),
alpha = 0.4, color = "blue", bins = 20) +
geom_density(alpha = 0.2, fill = "blue") +
labs(x = "Sample variance")In the next Lecture, we will look at how to estimate bias of the estimator and use bias correction to account for Bootstrap bias.
# Change this line!
# get sample of size n from F hat.
x.star <- sample(obs.data, n, replace = TRUE)