Suppose we observe data: \[ X_1, X_2, \ldots, X_n \sim F \] from some unknown distribution \(F\).
We are interested in a parameter: \[ \theta = T(F) \] and its estimator: \[ \hat{\theta} = T(\hat{F}_n) \] such as a mean, variance, or regression parameter.
We want to know the sampling distribution of \(\hat{\theta}\), in order to compute: -
Standard error (SE)
- Bias
- Confidence intervals (CIs)
Often, this is difficult to derive analytically.
Key Point:
This makes almost no assumptions about the shape of \(F\), but it “freezes” the data support at
the observed values.
Now suppose we are willing to assume a parametric model: \[ X_1, X_2, \ldots, X_n \sim f(x; \vartheta) \] (e.g., Normal, Poisson, etc.), where \(\vartheta\) is a parameter (or vector of parameters) to be estimated.
Key Point:
This uses model assumptions more strongly, but can give better results
when the model is appropriate.
boot)We will use the boot package in
R.
The basic function is:
#library(boot)
#boot(data, statistic, R, sim = c("ordinary","parametric",...), ran.gen, mle)
# True parameters
mu_true <- 5
sigma_true <- 2
# Sample size
n <- 50
# Simulate data
x <- rnorm(n, mean = mu_true, sd = sigma_true)
summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.067 3.881 4.855 5.069 6.396 9.338
bootThe statistic function takes data and an
index (for ordinary bootstrap).
For parametric bootstrap, boot still
expects the same interface, but the index is usually just all indices of
the generated sample.
stat_mean: returns the sample
meanstat_mean_sd: returns both mean and
standard deviation, so we can bootstrap two parameters at oncelibrary(boot)
# Statistic for mean only
stat_mean <- function(data, indices) {
x <- data[indices]
mean(x)
}
# Statistic returning mean and sd
stat_mean_sd <- function(data, indices) {
x <- data[indices]
c(mean = mean(x), sd = sd(x))
}
For a Normal model, the MLEs for mean and sd are just the sample mean and sd.
mu_hat <- mean(x)
sigma_hat <- sd(x)
mu_hat
## [1] 5.068807
sigma_hat
## [1] 1.85174
We will pass these as the mle argument to boot. ## 2.4 Define the
Parametric Generator ran.gen
The parametric generator must be a function of the form:
rnorm_gen <- function(data, mle) {
n <- length(data)
mu <- mle[1]
sigma <- mle[2]
rnorm(n, mean = mu, sd = sigma)
}
Now we call boot with the following arguments:
sim = "parametric"ran.gen = rnorm_genmle = c(mu_hat, sigma_hat)set.seed(123)
boot_norm <- boot(
data = x,
statistic = stat_mean_sd,
R = 2000,
sim = "parametric",
ran.gen = rnorm_gen,
mle = c(mu_hat, sigma_hat)
)
boot_norm
##
## PARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = x, statistic = stat_mean_sd, R = 2000, sim = "parametric",
## ran.gen = rnorm_gen, mle = c(mu_hat, sigma_hat))
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 5.068807 0.001808685 0.2499582
## t2* 1.851740 -0.008013623 0.1843551
The output shows the original statistic (t0) and bootstrap replicates (t).