Comparing Bootstrap Confidence Intervals with t-Intervals Computation Project

Jessica Stuart

Draw a random sample from an exponential distribution with mean 900 to respresent the lifetime of 60 Watt light bulbs.

lambda <- 1/900
pop <- rexp(10000, rate = lambda)

The distribution of the lifetimes in this sample is as follows:

plot(density(pop), xlab = "", main = "Distribution of Lifetimes of 60 Watt Lightbulbs")

plot of chunk unnamed-chunk-2

If we were to generate 800 samples of sample size 15 from this exponential distribution and then calculate 95% t-intervals, what is the proportion of intervals that include the true mean 900? Is this what you would expect? Explain.

sims <- 800
n <- 15
tsamples <- replicate(sims, rexp(n, rate = lambda))
mu = 1/lambda

results <- as.numeric(sims)
t.int <- matrix(FALSE, sims, 2)
for (i in 1:sims) {
    t.int[i, ] <- t.test(tsamples[, i], conf.level = 0.95)$conf.int
    results[i] <- t.int[i, 1] < mu & t.int[i, 2] > mu
}
sum(results)/sims
## [1] 0.9075

I would expect the proportion of intervals that include the true mean to be less than .95 because the confidence intervals are determined with a confidence level of 95% but the data is generated from an exponential distribution, rather than a normal distribution. The result from this simulation supports my claim, as the proportion is less than 0.95.

If we were to calculate 800-95% confidence intervals based on the bootstrap percentile method, what is the proportion of intervals that include the true mean 900?

sims <- 800
n <- 15

boot.ci.pctle <- matrix(FALSE, sims, 2)
results <- as.numeric(sims)
for (i in 1:sims) {
    pop <- rexp(10000, rate = lambda)
    x <- sample(pop, n)
    mu <- mean(pop)
    B <- 1000
    bootsamples <- matrix(sample(x, size = n * B, replace = T), nrow = n, ncol = B)
    bootEst <- apply(bootsamples, 2, mean)
    boot.ci.pctle[i, ] <- quantile(bootEst, probs = c(0.025, 0.975))
    results[i] <- boot.ci.pctle[i, 1] < mu & boot.ci.pctle[i, 2] > mu
}
sum(results)/sims
## [1] 0.8925

If we were to calculate 800-95% confidence intervals based on the studentized bootstrap method, what is the proportion of intervals that include the true mean 900?

sims <- 800
n <- 15

boot.ci.student <- matrix(FALSE, sims, 2)
results <- as.numeric(sims)
for (i in 1:sims) {
    pop <- rexp(10000, rate = lambda)
    x <- sample(pop, n)
    mu <- mean(pop)
    B <- 1000
    bootsamples <- matrix(sample(x, size = n * B, replace = T), nrow = n, ncol = B)
    bootEst <- (apply(bootsamples, 2, mean) - mean(x))/(apply(bootsamples, 2, 
        sd)/sqrt(n))
    boot.ci.student[i, ] <- mean(x) - quantile(bootEst, probs = c(0.975, 0.025)) * 
        sd(x)/sqrt(n)
    results[i] <- boot.ci.student[i, 1] < mu & boot.ci.student[i, 2] > mu
}
sum(results)/sims
## [1] 0.9313

Compare the coverage of the three methods.

The proportion of intervals that covered the true mean varies from simulation to simulation, but generally speaking, the proportion is highest for the studentized bootstrap method and lowest for the bootstrap percentile method, with all three proportions being below the expected value of 0.95 (determined by the fact that the confidence level used was 95%). If the underlying distribution was normal, as opposed to exponential which was used in these simulations, I would expect the proportion of intervals determined using the t-interval method containing the true mean to be closer to the expected value of 0.95, but the proportion of intervals determined using the two bootstrap methods to remain relatively the same since they do not make any assumptions about the underlying distribution.

t-Interval Method

sims <- 800
n <- 15
tsamples <- replicate(sims, rnorm(n, mean = 900, sd = 50))
mu <- 900

results <- as.numeric(sims)
t.int <- matrix(FALSE, sims, 2)
for (i in 1:sims) {
    t.int[i, ] <- t.test(tsamples[, i], conf.level = 0.95)$conf.int
    results[i] <- t.int[i, 1] < mu & t.int[i, 2] > mu
}
sum(results)/sims
## [1] 0.9563

Bootstrap Percentile Method

sims <- 800
n <- 15

boot.ci.pctle <- matrix(FALSE, sims, 2)
results <- as.numeric(sims)
for (i in 1:sims) {
    pop <- rnorm(10000, mean = 900, sd = 50)
    x <- sample(pop, n)
    mu <- mean(pop)
    B <- 1000
    bootsamples <- matrix(sample(x, size = n * B, replace = T), nrow = n, ncol = B)
    bootEst <- apply(bootsamples, 2, mean)
    boot.ci.pctle[i, ] <- quantile(bootEst, probs = c(0.025, 0.975))
    results[i] <- boot.ci.pctle[i, 1] < mu & boot.ci.pctle[i, 2] > mu
}
sum(results)/sims
## [1] 0.9163

Studentized Bootstrap Method

sims <- 800
n <- 15

boot.ci.student <- matrix(FALSE, sims, 2)
results <- as.numeric(sims)
for (i in 1:sims) {
    pop <- rnorm(10000, mean = 900, sd = 50)
    x <- sample(pop, n)
    mu <- mean(pop)
    B <- 1000
    bootsamples <- matrix(sample(x, size = n * B, replace = T), nrow = n, ncol = B)
    bootEst <- (apply(bootsamples, 2, mean) - mean(x))/(apply(bootsamples, 2, 
        sd)/sqrt(n))
    boot.ci.student[i, ] <- mean(x) - quantile(bootEst, probs = c(0.975, 0.025)) * 
        sd(x)/sqrt(n)
    results[i] <- boot.ci.student[i, 1] < mu & boot.ci.student[i, 2] > mu
}
sum(results)/sims
## [1] 0.935

Based on these simulations, I can confirm my above claim.