Understanding Bootstrapping in Statistics

Slide 1: What is Bootstrapping?

Bootstrapping is a resampling technique used to estimate statistics (like the mean, standard error, or confidence intervals) by sampling with replacement from a data set.

It’s powerful when we don’t want to assume a normal distribution or when the sample size is small.

Slide 2: Why Use Bootstrapping?

No assumption of normality
Works with small samples
Helps estimate standard errors and confidence intervals
Easily implemented with code

Slide 3: Bootstrapping Process

Draw a sample from your data
Resample with replacement (many times)
Compute your statistic (e.g., mean) for each sample
Analyze the distribution of those statistics

Slide 4: R Code for Resampling

set.seed(123)
library(ggplot2); library(plotly)

data <- rnorm(100, 50, 10)
boot_means <- replicate(1000, mean(sample(data, replace = TRUE)))
df <- data.frame(boot_means)

Slide 5: Distribution of Bootstrap Means

ggplot(df, aes(x = boot_means)) +
  geom_histogram(fill = "#8C1D40", bins = 30, color = "white") +
  labs(title = "Bootstrap Distribution of the Mean", x = "Mean", y = "Frequency")

Slide 6: Bootstrap Confidence Interval

ci <- quantile(boot_means, c(0.025, 0.975))
ggplot(df, aes(x = boot_means)) + geom_density(fill = "#8C1D40", alpha = 0.4) +
  geom_vline(xintercept = ci, linetype = "dashed", color = "blue") +
  labs(x = "Mean", y = "Density")

Slide 7: Interactive Plotly Histogram

plot_ly(x = ~boot_means, type = "histogram", nbinsx = 30, marker = list(color = "#8C1D40")) %>%
  layout(margin = list(t = 10),  # reduce top margin xaxis = list(title = "Bootstrapped Means"),
    yaxis = list(title = "Count")
   )

Slide 8: LaTeX - Bootstrap Standard Error

Let \(x_1, x_2, \dots, x_n\) be the sample.

The bootstrap estimate of standard error is:

\[ \hat{SE}_{boot} = \sqrt{ \frac{1}{B-1} \sum_{b=1}^{B} \left( \bar{x}^{(b)} - \bar{x}_{boot} \right)^2 } \]

Where:
- \(\bar{x}^{(b)}\) = mean of the b-th bootstrap sample
- \(\bar{x}_{boot}\) = average of all bootstrap means
- \(B\) = number of bootstrap samples

Slide 9: LaTeX - Bootstrap Confidence Interval

The percentile bootstrap confidence interval is given by:

\[ CI_{boot} = \left[ \text{quantile}_{0.025}(\bar{x}^{(b)}), \ \text{quantile}_{0.975}(\bar{x}^{(b)}) \right] \]

This interval is computed directly from the distribution of resampled statistics.

Advantages: - Does not assume normality - Adapts to the shape of your data - Easy to compute using quantile() in R

Slide 10: Conclusion

Bootstrapping is intuitive and powerful
It helps estimate standard errors and confidence intervals without strict assumptions
Works well even with small sample sizes
Visual, flexible, and easy to implement in R

Practice using it on your own datasets, it is a great tool for modern data analysis!