Statistical inference typically relies on theoretical probability distributions (t-distributions, F-distributions, etc.) that depend on assumptions we cannot always verify. Bootstrapping Monte Carlo simulation offers a powerful, assumption-light alternative: instead of deriving a formula, we let the data speak for itself by resampling it thousands of times.
This tutorial walks you through the core ideas and applies them to real business problems drawn from manufacturing, healthcare, finance, retail, and human resources.
Bootstrapping is a resampling technique in which we draw repeated samples with replacement from our observed data. Each resample is the same size as the original dataset, but because sampling is done with replacement, some observations will appear multiple times and others not at all.
Monte Carlo simulation refers to the broader strategy of using many random trials to estimate a quantity (probability, mean, confidence interval) that is analytically difficult to compute.
Together, Bootstrapping Monte Carlo Simulation:
Imagine you collect 9 measurements and want to know “how variable is my sample mean?” Rather than assuming normality and applying a t-formula, you:
set.seed(3301)
original <- c(39, 41, 38, 45, 36, 42, 37, 40, 40)
boot_means <- replicate(10000, mean(sample(original, replace = TRUE)))
ci <- quantile(boot_means, c(0.025, 0.975))
ggplot(data.frame(m = boot_means), aes(x = m)) +
geom_histogram(bins = 40, fill = "steelblue", colour = "white", alpha = 0.85) +
annotate("rect",
xmin = ci[1], xmax = ci[2], ymin = 0, ymax = Inf,
fill = "orange", alpha = 0.2
) +
geom_vline(xintercept = mean(original), colour = "firebrick", linewidth = 1) +
labs(
x = "Bootstrap Sample Mean",
y = "Frequency",
title = "Bootstrap Distribution of the Sample Mean",
subtitle = paste0("Red line = observed mean (", round(mean(original), 2),
"); shaded band = 95% CI [",
round(ci[1], 2), ", ", round(ci[2], 2), "]")
)Bootstrap sampling distribution of the mean from 9 observations. The shaded band marks the 95% bootstrap confidence interval.
Use bootstrapping when:
| Situation | Why bootstrapping helps |
|---|---|
| Small sample sizes | Avoids reliance on the Central Limit Theorem |
| Non-normal data | Theoretical distributions may not apply |
| Complex statistics | Median, IQR, Sharpe ratio — no standard SE formula |
| Comparing two groups without distributional assumptions | More robust than t-test under violations |
| Estimating probability that A beats B | Direct probability statement from simulations |
Bootstrapping is not a cure-all. It assumes your sample is representative of the population. With very small or biased samples, the bootstrap distribution will also be biased.
Every bootstrapping problem in this tutorial follows the same skeleton:
n_boot <- 10000 # number of bootstrap replications
boot_stat <- replicate(n_boot, {
# 1. Resample with replacement
resample <- sample(data_vector, size = length(data_vector), replace = TRUE)
# 2. Compute the statistic of interest
mean(resample) # replace with your statistic
})
# 3. Summarise
quantile(boot_stat, c(0.025, 0.975)) # 95% confidence intervalFor two-sample comparisons (testing whether Group A beats Group B), a slight variant is used: resample each group independently and check whether the difference statistic (e.g., mean_A − mean_B) exceeds zero.
A production engineer runs 9 batches at a high process temperature and 7 batches at a low temperature, recording the product yield from each. The mean yield at high temperature (39.74) exceeds that at low temperature (32.27), but is this difference real or just sampling noise?
Question: What is the probability that high-temperature production yields more than low-temperature production?
high_temp <- c(39, 41, 38, 45, 36, 42, 37, 40, 40)
low_temp <- c(30, 33, 34, 31, 34, 30, 34, 35, 32)
cat("High temperature — n =", length(high_temp),
"| mean =", round(mean(high_temp), 2), "\n")## High temperature — n = 9 | mean = 39.78
## Low temperature — n = 9 | mean = 32.56
set.seed(5512)
n_boot <- 10000
high_beats_low <- replicate(n_boot, {
mean(sample(high_temp, replace = TRUE)) >
mean(sample(low_temp, replace = TRUE))
})
prob_high_better <- mean(high_beats_low)
cat("Estimated probability that high temperature yields more:",
scales::percent(prob_high_better, accuracy = 0.1), "\n")## Estimated probability that high temperature yields more: 100.0%
boot_diffs <- replicate(n_boot, {
mean(sample(high_temp, replace = TRUE)) -
mean(sample(low_temp, replace = TRUE))
})
ggplot(data.frame(d = boot_diffs), aes(x = d)) +
geom_histogram(bins = 50, fill = "steelblue", colour = "white", alpha = 0.85) +
geom_vline(xintercept = 0, colour = "firebrick", linewidth = 1, linetype = "dashed") +
annotate("text", x = 0.3, y = 700, label = "← Low better | High better →",
size = 3.5, colour = "gray30") +
labs(
x = "Difference in Means (High − Low)",
y = "Frequency",
title = "Bootstrap Distribution: High Temp vs. Low Temp Yield Difference",
subtitle = paste0("Proportion of iterations where High > Low: ",
scales::percent(prob_high_better, accuracy = 0.1))
)Distribution of (mean_high − mean_low) across 10,000 bootstrap iterations. Values above zero favour high temperature.
Result
The simulation shows a 100.0% probability that high-temperature processing yields more than low-temperature processing. This is strong evidence — though not certainty — in favour of high-temperature production.
A pharmaceutical company tests a new influenza drug. Of 34 patients who received the drug, 25 felt better and 9 felt worse. Of 19 patients who received a placebo, 11 felt better and 8 felt worse.
Question: What is the probability that the drug is more effective than the placebo (i.e., produces a higher recovery rate)?
# Represent each patient: 1 = felt better, 0 = felt worse
drug <- c(rep(1, 25), rep(0, 9))
placebo <- c(rep(1, 11), rep(0, 8))
cat("Drug group — n =", length(drug),
"| recovery rate =", scales::percent(mean(drug), accuracy = 0.1), "\n")## Drug group — n = 34 | recovery rate = 73.5%
cat("Placebo group — n =", length(placebo),
"| recovery rate =", scales::percent(mean(placebo), accuracy = 0.1), "\n")## Placebo group — n = 19 | recovery rate = 57.9%
set.seed(7743)
drug_beats_placebo <- replicate(10000, {
mean(sample(drug, replace = TRUE)) >
mean(sample(placebo, replace = TRUE))
})
prob_drug_better <- mean(drug_beats_placebo)
cat("Probability that drug recovery rate exceeds placebo:",
scales::percent(prob_drug_better, accuracy = 0.1), "\n")## Probability that drug recovery rate exceeds placebo: 86.6%
boot_rate_diffs <- replicate(10000, {
mean(sample(drug, replace = TRUE)) -
mean(sample(placebo, replace = TRUE))
})
ci_drug <- quantile(boot_rate_diffs, c(0.025, 0.975))
ggplot(data.frame(d = boot_rate_diffs), aes(x = d)) +
geom_histogram(bins = 50, fill = "seagreen", colour = "white", alpha = 0.85) +
geom_vline(xintercept = 0, colour = "firebrick", linewidth = 1, linetype = "dashed") +
labs(
x = "Difference in Recovery Rates (Drug − Placebo)",
y = "Frequency",
title = "Bootstrap Distribution: Drug vs. Placebo Recovery Rate",
subtitle = paste0("95% CI: [", round(ci_drug[1], 3), ", ", round(ci_drug[2], 3),
"] | P(Drug > Placebo) = ",
scales::percent(prob_drug_better, accuracy = 0.1))
)Bootstrap distribution of the difference in recovery rates (Drug − Placebo). Values to the right of the red line indicate the drug outperforms the placebo.
Result
There is a 86.6% probability that the drug’s recovery rate exceeds the placebo’s, with a 95% bootstrap CI of [-0.113, 0.426] for the rate difference. The lower bound of the CI being near zero calls for caution about the drug’s efficacy.
Learning Point: Proportions as Means
Binary outcomes (recovered / not recovered) can be treated as a numeric variable of 0s and 1s. The mean of such a vector is the proportion, so all our bootstrap machinery applies directly.
A fund manager wants to know the beta (systematic risk coefficient) for two stocks — Stock A (a tech company, similar to MSFT) and Stock B (a consumer staples company, similar to PFE) — relative to the market. Beta is estimated as the slope of a regression of stock returns on market returns.
Question: What is the probability that Stock A has a higher beta than Stock B? Provide a 95% confidence interval for each stock’s beta.
set.seed(1984)
n_months <- 36
market_returns <- rnorm(n_months, mean = 0.008, sd = 0.045)
# Stock A: high-beta tech stock (beta ≈ 1.4)
stock_a <- 1.4 * market_returns + rnorm(n_months, 0, 0.03)
# Stock B: low-beta defensive stock (beta ≈ 0.6)
stock_b <- 0.6 * market_returns + rnorm(n_months, 0, 0.02)
returns_df <- data.frame(market = market_returns, stock_a, stock_b)
cat("OLS Beta — Stock A:", round(coef(lm(stock_a ~ market_returns))[2], 3), "\n")## OLS Beta — Stock A: 1.598
## OLS Beta — Stock B: 0.584
set.seed(4421)
boot_betas <- replicate(10000, {
# Resample row indices to keep market/stock pairs together
idx <- sample(n_months, replace = TRUE)
mkt <- market_returns[idx]
beta_a <- unname(coef(lm(stock_a[idx] ~ mkt))[2])
beta_b <- unname(coef(lm(stock_b[idx] ~ mkt))[2])
c(beta_a, beta_b)
}, simplify = TRUE)
beta_a_boot <- boot_betas[1, ]
beta_b_boot <- boot_betas[2, ]
ci_a <- quantile(beta_a_boot, c(0.025, 0.975))
ci_b <- quantile(beta_b_boot, c(0.025, 0.975))
prob_a_higher <- mean(beta_a_boot > beta_b_boot)
cat("Stock A beta — 95% CI: [", round(ci_a[1], 3), ",", round(ci_a[2], 3), "]\n")## Stock A beta — 95% CI: [ 1.384 , 1.793 ]
## Stock B beta — 95% CI: [ 0.453 , 0.7 ]
## P(Beta_A > Beta_B): 100.0%
bind_rows(
data.frame(beta = beta_a_boot, stock = "Stock A (Tech)"),
data.frame(beta = beta_b_boot, stock = "Stock B (Defensive)")
) |>
ggplot(aes(x = beta, fill = stock)) +
geom_histogram(bins = 50, colour = "white", alpha = 0.8, position = "identity") +
facet_wrap(~stock, ncol = 1, scales = "free_y") +
labs(
x = "Bootstrap Beta Estimate",
y = "Frequency",
title = "Bootstrap Distribution of Stock Betas",
fill = NULL
) +
theme(legend.position = "none")Bootstrap distributions of beta estimates for Stock A (tech) and Stock B (defensive). Dashed lines mark the respective 95% CIs.
Result
Stock A’s beta 95% CI of [1.38, 1.79] is clearly above Stock B’s CI of [0.45, 0.7]. The simulation confirms a 100.0% probability that Stock A has higher systematic risk than Stock B.
Key Insight: Resampling Pairs
When two variables are paired (e.g., market return and stock return for the same month), always resample rows together — not the two vectors independently — to preserve the correlation structure.
A retail bank’s credit risk team has data on default rates (proportion of loans defaulted) across 40 branch portfolios. The team wants to report the median default rate with a confidence interval to present to regulators. There is no standard formula for the SE of a median, making this an ideal bootstrap problem.
Question: Construct a 95% bootstrap confidence interval for the median branch-level default rate.
set.seed(9921)
# Simulate 40 branch default rates; right-skewed distribution typical in credit risk
default_rates <- rbeta(40, shape1 = 2, shape2 = 20) # mean ≈ 9%
default_rates <- round(default_rates, 4)
cat("n branches:", length(default_rates), "\n")## n branches: 40
## Mean default rate: 9.04%
## Median default rate: 7.50%
## Std deviation: 6.70%
ggplot(data.frame(r = default_rates), aes(x = r)) +
geom_histogram(bins = 20, fill = "coral", colour = "white", alpha = 0.85) +
geom_vline(xintercept = median(default_rates), colour = "firebrick", linewidth = 1) +
geom_vline(xintercept = mean(default_rates), colour = "steelblue",
linewidth = 1, linetype = "dashed") +
scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(
x = "Default Rate",
y = "Number of Branches",
title = "Branch-Level Loan Default Rates",
subtitle = "Red = median, Blue dashed = mean"
)Distribution of branch-level default rates. The distribution is right-skewed, which makes the mean a poor summary and the median preferable.
set.seed(3378)
boot_medians <- replicate(10000,
median(sample(default_rates, replace = TRUE))
)
ci_median <- quantile(boot_medians, c(0.025, 0.975))
cat("Observed median default rate:", scales::percent(median(default_rates), 0.01), "\n")## Observed median default rate: 7.50%
cat("95% bootstrap CI: [",
scales::percent(ci_median[1], 0.01), ",",
scales::percent(ci_median[2], 0.01), "]\n")## 95% bootstrap CI: [ 5.92% , 9.25% ]
ggplot(data.frame(m = boot_medians), aes(x = m)) +
geom_histogram(bins = 40, fill = "coral", colour = "white", alpha = 0.85) +
annotate("rect",
xmin = ci_median[1], xmax = ci_median[2], ymin = 0, ymax = Inf,
fill = "gold", alpha = 0.3
) +
geom_vline(xintercept = median(default_rates), colour = "firebrick", linewidth = 1) +
scale_x_continuous(labels = scales::percent_format(accuracy = 0.1)) +
labs(
x = "Bootstrap Median Default Rate",
y = "Frequency",
title = "Bootstrap CI for Median Branch Default Rate",
subtitle = paste0("95% CI: [",
scales::percent(ci_median[1], 0.01), ", ",
scales::percent(ci_median[2], 0.01), "]")
)Bootstrap distribution of the median branch default rate. The shaded region shows the 95% confidence interval.
Result
The median branch default rate is 7.50% with a 95% bootstrap confidence interval of [5.92%, 9.25%]. The bank can report this interval to regulators as a robust, assumption-free estimate.
Why Bootstrap Here?
The median has no simple standard error formula when the population distribution is unknown. The bootstrap solves this elegantly by treating the observed distribution as the population.
An HR manager runs a leadership training programme for 12 employees. Performance scores (out of 100) are recorded before and after the programme. The manager wants to know: What is the probability that the training genuinely improved performance?
This is a paired comparison — each employee serves as their own control — so we bootstrap the differences.
before <- c(62, 71, 55, 68, 74, 60, 58, 80, 66, 73, 69, 57)
after <- c(67, 75, 60, 72, 70, 68, 65, 85, 71, 78, 74, 64)
diff <- after - before
employee_df <- data.frame(
Employee = paste0("E", seq_along(before)),
Before = before,
After = after,
Change = diff
)
employee_df## Employee Before After Change
## 1 E1 62 67 5
## 2 E2 71 75 4
## 3 E3 55 60 5
## 4 E4 68 72 4
## 5 E5 74 70 -4
## 6 E6 60 68 8
## 7 E7 58 65 7
## 8 E8 80 85 5
## 9 E9 66 71 5
## 10 E10 73 78 5
## 11 E11 69 74 5
## 12 E12 57 64 7
set.seed(6614)
boot_mean_diff <- replicate(10000,
mean(sample(diff, replace = TRUE))
)
ci_diff <- quantile(boot_mean_diff, c(0.025, 0.975))
prob_improve <- mean(boot_mean_diff > 0)
cat("Mean score change: ", round(mean(diff), 2), "points\n")## Mean score change: 4.67 points
## 95% bootstrap CI: [ 2.83 , 6 ]
## P(training improved scores): 100.0%
employee_df |>
pivot_longer(cols = c(Before, After), names_to = "Period", values_to = "Score") |>
mutate(Period = factor(Period, levels = c("Before", "After"))) |>
ggplot(aes(x = Period, y = Score, group = Employee)) +
geom_line(colour = "gray60") +
geom_point(aes(colour = Period), size = 3) +
labs(
title = "Employee Performance Scores Before and After Training",
x = NULL,
y = "Performance Score",
colour = NULL
)Before-and-after performance scores for each employee. Lines connect the same individual’s scores.
ggplot(data.frame(d = boot_mean_diff), aes(x = d)) +
geom_histogram(bins = 50, fill = "mediumpurple", colour = "white", alpha = 0.85) +
annotate("rect",
xmin = ci_diff[1], xmax = ci_diff[2], ymin = 0, ymax = Inf,
fill = "gold", alpha = 0.25
) +
geom_vline(xintercept = 0, colour = "firebrick", linewidth = 1, linetype = "dashed") +
labs(
x = "Mean Score Change (After − Before)",
y = "Frequency",
title = "Bootstrap Distribution of Mean Performance Improvement",
subtitle = paste0("95% CI: [", round(ci_diff[1], 2), ", ", round(ci_diff[2], 2),
"] | P(Improvement > 0) = ",
scales::percent(prob_improve, accuracy = 0.1))
)Bootstrap distribution of the mean score improvement. The red dashed line marks zero (no change); the shaded band is the 95% CI.
Result
The mean performance improvement is 4.67 points, and the 95% bootstrap CI of [2.83, 6] lies entirely above zero, providing strong evidence that the training programme improved performance. The simulation gives a 100.0% probability of a genuine improvement.
Paired vs. Independent Samples
When the same subjects are measured twice (before/after), always work with the differences rather than resampling the two groups independently. This removes between-subject variability and sharpens the inference.
| Case | Statistic | Bootstrap Goal | Sample Design |
|---|---|---|---|
| 1 — Manufacturing | Difference in means | P(high temp > low temp) | Two independent groups |
| 2 — Healthcare | Difference in proportions | P(drug > placebo) | Two independent groups |
| 3 — Finance | Beta (regression slope) | CI for each beta; P(A > B) | Paired time series (rows) |
| 4 — Retail Banking | Median | 95% CI for median default rate | One sample |
| 5 — Human Resources | Mean paired difference | P(improvement > 0) | Paired (before/after) |
(Healthcare) 40 patients were treated with Drug X; 28 recovered. 35 patients received a placebo; 18 recovered. What is the probability that Drug X’s recovery rate is at least 10 percentage points higher than the placebo?
(Logistics) A courier company records delivery times (in hours) for two routes: Route A (15 observations) and Route B (18 observations). Use bootstrapping to construct 95% confidence intervals for the median delivery time on each route and test whether Route A is faster.
(Retail) A supermarket chain collected daily revenue data for 52 Saturdays and 52 Sundays. Use bootstrapping to estimate the probability that average Saturday revenue exceeds average Sunday revenue.
(Finance) You have 24 months of returns for a fund. Compute a 95% bootstrap confidence interval for the fund’s Sharpe ratio (mean return ÷ standard deviation of returns).
(HR / Organisational) Eight factory workers’ cholesterol levels were measured before and after a workplace wellness talk. Use bootstrapping on the paired differences to estimate the probability that the talk led to a reduction in cholesterol.
| Worker | Before | After |
|---|---|---|
| 1 | 220 | 210 |
| 2 | 195 | 198 |
| 3 | 250 | 210 |
| 4 | 200 | 199 |
| 5 | 220 | 224 |
| 6 | 260 | 212 |
| 7 | 175 | 179 |
| 8 | 198 | 184 |
The percentile method (used throughout this tutorial) directly uses the 2.5th and 97.5th percentiles of the bootstrap distribution as confidence interval bounds. It is simple and interpretable.
The bias-corrected and accelerated (BCa) bootstrap
adjusts for skewness and bias in the bootstrap distribution. For
moderate-to-large samples with symmetric statistics, the two methods
give very similar results. The boot package in R provides
BCa intervals if needed:
Sampling without replacement would just reproduce the original dataset every time. Sampling with replacement allows some observations to appear 0, 1, 2, or more times in each resample — mimicking the randomness that would occur if you could draw a fresh sample from the population.
Tutorial developed for Data Analytics II, Lagos Business
School. All simulations use set.seed() for
reproducibility.