1 Introduction

Statistical inference typically relies on theoretical probability distributions (t-distributions, F-distributions, etc.) that depend on assumptions we cannot always verify. Bootstrapping Monte Carlo simulation offers a powerful, assumption-light alternative: instead of deriving a formula, we let the data speak for itself by resampling it thousands of times.

This tutorial walks you through the core ideas and applies them to real business problems drawn from manufacturing, healthcare, finance, retail, and human resources.


2 What Is Bootstrapping?

Bootstrapping is a resampling technique in which we draw repeated samples with replacement from our observed data. Each resample is the same size as the original dataset, but because sampling is done with replacement, some observations will appear multiple times and others not at all.

Monte Carlo simulation refers to the broader strategy of using many random trials to estimate a quantity (probability, mean, confidence interval) that is analytically difficult to compute.

Together, Bootstrapping Monte Carlo Simulation:

2.1 Intuition

Imagine you collect 9 measurements and want to know “how variable is my sample mean?” Rather than assuming normality and applying a t-formula, you:

  1. Draw 9 values from your 9 observations with replacement → one resample.
  2. Compute the mean of that resample.
  3. Repeat 10,000 times.
  4. The histogram of those 10,000 means is your sampling distribution — no assumptions needed.
set.seed(3301)
original <- c(39, 41, 38, 45, 36, 42, 37, 40, 40)

boot_means <- replicate(10000, mean(sample(original, replace = TRUE)))

ci <- quantile(boot_means, c(0.025, 0.975))

ggplot(data.frame(m = boot_means), aes(x = m)) +
  geom_histogram(bins = 40, fill = "steelblue", colour = "white", alpha = 0.85) +
  annotate("rect",
    xmin = ci[1], xmax = ci[2], ymin = 0, ymax = Inf,
    fill = "orange", alpha = 0.2
  ) +
  geom_vline(xintercept = mean(original), colour = "firebrick", linewidth = 1) +
  labs(
    x = "Bootstrap Sample Mean",
    y = "Frequency",
    title = "Bootstrap Distribution of the Sample Mean",
    subtitle = paste0("Red line = observed mean (", round(mean(original), 2),
                      "); shaded band = 95% CI [",
                      round(ci[1], 2), ", ", round(ci[2], 2), "]")
  )
Bootstrap sampling distribution of the mean from 9 observations. The shaded band marks the 95% bootstrap confidence interval.

Bootstrap sampling distribution of the mean from 9 observations. The shaded band marks the 95% bootstrap confidence interval.


3 When to Use Bootstrapping

Use bootstrapping when:

Situation Why bootstrapping helps
Small sample sizes Avoids reliance on the Central Limit Theorem
Non-normal data Theoretical distributions may not apply
Complex statistics Median, IQR, Sharpe ratio — no standard SE formula
Comparing two groups without distributional assumptions More robust than t-test under violations
Estimating probability that A beats B Direct probability statement from simulations

Bootstrapping is not a cure-all. It assumes your sample is representative of the population. With very small or biased samples, the bootstrap distribution will also be biased.


4 Core R Pattern

Every bootstrapping problem in this tutorial follows the same skeleton:

n_boot <- 10000   # number of bootstrap replications

boot_stat <- replicate(n_boot, {
  # 1. Resample with replacement
  resample <- sample(data_vector, size = length(data_vector), replace = TRUE)
  # 2. Compute the statistic of interest
  mean(resample)  # replace with your statistic
})

# 3. Summarise
quantile(boot_stat, c(0.025, 0.975))  # 95% confidence interval

For two-sample comparisons (testing whether Group A beats Group B), a slight variant is used: resample each group independently and check whether the difference statistic (e.g., mean_A − mean_B) exceeds zero.


5 Case 1 — Manufacturing: Product Yield at High vs. Low Temperature

5.1 Problem

A production engineer runs 9 batches at a high process temperature and 7 batches at a low temperature, recording the product yield from each. The mean yield at high temperature (39.74) exceeds that at low temperature (32.27), but is this difference real or just sampling noise?

Question: What is the probability that high-temperature production yields more than low-temperature production?

5.2 Data

high_temp <- c(39, 41, 38, 45, 36, 42, 37, 40, 40)
low_temp  <- c(30, 33, 34, 31, 34, 30, 34, 35, 32)

cat("High temperature — n =", length(high_temp),
    "| mean =", round(mean(high_temp), 2), "\n")
## High temperature — n = 9 | mean = 39.78
cat("Low  temperature — n =", length(low_temp),
    "| mean =", round(mean(low_temp), 2), "\n")
## Low  temperature — n = 9 | mean = 32.56

5.3 Bootstrap Simulation

set.seed(5512)
n_boot <- 10000

high_beats_low <- replicate(n_boot, {
  mean(sample(high_temp, replace = TRUE)) >
  mean(sample(low_temp,  replace = TRUE))
})

prob_high_better <- mean(high_beats_low)
cat("Estimated probability that high temperature yields more:", 
    scales::percent(prob_high_better, accuracy = 0.1), "\n")
## Estimated probability that high temperature yields more: 100.0%
boot_diffs <- replicate(n_boot, {
  mean(sample(high_temp, replace = TRUE)) -
  mean(sample(low_temp,  replace = TRUE))
})

ggplot(data.frame(d = boot_diffs), aes(x = d)) +
  geom_histogram(bins = 50, fill = "steelblue", colour = "white", alpha = 0.85) +
  geom_vline(xintercept = 0, colour = "firebrick", linewidth = 1, linetype = "dashed") +
  annotate("text", x = 0.3, y = 700, label = "← Low better  |  High better →",
           size = 3.5, colour = "gray30") +
  labs(
    x = "Difference in Means (High − Low)",
    y = "Frequency",
    title = "Bootstrap Distribution: High Temp vs. Low Temp Yield Difference",
    subtitle = paste0("Proportion of iterations where High > Low: ",
                      scales::percent(prob_high_better, accuracy = 0.1))
  )
Distribution of (mean_high − mean_low) across 10,000 bootstrap iterations. Values above zero favour high temperature.

Distribution of (mean_high − mean_low) across 10,000 bootstrap iterations. Values above zero favour high temperature.

5.4 Interpretation

Result

The simulation shows a 100.0% probability that high-temperature processing yields more than low-temperature processing. This is strong evidence — though not certainty — in favour of high-temperature production.


6 Case 2 — Healthcare: Drug vs. Placebo Effectiveness

6.1 Problem

A pharmaceutical company tests a new influenza drug. Of 34 patients who received the drug, 25 felt better and 9 felt worse. Of 19 patients who received a placebo, 11 felt better and 8 felt worse.

Question: What is the probability that the drug is more effective than the placebo (i.e., produces a higher recovery rate)?

6.2 Data

# Represent each patient: 1 = felt better, 0 = felt worse
drug    <- c(rep(1, 25), rep(0, 9))
placebo <- c(rep(1, 11), rep(0, 8))

cat("Drug group    — n =", length(drug),
    "| recovery rate =", scales::percent(mean(drug), accuracy = 0.1), "\n")
## Drug group    — n = 34 | recovery rate = 73.5%
cat("Placebo group — n =", length(placebo),
    "| recovery rate =", scales::percent(mean(placebo), accuracy = 0.1), "\n")
## Placebo group — n = 19 | recovery rate = 57.9%

6.3 Bootstrap Simulation

set.seed(7743)

drug_beats_placebo <- replicate(10000, {
  mean(sample(drug,    replace = TRUE)) >
  mean(sample(placebo, replace = TRUE))
})

prob_drug_better <- mean(drug_beats_placebo)
cat("Probability that drug recovery rate exceeds placebo:",
    scales::percent(prob_drug_better, accuracy = 0.1), "\n")
## Probability that drug recovery rate exceeds placebo: 86.6%
boot_rate_diffs <- replicate(10000, {
  mean(sample(drug,    replace = TRUE)) -
  mean(sample(placebo, replace = TRUE))
})

ci_drug <- quantile(boot_rate_diffs, c(0.025, 0.975))

ggplot(data.frame(d = boot_rate_diffs), aes(x = d)) +
  geom_histogram(bins = 50, fill = "seagreen", colour = "white", alpha = 0.85) +
  geom_vline(xintercept = 0, colour = "firebrick", linewidth = 1, linetype = "dashed") +
  labs(
    x = "Difference in Recovery Rates (Drug − Placebo)",
    y = "Frequency",
    title = "Bootstrap Distribution: Drug vs. Placebo Recovery Rate",
    subtitle = paste0("95% CI: [", round(ci_drug[1], 3), ", ", round(ci_drug[2], 3),
                      "]  |  P(Drug > Placebo) = ",
                      scales::percent(prob_drug_better, accuracy = 0.1))
  )
Bootstrap distribution of the difference in recovery rates (Drug − Placebo). Values to the right of the red line indicate the drug outperforms the placebo.

Bootstrap distribution of the difference in recovery rates (Drug − Placebo). Values to the right of the red line indicate the drug outperforms the placebo.

6.4 Interpretation

Result

There is a 86.6% probability that the drug’s recovery rate exceeds the placebo’s, with a 95% bootstrap CI of [-0.113, 0.426] for the rate difference. The lower bound of the CI being near zero calls for caution about the drug’s efficacy.

Learning Point: Proportions as Means

Binary outcomes (recovered / not recovered) can be treated as a numeric variable of 0s and 1s. The mean of such a vector is the proportion, so all our bootstrap machinery applies directly.


7 Case 3 — Finance: Estimating a Stock’s Beta with Uncertainty

7.1 Problem

A fund manager wants to know the beta (systematic risk coefficient) for two stocks — Stock A (a tech company, similar to MSFT) and Stock B (a consumer staples company, similar to PFE) — relative to the market. Beta is estimated as the slope of a regression of stock returns on market returns.

Question: What is the probability that Stock A has a higher beta than Stock B? Provide a 95% confidence interval for each stock’s beta.

7.2 Data (Simulated Monthly Returns — 3 years)

set.seed(1984)
n_months <- 36
market_returns <- rnorm(n_months, mean = 0.008, sd = 0.045)

# Stock A: high-beta tech stock (beta ≈ 1.4)
stock_a <- 1.4 * market_returns + rnorm(n_months, 0, 0.03)

# Stock B: low-beta defensive stock (beta ≈ 0.6)
stock_b <- 0.6 * market_returns + rnorm(n_months, 0, 0.02)

returns_df <- data.frame(market = market_returns, stock_a, stock_b)

cat("OLS Beta — Stock A:", round(coef(lm(stock_a ~ market_returns))[2], 3), "\n")
## OLS Beta — Stock A: 1.598
cat("OLS Beta — Stock B:", round(coef(lm(stock_b ~ market_returns))[2], 3), "\n")
## OLS Beta — Stock B: 0.584

7.3 Bootstrap Simulation

set.seed(4421)

boot_betas <- replicate(10000, {
  # Resample row indices to keep market/stock pairs together
  idx    <- sample(n_months, replace = TRUE)
  mkt    <- market_returns[idx]
  beta_a <- unname(coef(lm(stock_a[idx] ~ mkt))[2])
  beta_b <- unname(coef(lm(stock_b[idx] ~ mkt))[2])
  c(beta_a, beta_b)
}, simplify = TRUE)

beta_a_boot <- boot_betas[1, ]
beta_b_boot <- boot_betas[2, ]

ci_a <- quantile(beta_a_boot, c(0.025, 0.975))
ci_b <- quantile(beta_b_boot, c(0.025, 0.975))

prob_a_higher <- mean(beta_a_boot > beta_b_boot)

cat("Stock A beta — 95% CI: [", round(ci_a[1], 3), ",", round(ci_a[2], 3), "]\n")
## Stock A beta — 95% CI: [ 1.384 , 1.793 ]
cat("Stock B beta — 95% CI: [", round(ci_b[1], 3), ",", round(ci_b[2], 3), "]\n")
## Stock B beta — 95% CI: [ 0.453 , 0.7 ]
cat("P(Beta_A > Beta_B):", scales::percent(prob_a_higher, accuracy = 0.1), "\n")
## P(Beta_A > Beta_B): 100.0%
bind_rows(
  data.frame(beta = beta_a_boot, stock = "Stock A (Tech)"),
  data.frame(beta = beta_b_boot, stock = "Stock B (Defensive)")
) |>
  ggplot(aes(x = beta, fill = stock)) +
  geom_histogram(bins = 50, colour = "white", alpha = 0.8, position = "identity") +
  facet_wrap(~stock, ncol = 1, scales = "free_y") +
  labs(
    x = "Bootstrap Beta Estimate",
    y = "Frequency",
    title = "Bootstrap Distribution of Stock Betas",
    fill = NULL
  ) +
  theme(legend.position = "none")
Bootstrap distributions of beta estimates for Stock A (tech) and Stock B (defensive). Dashed lines mark the respective 95% CIs.

Bootstrap distributions of beta estimates for Stock A (tech) and Stock B (defensive). Dashed lines mark the respective 95% CIs.

7.4 Interpretation

Result

Stock A’s beta 95% CI of [1.38, 1.79] is clearly above Stock B’s CI of [0.45, 0.7]. The simulation confirms a 100.0% probability that Stock A has higher systematic risk than Stock B.

Key Insight: Resampling Pairs

When two variables are paired (e.g., market return and stock return for the same month), always resample rows together — not the two vectors independently — to preserve the correlation structure.


8 Case 4 — Retail Banking: Estimating Median Customer Loan Default Rate

8.1 Problem

A retail bank’s credit risk team has data on default rates (proportion of loans defaulted) across 40 branch portfolios. The team wants to report the median default rate with a confidence interval to present to regulators. There is no standard formula for the SE of a median, making this an ideal bootstrap problem.

Question: Construct a 95% bootstrap confidence interval for the median branch-level default rate.

8.2 Data

set.seed(9921)
# Simulate 40 branch default rates; right-skewed distribution typical in credit risk
default_rates <- rbeta(40, shape1 = 2, shape2 = 20)  # mean ≈ 9%
default_rates <- round(default_rates, 4)

cat("n branches:", length(default_rates), "\n")
## n branches: 40
cat("Mean default rate:  ", scales::percent(mean(default_rates), accuracy = 0.01), "\n")
## Mean default rate:   9.04%
cat("Median default rate:", scales::percent(median(default_rates), accuracy = 0.01), "\n")
## Median default rate: 7.50%
cat("Std deviation:      ", scales::percent(sd(default_rates), accuracy = 0.01), "\n")
## Std deviation:       6.70%
ggplot(data.frame(r = default_rates), aes(x = r)) +
  geom_histogram(bins = 20, fill = "coral", colour = "white", alpha = 0.85) +
  geom_vline(xintercept = median(default_rates), colour = "firebrick", linewidth = 1) +
  geom_vline(xintercept = mean(default_rates), colour = "steelblue",
             linewidth = 1, linetype = "dashed") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    x = "Default Rate",
    y = "Number of Branches",
    title = "Branch-Level Loan Default Rates",
    subtitle = "Red = median, Blue dashed = mean"
  )
Distribution of branch-level default rates. The distribution is right-skewed, which makes the mean a poor summary and the median preferable.

Distribution of branch-level default rates. The distribution is right-skewed, which makes the mean a poor summary and the median preferable.

8.3 Bootstrap Confidence Interval for the Median

set.seed(3378)

boot_medians <- replicate(10000,
  median(sample(default_rates, replace = TRUE))
)

ci_median <- quantile(boot_medians, c(0.025, 0.975))

cat("Observed median default rate:", scales::percent(median(default_rates), 0.01), "\n")
## Observed median default rate: 7.50%
cat("95% bootstrap CI: [",
    scales::percent(ci_median[1], 0.01), ",",
    scales::percent(ci_median[2], 0.01), "]\n")
## 95% bootstrap CI: [ 5.92% , 9.25% ]
ggplot(data.frame(m = boot_medians), aes(x = m)) +
  geom_histogram(bins = 40, fill = "coral", colour = "white", alpha = 0.85) +
  annotate("rect",
    xmin = ci_median[1], xmax = ci_median[2], ymin = 0, ymax = Inf,
    fill = "gold", alpha = 0.3
  ) +
  geom_vline(xintercept = median(default_rates), colour = "firebrick", linewidth = 1) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(
    x = "Bootstrap Median Default Rate",
    y = "Frequency",
    title = "Bootstrap CI for Median Branch Default Rate",
    subtitle = paste0("95% CI: [",
                      scales::percent(ci_median[1], 0.01), ", ",
                      scales::percent(ci_median[2], 0.01), "]")
  )
Bootstrap distribution of the median branch default rate. The shaded region shows the 95% confidence interval.

Bootstrap distribution of the median branch default rate. The shaded region shows the 95% confidence interval.

8.4 Interpretation

Result

The median branch default rate is 7.50% with a 95% bootstrap confidence interval of [5.92%, 9.25%]. The bank can report this interval to regulators as a robust, assumption-free estimate.

Why Bootstrap Here?

The median has no simple standard error formula when the population distribution is unknown. The bootstrap solves this elegantly by treating the observed distribution as the population.


9 Case 5 — Human Resources: Impact of a Training Programme

9.1 Problem

An HR manager runs a leadership training programme for 12 employees. Performance scores (out of 100) are recorded before and after the programme. The manager wants to know: What is the probability that the training genuinely improved performance?

This is a paired comparison — each employee serves as their own control — so we bootstrap the differences.

9.2 Data

before <- c(62, 71, 55, 68, 74, 60, 58, 80, 66, 73, 69, 57)
after  <- c(67, 75, 60, 72, 70, 68, 65, 85, 71, 78, 74, 64)
diff   <- after - before

employee_df <- data.frame(
  Employee = paste0("E", seq_along(before)),
  Before = before,
  After  = after,
  Change = diff
)

employee_df
##    Employee Before After Change
## 1        E1     62    67      5
## 2        E2     71    75      4
## 3        E3     55    60      5
## 4        E4     68    72      4
## 5        E5     74    70     -4
## 6        E6     60    68      8
## 7        E7     58    65      7
## 8        E8     80    85      5
## 9        E9     66    71      5
## 10      E10     73    78      5
## 11      E11     69    74      5
## 12      E12     57    64      7

9.3 Bootstrap Simulation

set.seed(6614)

boot_mean_diff <- replicate(10000,
  mean(sample(diff, replace = TRUE))
)

ci_diff      <- quantile(boot_mean_diff, c(0.025, 0.975))
prob_improve <- mean(boot_mean_diff > 0)

cat("Mean score change:  ", round(mean(diff), 2), "points\n")
## Mean score change:   4.67 points
cat("95% bootstrap CI:  [", round(ci_diff[1], 2), ",", round(ci_diff[2], 2), "]\n")
## 95% bootstrap CI:  [ 2.83 , 6 ]
cat("P(training improved scores):", scales::percent(prob_improve, accuracy = 0.1), "\n")
## P(training improved scores): 100.0%
employee_df |>
  pivot_longer(cols = c(Before, After), names_to = "Period", values_to = "Score") |>
  mutate(Period = factor(Period, levels = c("Before", "After"))) |>
  ggplot(aes(x = Period, y = Score, group = Employee)) +
  geom_line(colour = "gray60") +
  geom_point(aes(colour = Period), size = 3) +
  labs(
    title = "Employee Performance Scores Before and After Training",
    x = NULL,
    y = "Performance Score",
    colour = NULL
  )
Before-and-after performance scores for each employee. Lines connect the same individual's scores.

Before-and-after performance scores for each employee. Lines connect the same individual’s scores.

ggplot(data.frame(d = boot_mean_diff), aes(x = d)) +
  geom_histogram(bins = 50, fill = "mediumpurple", colour = "white", alpha = 0.85) +
  annotate("rect",
    xmin = ci_diff[1], xmax = ci_diff[2], ymin = 0, ymax = Inf,
    fill = "gold", alpha = 0.25
  ) +
  geom_vline(xintercept = 0, colour = "firebrick", linewidth = 1, linetype = "dashed") +
  labs(
    x = "Mean Score Change (After − Before)",
    y = "Frequency",
    title = "Bootstrap Distribution of Mean Performance Improvement",
    subtitle = paste0("95% CI: [", round(ci_diff[1], 2), ", ", round(ci_diff[2], 2),
                      "]  |  P(Improvement > 0) = ",
                      scales::percent(prob_improve, accuracy = 0.1))
  )
Bootstrap distribution of the mean score improvement. The red dashed line marks zero (no change); the shaded band is the 95% CI.

Bootstrap distribution of the mean score improvement. The red dashed line marks zero (no change); the shaded band is the 95% CI.

9.4 Interpretation

Result

The mean performance improvement is 4.67 points, and the 95% bootstrap CI of [2.83, 6] lies entirely above zero, providing strong evidence that the training programme improved performance. The simulation gives a 100.0% probability of a genuine improvement.

Paired vs. Independent Samples

When the same subjects are measured twice (before/after), always work with the differences rather than resampling the two groups independently. This removes between-subject variability and sharpens the inference.


10 Bringing It All Together: Key Principles

Summary of all five cases.
Case Statistic Bootstrap Goal Sample Design
1 — Manufacturing Difference in means P(high temp > low temp) Two independent groups
2 — Healthcare Difference in proportions P(drug > placebo) Two independent groups
3 — Finance Beta (regression slope) CI for each beta; P(A > B) Paired time series (rows)
4 — Retail Banking Median 95% CI for median default rate One sample
5 — Human Resources Mean paired difference P(improvement > 0) Paired (before/after)

10.1 Rules of Thumb

  1. Independent groups → resample each group separately, compute the difference statistic each time.
  2. Paired data → compute differences first, then bootstrap the differences as a single vector.
  3. Time-series or regression data → resample rows (observation pairs) to preserve correlation.
  4. One-sample CI → bootstrap the sample directly; use percentile method for the CI.
  5. Number of replications → 10,000 is sufficient for most problems; use 50,000 for precise tail probabilities.

11 Practice Problems

  1. (Healthcare) 40 patients were treated with Drug X; 28 recovered. 35 patients received a placebo; 18 recovered. What is the probability that Drug X’s recovery rate is at least 10 percentage points higher than the placebo?

  2. (Logistics) A courier company records delivery times (in hours) for two routes: Route A (15 observations) and Route B (18 observations). Use bootstrapping to construct 95% confidence intervals for the median delivery time on each route and test whether Route A is faster.

  3. (Retail) A supermarket chain collected daily revenue data for 52 Saturdays and 52 Sundays. Use bootstrapping to estimate the probability that average Saturday revenue exceeds average Sunday revenue.

  4. (Finance) You have 24 months of returns for a fund. Compute a 95% bootstrap confidence interval for the fund’s Sharpe ratio (mean return ÷ standard deviation of returns).

  5. (HR / Organisational) Eight factory workers’ cholesterol levels were measured before and after a workplace wellness talk. Use bootstrapping on the paired differences to estimate the probability that the talk led to a reduction in cholesterol.

    Worker Before After
    1 220 210
    2 195 198
    3 250 210
    4 200 199
    5 220 224
    6 260 212
    7 175 179
    8 198 184

12 Technical Notes

12.1 The Percentile Bootstrap vs. the BCa Bootstrap

The percentile method (used throughout this tutorial) directly uses the 2.5th and 97.5th percentiles of the bootstrap distribution as confidence interval bounds. It is simple and interpretable.

The bias-corrected and accelerated (BCa) bootstrap adjusts for skewness and bias in the bootstrap distribution. For moderate-to-large samples with symmetric statistics, the two methods give very similar results. The boot package in R provides BCa intervals if needed:

library(boot)

# Statistic function must accept data and index vector
mean_stat <- function(x, i) mean(x[i])

b <- boot(data = high_temp, statistic = mean_stat, R = 10000)
boot.ci(b, type = "bca")

12.2 Why “With Replacement”?

Sampling without replacement would just reproduce the original dataset every time. Sampling with replacement allows some observations to appear 0, 1, 2, or more times in each resample — mimicking the randomness that would occur if you could draw a fresh sample from the population.


Tutorial developed for Data Analytics II, Lagos Business School. All simulations use set.seed() for reproducibility.