This workbook provides worked examples (with code, explanations, and interpretation prompts) for: - Normal distribution & CLT intuition - Student’s t distribution (small-n inference) - Chi-square distribution (variance inference) - Sampling distributions (means, proportions) and how they drive confidence intervals and tests
Each topic is grounded in a concrete case study rather than “agnostic random data,” so the scenarios map to realistic decisions a practitioner might face.
Context. A hospital lab is tracking turnaround time (minutes from sample receipt to verified result) for a common blood test. Operations wants to certify that average TAT ≤ 60 minutes under normal staffing. We analyze one month of data, stratified by shift.
We’ll also connect to A/B-style sampling distributions with a website conversion example later.
library(tidyverse)
library(scales)
library(patchwork) # for multi-panel plots (install.packages("patchwork") if needed)
Many process means (e.g., daily average TAT) are approximately normal via the Central Limit Theorem (CLT), even when raw individual times are skewed. This enables normal-based intervals/tests on means and simulation of predictive scenarios.
Below we simulate 30 days of observations by shift (Day, Evening, Night). We include slight right skew on Night shift to reflect occasional staffing issues. (We keep parameters explicit so you can modify them to match your site.)
set.seed(11)
n_day <- 420 # samples per shift (month aggregated)
n_eve <- 420
n_night <- 380
# Baseline means (mins) and SDs per shift
mu <- c(Day = 56, Evening = 58, Night = 62)
sd <- c(Day = 11, Evening = 12, Night = 15)
# Generate: Day & Evening ~ approximately normal; Night ~ mildly skewed
tat_day <- rnorm(n_day, mean = mu["Day"], sd = sd["Day"])
tat_even <- rnorm(n_eve, mean = mu["Evening"], sd = sd["Evening"])
tat_night <- rlnorm(n_night, meanlog = log(mu["Night"]) - 0.5^2/2, sdlog = 0.5) # induces right skew
tat <- tibble(
shift = c(rep("Day", n_day), rep("Evening", n_eve), rep("Night", n_night)),
minutes = c(tat_day, tat_even, tat_night)
) %>%
filter(minutes > 5, minutes < 240) # trim impossible outliers
tat_summary <- tat %>%
group_by(shift) %>%
summarise(n = n(), mean = mean(minutes), sd = sd(minutes), .groups = "drop")
tat_summary
Before applying formal statistical tests, it is useful to visually
inspect the raw data.
Descriptive visualization allows us to quickly see patterns,
spread, and differences between groups, and provides intuition
for whether differences may be meaningful.
In this case, we display histograms and bar charts with error bars
to:
- Identify whether the distributions appear symmetric or skewed.
- Compare group means and assess overlap of confidence intervals.
- Detect potential outliers or unusual patterns that may influence
results.
Visual exploration does not replace inference, but it sets the stage for interpreting subsequent analyses and helps communicate results more intuitively to non-technical audiences.
p1 <- ggplot(tat, aes(minutes, fill = shift)) +
geom_histogram(alpha = 0.7, bins = 50, position = "identity") +
facet_wrap(~shift, ncol = 3, scales = "fixed") + # 3 columns = side by side
scale_fill_manual(values = c("Day" = "#2E86AB", "Evening" = "#F18F01", "Night" = "#7CB518")) +
labs(title = "Lab Turnaround Time by Shift", x = "Minutes", y = "Count") +
theme_minimal(base_size = 13) +
theme(legend.position = "none")
p2 <- tat_summary %>%
ggplot(aes(shift, mean, fill = shift)) +
geom_col(width = 0.6) +
geom_errorbar(aes(ymin = mean - 1.96*sd/sqrt(n),
ymax = mean + 1.96*sd/sqrt(n)),
width = 0.12, linewidth = 0.8) +
geom_text(aes(label = sprintf("%.1f", mean)),
vjust = -0.7, fontface = "bold") +
scale_fill_manual(values = c("Day" = "#2E86AB",
"Evening" = "#F18F01",
"Night" = "#7CB518")) +
scale_y_continuous(limits = c(0, 80),
expand = expansion(mult = c(0, 0.05))) + # extend y up to 80
labs(title = "Shift Means with 95% CI (Normal Approx.)",
x = NULL, y = "Mean Minutes") +
theme_minimal(base_size = 13) +
theme(legend.position = "none")
p1 / p2
Interpretation. Day and Evening look roughly symmetric; Night shows visible right tail. Mean TAT is highest for Night. The error bars suggest Night’s mean is likely above the 60-min target; we’ll formalize this.
dnorm / pnorm / qnorm / rnorm support density, CDF, quantiles, and random sampling.
mu0 <- 60; sigma0 <- 12
xgrid <- seq(15, 105, by = 0.25)
dens <- dnorm(xgrid, mean = mu0, sd = sigma0)
cdf70 <- pnorm(70, mean = mu0, sd = sigma0)
q975 <- qnorm(0.975, mean = mu0, sd = sigma0)
tibble(x = xgrid, dens = dens) %>%
ggplot(aes(x, dens)) +
# Shaded right tail
geom_area(data = subset(tibble(x = xgrid, dens = dens), x >= q975),
aes(x, dens), fill = "orange", alpha = 0.3) +
# Main density curve
geom_line(color = "#2E86AB", linewidth = 1) +
# Annotations
geom_vline(xintercept = 70, color = "#F18F01", linetype = "dashed") +
annotate("text", x = 72, y = max(dens)*0.8,
label = paste0("P(X ≤ 70) = ", percent(cdf70)), hjust = 0) +
geom_vline(xintercept = q975, color = "#7CB518", linetype = "dotted") +
annotate("text", x = q975 + 1, y = max(dens)*0.6,
label = paste0("97.5th pct ≈ ", round(q975,1), " min"), hjust = 0) +
labs(title = "Normal PDF with Shaded Right Tail",
x = "Minutes", y = "Density") +
theme_minimal(base_size = 13)
Extension questions. 1) If the target is <= 60, what proportion of days exceed 75 mins under this baseline?
2) How sensitive are these exceedance probabilities to σ?
We examine the sampling distribution for the Night shift mean across repeated samples of size n. Even with skewed raw data, the sample mean tends toward normality as n grows.
night <- tat %>% filter(shift == "Night") %>% pull(minutes)
sample_mean <- function(x, n) mean(sample(x, n, replace = TRUE))
ns <- c(5, 20, 50, 100)
B <- 4000
sim_df <- map_dfr(ns, function(n){
tibble(n = n, mean_hat = replicate(B, sample_mean(night, n)))
})
ggplot(sim_df, aes(mean_hat)) +
geom_histogram(bins = 60, fill = "#D1E8E2", color = "grey30") +
facet_wrap(~n, scales = "free_y") +
labs(title = "CLT in Action for Night Shift Means", x = "Sample mean (minutes)", y = "Count") +
theme_minimal(base_size = 13)
Interpretation. As n increases, the distribution of sample means tightens and becomes more symmetric/normal, despite skew in the raw Night data.
Extension questions. - At what n would you be comfortable using a t-based confidence interval for the Night mean? - Compare bootstrap percentile intervals vs. t-intervals for n = 20 and n = 50.
When the population σ is unknown (common), and n is modest, inference for the mean uses the t-distribution. Heavier tails reflect extra uncertainty in estimating σ.
night_stats <- tibble(
n = length(night),
xbar = mean(night),
s = sd(night),
se = s / sqrt(n)
)
night_stats
t_res <- t.test(night, mu = 60, alternative = "greater") # H0: mean = 60
t_res
##
## One Sample t-test
##
## data: night
## t = 0.78516, df = 377, p-value = 0.2164
## alternative hypothesis: true mean is greater than 60
## 95 percent confidence interval:
## 58.56078 Inf
## sample estimates:
## mean of x
## 61.30828
The tables and test results provide a structured summary of our data and inferential steps:
Although the observed Night shift mean (61.3) is slightly above 60, the p-value > 0.05 means we do not have strong statistical evidence that the true mean exce
Interpretation. If the p-value is small and the CI lies above 60, Night shift mean TAT likely exceeds target. Decide whether process changes (staffing, automation) are warranted.
Extension questions. - Repeat for Day and Evening. Is only Night problematic? - Use Welch two-sample t to compare Evening vs Day. Is the difference operationally meaningful (not just statistically)?
df <- 15
x <- seq(-4, 4, by = 0.01)
nd <- dnorm(x); td <- dt(x, df = df)
tibble(x, Normal = nd, t_df = td) %>%
pivot_longer(-x, names_to = "dist", values_to = "dens") %>%
ggplot(aes(x, dens, color = dist)) +
geom_line(linewidth = 1) +
scale_color_manual(values = c("Normal" = "#2E86AB", "t_df" = "#F18F01")) +
labs(title = "Normal vs t(df=15)", x = "z / t", y = "Density", color = NULL) +
theme_minimal(base_size = 13)
To assess process variability, we can form CIs for σ² using the chi-square distribution (assumes normality). This matters because variability affects staffing buffers and SLA risk.
day <- tat %>% filter(shift == "Day") %>% pull(minutes)
n <- length(day)
s2 <- var(day)
alpha <- 0.05
# Chi-square quantiles
qL <- qchisq(1 - alpha/2, df = n - 1)
qU <- qchisq(alpha/2, df = n - 1)
ci_var <- c((n - 1)*s2/qL, (n - 1)*s2/qU)
ci_sd <- sqrt(ci_var)
list(variance_CI = ci_var, sd_CI = ci_sd)
## $variance_CI
## [1] 102.3656 134.2484
##
## $sd_CI
## [1] 10.11759 11.58656
The chi-square method constructs a confidence interval (CI) for the population variance (σ²) of Day shift turnaround times, assuming approximate normality. From our calculation:
What this means:
- The observed sample variance (s²) is our best estimate, but it
fluctuates across samples.
- The CI tells us that, with 95% confidence, the true process variance
(and therefore variability in turnaround times) lies between these two
bounds.
- Translating to SD is more intuitive: it shows the plausible range for
how far individual Day turnaround times typically deviate from the
mean.
If the upper bound of the SD CI is close to or exceeds operational tolerance, then even if the mean is acceptable, occasional large deviations may lead to SLA (service-level agreement) breaches. For the Day shift, the CI likely indicates relatively tight variability compared to Night, reinforcing the impression that Day shift is stable and consistent.
Interpretation. If the upper CI for σ is large, even a “good mean” can still yield frequent SLA misses. Consider variance reduction (process standardization, automation).
R utilities for chi-square
df <- c(5, 10, 30)
x <- seq(0.001, 40, by = 0.01)
dens_df <- map_dfr(df, function(k){
tibble(df = paste0("df=", k), x = x, dens = dchisq(x, df = k))
})
ggplot(dens_df, aes(x, dens, color = df)) +
geom_line(linewidth = 1) +
labs(title = "Chi-square PDFs by Degrees of Freedom", x = expression(chi^2), y = "Density", color = NULL) +
theme_minimal(base_size = 13)
Sampling distributions are a cornerstone of inferential statistics because they describe how a statistic (such as a mean, proportion, or variance) behaves across repeated random samples from the same population.
Rather than focusing only on the observed sample, the sampling
distribution provides the probabilistic framework that
allows us to:
- Quantify uncertainty in estimates (through standard errors).
- Construct confidence intervals for population parameters.
- Conduct hypothesis tests by comparing observed statistics against what
would be expected under the null hypothesis.
In practice, sampling distributions explain why even a single sample can tell us something meaningful about the population. By knowing how statistics vary across samples, we can evaluate whether an observed difference is likely due to chance or reflects a real effect.
In this section, we apply the concept to a concrete scenario—comparing conversion rates in an A/B website experiment. We use the sampling distribution of the difference in sample proportions to test whether the new design significantly improves outcomes.
In digital marketing, even small changes to a website’s design can have measurable impacts on customer behavior. One common approach to testing improvements is an A/B experiment, where traffic is split between two versions of a webpage to compare performance. In this case, the company wants to know if the new homepage (Design B) leads to higher conversion rates than the current homepage (Design A). Over the course of one week, thousands of visitors were randomly directed to either version, and purchases were tracked. The resulting proportions—5.0% for Design A and 6.0% for Design B—suggest a potential improvement. However, the key question is whether this observed difference reflects a true underlying effect or if it could simply be explained by random variation. This is where the concept of sampling distributions becomes critical: they allow us to quantify uncertainty and assess whether the difference is statistically significant and practically meaningful.
Context. Marketing tests a new homepage (B) vs current (A). Over a week: - A: 8,100 visits, 405 purchases (5.0%) - B: 7,900 visits, 474 purchases (6.0%)
We test whether B improves conversion.
A_n <- 8100; A_s <- 405
B_n <- 7900; B_s <- 474
ab <- tibble(
design = c("A", "B"),
purchases = c(A_s, B_s),
visits = c(A_n, B_n),
rate = purchases / visits
)
ab
The tibble summarizes the observed conversion outcomes for the two homepage designs:
At face value, Design B shows an absolute lift of 1 percentage point compared to Design A (from 5% to 6%). In relative terms, this is a 20% improvement in conversion rate. While these differences are encouraging, we must be cautious: they could be due to random chance.
The next step is to formally test whether the observed 1% difference is statistically significant. This requires examining the sampling distribution of the difference in proportions to determine if the improvement is large enough, given the sample sizes, to rule out random variation.
Practical context:
- Even a 1% lift in conversion can translate into significant additional
revenue for high-traffic sites.
- However, if the difference is not statistically reliable, acting on it
prematurely could misallocate resources.
ggplot(ab, aes(design, rate, fill = design)) +
geom_col(width = 0.55) +
geom_errorbar(aes(ymin = rate - 1.96*sqrt(rate*(1-rate)/visits),
ymax = rate + 1.96*sqrt(rate*(1-rate)/visits)),
width = 0.12) +
geom_text(aes(label = percent(rate, accuracy = 0.01)), vjust = -0.7, fontface = "bold") +
scale_fill_manual(values = c("A" = "#2E86AB", "B" = "#F18F01")) +
scale_y_continuous(labels = percent_format()) +
labs(title = "Conversion Rates with 95% CIs", x = NULL, y = "Conversion Rate") +
theme_minimal(base_size = 13) +
theme(legend.position = "none")
prop.test(x = c(A_s, B_s), n = c(A_n, B_n), correct = FALSE)
##
## 2-sample test for equality of proportions without continuity correction
##
## data: c(A_s, B_s) out of c(A_n, B_n)
## X-squared = 7.703, df = 1, p-value = 0.005513
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.017067685 -0.002932315
## sample estimates:
## prop 1 prop 2
## 0.05 0.06
Interpretation. With large n, the sampling distribution of the difference in proportions is approximately normal (CLT for proportions). If the CI excludes 0 and p-value is small, B likely lifts conversion.
Extension questions. - Estimate the minimum detectable effect at 80% power for a 1-week test. - Use a Bayesian Beta-Binomial to compute P(B > A) and compare to the frequentist result.
Practice prompts. 1) For the lab, suppose leadership wants P(TAT ≤ 60) ≥ 0.9. Using the normal approximation with your estimated mean/σ for Day, is the SLA met?
2) For Night, try a log-normal model explicitly and compare the implied mean/variance CI to the t-based one.
3) For A/B, run a sequential analysis (e.g., alpha spending) to see how monitoring frequency changes error rates.