1 Overview

This workbook provides worked examples (with code, explanations, and interpretation prompts) for: - Normal distribution & CLT intuition - Student’s t distribution (small-n inference) - Chi-square distribution (variance inference) - Sampling distributions (means, proportions) and how they drive confidence intervals and tests

Each topic is grounded in a concrete case study rather than “agnostic random data,” so the scenarios map to realistic decisions a practitioner might face.

2 Case Study: Clinical Lab Turnaround Time (TAT)

Context. A hospital lab is tracking turnaround time (minutes from sample receipt to verified result) for a common blood test. Operations wants to certify that average TAT ≤ 60 minutes under normal staffing. We analyze one month of data, stratified by shift.

  • Continuous metric (minutes) → natural to model with continuous distributions
  • Approximate normality often holds for process metrics via aggregation (CLT), but we verify/diagnose it explicitly

We’ll also connect to A/B-style sampling distributions with a website conversion example later.

library(tidyverse)
library(scales)
library(patchwork)   # for multi-panel plots (install.packages("patchwork") if needed)

3 Normal Distribution and Sampling Distributions

3.1 Intro: Why normal models show up

Many process means (e.g., daily average TAT) are approximately normal via the Central Limit Theorem (CLT), even when raw individual times are skewed. This enables normal-based intervals/tests on means and simulation of predictive scenarios.

3.1.1 Synthetic-but-realistic lab TAT data

Below we simulate 30 days of observations by shift (Day, Evening, Night). We include slight right skew on Night shift to reflect occasional staffing issues. (We keep parameters explicit so you can modify them to match your site.)

set.seed(11)

n_day   <- 420   # samples per shift (month aggregated)
n_eve   <- 420
n_night <- 380

# Baseline means (mins) and SDs per shift
mu <- c(Day = 56, Evening = 58, Night = 62)
sd <- c(Day = 11, Evening = 12, Night = 15)

# Generate: Day & Evening ~ approximately normal; Night ~ mildly skewed
tat_day   <- rnorm(n_day,   mean = mu["Day"],   sd = sd["Day"])
tat_even  <- rnorm(n_eve,   mean = mu["Evening"], sd = sd["Evening"])
tat_night <- rlnorm(n_night, meanlog = log(mu["Night"]) - 0.5^2/2, sdlog = 0.5)  # induces right skew

tat <- tibble(
  shift = c(rep("Day", n_day), rep("Evening", n_eve), rep("Night", n_night)),
  minutes = c(tat_day, tat_even, tat_night)
) %>%
  filter(minutes > 5, minutes < 240)   # trim impossible outliers

tat_summary <- tat %>%
  group_by(shift) %>%
  summarise(n = n(), mean = mean(minutes), sd = sd(minutes), .groups = "drop")

tat_summary

3.1.2 Quick descriptive visualization

Before applying formal statistical tests, it is useful to visually inspect the raw data.
Descriptive visualization allows us to quickly see patterns, spread, and differences between groups, and provides intuition for whether differences may be meaningful.

In this case, we display histograms and bar charts with error bars to:
- Identify whether the distributions appear symmetric or skewed.
- Compare group means and assess overlap of confidence intervals.
- Detect potential outliers or unusual patterns that may influence results.

Visual exploration does not replace inference, but it sets the stage for interpreting subsequent analyses and helps communicate results more intuitively to non-technical audiences.

p1 <- ggplot(tat, aes(minutes, fill = shift)) +
  geom_histogram(alpha = 0.7, bins = 50, position = "identity") +
  facet_wrap(~shift, ncol = 3, scales = "fixed") +   # 3 columns = side by side
  scale_fill_manual(values = c("Day" = "#2E86AB", "Evening" = "#F18F01", "Night" = "#7CB518")) +
  labs(title = "Lab Turnaround Time by Shift", x = "Minutes", y = "Count") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none")


p2 <- tat_summary %>%
  ggplot(aes(shift, mean, fill = shift)) +
  geom_col(width = 0.6) +
  geom_errorbar(aes(ymin = mean - 1.96*sd/sqrt(n),
                    ymax = mean + 1.96*sd/sqrt(n)),
                width = 0.12, linewidth = 0.8) +
  geom_text(aes(label = sprintf("%.1f", mean)), 
            vjust = -0.7, fontface = "bold") +
  scale_fill_manual(values = c("Day" = "#2E86AB", 
                               "Evening" = "#F18F01", 
                               "Night" = "#7CB518")) +
  scale_y_continuous(limits = c(0, 80), 
                     expand = expansion(mult = c(0, 0.05))) +  # extend y up to 80
  labs(title = "Shift Means with 95% CI (Normal Approx.)", 
       x = NULL, y = "Mean Minutes") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none")


p1 / p2

Interpretation. Day and Evening look roughly symmetric; Night shows visible right tail. Mean TAT is highest for Night. The error bars suggest Night’s mean is likely above the 60-min target; we’ll formalize this.


3.2 Normal distribution utilities in R

dnorm / pnorm / qnorm / rnorm support density, CDF, quantiles, and random sampling.

mu0 <- 60; sigma0 <- 12
xgrid <- seq(15, 105, by = 0.25)
dens  <- dnorm(xgrid, mean = mu0, sd = sigma0)
cdf70 <- pnorm(70, mean = mu0, sd = sigma0)
q975  <- qnorm(0.975, mean = mu0, sd = sigma0)

tibble(x = xgrid, dens = dens) %>%
  ggplot(aes(x, dens)) +
  # Shaded right tail
  geom_area(data = subset(tibble(x = xgrid, dens = dens), x >= q975),
            aes(x, dens), fill = "orange", alpha = 0.3) +
  # Main density curve
  geom_line(color = "#2E86AB", linewidth = 1) +
  # Annotations
  geom_vline(xintercept = 70, color = "#F18F01", linetype = "dashed") +
  annotate("text", x = 72, y = max(dens)*0.8,
           label = paste0("P(X ≤ 70) = ", percent(cdf70)), hjust = 0) +
  geom_vline(xintercept = q975, color = "#7CB518", linetype = "dotted") +
  annotate("text", x = q975 + 1, y = max(dens)*0.6,
           label = paste0("97.5th pct ≈ ", round(q975,1), " min"), hjust = 0) +
  labs(title = "Normal PDF with Shaded Right Tail",
       x = "Minutes", y = "Density") +
  theme_minimal(base_size = 13)

Extension questions. 1) If the target is <= 60, what proportion of days exceed 75 mins under this baseline?
2) How sensitive are these exceedance probabilities to σ?


3.3 CLT: Sampling distribution of the mean

We examine the sampling distribution for the Night shift mean across repeated samples of size n. Even with skewed raw data, the sample mean tends toward normality as n grows.

night <- tat %>% filter(shift == "Night") %>% pull(minutes)

sample_mean <- function(x, n) mean(sample(x, n, replace = TRUE))

ns <- c(5, 20, 50, 100)
B  <- 4000

sim_df <- map_dfr(ns, function(n){
  tibble(n = n, mean_hat = replicate(B, sample_mean(night, n)))
})

ggplot(sim_df, aes(mean_hat)) +
  geom_histogram(bins = 60, fill = "#D1E8E2", color = "grey30") +
  facet_wrap(~n, scales = "free_y") +
  labs(title = "CLT in Action for Night Shift Means", x = "Sample mean (minutes)", y = "Count") +
  theme_minimal(base_size = 13)

Interpretation. As n increases, the distribution of sample means tightens and becomes more symmetric/normal, despite skew in the raw Night data.

Extension questions. - At what n would you be comfortable using a t-based confidence interval for the Night mean? - Compare bootstrap percentile intervals vs. t-intervals for n = 20 and n = 50.


4 Student’s t Distribution

4.1 Intro

When the population σ is unknown (common), and n is modest, inference for the mean uses the t-distribution. Heavier tails reflect extra uncertainty in estimating σ.

4.1.1 One-sample t test: Is Night shift above 60 minutes?

night_stats <- tibble(
  n  = length(night),
  xbar = mean(night),
  s  = sd(night),
  se = s / sqrt(n)
)

night_stats
t_res <- t.test(night, mu = 60, alternative = "greater")  # H0: mean = 60
t_res
## 
##  One Sample t-test
## 
## data:  night
## t = 0.78516, df = 377, p-value = 0.2164
## alternative hypothesis: true mean is greater than 60
## 95 percent confidence interval:
##  58.56078      Inf
## sample estimates:
## mean of x 
##  61.30828

4.2 Interpretation of Output

The tables and test results provide a structured summary of our data and inferential steps:

  • Shift-level summary (tibble with 3 rows).
    Each row corresponds to one hospital shift (Day, Evening, Night).
    • n gives the sample size.
    • mean is the average turnaround time in minutes.
    • sd is the observed standard deviation.
      We see that Night shift has both the highest mean TAT (≈61 min) and much larger variability (sd ≈ 32) compared to Day and Evening, which are closer to 56–59 minutes with tighter spread.
  • Night shift statistics (tibble with 1 row).
    For the Night data alone:
    • n = 378
    • x̄ = 61.3 minutes (the observed mean)
    • s = 32.4 minutes (the sample standard deviation)
    • se = 1.67 minutes (standard error, i.e., the estimated variability of the mean).
      This standard error will be key in constructing confidence intervals and hypothesis tests.
  • One-sample t-test results.
    The test asks: Is the Night shift mean greater than the 60-minute target?
    • Test statistic: t = 0.79 with df = 377.
    • p-value = 0.2164 (greater than 0.05).
    • 95% CI: (58.6, ∞).
    • Sample estimate of mean: 61.3 minutes.

4.2.1 Analysis

Although the observed Night shift mean (61.3) is slightly above 60, the p-value > 0.05 means we do not have strong statistical evidence that the true mean exce

Interpretation. If the p-value is small and the CI lies above 60, Night shift mean TAT likely exceeds target. Decide whether process changes (staffing, automation) are warranted.

Extension questions. - Repeat for Day and Evening. Is only Night problematic? - Use Welch two-sample t to compare Evening vs Day. Is the difference operationally meaningful (not just statistically)?

4.2.2 R utilities for t distribution

df <- 15
x  <- seq(-4, 4, by = 0.01)
nd <- dnorm(x); td <- dt(x, df = df)

tibble(x, Normal = nd, t_df = td) %>%
  pivot_longer(-x, names_to = "dist", values_to = "dens") %>%
  ggplot(aes(x, dens, color = dist)) +
  geom_line(linewidth = 1) +
  scale_color_manual(values = c("Normal" = "#2E86AB", "t_df" = "#F18F01")) +
  labs(title = "Normal vs t(df=15)", x = "z / t", y = "Density", color = NULL) +
  theme_minimal(base_size = 13)


5 Chi-square Distribution (Variance Inference)

5.1 Intro

To assess process variability, we can form CIs for σ² using the chi-square distribution (assumes normality). This matters because variability affects staffing buffers and SLA risk.

5.1.1 CI for variance: Day shift

day <- tat %>% filter(shift == "Day") %>% pull(minutes)
n   <- length(day)
s2  <- var(day)
alpha <- 0.05

# Chi-square quantiles
qL <- qchisq(1 - alpha/2, df = n - 1)
qU <- qchisq(alpha/2,     df = n - 1)

ci_var <- c((n - 1)*s2/qL, (n - 1)*s2/qU)
ci_sd  <- sqrt(ci_var)
list(variance_CI = ci_var, sd_CI = ci_sd)
## $variance_CI
## [1] 102.3656 134.2484
## 
## $sd_CI
## [1] 10.11759 11.58656

5.1.2 Interpretation of Variance CI (Day Shift)

The chi-square method constructs a confidence interval (CI) for the population variance (σ²) of Day shift turnaround times, assuming approximate normality. From our calculation:

  • Variance CI: lower and upper bounds (on σ²).
  • SD CI: square root of those bounds, giving a range for the population standard deviation (σ).

What this means:
- The observed sample variance (s²) is our best estimate, but it fluctuates across samples.
- The CI tells us that, with 95% confidence, the true process variance (and therefore variability in turnaround times) lies between these two bounds.
- Translating to SD is more intuitive: it shows the plausible range for how far individual Day turnaround times typically deviate from the mean.

5.1.3 Analysis

If the upper bound of the SD CI is close to or exceeds operational tolerance, then even if the mean is acceptable, occasional large deviations may lead to SLA (service-level agreement) breaches. For the Day shift, the CI likely indicates relatively tight variability compared to Night, reinforcing the impression that Day shift is stable and consistent.

5.1.4 Extension Questions

  1. Compare this CI for variance across Day, Evening, and Night. Which shift shows the greatest uncertainty?
  2. If management sets a maximum allowable SD (e.g., 15 minutes), does the CI suggest the Day process is reliably within this tolerance?
  3. How would non-normality (e.g., skewed turnaround times) affect the validity of this chi-square variance CI?

Interpretation. If the upper CI for σ is large, even a “good mean” can still yield frequent SLA misses. Consider variance reduction (process standardization, automation).

R utilities for chi-square

df <- c(5, 10, 30)
x  <- seq(0.001, 40, by = 0.01)

dens_df <- map_dfr(df, function(k){
  tibble(df = paste0("df=", k), x = x, dens = dchisq(x, df = k))
})

ggplot(dens_df, aes(x, dens, color = df)) +
  geom_line(linewidth = 1) +
  labs(title = "Chi-square PDFs by Degrees of Freedom", x = expression(chi^2), y = "Density", color = NULL) +
  theme_minimal(base_size = 13)


6 Sampling Distributions in Practice

6.1 Introduction

Sampling distributions are a cornerstone of inferential statistics because they describe how a statistic (such as a mean, proportion, or variance) behaves across repeated random samples from the same population.

Rather than focusing only on the observed sample, the sampling distribution provides the probabilistic framework that allows us to:
- Quantify uncertainty in estimates (through standard errors).
- Construct confidence intervals for population parameters.
- Conduct hypothesis tests by comparing observed statistics against what would be expected under the null hypothesis.

In practice, sampling distributions explain why even a single sample can tell us something meaningful about the population. By knowing how statistics vary across samples, we can evaluate whether an observed difference is likely due to chance or reflects a real effect.

In this section, we apply the concept to a concrete scenario—comparing conversion rates in an A/B website experiment. We use the sampling distribution of the difference in sample proportions to test whether the new design significantly improves outcomes.

6.2 Website Conversion A/B (Proportions)

In digital marketing, even small changes to a website’s design can have measurable impacts on customer behavior. One common approach to testing improvements is an A/B experiment, where traffic is split between two versions of a webpage to compare performance. In this case, the company wants to know if the new homepage (Design B) leads to higher conversion rates than the current homepage (Design A). Over the course of one week, thousands of visitors were randomly directed to either version, and purchases were tracked. The resulting proportions—5.0% for Design A and 6.0% for Design B—suggest a potential improvement. However, the key question is whether this observed difference reflects a true underlying effect or if it could simply be explained by random variation. This is where the concept of sampling distributions becomes critical: they allow us to quantify uncertainty and assess whether the difference is statistically significant and practically meaningful.

Context. Marketing tests a new homepage (B) vs current (A). Over a week: - A: 8,100 visits, 405 purchases (5.0%) - B: 7,900 visits, 474 purchases (6.0%)

We test whether B improves conversion.

A_n <- 8100; A_s <- 405
B_n <- 7900; B_s <- 474

ab <- tibble(
  design = c("A", "B"),
  purchases = c(A_s, B_s),
  visits = c(A_n, B_n),
  rate = purchases / visits
)
ab

6.2.1 Interpretation of Conversion Data

The tibble summarizes the observed conversion outcomes for the two homepage designs:

  • Design A (current homepage):
    • 8,100 visitors
    • 405 purchases
    • Conversion rate = 5.0%
  • Design B (new homepage):
    • 7,900 visitors
    • 474 purchases
    • Conversion rate = 6.0%

At face value, Design B shows an absolute lift of 1 percentage point compared to Design A (from 5% to 6%). In relative terms, this is a 20% improvement in conversion rate. While these differences are encouraging, we must be cautious: they could be due to random chance.

The next step is to formally test whether the observed 1% difference is statistically significant. This requires examining the sampling distribution of the difference in proportions to determine if the improvement is large enough, given the sample sizes, to rule out random variation.

Practical context:
- Even a 1% lift in conversion can translate into significant additional revenue for high-traffic sites.
- However, if the difference is not statistically reliable, acting on it prematurely could misallocate resources.

6.2.2 Descriptive visualization

ggplot(ab, aes(design, rate, fill = design)) +
  geom_col(width = 0.55) +
  geom_errorbar(aes(ymin = rate - 1.96*sqrt(rate*(1-rate)/visits),
                    ymax = rate + 1.96*sqrt(rate*(1-rate)/visits)),
                width = 0.12) +
  geom_text(aes(label = percent(rate, accuracy = 0.01)), vjust = -0.7, fontface = "bold") +
  scale_fill_manual(values = c("A" = "#2E86AB", "B" = "#F18F01")) +
  scale_y_continuous(labels = percent_format()) +
  labs(title = "Conversion Rates with 95% CIs", x = NULL, y = "Conversion Rate") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none")

6.2.3 Two-proportion test

prop.test(x = c(A_s, B_s), n = c(A_n, B_n), correct = FALSE)
## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  c(A_s, B_s) out of c(A_n, B_n)
## X-squared = 7.703, df = 1, p-value = 0.005513
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.017067685 -0.002932315
## sample estimates:
## prop 1 prop 2 
##   0.05   0.06

Interpretation. With large n, the sampling distribution of the difference in proportions is approximately normal (CLT for proportions). If the CI excludes 0 and p-value is small, B likely lifts conversion.

Extension questions. - Estimate the minimum detectable effect at 80% power for a 1-week test. - Use a Bayesian Beta-Binomial to compute P(B > A) and compare to the frequentist result.


7 Pulling it together

  • Normal tools are powerful for means via CLT (even with moderate skew).
  • t handles unknown σ and smaller n.
  • Chi-square informs variance planning.
  • Sampling distributions underpin intervals and A/B decisions (means, proportions).

Practice prompts. 1) For the lab, suppose leadership wants P(TAT ≤ 60) ≥ 0.9. Using the normal approximation with your estimated mean/σ for Day, is the SLA met?
2) For Night, try a log-normal model explicitly and compare the implied mean/variance CI to the t-based one.
3) For A/B, run a sequential analysis (e.g., alpha spending) to see how monitoring frequency changes error rates.