This report walks through twelve problems from Chapters 13 and 14 of Bluman’s textbook — sign tests, Wilcoxon rank-sum and signed-rank tests, Kruskal–Wallis, Spearman correlation, and Monte-Carlo simulation. Every chart below is interactive. Try the three simulators at the bottom. All R code lives in the appendix.
The pattern across these problems is striking: most fail to reject the null, not because there’s nothing happening in the data, but because the samples are small and non-parametric tests are deliberately conservative. The four rejections (all in the Table K problems) and the two simulations round out the toolkit.
Section 13-2 · Problem 6 · Sign Test
An athletic director claims the median paid attendance at 20 local football games is 3,000. Is the data consistent with that?
Section 13-2 · Problem 10 · Sign Test (Normal Approx.)
She thinks she sells about 200 tickets a day. In 40 sampled days, 15 were below 200. Evidence for a lower median?
Section 13-3 · Problem 4 · Wilcoxon Rank Sum
Do men and women receive different sentence lengths for the same type of crime?
Section 13-3 · Problem 8 · Wilcoxon Rank Sum
During 1970–1993, did the National League East have a different win distribution than the American League East?
Wilcoxon Signed-Rank · Reject if wₛ ≤ critical value
Four direct critical-value comparisons — the only problems in the assignment that produce rejections.
Section 13-5 · Problem 2 · Kruskal–Wallis
Five countries each from Western Hemisphere, Europe, and Eastern Asia. Do their teenagers’ math scores differ?
Section 13-6 · Problem 6 · Spearman Correlation
Six cities. Are subway and commuter-rail trip volumes related?
Section 14-3 · Problem 16 · Monte-Carlo
Four prizes are placed at random, one per box. How many boxes does a customer need to collect them all?
Section 14-3 · Problem 18 · Monte-Carlo
Tickets show ‘b’ 60% of the time, ‘i’ 30%, ‘g’ just 10%. How many tickets to spell BIG?
Everything below runs in your browser. No R required.
Click a box to “buy” it. Try to collect all four prizes in as few boxes as possible.
P(b) = 60 % · P(i) = 30 % · P(g) = 10 %.
True expected value:
10.96 tickets.
Move the sliders to see how sample size and the count of below-median days shift the test statistic and p-value. This mirrors the lottery problem (Section 13-2 #10).
The complete script that produces every number in this report is
loaded directly from the companion .R file and embedded
below. Click to expand.
# =============================================================================
# ALY 6015 - Assignment 5
# Nonparametric Statistical Methods, Sampling and Simulation
# =============================================================================
# ---------- Packages ----------------------------------------------------------
required <- c("tidyverse", "ggplot2", "scales", "BSDA", "gridExtra")
new_pkgs <- required[!(required %in% installed.packages()[, "Package"])]
if (length(new_pkgs)) install.packages(new_pkgs, repos = "https://cloud.r-project.org")
suppressPackageStartupMessages({
library(tidyverse)
library(ggplot2)
library(scales)
library(BSDA)
library(grid)
library(gridExtra)
})
set.seed(2024)
# ---------- Shared visual style ----------------------------------------------
PAL <- list(primary = "#2C3E50", accent = "#E74C3C",
positive = "#18BC9C", negative = "#F39C12",
neutral = "#95A5A6", highlight = "#3498DB")
theme_report <- function() {
theme_minimal(base_size = 11) +
theme(plot.title = element_text(face = "bold", color = PAL$primary,
size = 13, hjust = 0.5),
plot.subtitle = element_text(color = "#666666", hjust = 0.5),
axis.title = element_text(color = "#444444"),
panel.grid.major = element_line(color = "#DDDDDD",
linetype = "dashed", linewidth = 0.4),
panel.grid.minor = element_blank(),
legend.position = "top")
}
show_plot <- function(p, name = NULL) {
print(p) # draw to the active graphics device (RStudio Plots pane)
}
# =============================================================================
# SECTION 13-2 -- SIGN TEST
# =============================================================================
# -----------------------------------------------------------------------------
# Problem 6 -- Game Attendance
# -----------------------------------------------------------------------------
cat("\n========== Section 13-2, Problem 6: Game Attendance ==========\n")
attendance <- c(6210, 3150, 2700, 3012, 4875, 3540, 6127, 2581, 2642, 2573,
2792, 2800, 2500, 3700, 6030, 5437, 2758, 3490, 2851, 2720)
med_claim <- 3000
n_pos <- sum(attendance > med_claim)
n_neg <- sum(attendance < med_claim)
test_val_p6 <- min(n_pos, n_neg)
cat("H0: median = 3000 (claim) H1: median != 3000\n")
cat("n =", length(attendance), "| + =", n_pos, "| - =", n_neg, "\n")
cat("Test value = min(+, -) =", test_val_p6, "\n")
cat("Critical value (n=20, alpha=0.05, two-tailed, Table J) = 5\n")
cat("Decision: 10 > 5 -> FAIL TO REJECT H0\n")
print(SIGN.test(attendance, md = 3000, alternative = "two.sided"))
# --- Graph ---
df6 <- tibble(game = seq_along(attendance), attend = attendance,
sign = ifelse(attendance > med_claim, "Above 3,000", "Below 3,000"))
p1 <- ggplot(df6, aes(x = game, y = attend, fill = sign)) +
geom_col(width = 0.75, color = "white") +
geom_hline(yintercept = med_claim, color = PAL$primary,
linetype = "dashed", linewidth = 1) +
scale_fill_manual(values = c("Above 3,000" = PAL$positive,
"Below 3,000" = PAL$accent),
name = NULL) +
scale_x_continuous(breaks = 1:20) +
scale_y_continuous(labels = comma) +
annotate("text", x = 5, y = 6500,
label = "Sign test: 10 above, 10 below -> tie -> fail to reject H0",
size = 3.2, fontface = "italic", color = "#555555") +
labs(title = "Paid Attendance vs. Claimed Median (Problem 6)",
x = "Game number", y = "Paid attendance") +
theme_report()
show_plot(p1)
# -----------------------------------------------------------------------------
# Problem 10 -- Lottery Ticket Sales
# -----------------------------------------------------------------------------
cat("\n========== Section 13-2, Problem 10: Lottery Tickets ==========\n")
n <- 40
neg <- 15; pos <- n - neg
X <- min(pos, neg)
z_p10 <- (X + 0.5 - n/2) / (sqrt(n)/2)
cat("H0: median >= 200 H1: median < 200 (claim)\n")
cat("n =", n, "| + =", pos, "| - =", neg, "\n")
cat(sprintf("z = (%d + 0.5 - %g) / (sqrt(%d)/2) = %.4f\n", X, n/2, n, z_p10))
cat("Critical z (one-tailed, alpha=0.05) = -1.645\n")
cat(sprintf("p-value = %.4f\n", pnorm(z_p10)))
cat("Decision: z = -1.4230 > -1.645 -> FAIL TO REJECT H0\n")
# --- Graph ---
xs <- seq(-4, 4, length.out = 600)
df10 <- tibble(z = xs, density = dnorm(xs))
p2 <- ggplot(df10, aes(z, density)) +
geom_area(data = filter(df10, z < -1.645), fill = PAL$accent, alpha = 0.35) +
geom_line(color = PAL$primary, linewidth = 1.1) +
geom_vline(xintercept = -1.645, color = PAL$accent,
linetype = "dashed", linewidth = 0.9) +
geom_vline(xintercept = z_p10, color = PAL$highlight,
linetype = "dotted", linewidth = 1.2) +
geom_point(aes(x = z_p10, y = dnorm(z_p10)), color = PAL$highlight, size = 4) +
annotate("label", x = -3.3, y = 0.20, label = "z* = -1.645",
color = PAL$accent, size = 3.2) +
annotate("label", x = -0.2, y = 0.30,
label = sprintf("Test z = %.3f", z_p10),
color = PAL$highlight, size = 3.4, fontface = "bold") +
annotate("label", x = 2.5, y = 0.12,
label = "Test stat inside\nnon-rejection region\n-> Fail to reject",
color = "#555555", size = 3, fontface = "italic") +
labs(title = "Sign-Test Normal Approximation (Problem 10)",
x = "z", y = "Density") +
theme_report()
show_plot(p2)
# =============================================================================
# SECTION 13-3 -- WILCOXON RANK SUM TEST
# =============================================================================
# -----------------------------------------------------------------------------
# Problem 4 -- Lengths of Prison Sentences
# -----------------------------------------------------------------------------
cat("\n========== Section 13-3, Problem 4: Prison Sentences ==========\n")
males <- c( 8, 12, 6, 14, 22, 27, 32, 24, 26, 19, 15, 13)
females <- c( 7, 5, 2, 3, 21, 26, 30, 9, 4, 17, 23, 12, 11, 16)
prison_df <- tibble(months = c(males, females),
group = c(rep("Male", length(males)),
rep("Female", length(females))))
prison_df$rank <- rank(prison_df$months)
R1 <- sum(prison_df$rank[prison_df$group == "Male"])
n1 <- length(males); n2 <- length(females)
mu_R <- n1 * (n1 + n2 + 1) / 2
sigma_R <- sqrt(n1 * n2 * (n1 + n2 + 1) / 12)
z_p4 <- (R1 - mu_R) / sigma_R
cat("H0: no difference (claim) H1: difference\n")
cat("n_males =", n1, "n_females =", n2, "\n")
cat("Sum of ranks (males, smaller n) R =", R1, "\n")
cat(sprintf("mu_R = %.2f sigma_R = %.4f z = %.4f\n", mu_R, sigma_R, z_p4))
cat(sprintf("Median (M) = %.1f Median (F) = %.1f\n",
median(males), median(females)))
cat("Critical z (two-tailed, alpha=0.05) = +/- 1.96\n")
cat("Decision: |1.4916| < 1.96 -> FAIL TO REJECT H0\n")
print(wilcox.test(males, females, exact = FALSE, correct = FALSE))
# --- Graph ---
prison_df$group <- factor(prison_df$group,
levels = c("Male", "Female"),
labels = c("Male (n=12)", "Female (n=14)"))
p3 <- ggplot(prison_df, aes(group, months, fill = group, color = group)) +
geom_violin(alpha = 0.30, linewidth = 0.6) +
geom_boxplot(width = 0.18, fill = "white", outlier.shape = NA, linewidth = 0.7) +
geom_jitter(width = 0.08, height = 0, size = 2.2, alpha = 0.85, stroke = 0.8) +
scale_fill_manual(values = c("Male (n=12)" = PAL$highlight,
"Female (n=14)" = PAL$accent)) +
scale_color_manual(values = c("Male (n=12)" = PAL$highlight,
"Female (n=14)" = PAL$accent)) +
annotate("text", x = 1.5, y = 34,
label = sprintf("Wilcoxon z = %.2f (not significant at alpha=0.05)", z_p4),
fontface = "italic", size = 3.2, color = "#555555") +
labs(title = "Sentence Length by Gender (Problem 4)",
x = NULL, y = "Sentence (months)") +
theme_report() + theme(legend.position = "none")
show_plot(p3)
# -----------------------------------------------------------------------------
# Problem 8 -- Winning Baseball Games
# -----------------------------------------------------------------------------
cat("\n========== Section 13-3, Problem 8: Baseball Wins ==========\n")
NL <- c( 89, 96, 88, 101, 90, 91, 92, 96, 108, 100, 95)
AL <- c(108, 86, 91, 97,100,102, 95,104, 95, 89, 88, 101)
base_df <- tibble(wins = c(NL, AL),
league = c(rep("NL", length(NL)), rep("AL", length(AL))))
base_df$rank <- rank(base_df$wins)
R_NL <- sum(base_df$rank[base_df$league == "NL"])
n1 <- length(NL); n2 <- length(AL)
mu_R <- n1 * (n1 + n2 + 1) / 2
sigma_R <- sqrt(n1 * n2 * (n1 + n2 + 1) / 12)
z_p8 <- (R_NL - mu_R) / sigma_R
cat("H0: no difference H1: difference (claim)\n")
cat("n_NL =", n1, " n_AL =", n2, "\n")
cat("R_NL =", R_NL, "\n")
cat(sprintf("mu_R = %.2f sigma_R = %.4f z = %.4f\n", mu_R, sigma_R, z_p8))
cat("Decision: |-0.4308| < 1.96 -> FAIL TO REJECT H0\n")
print(wilcox.test(NL, AL, exact = FALSE, correct = FALSE))
# --- Graph ---
mean_df <- base_df %>% group_by(league) %>% summarise(m = mean(wins), .groups = "drop")
p4 <- ggplot(base_df, aes(wins, fill = league, color = league)) +
geom_histogram(alpha = 0.55, position = "identity", binwidth = 3, color = "white") +
geom_vline(data = mean_df, aes(xintercept = m, color = league),
linetype = "dashed", linewidth = 1, show.legend = FALSE) +
scale_fill_manual(values = c(NL = PAL$primary, AL = PAL$accent),
labels = c(sprintf("AL East (n=12, mean=%.1f)", mean(AL)),
sprintf("NL East (n=11, mean=%.1f)", mean(NL)))) +
scale_color_manual(values = c(NL = PAL$primary, AL = PAL$accent), guide = "none") +
annotate("label", x = 107, y = 2.7,
label = sprintf("Wilcoxon z = %.2f\nFail to reject H0", z_p8),
color = "#555555", size = 3) +
labs(title = "Distribution of Wins: NL East vs AL East (Problem 8)",
x = "Wins per season", y = "Frequency (seasons)", fill = NULL) +
theme_report()
show_plot(p4)
# =============================================================================
# SECTION 13-4 -- TABLE K LOOK-UPS
# =============================================================================
cat("\n========== Section 13-4: Table K Look-ups ==========\n")
table_k <- tribble(
~Problem, ~ws, ~n, ~alpha, ~tails, ~CritVal, ~Decision,
"5", 13, 15, 0.01, "two-tailed", 16, "Reject H0",
"6", 32, 28, 0.025, "one-tailed", 117, "Reject H0",
"7", 65, 20, 0.05, "one-tailed", 60, "Fail to reject H0",
"8", 22, 14, 0.10, "two-tailed", 26, "Reject H0"
)
print(table_k)
# =============================================================================
# SECTION 13-5 -- KRUSKAL-WALLIS H TEST
# =============================================================================
cat("\n========== Section 13-5, Problem 2: Math Literacy ==========\n")
western <- c(527, 406, 474, 381, 411)
europe <- c(520, 510, 513, 548, 496)
eastern <- c(523, 547, 547, 391, 549)
math_df <- tibble(score = c(western, europe, eastern),
region = factor(rep(c("Western Hemisphere", "Europe",
"Eastern Asia"), each = 5),
levels = c("Western Hemisphere", "Europe",
"Eastern Asia")))
kw_p2 <- kruskal.test(score ~ region, data = math_df)
math_df$rank <- rank(math_df$score)
R_west <- sum(math_df$rank[math_df$region == "Western Hemisphere"])
R_eur <- sum(math_df$rank[math_df$region == "Europe"])
R_asia <- sum(math_df$rank[math_df$region == "Eastern Asia"])
cat("R_West =", R_west, " R_Europe =", R_eur, " R_Asia =", R_asia, "\n")
cat(sprintf("H (R) = %.4f p = %.4f\n", kw_p2$statistic, kw_p2$p.value))
cat(sprintf("Critical chi-square (df = 2, alpha = 0.05) = %.3f\n",
qchisq(0.95, df = 2)))
cat("Decision: 4.16 < 5.991 -> FAIL TO REJECT H0\n")
# --- Graph ---
rank_summary <- tibble(region = factor(c("Western Hemisphere", "Europe", "Eastern Asia"),
levels = c("Western Hemisphere","Europe","Eastern Asia")),
rank_sum = c(R_west, R_eur, R_asia))
mean_summary <- math_df %>% group_by(region) %>%
summarise(m = mean(score), .groups = "drop")
p5 <- ggplot(math_df, aes(region, score, fill = region, color = region)) +
geom_boxplot(alpha = 0.30, outlier.shape = NA, width = 0.5, linewidth = 0.6) +
geom_jitter(width = 0.10, size = 2.5, alpha = 0.95, stroke = 0.8) +
geom_point(data = mean_summary, aes(region, m), shape = 23, size = 4,
fill = "black", color = "black", inherit.aes = FALSE) +
geom_label(data = rank_summary, aes(region, 595,
label = paste("Rank sum =", rank_sum)),
inherit.aes = FALSE, color = PAL$primary, fontface = "bold", size = 3.2) +
scale_fill_manual(values = c("Western Hemisphere" = PAL$accent,
"Europe" = PAL$primary,
"Eastern Asia" = PAL$positive)) +
scale_color_manual(values = c("Western Hemisphere" = PAL$accent,
"Europe" = PAL$primary,
"Eastern Asia" = PAL$positive)) +
ylim(370, 605) +
annotate("text", x = 2, y = 378,
label = "Kruskal-Wallis H = 4.16 < chi-sq(2, 0.05) = 5.99 -> Fail to reject H0",
fontface = "italic", size = 3.2, color = "#555555") +
labs(title = "Math Literacy Scores by Region (Problem 2)",
x = NULL, y = "Math literacy score") +
theme_report() + theme(legend.position = "none")
show_plot(p5)
# =============================================================================
# SECTION 13-6 -- SPEARMAN RANK CORRELATION
# =============================================================================
cat("\n========== Section 13-6, Problem 6: Subway vs Rail ==========\n")
subway <- c(845, 494, 425, 313, 108, 41)
rail <- c( 39, 291, 142, 103, 33, 38)
subway_rank <- rank(subway); rail_rank <- rank(rail)
d <- subway_rank - rail_rank; d2 <- d^2
n_cities <- length(subway)
rs_manual <- 1 - (6 * sum(d2)) / (n_cities * (n_cities^2 - 1))
cat("Subway ranks:", subway_rank, "\n")
cat("Rail ranks :", rail_rank, "\n")
cat("sum(d^2) =", sum(d2), " rs (manual) =", round(rs_manual, 4), "\n")
print(cor.test(subway, rail, method = "spearman", exact = FALSE))
cat("Critical value (n=6, alpha=0.05 two-tailed) = +/- 0.886\n")
cat("Decision: |0.6| < 0.886 -> FAIL TO REJECT H0\n")
# --- Graph: two panels combined ---
transit_df <- tibble(city = paste("City", 1:6), subway = subway, rail = rail,
subway_rank = subway_rank, rail_rank = rail_rank)
panel_raw <- ggplot(transit_df, aes(subway, rail, label = city)) +
geom_smooth(method = "lm", se = FALSE, color = PAL$neutral,
linetype = "dashed", linewidth = 0.7) +
geom_point(color = PAL$accent, size = 5) +
geom_text(vjust = -1, fontface = "bold", color = PAL$primary) +
labs(title = "Raw Values",
x = "Subway trips (thousands/day)", y = "Rail trips (thousands/day)") +
theme_report()
panel_rank <- ggplot(transit_df, aes(subway_rank, rail_rank, label = city)) +
geom_abline(slope = 1, intercept = 0, color = PAL$neutral, linetype = "dashed") +
geom_point(color = PAL$highlight, size = 5) +
geom_text(vjust = -1, fontface = "bold", color = PAL$primary) +
scale_x_continuous(breaks = 1:6, limits = c(0.5, 6.5)) +
scale_y_continuous(breaks = 1:6, limits = c(0.5, 6.5)) +
labs(title = sprintf("Spearman View (rs = %.3f)", rs_manual),
x = "Subway rank", y = "Rail rank") +
theme_report()
grid.arrange(panel_raw, panel_rank, ncol = 2,
top = textGrob("Subway vs Commuter Rail - Section 13-6 Problem 6",
gp = gpar(fontsize = 13, fontface = "bold", col = PAL$primary)))
# =============================================================================
# SECTION 14-3 -- MONTE-CARLO SIMULATION
# =============================================================================
# -----------------------------------------------------------------------------
# Problem 16 -- Caramel Corn Prizes
# -----------------------------------------------------------------------------
cat("\n========== Section 14-3, Problem 16: Caramel Corn ==========\n")
collect_prizes <- function(n_prizes = 4) {
collected <- integer(0); count <- 0L
while (length(collected) < n_prizes) {
collected <- union(collected, sample.int(n_prizes, 1)); count <- count + 1L
}
count
}
set.seed(2024)
trials_p16 <- replicate(40, collect_prizes())
mean_p16 <- mean(trials_p16)
theoretical_p16 <- 4 * (1 + 1/2 + 1/3 + 1/4)
cat("40 trials -- box counts:\n"); print(trials_p16)
cat(sprintf("Average boxes needed (n=40) = %.4f\n", mean_p16))
cat(sprintf("Theoretical E[T] = 4*(1+1/2+1/3+1/4) = %.4f\n", theoretical_p16))
# --- Graph: histogram + convergence ---
set.seed(2024)
big_run <- replicate(2000, collect_prizes())
running <- cumsum(big_run) / seq_along(big_run)
hist_panel <- ggplot(tibble(boxes = trials_p16), aes(boxes)) +
geom_histogram(binwidth = 1, fill = PAL$highlight,
color = "white", linewidth = 0.6) +
geom_vline(xintercept = mean_p16, color = PAL$accent,
linetype = "dashed", linewidth = 1) +
geom_vline(xintercept = theoretical_p16, color = PAL$primary,
linetype = "dotted", linewidth = 1.2) +
annotate("label", x = mean_p16, y = 5.5, color = PAL$accent,
label = sprintf("Sample mean = %.2f", mean_p16), size = 3) +
annotate("label", x = theoretical_p16, y = 4.5, color = PAL$primary,
label = sprintf("E[T] = %.3f", theoretical_p16), size = 3) +
labs(title = "Distribution: 40 Trials", x = "Boxes purchased", y = "Frequency") +
theme_report()
conv_panel <- ggplot(tibble(trial = seq_along(big_run), running = running),
aes(trial, running)) +
geom_line(color = PAL$accent, alpha = 0.85, linewidth = 0.5) +
geom_hline(yintercept = theoretical_p16, color = PAL$primary,
linetype = "dashed", linewidth = 1) +
geom_vline(xintercept = 40, color = PAL$neutral,
linetype = "dotted", linewidth = 0.8) +
scale_x_log10() +
labs(title = "Convergence to E[T]", x = "Trial number (log scale)",
y = "Running average") + theme_report()
grid.arrange(hist_panel, conv_panel, ncol = 2,
top = textGrob("Caramel Corn - Coupon Collector (Problem 16)",
gp = gpar(fontsize = 13, fontface = "bold", col = PAL$primary)))
# -----------------------------------------------------------------------------
# Problem 18 -- Lottery "BIG"
# -----------------------------------------------------------------------------
cat("\n========== Section 14-3, Problem 18: Lottery 'BIG' ==========\n")
spell_big <- function() {
have <- character(0); count <- 0L
while (!all(c("b","i","g") %in% have)) {
have <- union(have, sample(c("b","i","g"), 1, prob = c(.6, .3, .1)))
count <- count + 1L
}
count
}
set.seed(2024)
trials_p18 <- replicate(30, spell_big())
mean_p18 <- mean(trials_p18)
p_big <- c(.6, .3, .1)
ET_big <- 1/p_big[1] + 1/p_big[2] + 1/p_big[3] -
1/(p_big[1]+p_big[2]) - 1/(p_big[1]+p_big[3]) -
1/(p_big[2]+p_big[3]) + 1/sum(p_big)
cat("30 trials -- ticket counts:\n"); print(trials_p18)
cat(sprintf("Average tickets needed (n=30) = %.4f\n", mean_p18))
cat(sprintf("Theoretical E[T] (exact formula) = %.4f\n", ET_big))
# --- Graph ---
p8plot <- ggplot(tibble(tickets = trials_p18), aes(tickets)) +
geom_histogram(bins = 15, fill = PAL$positive,
color = "white", linewidth = 0.6) +
geom_vline(xintercept = mean_p18, color = PAL$accent,
linetype = "dashed", linewidth = 1.2) +
geom_vline(xintercept = ET_big, color = PAL$primary,
linetype = "dotted", linewidth = 1.2) +
annotate("label", x = mean_p18 + 8, y = 7.5,
label = sprintf("Sample mean = %.2f (n=30)", mean_p18),
color = PAL$accent, size = 3) +
annotate("label", x = mean_p18 - 6, y = 6,
label = sprintf("True E[T] = %.3f", ET_big),
color = PAL$primary, size = 3) +
annotate("label", x = 50, y = 4,
label = "Heavy right tail -\nrare 'g' (10%) drives\nlarge outcomes",
color = "#555555", fontface = "italic", size = 3) +
labs(title = "BIG Lottery - Tickets Needed (Problem 18)",
x = "Tickets purchased", y = "Frequency") +
theme_report()
show_plot(p8plot)
cat("\nAll eight plots drawn to the Plots pane.\n")
cat("Numerical work complete.\n")
# =============================================================================
# End of Script
# =============================================================================
Built with R · Plotly · DT · vanilla JavaScript · Claude.