# Robust loader for the OpenIntro Kobe dataset
suppressWarnings(rm(kobe))
ok <- try(load(url("https://www.openintro.org/stat/data/kobe.RData")), silent = TRUE)
if (inherits(ok, "try-error") || !exists("kobe")) stop("kobe did not load; re-run or upload kobe.RData and use load('kobe.RData').")
# Normalize to an H/M character vector regardless of column name
if ("shot" %in% names(kobe)) x <- kobe$shot else if ("basket" %in% names(kobe)) x <- kobe$basket else stop("Neither 'shot' nor 'basket' found.")
x <- toupper(as.character(x))
x <- ifelse(x %in% c("H","HIT","1","TRUE"), "H",
ifelse(x %in% c("M","MISS","0","FALSE"), "M", NA))
x <- x[!is.na(x)] # drop any weird values
cat("Rows:", length(x), " | H:", sum(x=="H"), " | M:", sum(x=="M"), "\n")
## Rows: 133 | H: 58 | M: 75
# calc_streak counts consecutive H's; a streak ends when an M occurs.
calc_streak <- function(vec) {
streaks <- integer(0); run <- 0L
for (i in seq_along(vec)) {
if (vec[i] == "H") {
run <- run + 1L
} else {
streaks <- c(streaks, run); run <- 0L
}
}
streaks <- c(streaks, run) # close last run if it ends with H
streaks[streaks > 0]
}
example_vec <- c("H","M","H","H","M","M","H")
example_vec
## [1] "H" "M" "H" "H" "M" "M" "H"
calc_streak(example_vec)
## [1] 1 2 1
Answer Q1 (in words):
A streak of length 1 is a single made shot that is
immediately followed by a miss (the miss ends the run). A streak
length of 0 corresponds to an isolated miss (no made shots in
that run before it ends).
kobe_streaks <- calc_streak(x) # use normalized H/M vector from the Load step
if (length(kobe_streaks) == 0) stop("No streaks found — check that x contains H/M values.")
tbl_kobe <- table(kobe_streaks)
barplot(tbl_kobe,
main = "Kobe Bryant: Streak length distribution",
xlab = "Streak length (consecutive hits)", ylab = "Frequency")
summary(kobe_streaks)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 1.568 2.000 4.000
Answer Q2 (description):
The distribution is right-skewed with many short
streaks (1–2) and rapidly decreasing frequency as streak length
increases. Long streaks are rare, which matches what we
expect even under independent shots.
# Q3 — Estimate hit probability and simulate independent shooting
p_hat <- mean(x == "H") # x is the H/M vector from the Load chunk
p_hat
## [1] 0.4360902
n <- length(x)
set.seed(60603)
sim_shots <- ifelse(runif(n) < p_hat, "H", "M")
sim_streaks <- calc_streak(sim_shots)
table(sim_streaks)
## sim_streaks
## 1 2 3 4
## 24 6 3 2
barplot(table(sim_streaks),
main = sprintf("IID Bernoulli simulation (p = %.3f): streaks", p_hat),
xlab = "Streak length", ylab = "Frequency")
Answer Q3 (compare Kobe vs IID):
Kobe’s observed distribution and the IID simulation are
qualitatively similar—short streaks dominate, and long
streaks appear occasionally but infrequently. Any small differences are
within what we’d expect from sampling variability.
# Q4. Multiple simulations to assess variability of streak frequencies
# Repeat many times; collect frequencies for lengths 1, 2, 3, and 4+ (collapsed)
set.seed(60603)
many <- 500
freq_mat <- replicate(many, {
x <- ifelse(runif(n) < p_hat, "H", "M")
cs <- calc_streak(x)
# Tabulate 1,2,3,4+:
freqs <- tabulate(pmin(cs, 4), nbins = 4)
freqs
})
rownames(freq_mat) <- c("len1","len2","len3","len4plus")
sim_means <- rowMeans(freq_mat)
sim_sds <- apply(freq_mat, 1, sd)
sim_means
## len1 len2 len3 len4plus
## 19.014 8.114 3.444 2.632
sim_sds
## len1 len2 len3 len4plus
## 3.820340 2.523835 1.696059 1.495677
# Kobe's observed frequencies for the same bins:
kobe_bins <- tabulate(pmin(kobe_streaks, 4), nbins = 4)
names(kobe_bins) <- c("len1","len2","len3","len4plus")
kobe_bins
## len1 len2 len3 len4plus
## 24 6 6 1
Answer Q4 (interpretation): Kobe’s observed bin counts are reasonably close to the simulation averages for IID shooting and fall within a plausible range given the simulation standard deviations. This does not provide strong evidence against independence (i.e., no strong “hot-hand” signal here).
# (Optional) Q5. Simple probability refresher: coin example
set.seed(60603)
coin <- sample(c("H","T"), size = 100, replace = TRUE)
mean(coin == "H") # ~0.5 in the long run, but variable in small samples
## [1] 0.54
Answer Q5 (law of large numbers):
Over many trials, the proportion of heads tends to 0.5, but
short runs can deviate quite a bit—mirroring how
apparent streaks can arise under independence.
# Summary & conclusion What the IID model
predicts: Short streaks are most common; long streaks occur
occasionally.
Observed vs IID: Kobe’s streak distribution is largely
consistent with independence given this sample.
Limitations: We analyze one dataset, the streak
definition matters, and context (defense, shot selection) is ignored, so
results are suggestive rather than definitive.
# Appendix — Reusable function test (sanity check)
# Quick check on a custom vector to confirm calc_streak behavior:
calc_streak(c("M","H","H","H","M","H","M","M","H","H"))
## [1] 3 1 2
# Citation (formula/source) The Elo/independence framework here uses
an IID Bernoulli model for makes/misses; streaks are computed as
consecutive H’s ending with an M. This setup follows common Hot Hand lab
variants (OpenIntro/STA labs). Dataset: kobe.RData
from
OpenIntro (downloaded at knit time).