In this lab, topics to be covered are random variables, simulation, law of large numbers and bootstrapping. These topics are done though not required to submit for the lab.
Basketball players who make several baskets in succession are described as having a “hot hand. Fans and players have long believed in the hot hand phenomenon, which refutes the assumption that each shot is independent of the next. However, a 1985 paper by Gilovich, Vallone, and Tversky collected evidence that contradicted this belief and showed that successive shots are independent events. This paper started a great controversy that continues to this day.
The lab will focus on the performance of one player: Kobe Bryant of the Los Angeles Lakers. His performance against the Orlando Magic in the 2009 NBA finals earned him the title “Most Valuable Player” and many spectators commented on how he appeared to show a hot hand.
In the column basket of the data frame kobe, H stands for hit if he hit the shot (made a basket), otherwise a miss, M, is recorded. Just looking at the string of hits and misses,it can be difficult to gauge whether or not it seems like Kobe was shooting with a hot hand. One way we can approach this is by considering the belief that hot hand shooters tend to go on shooting streaks. For this lab, we define the length of a shooting streak to be the number of consecutive baskets made until a miss occurs.
load(url("http://www.openintro.org/stat/data/kobe.RData"))
head(kobe)
## vs game quarter time
## 1 ORL 1 1 9:47
## 2 ORL 1 1 9:07
## 3 ORL 1 1 8:11
## 4 ORL 1 1 7:41
## 5 ORL 1 1 7:03
## 6 ORL 1 1 6:01
## description basket
## 1 Kobe Bryant makes 4-foot two point shot H
## 2 Kobe Bryant misses jumper M
## 3 Kobe Bryant misses 7-foot jumper M
## 4 Kobe Bryant makes 16-foot jumper (Derek Fisher assists) H
## 5 Kobe Bryant makes driving layup H
## 6 Kobe Bryant misses jumper M
attach(kobe)
basket[1:9]
## [1] "H" "M" "M" "H" "H" "M" "M" "M" "M"
kobe_streak <- calc_streak(basket)
calc_streak calculates the lengths of all shooting streaks. For this lab, the length of a shooting streak is defined to be the number of consecutive baskets made until a miss occurs.
calc_streak <- function(x){
y <- rep(0,length(x))
y[x == "H"] <- 1
y <- c(0, y, 0)
wz <- which(y == 0)
streak <- diff(wz) - 1
return(streak)}
A bar plot is preferable to histogram here since the variable is discrete; it counts number of hits.
barplot(table(kobe_streak), main = "Distribution of Streaks")
While we don’t have any data from a shooter we know to have independent shots, that is very easy to simulate in R. In a simulation, you set the ground rules of a random process and then the computer uses random numbers to generate an outcome that adheres to those rules. As a simple example, you can simulate flipping a fair coin with the following.
outcomes <- c("heads", "tails")
# Simulate a fair coin 100 times with replacement (independent)
sim_fair_coin <- sample(outcomes, size = 100, replace = TRUE)
# How many heads and tails in flipping the coin 100 times
table(sim_fair_coin)
## sim_fair_coin
## heads tails
## 52 48
# Use law of Large Numbers
set.seed(121213)
n <- 50000
# Creates a vector x of n randomly chosen 0s and 1s based on a fair coin
x <- sample(0:1, n, repl = T)
# Cumulative sum
s <- cumsum(x)
# Vector of heads ratios
r <- s/(1:n)
# Plot
plot(r, ylim = c(0.4, 0.6), type = "l", xlab = "Number of Coin flips", ylab = "Probability the flip is a head",
main = "Flipping a Fair Coin Using the Law of Large Numbers")
# Horizontal reference line
lines(c(0, n), c(0.5, 0.5))
# Probability of flipping a coin and getting a head
round(r[n], 2)
## [1] 0.5
# CI and labels
set.seed(121213)
n <- 50000
p <- 0.5
x <- sample(0:1, n, prob = c(1 - p, p), repl = T)
s <- cumsum(x)
r <- s/(1:n)
upr <- min(1, p + 0.1)
lwr <- max(0, p - 0.1)
plot(r, ylim = c(lwr, upr), type = "l")
lines(c(0, n), c(p, p), col = "darkblue", lty = 2)
err <- 1.96 * sqrt(p * (1 - p)/n)
lines(c(1.01 * n, 1.01 * n), c(p + err, p - err), col = "darkgreen", lwd = 2)
farb <- "darkgreen"
if (abs(p - r[n]) > err) farb <- "red"
text(n, (lwr + p - err)/2, paste("r =", round(r[n], 3)), adj = 1, col = farb)
title(paste("Heads Ratios up to", n, "Tosses With P(H)=", p))
round(cbind(x, s, r), 5)[1:10, ]
## x s r
## [1,] 1 1 1.0000
## [2,] 1 2 1.0000
## [3,] 0 2 0.6667
## [4,] 1 3 0.7500
## [5,] 1 4 0.8000
## [6,] 1 5 0.8333
## [7,] 0 5 0.7143
## [8,] 1 6 0.7500
## [9,] 0 6 0.6667
## [10,] 1 7 0.7000
If the coin is unfair
set.seed(121213)
n <- 50000
p <- 0.2
x <- sample(0:1, n, repl = T, prob = c(1 - p, p)) #Or it can be replaced by x <- rbinom(n, 1, p)
s <- cumsum(x)
r <- s/(1:n)
lo <- max(c(0, p - 0.1))
hi <- min(c(1, p + 0.1))
plot(r, ylim = c(lo, hi), type = "l", main = "Toss of an unfair Coin", ylab = "Probability of a head")
lines(c(0, n), c(p, p))
r[n]
## [1] 0.1992
Simulating a basketball player who has independent shots uses the same mechanism that we use to simulate a coin flip. Simulate a single shot from an independent shooter with a shooting percentage of 45% (percentage of hits by Kobe).
set.seed(121213)
n = 50000
outcomes <- c("H", "M")
p <- 0.45
sim_basket <- sample(0:1, n, prob = c(1 - p, p), replace = TRUE)
s <- cumsum(sim_basket)
r <- s/(1:n)
lo <- max(c(0, p - 0.1))
hi <- min(c(1, p + 0.1))
plot(r, ylim = c(lo, hi), type = "l", main = "Number of hits made by the independent Shooter",
ylab = "Probability of a a hit")
lines(c(0, n), c(p, p))
r[n]
## [1] 0.4496
# Number of hits made by the independent shooter
r[n] * 133
## [1] 59.79
outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 133, prob = c(0.45, 0.55), replace = TRUE)
sim_streak <- calc_streak(sim_basket)
barplot(table(sim_streak), main = "Distribution of Streaks")
Let 1=“H” and 0=“M”. THen outcomes = c(1,0).
outcomes <- c(1, 0)
B <- 10000
n <- 133
resamples <- matrix(sample(outcomes, n * B, prob = c(0.45, 0.55), replace = TRUE),
B, n)
means <- apply(resamples, 2, mean)
round(mean(means), 2)
## [1] 0.45