Data Analysis and Statistical Inference Course through Coursera

Duke University

Lab02

In this lab, topics to be covered are random variables, simulation, law of large numbers and bootstrapping. These topics are done though not required to submit for the lab.

Basketball players who make several baskets in succession are described as having a “hot hand. Fans and players have long believed in the hot hand phenomenon, which refutes the assumption that each shot is independent of the next. However, a 1985 paper by Gilovich, Vallone, and Tversky collected evidence that contradicted this belief and showed that successive shots are independent events. This paper started a great controversy that continues to this day.

The lab will focus on the performance of one player: Kobe Bryant of the Los Angeles Lakers. His performance against the Orlando Magic in the 2009 NBA finals earned him the title “Most Valuable Player” and many spectators commented on how he appeared to show a hot hand.

In the column basket of the data frame kobe, H stands for hit if he hit the shot (made a basket), otherwise a miss, M, is recorded. Just looking at the string of hits and misses,it can be difficult to gauge whether or not it seems like Kobe was shooting with a hot hand. One way we can approach this is by considering the belief that hot hand shooters tend to go on shooting streaks. For this lab, we define the length of a shooting streak to be the number of consecutive baskets made until a miss occurs.

load(url("http://www.openintro.org/stat/data/kobe.RData"))
head(kobe)
##    vs game quarter time
## 1 ORL    1       1 9:47
## 2 ORL    1       1 9:07
## 3 ORL    1       1 8:11
## 4 ORL    1       1 7:41
## 5 ORL    1       1 7:03
## 6 ORL    1       1 6:01
##                                               description basket
## 1                 Kobe Bryant makes 4-foot two point shot      H
## 2                               Kobe Bryant misses jumper      M
## 3                        Kobe Bryant misses 7-foot jumper      M
## 4 Kobe Bryant makes 16-foot jumper (Derek Fisher assists)      H
## 5                         Kobe Bryant makes driving layup      H
## 6                               Kobe Bryant misses jumper      M
attach(kobe)
basket[1:9]
## [1] "H" "M" "M" "H" "H" "M" "M" "M" "M"
kobe_streak <- calc_streak(basket)

calc_streak calculates the lengths of all shooting streaks. For this lab, the length of a shooting streak is defined to be the number of consecutive baskets made until a miss occurs.

calc_streak <- function(x){
  y <- rep(0,length(x))
  y[x == "H"] <- 1
  y <- c(0, y, 0)
  wz <- which(y == 0)
  streak <- diff(wz) - 1
  return(streak)}

A bar plot is preferable to histogram here since the variable is discrete; it counts number of hits.

barplot(table(kobe_streak), main = "Distribution of Streaks")

plot of chunk unnamed-chunk-2

Simulating a Fair Coin Using Law of Large Numbers (LLN)

While we don’t have any data from a shooter we know to have independent shots, that is very easy to simulate in R. In a simulation, you set the ground rules of a random process and then the computer uses random numbers to generate an outcome that adheres to those rules. As a simple example, you can simulate flipping a fair coin with the following.

outcomes <- c("heads", "tails")
# Simulate a fair coin 100 times with replacement (independent)
sim_fair_coin <- sample(outcomes, size = 100, replace = TRUE)
# How many heads and tails in flipping the coin 100 times
table(sim_fair_coin)
## sim_fair_coin
## heads tails 
##    52    48

# Use law of Large Numbers
set.seed(121213)
n <- 50000
# Creates a vector x of n randomly chosen 0s and 1s based on a fair coin
x <- sample(0:1, n, repl = T)

# Cumulative sum
s <- cumsum(x)

# Vector of heads ratios
r <- s/(1:n)

# Plot
plot(r, ylim = c(0.4, 0.6), type = "l", xlab = "Number of Coin flips", ylab = "Probability the flip is a head", 
    main = "Flipping a Fair Coin Using the Law of Large Numbers")

# Horizontal reference line
lines(c(0, n), c(0.5, 0.5))

plot of chunk unnamed-chunk-3

# Probability of flipping a coin and getting a head
round(r[n], 2)
## [1] 0.5

# CI and labels
set.seed(121213)
n <- 50000
p <- 0.5
x <- sample(0:1, n, prob = c(1 - p, p), repl = T)
s <- cumsum(x)
r <- s/(1:n)
upr <- min(1, p + 0.1)
lwr <- max(0, p - 0.1)
plot(r, ylim = c(lwr, upr), type = "l")
lines(c(0, n), c(p, p), col = "darkblue", lty = 2)
err <- 1.96 * sqrt(p * (1 - p)/n)
lines(c(1.01 * n, 1.01 * n), c(p + err, p - err), col = "darkgreen", lwd = 2)
farb <- "darkgreen"
if (abs(p - r[n]) > err) farb <- "red"
text(n, (lwr + p - err)/2, paste("r =", round(r[n], 3)), adj = 1, col = farb)
title(paste("Heads Ratios up to", n, "Tosses With P(H)=", p))

plot of chunk unnamed-chunk-3

round(cbind(x, s, r), 5)[1:10, ]
##       x s      r
##  [1,] 1 1 1.0000
##  [2,] 1 2 1.0000
##  [3,] 0 2 0.6667
##  [4,] 1 3 0.7500
##  [5,] 1 4 0.8000
##  [6,] 1 5 0.8333
##  [7,] 0 5 0.7143
##  [8,] 1 6 0.7500
##  [9,] 0 6 0.6667
## [10,] 1 7 0.7000

If the coin is unfair

set.seed(121213)
n <- 50000
p <- 0.2
x <- sample(0:1, n, repl = T, prob = c(1 - p, p))  #Or it can be replaced by x <- rbinom(n, 1, p)
s <- cumsum(x)
r <- s/(1:n)
lo <- max(c(0, p - 0.1))
hi <- min(c(1, p + 0.1))
plot(r, ylim = c(lo, hi), type = "l", main = "Toss of an unfair Coin", ylab = "Probability of a head")
lines(c(0, n), c(p, p))

plot of chunk unnamed-chunk-4

r[n]
## [1] 0.1992

Simulating the shots by Kobe Bryant

Simulating a basketball player who has independent shots uses the same mechanism that we use to simulate a coin flip. Simulate a single shot from an independent shooter with a shooting percentage of 45% (percentage of hits by Kobe).

set.seed(121213)
n = 50000
outcomes <- c("H", "M")
p <- 0.45
sim_basket <- sample(0:1, n, prob = c(1 - p, p), replace = TRUE)
s <- cumsum(sim_basket)
r <- s/(1:n)
lo <- max(c(0, p - 0.1))
hi <- min(c(1, p + 0.1))
plot(r, ylim = c(lo, hi), type = "l", main = "Number of hits made by the independent Shooter", 
    ylab = "Probability of a a hit")
lines(c(0, n), c(p, p))

plot of chunk unnamed-chunk-5

r[n]
## [1] 0.4496
# Number of hits made by the independent shooter
r[n] * 133
## [1] 59.79
outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 133, prob = c(0.45, 0.55), replace = TRUE)
sim_streak <- calc_streak(sim_basket)
barplot(table(sim_streak), main = "Distribution of Streaks")

plot of chunk unnamed-chunk-7

Bootstrapping of the independent shooter

Let 1=“H” and 0=“M”. THen outcomes = c(1,0).

outcomes <- c(1, 0)
B <- 10000
n <- 133
resamples <- matrix(sample(outcomes, n * B, prob = c(0.45, 0.55), replace = TRUE), 
    B, n)
means <- apply(resamples, 2, mean)
round(mean(means), 2)
## [1] 0.45