Data 606_Week 3_Lab 2 Response

Question 1

Q1. What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

A1. A streak of length 1 constitutes a single basket (H) that is not immediately followed by another. We would expect few of these if the “hot hands” phenomenon holds true. A streak of length 0 indicates a miss (M).

Question 2

Q2. Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets?

A2. The distribution was rightward-skewed, with three dozen misses, two dozen single shots, and half a dozen two-in-a-row and three-in-a-row streaks each.

kobe_streak <- calc_streak(kobe$basket)
barplot(table(kobe_streak))

(A2 cont’d) The mean streak length was under .76 - i.e. a typical streak was one or less. Removing the misses, the mean streak length was 1.57.

kobe_MeanAll <- mean(kobe_streak)
kobe_MeanOnlyHits <- mean(kobe_streak[kobe_streak != 0])
cat(kobe_MeanAll, " ", kobe_MeanOnlyHits)

## 0.7631579   1.567568

(A2 cont’d) The longest streak was four baskets (assuming the function does not group between games; perhaps quarters is okay for a loose definition of “hand heat”).

max(kobe_streak)

## [1] 4

Question 3

Q3. In your simulation of flipping the unfair coin 100 times, how many flips came up heads?

A3. I ran the simulation several times at a P(H)=.2, P(T)=.8 First time: heads 19, tails 81 Second time: heads 13, tails 87 Third time: heads 23, tails 77

outcomes <-c("heads", "tails")
sample(outcomes, size = 1, replace = TRUE)

## [1] "tails"

sim_fair_coin <- sample(outcomes, size = 100, replace = TRUE, prob = c(.2, .8))
heads_count <- length(sim_fair_coin[which(sim_fair_coin == "heads")])
tails_count <- length(sim_fair_coin[which(sim_fair_coin == "tails")])
cat("H ", as.character(heads_count), " T ", as.character(tails_count))

## H  26  T  74

Question 4

Q4. What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

A4. See below.

outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 133, replace = TRUE, prob = c(.45, .55))
H_count <- length(sim_basket[which(sim_basket == "H")])
M_count <- length(sim_basket[which(sim_basket == "M")])
cat("H ", as.character(H_count), " M ", as.character(M_count))

## H  64  M  69

On your own

Comparing Kobe Bryant to the Independent Shooter

Using calc_streak, compute the streak lengths of sim_basket.

Q: Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots?

A: Compared with Kobe, the simulated distribution has fewer single-shot streaks and a longer tail with streaks of 3, 4, 5, 6, and 8. (These vary for each simulation, and may not be accurately reflected in the charts / calculations when knit)

sim_streak <- calc_streak(sim_basket)
barplot(table(sim_streak))

(A cont’d) When including misses, the simulated mean of .7 is less than Kobe’s of .76. When removing misses, the simulated mean of 2.2 is greater than Kobe’s of 1.57. (These vary for each simulation, and may not be accurately reflected in the charts / calculations when knit)

sim_MeanAll <- mean(sim_streak)
sim_MeanOnlyHits <- mean(sim_streak[sim_streak != 0])
cat(sim_MeanAll, " ", sim_MeanOnlyHits)

## 0.9142857   1.72973

(A cont’d) The simulated longest streak was 8, or double Kobe’s of 4. (These vary for each simulation, and may not be accurately reflected in the charts / calculations when knit)

max(sim_streak)

## [1] 5

Q: If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

A: We would expect the streak distribution to vary; if we ran many simulations, we would anticipate that the means of all these simulations would themselves follow a normal distribution. Case in point:

outcomes <- c("H", "M")
sim_basket2 <- sample(outcomes, size = 133, replace = TRUE, prob = c(.45, .55))
H_count2 <- length(sim_basket2[which(sim_basket2 == "H")])
M_count2 <- length(sim_basket2[which(sim_basket2 == "M")])
cat("H ", as.character(H_count2), " M ", as.character(M_count2))

## H  66  M  67

sim_streak2 <- calc_streak(sim_basket2)
barplot(table(sim_streak2))

Q: How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

A: While the specific distribution varies each time the simulation is run, over the several times I ran it the simulated shooter got a greater number of two-or-more streaks than Kobe. The simulation is built on the premise that each shot is an independent event with preset probability of success, so if Kobe’s hands were hot we would expect him to achieve more multi-shot streaks than the simulation. Accordingly, these findings do not lend support to the “hot hands” phenomenon.

par(mfrow=c(1,2))
barplot(table(kobe_streak))
barplot(table(sim_streak2))