Probability Lab

Author

Jack Hegarty

Probability

The Hot Hand

Getting Started

Load Packages

library(tidyverse)
library(openintro)

Data

glimpse(kobe_basket)

Rows: 133
Columns: 6
$ vs          <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL…
$ game        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ quarter     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3…
$ time        <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6:35…
$ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant misse…
$ shot        <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", "H"…

data("kobe_basket")

Exercise 1

What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

Answer: A streak length of 1 means Kobe made one basket and missed his next shot attempt. A streak length of 0 means after a missed basket ended his previous streak, Kobe’s next shot attempt was another miss.

kobe_streak <- calc_streak(kobe_basket$shot)

ggplot(data = kobe_streak, aes(x = length)) +
  geom_bar()

Exercise 2

Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets? Make sure to include the accompanying plot in your answer.

Answer: The distribution of Kobe’s streak lengths from the 2009 NBA finals is heavily skewed to the right, with the majority of his streaks being either 0 or 1. His longest streak was 4 made baskets in a row.

ggplot(data = kobe_streak, aes(x = length)) +
  geom_bar()

Compared to What?

Simulations in R

coin_outcomes <- c("heads", "tails")
sample(coin_outcomes, size = 1, replace = TRUE)

[1] "tails"

sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)

sim_fair_coin

  [1] "heads" "heads" "tails" "tails" "heads" "heads" "tails" "heads" "heads"
 [10] "tails" "heads" "heads" "heads" "tails" "heads" "tails" "tails" "heads"
 [19] "heads" "heads" "tails" "heads" "tails" "heads" "heads" "heads" "heads"
 [28] "heads" "heads" "heads" "heads" "tails" "heads" "heads" "heads" "tails"
 [37] "tails" "heads" "tails" "tails" "heads" "heads" "tails" "tails" "tails"
 [46] "heads" "heads" "tails" "heads" "tails" "tails" "heads" "tails" "tails"
 [55] "heads" "tails" "heads" "tails" "tails" "tails" "heads" "tails" "heads"
 [64] "tails" "tails" "tails" "heads" "heads" "tails" "heads" "tails" "tails"
 [73] "tails" "heads" "tails" "heads" "heads" "tails" "tails" "tails" "tails"
 [82] "heads" "heads" "tails" "heads" "tails" "heads" "heads" "tails" "tails"
 [91] "tails" "heads" "tails" "heads" "tails" "tails" "heads" "heads" "heads"
[100] "heads"

table(sim_fair_coin)

sim_fair_coin
heads tails 
   53    47

sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE, 
                          prob = c(0.2, 0.8))

Exercise 3

In your simulation of flipping the unfair coin 100 times, how many flips came up heads? Include the code for sampling the unfair coin in your response. Since the markdown file will run the code, and generate a new sample each time you Knit it, you should also “set a seed” before you sample. Read more about setting a seed below.

Answer: In my unfair coin simulation, 22/100 flips came up heads.

set.seed(042299)
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE, 
                          prob = c(0.2, 0.8))

sim_unfair_coin

  [1] "tails" "tails" "heads" "heads" "tails" "tails" "tails" "tails" "heads"
 [10] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
 [19] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "heads"
 [28] "heads" "tails" "heads" "heads" "heads" "tails" "tails" "tails" "tails"
 [37] "tails" "tails" "tails" "tails" "tails" "heads" "tails" "tails" "heads"
 [46] "tails" "heads" "tails" "heads" "tails" "tails" "tails" "tails" "tails"
 [55] "tails" "heads" "tails" "tails" "tails" "tails" "tails" "tails" "heads"
 [64] "tails" "tails" "heads" "tails" "heads" "heads" "tails" "tails" "heads"
 [73] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "heads" "tails"
 [82] "heads" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "heads"
 [91] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "heads"
[100] "tails"

table(sim_unfair_coin)

sim_unfair_coin
heads tails 
   22    78

?sample

starting httpd help server ... done

Simulating the Independent Shooter

shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 1, replace = TRUE)

Exercise 4

What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

Answer: To reflect a shooting percentage of 45%, we can add a ‘prob’ argument that assigns the made basket element ‘H’ a probability of 0.45 and the missed shot element ‘M’ a probability of 0.55.

sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE,
                     prob = c(0.45, 0.55))

More Practice

Comparing Kobe Bryant to the Independent Shooter

Exercise 5

Using calc_streak, compute the streak lengths of sim_basket, and save the results in a data frame called sim_streak.

sim_streak <- calc_streak(sim_basket)

Exercise 6

Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots? Make sure to include a plot in your answer.

Answer: The distribution of streak lengths for the independent shooter is heavily skewed to the right, with the majority of their streaks being either 0 or 1. The independent shooter’s longest streak was 6 made shots in a row.

ggplot(data = sim_streak, aes(x = length)) +
  geom_bar()

Exercise 7

If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

Answer: If I ran the independent shooter simulation a second time, I would expect the streak distribution to be very similar to the previous simulation, but not exactly the same. The parameters of the simulation would not have changed, so I wouldn’t expect a stark difference, but the element of randomness involved would slightly change the result.

Exercise 8

How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

Answer: Kobe Bryant’s distribution of streak lengths is very similar to the simulated shooter’s distribution of streak lengths. Using this comparison, the hot hand model does not fit Kobe’s shooting patterns in the 2009 NBA Finals. If Kobe’s shooting patterns fit the hot hand model, we could expect to see a more even distribution of streak lengths, as the increasing likelihood of made baskets would result in a greater frequency of longer streaks.