The Hot Hand

Hot hand phenomenon - is each shot independent of the next shot?

Getting Started

Always start by loading packages

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.5     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata

Look at the data set we will be using for this lab

glimpse(kobe_basket)
## Rows: 133
## Columns: 6
## $ vs          <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ...
## $ game        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ quarter     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3...
## $ time        <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6...
## $ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant mi...
## $ shot        <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", ...
kobe_basket <- kobe_basket

Added this data set to my global environment 133 obs and 6 variables

Exercise 1: What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak of zero?

A streak of 1 means there was one hit before the streak ended (streaks end with a single miss). Therefore, a streak of one means one hit and one miss or H M. A streak of zero means there was no hit but a single miss: M.

kobe_streak <- calc_streak(kobe_basket$shot)
ggplot(kobe_streak, aes(x=length)) +
  geom_bar()

Exercise 2: Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets? Make sure to include the accompanying plot in your answer.

The distribution is skewed to the right such that most data points fall at 0 or 1

The typical streak length was 0 (referring to the bar graph and the table)

His longest streak was 4 (referring to bar graph and max)

kobe_streak %>%
  summarise(Median = median(length),
            Min = min(length),
            Max = max(length))
##   Median Min Max
## 1      0   0   4
table(kobe_streak)
## kobe_streak
##  0  1  2  3  4 
## 39 24  6  6  1

Compared to What?

Two processes are independent if the outcome of one doesn’t effect the outcome of the other

P (shot 1 = H) = 0.45 If not independent: P(shot 2 = H | shot 1 = H) = 0.60 If independent: P(shot 2 = H | shot 1 = H) = 0.45

Simulations in R

Simulate independent shots

coin_outcomes <- c("heads", "tails")
sample(coin_outcomes, size = 1, replace = TRUE)
## [1] "tails"
sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)
sim_fair_coin
##   [1] "heads" "heads" "tails" "tails" "heads" "tails" "heads" "heads" "tails"
##  [10] "heads" "tails" "tails" "tails" "tails" "tails" "tails" "heads" "tails"
##  [19] "heads" "tails" "heads" "tails" "heads" "heads" "tails" "tails" "heads"
##  [28] "heads" "heads" "heads" "tails" "tails" "tails" "heads" "heads" "tails"
##  [37] "tails" "heads" "tails" "heads" "heads" "tails" "heads" "heads" "tails"
##  [46] "tails" "tails" "tails" "heads" "heads" "heads" "heads" "tails" "heads"
##  [55] "tails" "tails" "heads" "heads" "tails" "heads" "tails" "heads" "tails"
##  [64] "heads" "tails" "heads" "heads" "tails" "heads" "heads" "tails" "heads"
##  [73] "heads" "tails" "heads" "tails" "tails" "heads" "heads" "heads" "tails"
##  [82] "heads" "heads" "heads" "tails" "tails" "heads" "tails" "tails" "heads"
##  [91] "heads" "tails" "tails" "heads" "tails" "heads" "heads" "tails" "heads"
## [100] "heads"
table(sim_fair_coin)
## sim_fair_coin
## heads tails 
##    53    47
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE,
                        prob = c(0.2, 0.8))

Exercise 3: In your simulation, how many flips came up heads? Include the code for sampling the unfair coin.

In my simulation, heads came up 17 times.

set.seed(1)
sim_unfair_coin1 <- sample(coin_outcomes, size = 100, replace = TRUE,
                        prob = c(0.2, 0.8))
sim_unfair_coin1
##   [1] "tails" "tails" "tails" "heads" "tails" "heads" "heads" "tails" "tails"
##  [10] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "heads"
##  [19] "tails" "tails" "heads" "tails" "tails" "tails" "tails" "tails" "tails"
##  [28] "tails" "heads" "tails" "tails" "tails" "tails" "tails" "heads" "tails"
##  [37] "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails" "tails"
##  [46] "tails" "tails" "tails" "tails" "tails" "tails" "heads" "tails" "tails"
##  [55] "tails" "tails" "tails" "tails" "tails" "tails" "heads" "tails" "tails"
##  [64] "tails" "tails" "tails" "tails" "tails" "tails" "heads" "tails" "heads"
##  [73] "tails" "tails" "tails" "heads" "heads" "tails" "tails" "heads" "tails"
##  [82] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
##  [91] "tails" "tails" "tails" "heads" "tails" "tails" "tails" "tails" "heads"
## [100] "tails"
table(sim_unfair_coin1)
## sim_unfair_coin1
## heads tails 
##    17    83

Simulating the Independent Shooter

shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 1, replace = TRUE)

Exercise 4: What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign output as sim_basket.

I have to add a prob argument with H being 0.45. Additionally, I have to change the size to 133.

shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE,
                     prob = c(0.45, 0.55))
table(sim_basket)
## sim_basket
##  H  M 
## 58 75

More Practice

Exercise 5: Using calc_streak, compute the streak lengths of sim_basket, and save results in a data frame called sim_streak

sim_streak <- calc_streak(sim_basket)

Exercise 6: Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots? Include plot.

Start off by making a bargraph (since the variables are discrete):

ggplot(sim_streak, aes(x=length)) +
  geom_bar()

The distribution looks very similar to the kobe_streak data frame. The distribution is skewed right. The typical streak length is 0 and the longest streak of baskets is 5.

Exercise 7: If you were to run the simulation again, how would you expect its streak distribution to compare to the distribution above? Exactly the same? Somewhat similar? Totally different? Explain

I would expect the distribution to be somewhat similar. I would expect the same unimodal right skew and I expect the mode would be 0. The exact count in each length would vary and perhaps the extent of the tail would change (could get one length longer or shorter).

Exercise 8: How does Kobe Bryants’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? EXplain.

Kobe’s streak length distribution is quite similar to the simulated streak length distribution. In fact, both the median and the mean are about the same. I would think any variation between the two distrubtions is due to chance. Therefore we do not have evidence of the hot hand model when looking at the data provides on Kobe’s shooting patterns. If Kobe did fit the hot hand model, we would have expected a slight peak shift to the right and possible a greater max.

kobe_streak %>%
  summarise(Median_kobe = median(length),
            Min_kobe = min(length),
            Max_kobe = max(length),
            Mean_kobe = mean(length))
##   Median_kobe Min_kobe Max_kobe Mean_kobe
## 1           0        0        4 0.7631579
sim_streak %>%
  summarise(Median_sim = median(length),
            Min_sim = min(length),
            Max_sim = max(length),
            Mean_sim = mean(length))
##   Median_sim Min_sim Max_sim  Mean_sim
## 1          0       0       5 0.7631579