Simulating the Hot Hand

Duncan Gates

03 December, 2020

The Questions at Hand

  • Is each made shot an independent event?

  • What is the frequency of occurrence of the “hot hand” in NBA basketball games?

  • Does the chance of a “hot hand” vary significantly by player position as opposed to simulated players?

The Preceding Literature

  • Canonical study is Gilovich, Vallone, and Tversky (1985) who initiated the debate and attributed the belief in the hot hand to general misconceptions of chance.

  • Ran probability of making any x shot conditioned on outcome of previous n makes or misses and found largely negative serial correlations.

  • The selection procedure here has several biases, the primary one being demonstrated by the following table

Table

The Statistical Question

  • The hot hand can be mathematically defined as: Given a certain game,

\[P(\text{Shot n+1 is made }|\text{ Player has made n previous shots})\]

  • If events A and B are not independent this presents the probability formula

\[P(B|A)=\frac{P(A \text{ and } B)}{P(A)}\]

Bias-Corrected Statistics

  • Paired t test after shifting the difference of each shooter by corresponding bias

  • I.i.d. Bernoulli trials with probability of success equal to player’s observed shooting percentage

Simple Streak Length Code

Streak length is determined after grouping by game and player by looking at the sequence of consecutive makes or misses

streak_length <- function(x) {
  sequence(rle(x)$lengths) * x
}

Player Simulation Code

Takes size (number of simulations), number of simulated players, and probability of make or miss as arguments and uses a for loop because it is slightly more efficient at much larger sizes.

simulate_players <- function(size, n_sim_players, prob = c(0.5, 0.5)) {
  shot_outcomes <- c(TRUE, FALSE)
  simulated_players <- tibble(isShotMade = rep(NA, size), streakLength = rep(NA, size), Label = rep(NA, size))
  for (i in 1:n_sim_players) {
    df <- tibble(isShotMade = rep(NA, size), streakLength = rep(NA, size), Label = rep(NA, size))
    df <- df %>% mutate(isShotMade = sample(shot_outcomes, size = size, replace = T, prob = prob)) %>% 
      mutate(streakLength = HotHand::streak_length(isShotMade), Label = paste0("Simulated Player #", (i)))
    simulated_players <- bind_rows(simulated_players, df) %>% drop_na()
  }
  assign("simulated_players", simulated_players, envir = .GlobalEnv)
}

What simulated player’s look like

simulate_players(500, 10)

Another Graph vs NBA Players

What do streaks look like

Simulated Shooting Streaks

The Data gets Tricker…

Conclusions

  • A small but substantial bias exists in the common measure of conditional dependence of present outcomes on streaks of past outcomes on sequential data. The magnitude of “streak selection bias” mostly decreases as the sequence gets longer but increases in streak length.

  • Bernoulli trials found 3-13% magnitude of difference depending on streak length thus far, difference between median three point shooter and top three point shooter in the NBA over the last 5 years is about 12%.

  • As a player begins to heat up, their behavior often changes, as well as their defender’s this type of behavior could also be responsible for drops in field goal percentage. Maybe some players are streaky and others are not, some could even just not be good!

Thanks!