Kobe’s Hot Hand

Data from the 2009 NBA Playoffs

##    vs game quarter time                             description basket
## 1 ORL    1       1 9:47 Kobe Bryant makes 4-foot two point shot      H
## 2 ORL    1       1 9:07               Kobe Bryant misses jumper      M

A streak length of 1 means that Kobe hit one shot before and after a miss. Every streak is capped by a miss. Misses in a row all belong to separate streaks. If n shots in a row are hit after a previous streak was ended, the strek is length n.

Kobe’s average shot percentage for the playoffs was 0.4360902.

Kobe’s streaks form a histogram with 5 categories. The mean is 0.7631579 and its standard deviation is 0.9915432. His longest streak was 4. His most typical streak is 0 and each higher number streak has a lower probability. If there is no such thing as a hot streak, each streak of N should be .436 times as likely as an N-1 streak.

In our simulation of 1000 coin tosses, 508 came up heads and 492 came up tails.

Our graphs below show a fair coin and an unfair coin being flipped 1000 times. We can see that the distribution doesn’t settle in to it’s proper expectation until after 100 flips. Even after 100 flips, there’s still some extra fluctuation. We can see the danger of making inferences based on insufficient data,vespecially without using special methods for small data sets.

##  [1] "tails" "heads" "heads" "heads" "heads" "tails" "heads" "tails"
##  [9] "tails" "heads" "heads" "heads" "heads" "heads" "heads"
## 
## heads tails 
##   508   492

When we add: prob = c(0.45, 0.55) to our sample function, it will simulate a shooter that makes 45% of his basket. We now have our distribution of Kobe’s actual shots in the 2009 NBA playoffs and a distribution of a simulated player who makes the same percentage, but does not have a “hot hand”.

outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 133, replace = TRUE,prob = c(0.45, 0.55))
kobe$basket
##   [1] "H" "M" "M" "H" "H" "M" "M" "M" "M" "H" "H" "H" "M" "H" "H" "M" "M"
##  [18] "H" "H" "H" "M" "M" "H" "M" "H" "H" "H" "M" "M" "M" "M" "M" "M" "H"
##  [35] "M" "H" "M" "M" "H" "H" "H" "H" "M" "H" "M" "M" "H" "M" "M" "H" "M"
##  [52] "M" "H" "M" "H" "H" "M" "M" "H" "M" "H" "H" "M" "H" "M" "M" "M" "H"
##  [69] "M" "M" "M" "M" "H" "M" "H" "M" "M" "H" "M" "M" "H" "H" "M" "M" "M"
##  [86] "M" "H" "H" "H" "M" "M" "H" "M" "M" "H" "M" "H" "H" "M" "H" "M" "M"
## [103] "H" "M" "M" "M" "H" "M" "H" "H" "H" "M" "H" "H" "H" "M" "H" "M" "H"
## [120] "M" "M" "M" "M" "M" "M" "H" "M" "H" "M" "M" "M" "M" "H"
sim_basket
##   [1] "H" "M" "H" "H" "M" "H" "H" "H" "M" "H" "M" "M" "M" "M" "H" "M" "M"
##  [18] "H" "H" "M" "H" "M" "M" "H" "H" "H" "M" "M" "H" "M" "H" "M" "H" "M"
##  [35] "M" "H" "H" "H" "H" "M" "H" "M" "H" "H" "M" "H" "H" "H" "M" "M" "M"
##  [52] "M" "H" "M" "M" "M" "M" "H" "H" "M" "M" "M" "H" "M" "M" "M" "H" "M"
##  [69] "M" "M" "M" "M" "H" "H" "M" "M" "M" "H" "M" "H" "H" "M" "M" "H" "H"
##  [86] "H" "M" "M" "H" "H" "H" "M" "M" "M" "M" "M" "H" "H" "H" "M" "M" "M"
## [103] "M" "H" "M" "M" "H" "H" "H" "H" "M" "H" "M" "M" "M" "M" "H" "M" "M"
## [120] "M" "M" "M" "M" "M" "H" "H" "H" "M" "H" "H" "M" "M" "M"

## sim_streak
##  0  1  2  3  4 
## 45 15  7  7  2

Our simulation and Kobe’s shots look slightly similar. Ours has a greater probability dropoff from 0 to 1. Both have a curious similarity between 2 and 3. This would suggest an unnaturally high tendency to make a third basket if one makes a second. To get a better sense of the distribution, we run a simulation with a much higher number of shots:

sim_basket2 <- sample(outcomes, size = 10000, replace = TRUE,prob = c(0.45, 0.55))
head(sim_basket2)
## [1] "M" "M" "H" "H" "H" "M"
sim_streak2 <- calc_streak(sim_basket2)
barplot(table(sim_streak2),yaxt='n', col = '#998f5a')

table(sim_streak2)
## sim_streak2
##    0    1    2    3    4    5    6    7    8    9   12 
## 2895 1416  597  280  141   54   21   11    4    5    1

Now, we can see that Kobe is less likely to have a streak of 2 and more likely to have a streak of 3 than a truly independent set of shots. If he made 1 shot, he was more likely to reach his third than an independent distribution would suggest. This leads to a tantalizing possibility that Kobe did indeed have a streak. Our caution is that this data is not enough to establish a definitive conclusion. After all, even our simulation didn’t look like we expected it to after only 133 trials.

Final Questions:The typical streak for this simulated independent shooter with a 45% shooting percentage is 0. Each streak of N is a little less than half as likely as a streak of N-1. The player’s longest streak of baskets in 133 shots is 4. Our augmented set of 1000 trials has a streak of length 12.
If we ran the simulation of the independent shooter a second time, we would expect its streak distribution to be similar, but to change. With only 133 trials, it fluctuates a decent amount. If we ran our trial of 1000 again, it will be less likely to change.
Based on this simulation, Kobe is less likely to have a streak of 2 and more likely to have a streak of 3 than a truly independent set of shots. If he made 1 shot, he was more likely to reach his third than an independent distribution would suggest. This leads to a tantalizing possibility that Kobe did indeed have a streak. Our caution is that this data is not enough to establish a definitive conclusion.