Results

Duncan Gates

28 November, 2020

1 Results

First some more graphs, the hot hand can be mathematically defined as \[P(\text{shot n+1 is made }|\text{ player has made n previous shots})\]. If events A and B are not independent this presents the probability formula \[P(B|A)=\frac{P(A \text{ and } B)}{P(A)}\].

I look at the shot percentage chance that a player makes his next shot based on streak length below after filtering out the 562,049 players who missed their first shot. So there is a ~59% chance of a player making a shot if he has missed the previous one, and the chance rapidly drops as streaks longer than 5 makes are very rare. The longest streaks in recent NBA history are by Serge Ibaka, Klay Thompson, and Thomas Bryant at 14 each, and where the graph ends as the chance for 15 in this data is 0.

Now doing the same operation but grouping by each player individually we get the average field goal percentage declining much more slowly:

## `summarise()` regrouping output by 'streakLength' (override with `.groups` argument)

## label_key: namePlayer

## Too many data series, skip labeling

## `summarise()` ungrouping output (override with `.groups` argument)

Mean shot distance decreases as streak length decreases, one would think this is because centers and “bigs” achieve the highest streaks but this is actually because of a somewhat surprising rise in shot distance by point guards, shooting guards, and power forwards.

This creates an interesting issue for the hot hand, different players are doing different things, shooting guards and point guards are running out to the three point line as soon as they get hot, while centers, power forwards, and small forwards just appear to keep getting layups and dunks.

Just to verify this one more time here are the top 10 streaks of 2015-2020 by shot distance. Montrezl Harrell is messy here since he actually did this twice, and so he is plotted separately.

1.1 Statistically Testing Independence

One statistical test for independence of two frequency distributions is the Chi-squared test, I implement that on a simulated player and nba player data to see what we get:

A work by Duncan Gates

gatesdu@oregonstate.edu