Probability

Hot Hands

Basketball players who make several baskets in succession are described as having a hot hand. Fans and players have long believed in the hot hand phenomenon, which refutes the assumption that each shot is independent of the next. However, a 1985 paper by Gilovich, Vallone, and Tversky collected evidence that contradicted this belief and showed that successive shots are independent events (http://psych.cornell.edu/sites/default/files/Gilo.Vallone.Tversky.pdf). This paper started a great controversy that continues to this day, as you can see by Googling hot hand basketball.

We do not expect to resolve this controversy today. However, in this lab we’ll apply one approach to answering questions like this. The goals for this lab are to (1) think about the effects of independent and dependent events, (2) learn how to simulate shooting streaks in R, and (3) to compare a simulation to actual data in order to determine if the hot hand phenomenon appears to be real.

Getting Started

Our investigation will focus on the performance of one player: Kobe Bryant of the Los Angeles Lakers. His performance against the Orlando Magic in the 2009 NBA finals earned him the title Most Valuable Player and many spectators commented on how he appeared to show a hot hand. Let’s load some data from those games and look at the first several rows.

setwd("~/R/Lab2")
load("more/kobe.RData")
head(kobe)
##    vs game quarter time
## 1 ORL    1       1 9:47
## 2 ORL    1       1 9:07
## 3 ORL    1       1 8:11
## 4 ORL    1       1 7:41
## 5 ORL    1       1 7:03
## 6 ORL    1       1 6:01
##                                               description basket
## 1                 Kobe Bryant makes 4-foot two point shot      H
## 2                               Kobe Bryant misses jumper      M
## 3                        Kobe Bryant misses 7-foot jumper      M
## 4 Kobe Bryant makes 16-foot jumper (Derek Fisher assists)      H
## 5                         Kobe Bryant makes driving layup      H
## 6                               Kobe Bryant misses jumper      M

In this data frame, every row records a shot taken by Kobe Bryant. If he hit the shot (made a basket), a hit, H, is recorded in the column named basket, otherwise a miss, M, is recorded.

kobe$basket[1:9]
## [1] "H" "M" "M" "H" "H" "M" "M" "M" "M"

Exercise 1 :What does a streak length of 1 mean? It is one hit and one miss.

how many hits and misses are in a streak of 1? It is one miss follow by one miss.

What about a streak length of 0? It is a miss and 0 hits.

kobe_streak <- calc_streak(kobe$basket)
barplot(table(kobe_streak))

Exercise 2:Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. The distrution seems an unimodal distribution right skewed with a range of 0 to 4

What was his typical streak length? it was 0.7632

mean(kobe_streak)
## [1] 0.7631579
summary(kobe_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7632  1.0000  4.0000
#boxplot((kobe_streak))

How long was his longest streak of baskets? it was 4

kobe_streak
##  [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0
## [36] 0 1 0 0 0 1 1 0 1 0 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0
## [71] 1 1 0 0 0 1

Simulations in R

While we don’t have any data from a shooter we know to have independent shots, that sort of data is very easy to simulate in R. In a simulation, you set the ground rules of a random process and then the computer uses random numbers to generate an outcome that adheres to those rules. As a simple example, you can simulate flipping a fair coin with the following

outcomes <- c("heads", "tails")
sample(outcomes, size = 1, replace = TRUE)
## [1] "tails"
sample(outcomes, size = 1, replace = TRUE)
## [1] "tails"

The vector outcomes can be thought of as a hat with two slips of paper in it: one slip says heads and the other says tails. The function sample draws one slip from the hat and tells us if it was a head or a tail.

Run the second command listed above several times. Just like when flipping a coin, sometimes you’ll get a heads, sometimes you’ll get a tails, but in the long run, you’d expect to get roughly equal numbers of each.

If you wanted to simulate flipping a fair coin 100 times, you could either run the function 100 times or, more simply, adjust the size argument, which governs how many samples to draw (the replace = TRUE argument indicates we put the slip of paper back in the hat before drawing again). Save the resulting vector of heads and tails in a new object called sim_fair_coin.

sim_fair_coin <- sample(outcomes, size = 100, replace = TRUE)
sim_fair_coin
##   [1] "tails" "tails" "tails" "heads" "tails" "heads" "tails" "heads"
##   [9] "heads" "heads" "tails" "tails" "heads" "heads" "heads" "tails"
##  [17] "heads" "heads" "tails" "tails" "heads" "heads" "tails" "tails"
##  [25] "heads" "heads" "tails" "heads" "tails" "heads" "heads" "tails"
##  [33] "tails" "tails" "tails" "heads" "heads" "tails" "heads" "tails"
##  [41] "tails" "heads" "heads" "tails" "tails" "heads" "heads" "heads"
##  [49] "heads" "tails" "heads" "heads" "heads" "tails" "tails" "heads"
##  [57] "heads" "heads" "tails" "tails" "tails" "tails" "tails" "heads"
##  [65] "heads" "tails" "tails" "heads" "tails" "tails" "heads" "heads"
##  [73] "heads" "heads" "tails" "heads" "heads" "heads" "heads" "tails"
##  [81] "heads" "tails" "heads" "tails" "tails" "tails" "tails" "heads"
##  [89] "heads" "tails" "tails" "heads" "tails" "tails" "heads" "heads"
##  [97] "heads" "tails" "heads" "heads"
#
table(sim_fair_coin)
## sim_fair_coin
## heads tails 
##    53    47

Exercise 3: In your simulation of flipping the unfair coin 100 times, how many flips came up heads? 57 flips came up head.

Since there are only two elements in outcomes, the probability that we “flip” a coin and it lands heads is 0.5.

Simulating the Independent Shooter

Simulating a basketball player who has independent shots uses the same mechanism that we use to simulate a coin flip. To simulate a single shot from an independent shooter with a shooting percentage of 50% we type,

outcomes <- c("H", "M")
simbasket <- sample(outcomes, size = 1, replace = TRUE)
simbasket
## [1] "H"

Exercise 4: What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

Simulation of Independet shooter

sim_basket <- sample(outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
sim_basket
##   [1] "M" "H" "H" "H" "H" "M" "H" "H" "H" "H" "M" "H" "H" "H" "H" "M" "H"
##  [18] "M" "M" "H" "M" "M" "M" "H" "H" "M" "M" "M" "H" "M" "M" "H" "H" "M"
##  [35] "H" "H" "H" "H" "H" "M" "M" "H" "M" "H" "H" "M" "M" "H" "H" "H" "M"
##  [52] "H" "M" "H" "H" "H" "M" "M" "M" "M" "M" "M" "M" "M" "M" "M" "M" "H"
##  [69] "M" "M" "H" "M" "M" "M" "M" "M" "M" "M" "M" "M" "M" "M" "M" "H" "M"
##  [86] "M" "M" "M" "M" "H" "H" "H" "M" "M" "M" "M" "M" "M" "H" "M" "M" "M"
## [103] "H" "H" "M" "M" "M" "H" "H" "H" "H" "H" "M" "M" "M" "M" "M" "M" "H"
## [120] "M" "M" "H" "M" "M" "M" "H" "H" "H" "H" "H" "H" "M" "M"

Kobe’s data

kobe$basket
##   [1] "H" "M" "M" "H" "H" "M" "M" "M" "M" "H" "H" "H" "M" "H" "H" "M" "M"
##  [18] "H" "H" "H" "M" "M" "H" "M" "H" "H" "H" "M" "M" "M" "M" "M" "M" "H"
##  [35] "M" "H" "M" "M" "H" "H" "H" "H" "M" "H" "M" "M" "H" "M" "M" "H" "M"
##  [52] "M" "H" "M" "H" "H" "M" "M" "H" "M" "H" "H" "M" "H" "M" "M" "M" "H"
##  [69] "M" "M" "M" "M" "H" "M" "H" "M" "M" "H" "M" "M" "H" "H" "M" "M" "M"
##  [86] "M" "H" "H" "H" "M" "M" "H" "M" "M" "H" "M" "H" "H" "M" "H" "M" "M"
## [103] "H" "M" "M" "M" "H" "M" "H" "H" "H" "M" "H" "H" "H" "M" "H" "M" "H"
## [120] "M" "M" "M" "M" "M" "M" "H" "M" "H" "M" "M" "M" "M" "H"

On your own

Comparing Kobe Bryant to the Independent Shooter

Using calc_streak, compute the streak lengths of sim_basket.

  1. Describe the distribution of streak lengths.
    The distrution seems an unimodal distribution right skewed with a range of 0 to 4
ind_streak <- calc_streak(sim_basket)
barplot(table(ind_streak))

1.1 What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? R/: The typical streak leght is 0.6145

mean(ind_streak)
## [1] 0.7179487
summary(ind_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7179  1.0000  6.0000
1.2 How long is the player's longest streak of baskets in 133
    shots?
    The longest streak of basket is 4
ind_streak
##  [1] 0 4 4 4 1 0 1 0 0 2 0 0 1 0 2 5 0 1 2 0 3 1 3 0 0 0 0 0 0 0 0 0 0 1 0
## [36] 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 3 0 0 0 0 0 1 0 0 2 0 0 5 0 0 0 0 0
## [71] 1 0 1 0 0 6 0 0
##boxplot((kobe_streak))
  1. If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

It might be some how similar and both are independent process knowing the Kobe’s longest streak doesn’t provide useful information about what could be the simulation of the independent shooter.

  1. How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

Below, you can see the both distribution of streak lenghts are diferent. The first plot, Kobe’s distribution show the streak lenght range is from 0 to 4 and the longest streak is 4. In the other hand on the second plot, the independent shooter distribution shows the streak lenght range between 0 to 6 and the longest streak is 5. Base on the probability distribution for both disjoint outcomes, I can conclude that Kobe Bryant could be more effective shooter that the independent shooter. However, this is an observational study that would need more evidence to probe the hot hand model.

kobe_streak <- calc_streak(kobe$basket)
barplot( table (kobe_streak))

ind_streak <- calc_streak(sim_basket)
barplot(table(ind_streak))