Getting Started

download.file("http://www.openintro.org/stat/data/kobe.RData", destfile = "kobe.RData")
load("kobe.RData")
head(kobe)
##    vs game quarter time                                             description
## 1 ORL    1       1 9:47                 Kobe Bryant makes 4-foot two point shot
## 2 ORL    1       1 9:07                               Kobe Bryant misses jumper
## 3 ORL    1       1 8:11                        Kobe Bryant misses 7-foot jumper
## 4 ORL    1       1 7:41 Kobe Bryant makes 16-foot jumper (Derek Fisher assists)
## 5 ORL    1       1 7:03                         Kobe Bryant makes driving layup
## 6 ORL    1       1 6:01                               Kobe Bryant misses jumper
##   basket
## 1      H
## 2      M
## 3      M
## 4      H
## 5      H
## 6      M
kobe$basket[1:9]
## [1] "H" "M" "M" "H" "H" "M" "M" "M" "M"

Exercise 1: What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

A streak length of 1 means that the first shot was a hit and the second was a miss. A streak length of 0 means that the first shot was a miss (i.e. one miss and no hit).

Exercise 2: Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets?

kobe_streak <- calc_streak(kobe$basket)
barplot(table(kobe_streak))

summary(kobe_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7632  1.0000  4.0000
sd(kobe_streak)
## [1] 0.9915432
# The distribution of Kobe's streak lengths from the 2009 NBA finals is highly skewed to the right. The majority of his shot streaks are clumped on the far left side of the distribution between 0 and 1.
# Kobe's typical or average streak length during the 2009 NBA finals was .762 and his longest streak of baskets was 4, which occurred less than 5 times. Additionally, the standard deviation of his streak lengths was about .991.

Simulations in R

outcomes <- c("heads", "tails")
sample(outcomes, size = 1, replace = TRUE)
## [1] "tails"
sim_fair_coin <- sample(outcomes, size = 100, replace = TRUE)
sim_fair_coin
##   [1] "heads" "heads" "heads" "tails" "heads" "heads" "heads" "heads" "tails"
##  [10] "tails" "tails" "tails" "tails" "heads" "heads" "tails" "tails" "heads"
##  [19] "tails" "heads" "tails" "tails" "heads" "tails" "tails" "tails" "heads"
##  [28] "heads" "heads" "tails" "tails" "heads" "heads" "tails" "heads" "tails"
##  [37] "heads" "tails" "heads" "tails" "heads" "tails" "heads" "tails" "heads"
##  [46] "tails" "heads" "heads" "tails" "heads" "tails" "heads" "tails" "tails"
##  [55] "heads" "heads" "heads" "tails" "tails" "tails" "heads" "heads" "heads"
##  [64] "tails" "tails" "heads" "tails" "heads" "heads" "tails" "heads" "tails"
##  [73] "tails" "heads" "heads" "heads" "heads" "tails" "tails" "heads" "heads"
##  [82] "tails" "tails" "heads" "tails" "heads" "heads" "heads" "tails" "heads"
##  [91] "heads" "heads" "tails" "heads" "heads" "tails" "tails" "heads" "tails"
## [100] "heads"
table(sim_fair_coin)
## sim_fair_coin
## heads tails 
##    54    46

Exercise 3: In your simulation of flipping the unfair coin 100 times, how many flips came up heads?

sim_unfair_coin <- sample(outcomes, size = 100, replace = TRUE, prob = c(0.2, 0.8))
sim_unfair_coin
##   [1] "tails" "heads" "tails" "tails" "tails" "tails" "tails" "heads" "heads"
##  [10] "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails" "heads"
##  [19] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "heads"
##  [28] "tails" "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails"
##  [37] "tails" "heads" "tails" "tails" "heads" "tails" "tails" "tails" "heads"
##  [46] "tails" "tails" "tails" "heads" "tails" "tails" "tails" "tails" "tails"
##  [55] "tails" "tails" "tails" "heads" "tails" "tails" "tails" "tails" "heads"
##  [64] "tails" "tails" "tails" "heads" "tails" "tails" "tails" "heads" "heads"
##  [73] "tails" "tails" "tails" "tails" "tails" "heads" "heads" "tails" "tails"
##  [82] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "heads"
##  [91] "tails" "tails" "tails" "tails" "heads" "tails" "tails" "heads" "tails"
## [100] "heads"
table(sim_unfair_coin)
## sim_unfair_coin
## heads tails 
##    22    78
# In my simulation of the unfair coin, 20 flips came up heads and 80 flips came up tails.

Simulating the Independent Shooter

outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 1, replace = TRUE)

Exercise 4: What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
sim_basket
##   [1] "H" "M" "H" "M" "H" "M" "H" "H" "M" "M" "M" "M" "M" "M" "H" "M" "M" "M"
##  [19] "M" "M" "M" "M" "H" "H" "M" "H" "M" "M" "M" "H" "M" "H" "H" "H" "M" "H"
##  [37] "H" "M" "M" "M" "H" "H" "M" "M" "H" "M" "H" "M" "H" "H" "M" "M" "H" "H"
##  [55] "H" "M" "M" "M" "H" "M" "M" "M" "M" "H" "M" "M" "H" "M" "M" "H" "H" "H"
##  [73] "H" "H" "M" "M" "H" "H" "M" "H" "H" "H" "H" "M" "M" "H" "M" "H" "M" "M"
##  [91] "M" "M" "H" "M" "H" "M" "M" "M" "M" "M" "H" "M" "M" "M" "H" "M" "M" "M"
## [109] "H" "H" "H" "H" "M" "M" "H" "H" "H" "M" "H" "H" "M" "M" "H" "M" "H" "M"
## [127] "H" "M" "H" "H" "H" "M" "M"
table(sim_basket)
## sim_basket
##  H  M 
## 59 74
# The simulation size needed to be increased to 133 and the probability of a hit ("H") had to be changed from the default of 0.50 to 0.45.

Comparing Kobe to the Independent Shooter

1. Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots?

sim_streak <- calc_streak(sim_basket)
barplot(table(sim_streak))

summary(sim_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7867  1.0000  5.0000
sd(sim_streak)
## [1] 1.130574
# The average streak length for this simulated independent shooter with a 45% shooting percentage was .9143 and the player's longest streak of baskets was 10.

2. If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

I would expect the distribution to be somewhat similar rather than identical. Firstly, the number of misses will always be higher than the number of hits due to the shooting percentage being below 50%. Secondly, the probability of making successive shots or hits will always be very low in relation to the probability of missing a single shot (e.g. P(streak of 2) = P(.45 and .45) = .2025 < P(streak of 1) = P(.45 and .55) = .2475 < P(streak of 0) = .55). In other words, the longer the streak, the lower the probability or frequency of occurences. Thus, the shape will essentially always be the same. What will vary is basically the extremity of outliers as a function of the fact that this is a random simulation. Another simulation might yield outliers of just 4 and 15, for example.

3. How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

The overall shape and distributions are very similar. Just like the distribution of Kobe’s streak lengths, the distribution of the simulated shooter’s streak lengths is highly skewed to the right and the majority of streaks are clumped between 0 and 1. Additionally, the simulated shooter’s average streak length was about .914, which was slightly larger than Kobe’s average streak length of .762. despite the fact that the simulated shooter had more instances of 0 hit streak lengths and despite the fact that Kobe had approximately twice as many instances of 1 hit streak lengths. However, the reason the simulated shooter’s streak length was higher on average was because, unlike Kobe, the simulated shooter had streak lengths of 5 and 10. Not suprisingly, the distribution of the simulated shooter’s streak also had a larger standard deviation of 1.603.

The hot hand model appears to fit or model Kobe’s shooting patterns fairly well given the similarity of the distributions in terms of shape, mean, and skew. The only major difference between the two distributions was the presence of outliers in the simulated case (streaks of 5 and 10). If one were to remove the outlier of 10, the simulated shooter’s mean and standard deviation would be far closer to Kobe’s.

However, despite the model’s validity, the question of whether shots are dependent on one another still seems to be difficult to confirm. While the number of overall hits as a percentage of total shots for the simulated shooter was not exactly 45% (it was about 48%) and the number of misses for the simulated shooter was not exactly 55% (it was about 52%), if we were to simulate a larger number of shots (e.g. 10,000), the percentage of hits would definitely be closer to 45%. The real question then seems to be: are the streaks of hits merely the effect of random, independent shots or do they demonstrate that shots are dependent upon prior shots?