library(stringr)
library(ggplot2)
library(knitr)
load(url("http://www.openintro.org/stat/data/kobe.RData"))
head(kobe)
##    vs game quarter time
## 1 ORL    1       1 9:47
## 2 ORL    1       1 9:07
## 3 ORL    1       1 8:11
## 4 ORL    1       1 7:41
## 5 ORL    1       1 7:03
## 6 ORL    1       1 6:01
##                                               description basket
## 1                 Kobe Bryant makes 4-foot two point shot      H
## 2                               Kobe Bryant misses jumper      M
## 3                        Kobe Bryant misses 7-foot jumper      M
## 4 Kobe Bryant makes 16-foot jumper (Derek Fisher assists)      H
## 5                         Kobe Bryant makes driving layup      H
## 6                               Kobe Bryant misses jumper      M
Question 1. What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?
shot
H M | M | H H M | M | M | M
streak_length_1 streak_length_0
when there’s 1 hit ‘H’ and 1 miss ‘M’ in a shot when a shot has 1 miss ‘M’ in it
Question 2. Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets?
kobe_streak <- calc_streak(kobe$basket)
kobe_streak
##  [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0
## [36] 0 1 0 0 0 1 1 0 1 0 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0
## [71] 1 1 0 0 0 1
qplot(kobe_streak, col="red", binwidth=1)

kobe_streak_distribution
longest_streak 4
distribution skewed to the right
kobe_most_common_streak_length 0 and 1
highest_streak_distribution 0 which is ‘MISS’
Question 3. In your simulation of flipping the unfair coin 100 times, how many flips came up heads?
outcomes <- c("heads", "tails")
sim_fair_coin <- sample(outcomes, size = 100, replace = TRUE)
table(sim_fair_coin)
## sim_fair_coin
## heads tails 
##    45    55
sim_unfair_coin <- sample(outcomes, size = 100, replace = TRUE, prob = c(0.2, 08))
table(sim_unfair_coin)
## sim_unfair_coin
## heads tails 
##     1    99
table(sim_unfair_coin)[1]
## heads 
##     1
Question 4. What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.
outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
table(sim_basket)
## sim_basket
##  H  M 
## 53 80

On your own

Comparing Kobe Bryant to the Independent Shooter

Using calc_streak, compute the streak lengths of sim_basket.

Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots?
sim_streak<-calc_streak(sim_basket)
sim_streak
##  [1] 0 0 0 0 0 0 0 0 1 0 0 3 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1
## [36] 1 3 1 2 1 0 4 2 0 0 0 6 0 1 1 0 0 0 0 2 0 1 0 1 0 0 1 0 0 1 1 2 0 1 0
## [71] 0 1 0 7 0 0 0 0 0 1 0
qplot(sim_streak, col="red", binwidth=1)

shooter_streak_distribution
longest_streak 7
distribution skewed to the right
shooter_most_common_streak_length 0 and 1
highest_streak_distribution 0 which is ‘MISS’
If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.
sim_basket1 <- sample(outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
sim_streak1<-calc_streak(sim_basket1)
sim_streak1
##  [1] 1 0 1 1 3 3 0 0 4 0 2 0 3 0 0 0 0 0 2 0 1 0 1 1 0 1 0 0 4 6 2 0 0 2 2
## [36] 0 0 0 3 0 1 0 0 0 0 0 1 0 3 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 2 1 0 0
## [71] 1 0 1 0 0 0 1
qplot(sim_streak1, col="red", binwidth=1)

title
longest_streak 6
distribution skewed to the right
shooter_most_common_streak_length 0 and 1
highest_streak_distribution 0 which is ‘MISS’

The outcome of running the simulation of the independent shooter a second time is resulting to the followings:

  • Both skewed positively to the right
  • Both have streak length 0 as highest
  • Their longest streaks are different
  • The values of each length are different in both

The results are somewhat similar. It’s fair to say that both have same patterns as mentioned above.

How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

In order to compare between Kobe Bryant’s distribution of streak lengths and the distribution of streak lengths for the simulated shooter, we will have to modify the sim_basket sampel to be 50-50% instead of 45-55%.

sim_basket2 <- sample(outcomes, size = 133, replace = TRUE)
sim_streak2<-calc_streak(sim_basket2)
qplot(sim_streak2, col="red", binwidth=1)

title
longest_streak 5
distribution skewed to the right
shooter_most_common_streak_length 0 and 1
highest_streak_distribution 0 which is ‘MISS’
summary(kobe_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7632  1.0000  4.0000
summary(sim_streak2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  1.0000  0.8873  1.0000  5.0000
kobe_streak<-kobe_streak[1:length(sim_streak2)]
boxplot(kobe_streak~sim_streak2, outline = F, ylab = "wdiff")

The results tell us as the followings:

  • Both skewed positively to the right
  • Both have streak length 0 as highest
  • Their longest streaks are different
  • The values of each length are different in both

This can mean that Kobe’s streak length is ver close or similar to the simulated shooter’s streak length, which means that Kobe is a great candidate to be called having a hot hand.