It is believed in the sports world specially in basketball hot hand exists. This lab does not refute or accept the hot hand phenomenon, instead it focus on understanding if events are dependent or independent, learning to simulate shooting streak and to compare simulation to actual data to understand if hot hand exists.
Load data in kobe data frame.
load("more/kobe.RData")
head(kobe)
## vs game quarter time
## 1 ORL 1 1 9:47
## 2 ORL 1 1 9:07
## 3 ORL 1 1 8:11
## 4 ORL 1 1 7:41
## 5 ORL 1 1 7:03
## 6 ORL 1 1 6:01
## description basket
## 1 Kobe Bryant makes 4-foot two point shot H
## 2 Kobe Bryant misses jumper M
## 3 Kobe Bryant misses 7-foot jumper M
## 4 Kobe Bryant makes 16-foot jumper (Derek Fisher assists) H
## 5 Kobe Bryant makes driving layup H
## 6 Kobe Bryant misses jumper M
Q 1: What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?
A: Hit streak length of 1 can be explained as follows
Above 3 events put together is considered as hit streak length of 1. In simple terms combination of miss, hit and miss can be explained as hit streak of length 1. There a 2 misses and 1 hit. in hit streak of 1.
Hit streak of length zero means there are no hits. it is basically miss, miss, miss ….. In
Q 2: Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets?
A: Following is visual analysis of distribution of Kobe’s streak.
kobe_streak <- calc_streak(kobe$basket)
kobe_streak
## [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0
## [36] 0 1 0 0 0 1 1 0 1 0 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0
## [71] 1 1 0 0 0 1
summary(kobe_streak)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.7632 1.0000 4.0000
barplot(table(kobe_streak))
Q 3: In your simulation of flipping the unfair coin 100 times, how many flips came up heads?
A:
#Possible outcomes of coin flips
outcomes <- c("heads", "tails")
#Run summulation for 100 times. Coin is unfair because it has probability of 20% to land head and 80% to land tails. That mean if a coin is flipped 100 times it lands on heads 20 times and 80 times on tails.
sim_unfair_coin <- sample(outcomes, size = 100, replace = TRUE, prob = c(0.2, 0.8))
# summarize the landings
sumTable <- table(sim_unfair_coin)
sumTable
## sim_unfair_coin
## heads tails
## 32 68
Number of times heads came up 32.
Q 4: What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.
A: For any independent shooter chance of hit or miss is 50%. In this problem it is required to lower the shooting percentage to 45% and run the simulation of sample 133 shots. So the probability of hit is 0.45 and miss is .55. sample size is 133.
#Possible outcomes of shooting a basket
outcomes <- c("hit", "miss")
#Run summulation for 133 times. Probability of hit is 45% and miss is 55%.
sim_basket <- sample(outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
# summarize the landings
sumTable <- table(sim_basket)
sumTable
## sim_basket
## hit miss
## 60 73
Number of times hit is recorded 60, miss is recorded 73 times.
On your own
Q 5: Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots?
A:
#Possible outcomes of shooting a basket
outcomes <- c("H", "M")
#Run summulation for 133 times. Probability of hit is 45% and miss is 55%.
sim_basket <- sample(outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
ind_streak <- calc_streak(sim_basket)
summary(ind_streak)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.9143 1.0000 5.0000
barplot(table(ind_streak))
Q 6: If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.
A: According to the observations each time a simulation is run for independent shooter distribution changes. I have observed longest streak of baskets range from 3 to 11. Each simulation is totally different.
Q 7: How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.
A: To order to make valid comparison between Kobe and simulated shooter, shooting percentage needs an adjustment. It needs to be changed to hit or miss to 50%.
#Possible outcomes of shooting a basket
outcomes <- c("H", "M")
#Run summulation for 133 times. Probability of hit or miss is 50%.
sim_basket <- sample(outcomes, size = 133, replace = TRUE)
ind_streak <- calc_streak(sim_basket)
summary(ind_streak)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.8873 1.0000 6.0000
summary(kobe_streak)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.7632 1.0000 4.0000
#Kobe strek bar plot
barplot(table(kobe_streak))
#simulated shooter
barplot(table(ind_streak))
Kobe Bryant’s distribution of streak lengths are similar to simulated shooter. Graphs suggest each shot is independent. Outcome of first shot does have any impact on second shot. It is fair to say Kobe Bryant shooting pattern does not fit hot hand model.