Probability

Getting Started

load("more/kobe.RData")
kobe_streak <- calc_streak(kobe$basket)

barplot(table(kobe_streak))

What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

Streak of length one means 1 hit and 1 miss
Streak of length zero means 1 miss

Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets?

The distribution is skewed to the right
35% of the time Kobe missed a shot to start a streak(missed after a miss)
Only about (1-2%) of Kobe streaks were of length 4 shots hit in a row
Over 20% of the time Kobe would hit one shot and miss his next shot
About 5% of the time Kobe hit 2 shots and missed one and about 5% kobe hit 3 shots and missed one

outcomes <- c("heads", "tails")
sample(outcomes, size = 1, replace = TRUE)

## [1] "tails"

sim_fair_coin <- sample(outcomes, size = 100, replace = TRUE)
sim_unfair_coin <- sample(outcomes, size = 100, replace = TRUE, prob = c(0.2, 0.8))

In your simulation of flipping the unfair coin 100 times, how many flips came up heads?

76 times

Simulating the Independent Shooter

What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

Adjust probability to hit=45% miss= 55%

outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 133, replace = TRUE, prob= c(.45,.55))
#kobe$basket
#sim_basket

On your own

Comparing Kobe Bryant to the Independent Shooter

Using calc_streak, compute the streak lengths of sim_basket.

Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots?

library(ggplot2)
normal_hand_streak <-calc_streak(sim_basket)
table(normal_hand_streak)

## normal_hand_streak
##  0  1  2  3  4  5  6  8 
## 35 12  7  4  2  2  1  1

barplot(table(normal_hand_streak))

A majority of the streak lengths around 40% are of length 0
It appears rightward skewed
very few lengths are of 4+
longest streak was 5

Graph two shooters side by side

length(kobe_streak)=length(normal_hand_streak)
hot_kobe_vs_random_kobbe <- cbind(normal_hand_streak,kobe_streak)
df_kobe <- as.data.frame(hot_kobe_vs_random_kobbe) 

shot_comparison<- rbind(data.frame(fill="Kobe", obs=df_kobe$normal_hand_streak),
            data.frame(fill="Normal_shooter", obs=df_kobe$kobe_streak))
ggplot(shot_comparison, aes(x=obs, fill=fill)) +
  geom_histogram(binwidth=1, colour="black", position="dodge")

If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

I would expect the results to be somewhat similar
- I would expect the missed shots to be closest to their true expected value, as they are the most common event
- Streaks of 3 plus shots make up a very small percentage of overall shots made and therefore are extremely volatile given our sample size. For instance in neither sample were 6 shots hit yet statistically speaking given a 45% shooting percentage such an occurrence has over a 1% likelihood, which would mean it should occur every 100 streaks.

How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

There doesn’t appear to be much of a difference between a randomized shooter shooting at 45% and Kobe during his “hot hand” playoff run, Sorry Kobe