library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
head(kobe_basket, n = 10)
## # A tibble: 10 × 6
## vs game quarter time description shot
## <fct> <int> <fct> <fct> <fct> <chr>
## 1 ORL 1 1 9:47 Kobe Bryant makes 4-foot two point shot H
## 2 ORL 1 1 9:07 Kobe Bryant misses jumper M
## 3 ORL 1 1 8:11 Kobe Bryant misses 7-foot jumper M
## 4 ORL 1 1 7:41 Kobe Bryant makes 16-foot jumper (Derek Fish… H
## 5 ORL 1 1 7:03 Kobe Bryant makes driving layup H
## 6 ORL 1 1 6:01 Kobe Bryant misses jumper M
## 7 ORL 1 1 4:07 Kobe Bryant misses 12-foot jumper M
## 8 ORL 1 1 0:52 Kobe Bryant misses 19-foot jumper M
## 9 ORL 1 1 0:00 Kobe Bryant misses layup M
## 10 ORL 1 2 6:35 Kobe Bryant makes jumper H
The streak length is defined as the number of consecutive successes (made shots, H) before a failure (missed shot, M) occurs. A streak length of 1 means there is one (1) hit (H) and 1 miss (M) in the streak. A streak length of 0 means there is 0 hit (H) and 1 miss (M) in the streak.
The distribution of Kobe’s streak length from the 2009 NBA finals is left skewed geometric distribution. The probability of 0 of smaller streak lengths as higher than the probability of longer streak lengths.
With a 45% probability (p) of a hit, the expected Length of Streak = 1/p = 1/0.45 = 2.22. This would be Kobe’s typical streak length.
To get Kobe’s longest streak, we can calculate the maximum streak length or read it from the histogram. The longest streak length of the basket from the distribution and calculation was 4.
Answers: Typical streak length = 1/0.45 = 2.222 longest streak length = max(streak lengths) = 4.
kobe_streak <- calc_streak(kobe_basket$shot)
ggplot(data = kobe_streak, aes(x = length)) +
geom_bar()
typical_streak <- kobe_streak %>% summarise("typical streak length" = 1/0.45,
"max/longest streak length" = max(length)
)
typical_streak
## typical streak length max/longest streak length
## 1 2.222222 4
In my simulation, 21 flips came up heads.
coin_outcomes <- c("heads", "tails")
#Set seed to ensure the same outcome each time
set.seed(245824)
sim_unfair_coin <- sample(coin_outcomes , size = 100, replace = TRUE,
prob = c(0.2, 0.8))
table(sim_unfair_coin)
## sim_unfair_coin
## heads tails
## 21 79
The change that needs to be made is to add a prob = (0.45, 0.55) argument to the function to reflect that the chances of a hit (H) is 45% and the chances of a miss (M) is 55%.
shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes,
size = 133,
replace = TRUE,
prob = c(0.45, 0.55)
)
table(sim_basket)
## sim_basket
## H M
## 62 71
sim_streak <- calc_streak(sim_basket)
glimpse(sim_streak)
## Rows: 72
## Columns: 1
## $ length <dbl> 0, 2, 1, 0, 1, 2, 4, 0, 3, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, …
The distribution is a left-skewed geometric distribution with 0 or lower streak lengths having a higher frequency of occurrence. The typical streak length is synonymous to the expected streak length = 1/0.45.
To get longest streak, we can calculate the maximum streak length or read it from the histogram. The longest streak length of the basket from the distribution and calculation was 6.
Answers: Typical streak length = 1/0.45 = 2.222 longest streak length = max(streak lengths) = 5.
ggplot(data = sim_streak, aes(x = length)) +
geom_bar()
typical_sim_streak <- sim_streak %>% summarise("typical streak length" = 1/0.45,
"max/longest streak length" = max(length)
)
typical_sim_streak
## typical streak length max/longest streak length
## 1 2.222222 5
I expect the distribution to be somewhat similar for the following reasons:
Randomness: The outcome of each shot are determined by a random chance for an independent shooter so I expect some degree of variation in the streak distribution. The distribution will therefore not be exactly the same.
Probability: Since streak distribution is influenced by probability, the streak distribution will not be totally different because we are maintaining the same probability of success in the second trial.
Law of Large Numbers: As you run more simulations with the same parameters, the overall streak distribution will converge toward an expected distribution based on the specified probability. However, individual simulations may still exhibit variation due to the random nature of each shot.
Note: The answers above are based on the assumption that excercise 4 to 7 are run without first running excercise 3 which has the set.seed() function.
sim_streak %>% group_by(length) %>% summarise("frequency of streak length" = n())
## # A tibble: 6 × 2
## length `frequency of streak length`
## <dbl> <int>
## 1 0 40
## 2 1 15
## 3 2 9
## 4 3 4
## 5 4 3
## 6 5 1
Kobe Bryant’s distribution of streak lengths closely resembles the distribution for the simulated shooter. This suggests that there is no evidence for the hot hand effect in his shooting patterns. The distributions are similar because the probability of a hit by the simulated shooter is set to the same value of 0.45 for Kobe Bryant and the number of shots are the same 173.