Probability
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.5 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
Exercise 1: What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?
A streak length of 1 will have 1 hit and 1 miss. Then a streak length of 0 will have 0 hits and 1 miss.
Exercise 2: Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets? Make sure to include the accompanying plot in your answer.
kobe_streak <- calc_streak(kobe_basket$shot)
ggplot(data = kobe_streak, aes(x = length)) +
geom_bar()

Kobe’s typical streak length from the 2009 NBA finals was a streak length of 0. His longest streak of baskets was a streak length of 4. The overall distribution is right skewed.
Exercise 3: In your simulation of flipping the unfair coin 100 times, how many flips came up heads? Include the code for sampling the unfair coin in your response. Since the markdown file will run the code, and generate a new sample each time you Knit it, you should also “set a seed” before you sample. Read more about setting a seed below.
set.seed(64588)
coin_outcomes <- c("heads", "tails")
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE,
prob = c(0.2, 0.8))
sim_unfair_coin
## [1] "tails" "heads" "tails" "tails" "tails" "tails" "heads" "tails" "tails"
## [10] "tails" "tails" "tails" "tails" "tails" "tails" "heads" "tails" "tails"
## [19] "tails" "tails" "tails" "tails" "tails" "heads" "tails" "heads" "heads"
## [28] "tails" "tails" "tails" "heads" "tails" "tails" "tails" "tails" "tails"
## [37] "tails" "tails" "tails" "tails" "tails" "tails" "heads" "tails" "tails"
## [46] "tails" "tails" "heads" "tails" "heads" "tails" "tails" "heads" "tails"
## [55] "heads" "tails" "tails" "tails" "tails" "tails" "heads" "tails" "tails"
## [64] "heads" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
## [73] "heads" "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails"
## [82] "tails" "tails" "heads" "tails" "tails" "tails" "tails" "tails" "tails"
## [91] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
## [100] "tails"
table(sim_unfair_coin)
## sim_unfair_coin
## heads tails
## 17 83
In my simulation, 17 coin flips came up heads.
Exercise 4: What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.
shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE,
prob = c(0.45,0.55))
sim_basket
## [1] "M" "M" "H" "H" "M" "H" "M" "H" "H" "H" "M" "H" "H" "H" "M" "M" "H" "H"
## [19] "H" "M" "M" "M" "H" "H" "H" "H" "H" "M" "M" "M" "H" "H" "M" "H" "M" "M"
## [37] "M" "M" "M" "M" "H" "M" "M" "H" "M" "H" "M" "H" "H" "M" "M" "M" "H" "M"
## [55] "M" "M" "M" "M" "H" "M" "H" "M" "M" "H" "H" "M" "M" "H" "H" "H" "M" "H"
## [73] "M" "M" "M" "M" "H" "M" "H" "H" "M" "M" "H" "H" "M" "M" "H" "M" "M" "H"
## [91] "H" "H" "H" "H" "M" "H" "H" "H" "M" "H" "M" "M" "H" "H" "M" "H" "M" "H"
## [109] "H" "M" "M" "M" "M" "M" "M" "H" "M" "M" "M" "M" "H" "H" "M" "M" "M" "H"
## [127] "H" "H" "M" "M" "M" "M" "M"
table(sim_basket)
## sim_basket
## H M
## 60 73
Exercise 5: Using calc_streak, compute the streak lengths of sim_basket, and save the results in a data frame called sim_streak.
sim_streak <- calc_streak(sim_basket)
Exercise 6: Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots? Make sure to include a plot in your answer.
ggplot(data = sim_streak, aes(x = length)) +
geom_bar()

The simulated independent shooter typically has a streak length of 0. The longest streak of baskets in the shooter’s 133 shots has a streak length of 5.
Exercise 7: If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.
I would expect that the simulation of the independent shooter will give a different but similar streak distribution in comparison to the previous output. This is due to the fact that the graph made above was created using the output from 133 runs of the independent shooter simulation, and it is known that a collection of experimental outcomes will get closer to the theoretical outcome as the total amount of runs of the experiment/simulation reach large numbers.
Exercise 8: How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.
Visually, the Kobe Bryant distribution is very similar to the simulated shooter distribution. Therefore, I do not have evidence that the hot hand model fits Kobe’s shooting patterns. For example, Kobe’s distribution shows that a streak length of 0 occured ~38 times, while the simulated distribution shows a streak length of 0 occurs ~43 times. The difference in streak length occurence between the graphs are not great enough for me to confidently support the hot hand model for Kobe.