library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
glimpse(kobe_basket)
## Rows: 133
## Columns: 6
## $ vs <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL~
## $ game <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ quarter <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3~
## $ time <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6:35~
## $ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant misse~
## $ shot <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", "H"~
kobe_basket <- c(kobe_basket)
What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?
A streak of one means that there was one shot made. A streak length of zero means not shots were made.
Calculate the number of streaks:
kobe_streak <- calc_streak(kobe_basket$shot)
Plot the data:
ggplot(data = kobe_streak, aes(x = length)) +
geom_bar()
Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets? Make sure to include the accompanying plot in your answer.
The most common streak length was 0. The longest streak was 4. The histogram is skewed right.
#Simulations in R
Simulate a coin toss:
set.seed(1209)
coin_outcomes <- c("heads", "tails")
sample(coin_outcomes, size = 1, replace = TRUE)
## [1] "tails"
Simulate coin toss 100 times:
sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)
Table of results:
set.seed(1211)
sim_fair_coin
## [1] "heads" "heads" "tails" "tails" "heads" "tails" "heads" "tails" "tails"
## [10] "heads" "tails" "heads" "heads" "heads" "heads" "tails" "tails" "heads"
## [19] "tails" "heads" "heads" "tails" "heads" "tails" "tails" "heads" "heads"
## [28] "tails" "tails" "tails" "tails" "heads" "heads" "heads" "tails" "heads"
## [37] "tails" "heads" "tails" "heads" "heads" "tails" "tails" "heads" "heads"
## [46] "heads" "heads" "tails" "tails" "heads" "heads" "heads" "tails" "tails"
## [55] "tails" "tails" "tails" "heads" "heads" "tails" "heads" "heads" "tails"
## [64] "heads" "tails" "heads" "tails" "tails" "heads" "heads" "heads" "tails"
## [73] "tails" "tails" "tails" "heads" "tails" "heads" "tails" "tails" "heads"
## [82] "tails" "tails" "heads" "heads" "heads" "heads" "tails" "heads" "heads"
## [91] "tails" "heads" "heads" "tails" "tails" "tails" "tails" "heads" "heads"
## [100] "heads"
table(sim_fair_coin)
## sim_fair_coin
## heads tails
## 52 48
Simulate an unfair coin:
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE,
prob = c(0.2, 0.8))
In your simulation of flipping the unfair coin 100 times, how many flips came up heads? Include the code for sampling the unfair coin in your response. Since the markdown file will run the code, and generate a new sample each time you Knit it, you should also “set a seed” before you sample. Read more about setting a seed below.
Table of results for unfair coin:
set.seed(1212)
sim_unfair_coin
## [1] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
## [10] "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails" "heads"
## [19] "tails" "heads" "tails" "tails" "tails" "tails" "tails" "heads" "tails"
## [28] "tails" "tails" "tails" "tails" "heads" "tails" "heads" "tails" "tails"
## [37] "tails" "tails" "heads" "heads" "tails" "tails" "tails" "heads" "tails"
## [46] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
## [55] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
## [64] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
## [73] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
## [82] "tails" "tails" "tails" "tails" "heads" "tails" "heads" "tails" "tails"
## [91] "tails" "tails" "heads" "heads" "tails" "tails" "tails" "tails" "tails"
## [100] "heads"
table(sim_unfair_coin)
## sim_unfair_coin
## heads tails
## 14 86
In flipping the unfair coin, the coin came up heads 14 times.
shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 1, replace = TRUE)
What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.
Simulate 45% shooting percentage:
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE,
prob = c(0.45, 0.55))
Table of the data:
set.seed(1012)
sim_basket
## [1] "M" "H" "M" "H" "M" "M" "M" "H" "M" "M" "H" "M" "H" "H" "M" "M" "M" "H"
## [19] "M" "H" "M" "M" "M" "H" "M" "H" "H" "H" "H" "M" "M" "M" "H" "M" "H" "M"
## [37] "H" "H" "M" "M" "H" "H" "M" "M" "H" "M" "H" "H" "M" "M" "M" "H" "H" "M"
## [55] "H" "M" "H" "M" "M" "M" "H" "M" "H" "H" "M" "M" "H" "M" "M" "H" "H" "M"
## [73] "M" "M" "H" "M" "M" "H" "M" "H" "M" "M" "H" "M" "H" "M" "M" "M" "M" "M"
## [91] "H" "M" "M" "M" "H" "H" "M" "M" "M" "M" "M" "M" "H" "H" "H" "M" "H" "M"
## [109] "H" "H" "M" "H" "M" "H" "M" "H" "M" "M" "M" "M" "H" "H" "M" "H" "M" "H"
## [127] "H" "M" "H" "M" "H" "M" "M"
table(sim_basket)
## sim_basket
## H M
## 56 77
Using calc_streak, compute the streak lengths of sim_basket, and save the results in a data frame called sim_streak.
sim_streak <- calc_streak(sim_basket)
Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots? Make sure to include a plot in your answer.
ggplot(data = sim_streak, aes(x = length)) +
geom_bar()
The typical length of a streak is 0. The longest streak is 5.
If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.
It would be similar but not exactly the same. The simulation randomly calculates how many baskets are made and how the streaks are made, so while there would be similarity from the randomization, the actual data would differ slightly.
How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.
Kobe’s data is similar to the simulated shooter in that the most common streak length is 0. His frequency of a streak length of 1 was much higher than the simulated shooter, however the rest of the frequencies were very similar.
There is no evidence that the hot hands model fits his pattern, as Kobe’s data is very similar to that of a simulated independent shooter.