library(tidyverse)
library(openintro)Probability Lab Hw
Overview:
This is a probability tutorial assignment that looks at the “hot hand” phenomenon that is observed in basketball. The “hot hand” phenomenon refutes the assumption that each shot is independent of the next. As the OpenIntro tutorial states, in this lab, we have 3 goals: To (1) think about the effects of independent and dependent events, (2) learn how to simulate shooting streaks in R, and (3) to compare a simulation to actual data in order to determine if the hot hand phenomenon appears to be real.
Loading the packages:
Loading the data
This investigation will focus on the performance of one player: Kobe Bryant of the Los Angeles Lakers. His performance against the Orlando Magic in the 2009 NBA Finals earned him the title Most Valuable Player and many spectators commented on how he appeared to show a hot hand. The data file we’ll use is called kobe_basket.
data("kobe_basket")
glimpse(kobe_basket)Rows: 133
Columns: 6
$ vs <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL…
$ game <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ quarter <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3…
$ time <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6:35…
$ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant misse…
$ shot <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", "H"…
Exercise 1: What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?
Answer: A streak length of 1 indicates that the player, in this case Kobe Bryant, made a shot, and then missed the following shot. Since a streak starts at the first basket made, a streak of 0 means that the player missed a basket following a prior miss.
Looking at streak lengths for all 133 shots
kobe_streak <- calc_streak(kobe_basket$shot)
ggplot(data = kobe_streak, aes(x = length)) +
geom_bar()Exercise 2: Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets? Make sure to include the accompanying plot in your answer.
Answer: The distribution is right skewed. The typical streak length was 0, and his longest streak of baskets was 4.
ggplot(data = kobe_streak, aes(x = length)) +
geom_bar()Simulations in R
We will be conducting a coin toss simulation to lay the foundation for simulation models.
coin_outcomes <- c("heads", "tails")
sample(coin_outcomes, size = 1, replace = TRUE)[1] "heads"
Simulating 100 coin tosses
sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)
table(sim_fair_coin)sim_fair_coin
heads tails
43 57
Simulating an unfair coin
set.seed(022300)
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE, prob = c(0.2, 0.8))Exercise 3: In your simulation of flipping the unfair coin 100 times, how many flips came up heads? Include the code for sampling the unfair coin in your response
Answer: In the simulation of the unfair coin, 18 flips came up heads (table of results included below)
table(sim_unfair_coin)sim_unfair_coin
heads tails
18 82
Using this model to simulate the independent shooter
shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 1, replace = TRUE)Exercise 4:
Answer: We need to change both sample size and probability to make it a model that reflects Kobe Bryant’s shooting.
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))Comparing Kobe Bryant to the independent shooter
Exercise 5: Using calc_streak, compute the streak lengths of sim_basket, and save the results in a data frame called sim_streak
sim_streak <- calc_streak(sim_basket)Exercise 6: Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots? Make sure to include a plot in your answer.
Answer: The distribution of streak lengths is right skewed with the typical streak length being 0. The simulated player’s longest streak length is 7.
ggplot(data = sim_streak, aes(x = length)) +
geom_bar()Exercise 7: If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning
Answer: I would expect very minimal differences in the data, with the distribution remaining right skewed. The probabilities will remain the same, with most of the streak counts being 0.
Exercise 8: How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain
Answer: Kobe Bryant’s distribution is essentially identical to that of the simulated shooter, with the exception of a shorter x-axis (streak length). I do believe that the hot hand model is convincing, and that it greatly resembles the shooting patterns. However, there are human elements and environmental conditions that cannot be programmed into this model. This is perhaps why the computer model stretched the streak for so long, since it was relying a probability that does not change, whereas in real life, I don’t think this to be the case. Very rarely do you see a basketball player have such a high streak.