Probability Lab

Author

B Braden

Probability Lab

Load Libraries

library(tidyverse)
library(openintro)

Access the data

data("kobe_basket")
glimpse(kobe_basket)
Rows: 133
Columns: 6
$ vs          <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL…
$ game        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ quarter     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3…
$ time        <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6:35…
$ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant misse…
$ shot        <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", "H"…

Exercise 1

What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

A streak length of 1 means there was 1 hit followed by 1 miss. A streak of zero means no hits and 1 miss.

Counting Streak Lengths

kobe_streak <- calc_streak(kobe_basket$shot) 

Streak length Distribution

ggplot(data = kobe_streak, aes(x = length)) + geom_bar()

Exercise 2

Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets? Make sure to include the accompanying plot in your answer.

The streak lengths are right skewed. His typical streak length was 0. His longest streak of baskets was 4. The plot is shown above.

Simulations in R

coin_outcomes <- c("heads", "tails")
sample(coin_outcomes, size = 1, replace = TRUE)
[1] "heads"
sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)
sim_fair_coin
  [1] "tails" "tails" "tails" "heads" "heads" "tails" "tails" "tails" "tails"
 [10] "tails" "tails" "tails" "tails" "heads" "heads" "heads" "tails" "heads"
 [19] "tails" "tails" "heads" "heads" "tails" "heads" "heads" "tails" "heads"
 [28] "heads" "tails" "heads" "heads" "heads" "tails" "heads" "tails" "heads"
 [37] "tails" "tails" "tails" "heads" "tails" "heads" "tails" "heads" "heads"
 [46] "heads" "heads" "heads" "heads" "tails" "heads" "tails" "heads" "tails"
 [55] "tails" "tails" "tails" "heads" "heads" "tails" "heads" "tails" "heads"
 [64] "heads" "tails" "heads" "tails" "tails" "tails" "heads" "heads" "tails"
 [73] "heads" "tails" "heads" "heads" "heads" "heads" "heads" "tails" "tails"
 [82] "tails" "heads" "heads" "tails" "heads" "tails" "tails" "heads" "heads"
 [91] "tails" "heads" "tails" "tails" "heads" "tails" "tails" "tails" "tails"
[100] "tails"
table(sim_fair_coin)
sim_fair_coin
heads tails 
   48    52 

Exercise 3

In your simulation of flipping the unfair coin 100 times, how many flips came up heads? Include the code for sampling the unfair coin in your response. Since the markdown file will run the code, and generate a new sample each time you Knit it, you should also “set a seed” before you sample. Read more about setting a seed below.

20 flips came up heads

set.seed(0415)   
sim_unfair_coin <- sample(coin_outcomes, size=100, replace=TRUE, prob=c(0.2,0.8))
sim_unfair_coin
  [1] "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails" "heads"
 [10] "tails" "tails" "tails" "tails" "tails" "heads" "tails" "tails" "heads"
 [19] "tails" "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails"
 [28] "tails" "tails" "tails" "heads" "tails" "tails" "tails" "tails" "heads"
 [37] "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails" "tails"
 [46] "heads" "tails" "tails" "tails" "tails" "tails" "heads" "heads" "heads"
 [55] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
 [64] "tails" "tails" "tails" "tails" "tails" "tails" "heads" "tails" "tails"
 [73] "tails" "tails" "tails" "tails" "tails" "heads" "heads" "tails" "tails"
 [82] "heads" "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails"
 [91] "tails" "heads" "tails" "heads" "tails" "tails" "tails" "tails" "heads"
[100] "tails"
table(sim_unfair_coin)
sim_unfair_coin
heads tails 
   20    80 

when do we need to use set.seed?

?sample

Simulating the Independent Shooter

shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 1, replace = TRUE)
shot_outcomes
[1] "H" "M"

Exercise 4

What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

I adjusted it by using the unfair coin simulation code as guidance.I put in a probability (0.45).

shot_outcomes <- c("H", "M")
set.seed(5656)
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
sim_basket
  [1] "H" "H" "M" "H" "H" "M" "H" "M" "M" "M" "M" "M" "H" "H" "M" "M" "M" "H"
 [19] "M" "M" "M" "H" "M" "H" "H" "H" "H" "M" "H" "M" "H" "H" "H" "H" "H" "M"
 [37] "M" "H" "H" "H" "H" "H" "M" "M" "M" "M" "M" "H" "M" "H" "M" "H" "M" "H"
 [55] "H" "M" "M" "H" "M" "H" "M" "H" "H" "M" "H" "M" "M" "M" "M" "M" "M" "H"
 [73] "H" "H" "M" "M" "H" "M" "H" "M" "M" "H" "H" "H" "H" "M" "H" "M" "H" "M"
 [91] "M" "M" "M" "M" "M" "M" "M" "H" "M" "M" "H" "M" "M" "H" "M" "M" "H" "H"
[109] "H" "M" "H" "H" "H" "M" "H" "M" "H" "H" "H" "H" "M" "M" "H" "H" "M" "M"
[127] "M" "M" "M" "M" "M" "M" "M"
table(sim_basket)
sim_basket
 H  M 
61 72 

Exercise 5

Using calc_streak, compute the streak lengths of sim_basket, and save the results in a data frame called sim_streak.

sim_streak <- calc_streak(sim_basket)
ggplot(data=sim_streak, aes(x=length)) + geom_bar()

Exercise 6

Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots? Make sure to include a plot in your answer.

The typical streak length is 0. The longest streak is 5. The plot is shown above

Exercise 7

If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

I think the streak distribution would be similar to the one above (if I undid the seed I set up). The percentages in the probability are the same and the number of shots taken are the same. The theoretical probability is not the exact same as in real life. For example, you will not always get 50 heads and 50 tails if you flip a coin 100 times. But it will be around that. So I think the streak distribution will be around the same.

Exercise 8

How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

A hot hand would involve not independent shots so the probability that Kobe would make his second shots would go up. Potential increased probabilities would lead to longer streaks. However, the independent simulation was able to reach a streak of 5 which Kobe did not (from this data). The hot hand model does not fit Kobe’s shooting patterns.