OpenIntro Probability Lab

Author

A. Diaz-Nova

load packages using [library] function

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openintro)
Loading required package: airports
Loading required package: cherryblossom
Loading required package: usdata

A look into the data - Huge emphasis/focus on the performance of Kobe Bryant

glimpse(kobe_basket)
Rows: 133
Columns: 6
$ vs          <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL…
$ game        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ quarter     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3…
$ time        <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6:35…
$ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant misse…
$ shot        <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", "H"…
# Contains 133 observations and 6 variables
# Number of consecutive baskets made until a miss occurs = Length of a shooting streak

Exercise 1: What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

kobe_streak <- calc_streak(kobe_basket$shot)
# Make a custom function to reduce work effort - make it easier to code

Take a look at the distribution

ggplot(data = kobe_streak, aes(x = length)) + geom_bar()

Answer: A streak length of 1 means Kobe hit 1 shot and then missed 1 right after - A streak lenght of 0 means Kobe hit 0 shot and then missed 1 instead

Exercise 2: Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets? Make sure to include the accompanying plot in your answer.

p1 <- kobe_streak |>
  ggplot(aes(x = length)) + 
  geom_boxplot()
p1

Boxplot, I struggled so much with the dotplot and histogram coding. The answer is that the distribution is skewed to the left with majority of the data between length 1 to 0 (typical). His longest streak length was 4.

Intro to independence - Only if the outcome of one process doesn’t effect the outcome of the second. Also deals with probability, is it a high or low chance?

Intro to simulations in R

coin_outcomes <- c("heads", "tails")
sample(coin_outcomes, size = 1, replace = TRUE)
[1] "tails"
# coin_outcomes is like a hat with two different slips of paper in it: which is heads or tails

Now try simulating a coin 100 times

sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)
table(sim_fair_coin)
sim_fair_coin
heads tails 
   45    55 
# Use the table functionto view the results of the simulation
# Since what we did was flip a coin fairly, now we will try to simulate an unfair coin for 20% instead of 50%
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE, prob = c(0.2, 0.8))
table(sim_unfair_coin)
sim_unfair_coin
heads tails 
   18    82 
# We adjust for the 20% by using the prob function which provides a vector - one is 20% and the other is 80%
set.seed(35797)

In your simulation of flipping the unfair coin 100 times, how many flips came up heads? Include the code for sampling the unfair coin in your response. Since the markdown file will run the code, and generate a new sample each time you Knit it, you should also “set a seed” before you sample. Read more about setting a seed below.

Answer: By flipping the unfair coinn 100 times, heads came up 24 times.

Simulating the Independent Shooter - Very similar to the mechanics of simulating a coin flip, so now try to simulate a single shot with 50% (fair)

shot_outcomes <- c("H","M")
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
table(sim_basket)
sim_basket
 H  M 
63 70 
# Readjust for the shooting percentage of 45% and run a simulation to sample 133 shots

Kobe has independent shots meaning we know the simulated shootes does not have a hot hand.

sim_streak <- calc_streak(sim_basket)
# Exercise 5 Using calc_streak compute the streak lengths 
table(sim_streak)
length
 0  1  2  3  4  5  6 
40 14  9  4  2  1  1 

Do an exploratory data analysis based of the simulated streak lengths

p2 <- sim_streak |>
  ggplot(aes(x = length)) + 
  geom_boxplot()
p2

# Exercise 6

Answer: The typical streak length is between 0 and 1. The longest streak length is 5 from the 133 shots