Probability

Getting Started

library(tidyverse)

## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(openintro)

## Loading required package: airports

## Loading required package: cherryblossom

## Loading required package: usdata

glimpse(kobe_basket)

## Rows: 133
## Columns: 6
## $ vs          <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ...
## $ game        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ quarter     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3...
## $ time        <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6...
## $ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant mi...
## $ shot        <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", ...

Exercise 1

What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

A streak of one means that Kobe hit one shot before a miss. Once he missed, the streak is ended. A streak of zero means that directly after Kobe broke a streak by missing, he missed again, meaning that the streak would then be over, even though there never was one to begin with.

kobe_streak <- calc_streak(kobe_basket$shot)

ggplot(data= kobe_streak, aes(x=length))+
  geom_bar()

Exercise 2

Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets? Make sure to include the accompanying plot in your answer.

The distribution of Kobe’s streak lengths is skewed to the right. His typical streak length is 0 and his longest streak is 4, which he met about twice according to the bar graph.

Simulations in R

coin_outcomes <- c("heads", "tails")
sample(coin_outcomes, size = 1, replace = TRUE)

## [1] "heads"

sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)

sim_fair_coin

##   [1] "tails" "heads" "tails" "tails" "tails" "heads" "tails" "heads" "heads"
##  [10] "tails" "heads" "heads" "heads" "tails" "tails" "heads" "heads" "tails"
##  [19] "tails" "tails" "tails" "heads" "tails" "tails" "tails" "heads" "heads"
##  [28] "tails" "heads" "tails" "tails" "heads" "heads" "heads" "tails" "heads"
##  [37] "tails" "tails" "heads" "tails" "heads" "heads" "tails" "tails" "tails"
##  [46] "tails" "tails" "heads" "heads" "heads" "tails" "heads" "heads" "tails"
##  [55] "heads" "heads" "tails" "heads" "heads" "heads" "heads" "tails" "tails"
##  [64] "heads" "heads" "heads" "tails" "heads" "heads" "tails" "heads" "tails"
##  [73] "tails" "heads" "heads" "heads" "heads" "heads" "tails" "heads" "heads"
##  [82] "heads" "heads" "heads" "tails" "heads" "heads" "tails" "heads" "heads"
##  [91] "heads" "tails" "heads" "tails" "heads" "heads" "heads" "heads" "tails"
## [100] "heads"

table(sim_fair_coin)

## sim_fair_coin
## heads tails 
##    58    42

set.seed(12345)
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE, 
                          prob = c(0.2, 0.8))

Exercise 3

In your simulation of flipping the unfair coin 100 times, how many flips came up heads? Include the code for sampling the unfair coin in your response. Since the markdown file will run the code, and generate a new sample each time you Knit it, you should also “set a seed” before you sample. Read more about setting a seed below.

sim_unfair_coin

##   [1] "tails" "heads" "tails" "heads" "tails" "tails" "tails" "tails" "tails"
##  [10] "heads" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
##  [19] "tails" "heads" "tails" "tails" "heads" "tails" "tails" "tails" "tails"
##  [28] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
##  [37] "heads" "heads" "tails" "tails" "tails" "tails" "heads" "tails" "tails"
##  [46] "tails" "tails" "tails" "tails" "tails" "heads" "heads" "tails" "tails"
##  [55] "tails" "tails" "tails" "tails" "tails" "tails" "tails" "tails" "heads"
##  [64] "tails" "heads" "tails" "heads" "tails" "tails" "heads" "tails" "tails"
##  [73] "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails" "heads"
##  [82] "tails" "tails" "tails" "tails" "tails" "heads" "tails" "heads" "tails"
##  [91] "heads" "heads" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
## [100] "tails"

table(sim_unfair_coin)

## sim_unfair_coin
## heads tails 
##    20    80

Out of 100 flips, only 20 of them came back as heads in the unfair coin simulation.

Simulating the Independent Shooter

shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 1, replace = TRUE)

Exercise 4

What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

To make it reflect this percentage, you must add , prob= c(.45, .55).

The new code would look like this:

shot_outcomes <- c(“H”, “M”) sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE, prob= c(.45, .55) )

shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE, prob= c(.45, .55) )

table(sim_basket)

## sim_basket
##  H  M 
## 72 61

Exercise 5

Using calc_streak, compute the streak lengths of sim_basket, and save the results in a data frame called sim_streak.

sim_streak <- calc_streak(sim_basket)

Exercise 6

Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots? Make sure to include a plot in your answer.

ggplot(data= sim_streak, aes(x= length))+
  geom_bar()

The distribution of streak lengths is skewed to the right. The typical length of the streak is zero and the largest streak held by the independent shooter is a streak of 4.

Exercise 7

If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

I expect that the distribution will be similar. Since the probability is at the same percent for different attempts at the simulation, there should only be minor changes to the data points, which would not greatly effect the overall distribution of a graph.

Exercise 8

How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

Both of these distributions are similar, they are both skewed to the right. It does not seem that there is this hot hand phenomenon, since in both Kobe’s results and the results of the simulated lab, the majority of the streaks were 0, meaning that there was no streak at all. This leads me to think that basketball players are not often making lots of baskets in a row, because having a streak of 0 means that they keep on missing.

Lab 3- Probability

Maddie Brennan

9/15/2020