DATA606WK3Lab

Getting Started

Load packages

We’re programming within the tidyverse and using data from the openintro library.

library(tidyverse)
library(openintro)

Data

The data represents shots attempted by Kobe Bryant where baskets are called “hits”.

glimpse(kobe_basket)

## Rows: 133
## Columns: 6
## $ vs          <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL…
## $ game        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ quarter     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3…
## $ time        <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6:35…
## $ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant misse…
## $ shot        <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", "H"…

Exercises

Exercise 1

What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

A streak length of 1 means two shots, the first a hit, the second a streak-ending miss. A streak length of 0 means one shot, a streak-ending miss.

Exercise 2

Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets? Make sure to include the accompanying plot in your answer.

The distribution skews to the right and more than halves with each step you take to the right. The most frequently observed streak length for Kobe was zero and his longest streak of baskets was four hits.

kobe_streak <- calc_streak(kobe_basket$shot)

ggplot(data = kobe_streak, aes(x = length)) +
  geom_bar() + 
  ggtitle("Kobe's Streak Distribution")

Exercise 3

In your simulation of flipping the unfair coin 100 times, how many flips came up heads? Include the code for sampling the unfair coin in your response.

Using a seed value of “43110”, 18 flips came up heads.

coin_outcomes <- c("heads", "tails")
set.seed(43110)
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE, prob = c(0.2, 0.8))

table(sim_unfair_coin)

## sim_unfair_coin
## heads tails 
##    18    82

Exercise 4

What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

Add “prob = c(0.45, 0.55)” and change the size from 1 to 133. We also add a seed value for duplicability.

# Original code
shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 1, replace = TRUE)

# Adjusted code
shot_outcomes <- c("H", "M")
set.seed(66161)
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))

Exercise 5

Using calc_streak, compute the streak lengths of sim_basket, and save the results in a data frame called sim_streak.

Task completed in code chunk below:

sim_streak <- calc_streak(sim_basket)

Exercise 6

Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots? Make sure to include a plot in your answer.

The sim’s distribution skews to the right and more than halves with each step you take to the right, like Kobe’s distribution. The most frequently observed streak length for the sim was also zero and the sim’s longest streak of baskets was six hits compared to Kobe’s four.

ggplot(data = sim_streak, aes(x = length)) +
  geom_bar() + 
  ggtitle("Simulation's Streak Distribution")

Exercise 7

If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

I would expect the second simulation’s distribution to have a similar shape but maybe the longest streak would be five or six baskets.

Exercise 8

How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

The distributions are very similar except the sim’s distribution has a longer tail. This suggests that the hot hand model does not fit Kobe’s shooting patterns. If Kobe had a hot hand at one point during the game he would have had a streak longer than what would be expected for someone with his average hit rate. To the contrary, his longest streak was shorter than what we would expect by chance.

What if Kobe actually has a 40% chance of sinking a basket not 45% and the mistake made here was we included hits in Kobe’s hot hand streaks as part of the simulation’s average?

My guess is the human proprioceptive system tires out, meaning a series of shots back to back are not independent. Sort of like my beginners luck in pool, or in bowling, running out as my proprioceptive system gets fatigued. Or maybe when Kobe was on a streak it caused opponents to interfere, also increasing the likelihood of a missed shot the deeper into a streak. Or maybe heightened emotions, or the dramatic timing, made people remember Kobe’s streaks more and not register the misses as much and then the audience’s experience was that Kobe had longer streaks than he mathematically was likely to.