library(tidyverse)
library(openintro)
glimpse(kobe_basket)## Rows: 133
## Columns: 6
## $ vs <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL…
## $ game <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ quarter <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3…
## $ time <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6:35…
## $ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant misse…
## $ shot <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", "H"…
set.seed(271828)A streak length of 1 is a single basket followed by a miss (HM). A streak length of 0 is a consecutive miss (M).
kobe_streak <- calc_streak(kobe_basket$shot)
ggplot(data = kobe_streak, aes(x = length)) + geom_bar()Describe the distribution:
summary(kobe_streak)## length
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.7632
## 3rd Qu.:1.0000
## Max. :4.0000
Shape - skew right Center - median at 0, mean at 0.763 Spread - range is 4
His longest streak was a length of 4
coin_outcomes <- c("heads", "tails")
sample(coin_outcomes, size = 1, replace = TRUE)## [1] "tails"
sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)
sim_fair_coin## [1] "heads" "tails" "tails" "heads" "tails" "tails" "tails" "heads" "heads"
## [10] "tails" "tails" "heads" "tails" "heads" "tails" "tails" "tails" "tails"
## [19] "heads" "tails" "heads" "tails" "tails" "heads" "tails" "tails" "tails"
## [28] "heads" "heads" "heads" "tails" "heads" "tails" "tails" "tails" "heads"
## [37] "heads" "heads" "heads" "heads" "heads" "tails" "heads" "heads" "tails"
## [46] "tails" "tails" "heads" "tails" "tails" "tails" "heads" "heads" "heads"
## [55] "tails" "heads" "heads" "tails" "tails" "tails" "heads" "tails" "tails"
## [64] "tails" "tails" "tails" "heads" "heads" "tails" "tails" "tails" "tails"
## [73] "heads" "tails" "tails" "tails" "heads" "heads" "heads" "heads" "tails"
## [82] "tails" "heads" "tails" "tails" "heads" "tails" "tails" "heads" "tails"
## [91] "heads" "heads" "heads" "tails" "tails" "heads" "heads" "tails" "heads"
## [100] "tails"
table(sim_fair_coin)## sim_fair_coin
## heads tails
## 44 56
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE,
prob = c(0.2, 0.8))
table(sim_unfair_coin)## sim_unfair_coin
## heads tails
## 24 76
24 flips in the simulation came up heads
Change the given function by adding a probability statement:
shot_outcomes <- c("H","M")
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE,
prob = c(0.45,0.55))
table(sim_basket)## sim_basket
## H M
## 56 77
sim_streak <- calc_streak(sim_basket)
ggplot(data = sim_streak, aes(x = length)) + geom_bar()
### Exercise 6
summary(sim_streak)## length
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.7179
## 3rd Qu.:1.0000
## Max. :7.0000
Describe the distribution:
Shape - skew right Center - median at 0, mean at 0.577 Spread - range of 8
The player’s longest streak is 8
If I were to run the simulation again, I would expect it to be somewhat similar to this one, with the same shape and median, but a slightly different mean. I do not think the spread would remain the same, as the value 8 is an outlier. The sample size 133 is large enough to produce similar results.
Kobe and the simulation’s distribution have similar shapes and centers. Although Kobe’s distribution features relatively higher frequencies at 1 and 3 for streak length, I do not believe that this is compelling evidence that the hot-hand hypothesis has any significance. I would have to do analysis such as the chi-square test to determine the significance in any quantitative sense. Another issue is that our simulation could simply be assuming the wrong probability of a hit given Kobe’s distribution. A binomial probability fit test would help to determine if this is an issue.