glimpse(kobe_basket)
## Rows: 133
## Columns: 6
## $ vs <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL…
## $ game <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ quarter <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3…
## $ time <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6:35…
## $ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant misse…
## $ shot <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", "H"…
head(kobe_basket)
## # A tibble: 6 × 6
## vs game quarter time description shot
## <fct> <int> <fct> <fct> <fct> <chr>
## 1 ORL 1 1 9:47 Kobe Bryant makes 4-foot two point shot H
## 2 ORL 1 1 9:07 Kobe Bryant misses jumper M
## 3 ORL 1 1 8:11 Kobe Bryant misses 7-foot jumper M
## 4 ORL 1 1 7:41 Kobe Bryant makes 16-foot jumper (Derek Fishe… H
## 5 ORL 1 1 7:03 Kobe Bryant makes driving layup H
## 6 ORL 1 1 6:01 Kobe Bryant misses jumper M
kobe_basket$shot[1:9]
## [1] "H" "M" "M" "H" "H" "M" "M" "M" "M"
I have no idea about basketball (except I can shoot hoops), so I had to look this up. A streak length of 1 means one hit followed by one miss. A streak length of 0 means one miss which must occur after a miss that ended the preceeding streak.
kobe_streak <- calc_streak(kobe_basket$shot)
print(kobe_streak)
## length
## 1 1
## 2 0
## 3 2
## 4 0
## 5 0
## 6 0
## 7 3
## 8 2
## 9 0
## 10 3
## 11 0
## 12 1
## 13 3
## 14 0
## 15 0
## 16 0
## 17 0
## 18 0
## 19 1
## 20 1
## 21 0
## 22 4
## 23 1
## 24 0
## 25 1
## 26 0
## 27 1
## 28 0
## 29 1
## 30 2
## 31 0
## 32 1
## 33 2
## 34 1
## 35 0
## 36 0
## 37 1
## 38 0
## 39 0
## 40 0
## 41 1
## 42 1
## 43 0
## 44 1
## 45 0
## 46 2
## 47 0
## 48 0
## 49 0
## 50 3
## 51 0
## 52 1
## 53 0
## 54 1
## 55 2
## 56 1
## 57 0
## 58 1
## 59 0
## 60 0
## 61 1
## 62 3
## 63 3
## 64 1
## 65 1
## 66 0
## 67 0
## 68 0
## 69 0
## 70 0
## 71 1
## 72 1
## 73 0
## 74 0
## 75 0
## 76 1
ggplot(data = kobe_streak, aes(x = length)) +
geom_bar()
The longest streak was 4, the shortest streak was 0. There were the most occurence of 0 streaks (mode or typical streak length). The median length of streaks was 2. The graph above shown is right skewed.
set.seed(100)
coin_outcomes <- c("heads", "tails")
class(coin_outcomes)
## [1] "character"
sample(coin_outcomes, size = 1, replace = TRUE)
## [1] "tails"
sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)
table(sim_fair_coin)
## sim_fair_coin
## heads tails
## 50 50
data_x <- as.data.frame(coin_outcomes)
data_x
## coin_outcomes
## 1 heads
## 2 tails
set.seed(133)
#Image result for sample size vs set seed seed() function in R and why to use it ? : set. seed() function in R is used to reproduce results i.e. it produces the same sample again and again. When we generate randoms numbers without set. seed() function it will produce different samples at different time of execution.
outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 1, replace = TRUE)
sim_basket <- sample(outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
sim_basket
## [1] "H" "H" "M" "M" "H" "H" "M" "H" "H" "M" "H" "H" "M" "M" "M" "M" "H" "M"
## [19] "M" "H" "H" "H" "H" "M" "H" "H" "M" "M" "H" "H" "M" "M" "M" "H" "H" "M"
## [37] "H" "M" "H" "M" "H" "H" "H" "M" "M" "M" "H" "M" "H" "H" "H" "H" "H" "M"
## [55] "H" "H" "M" "M" "H" "H" "H" "H" "H" "H" "M" "H" "H" "M" "M" "M" "M" "M"
## [73] "H" "H" "H" "H" "H" "M" "H" "M" "H" "H" "H" "H" "M" "M" "H" "M" "H" "H"
## [91] "M" "M" "H" "M" "H" "M" "H" "H" "H" "M" "H" "M" "H" "M" "H" "M" "M" "M"
## [109] "H" "M" "H" "H" "M" "H" "H" "H" "M" "M" "M" "H" "M" "H" "M" "H" "H" "M"
## [127] "H" "M" "M" "H" "M" "M" "M"
sim_streak <- calc_streak(sim_basket)
head(sim_streak)
## length
## 1 2
## 2 0
## 3 2
## 4 2
## 5 2
## 6 0
ggplot(data = sim_streak, aes(x = length)) +
geom_bar()
The longest streak is at 6, while the shortest streak is at 0. The graph is again right skewed, with a median of 3. The 0 length is the mode.
sim_streak <- calc_streak(sim_basket)
head(sim_streak)
## length
## 1 2
## 2 0
## 3 2
## 4 2
## 5 2
## 6 0
I think it would be somewhat similar. I reran it above, and the results look the same. The reason is because we set the shooting percentage at 45%.
According to Investopedia, "the "hot hand" is the notion where people believe that after a string of successes, an individual or entity is more likely to have continued success." The distribution for both Kobe Bryant and the simulation are similar except the simulation had a few longer streaks (5 and 6).