Probability First load the data
load(url("http://s3.amazonaws.com/assets.datacamp.com/course/dasi/kobe.RData"))
head(kobe)
## vs game quarter time
## 1 ORL 1 1 9:47
## 2 ORL 1 1 9:07
## 3 ORL 1 1 8:11
## 4 ORL 1 1 7:41
## 5 ORL 1 1 7:03
## 6 ORL 1 1 6:01
## description basket
## 1 Kobe Bryant makes 4-foot two point shot H
## 2 Kobe Bryant misses jumper M
## 3 Kobe Bryant misses 7-foot jumper M
## 4 Kobe Bryant makes 16-foot jumper (Derek Fisher assists) H
## 5 Kobe Bryant makes driving layup H
## 6 Kobe Bryant misses jumper M
Now look at the variables in the data set and look at the basket variable.
names(kobe)
## [1] "vs" "game" "quarter" "time" "description"
## [6] "basket"
head(kobe$basket)
## [1] "H" "M" "M" "H" "H" "M"
Now we use a bit of custom code that the Data Camp authors have made to calculate the number of streaks in a player's record called, appropriately, calc_streak.
kobe_streak <- calc_streak(kobe$basket)
head(kobe_streak)
## [1] 1 0 2 0 0 0
barplot(table(kobe_streak))
We see that the streaks of Kobe are left skewed. In fact the mode of the variable is 0.
Now we learn about simulations. First we simulate a random variable.
outcomes <- c("heads", "tails")
# Now we use the sample function to sample from the outcomes of size = ?.
sample(outcomes, size = 1, replace = TRUE)
## [1] "tails"
Istead of manually running the program over and over again to see if the outcome roughly equals 50-50 we will just increase the sample size.
sample(outcomes, size = 10, replace = TRUE)
## [1] "tails" "tails" "tails" "tails" "heads" "tails" "tails" "tails"
## [9] "tails" "heads"
Now store the result of 100 samples:
sim_fair_coin <- sample(outcomes, size = 100, replace = TRUE)
table(sim_fair_coin)
## sim_fair_coin
## heads tails
## 49 51
Now we want to adjust the probabilities of the outcomes in the random sample.
sim_bad_coin <- sample(outcomes, size = 100, prob = c(0.2, 0.8), replace = TRUE)
head(sim_bad_coin)
## [1] "tails" "tails" "tails" "tails" "tails" "tails"
table(sim_bad_coin)
## sim_bad_coin
## heads tails
## 24 76
Now we simulate a basketball player that shoots 45%, Kobe's shooting percentage, with 133 attempts, the number in the Kobe data set.
outcomes = c("H", "M")
sim_basket = sample(outcomes, prob = c(0.45, 0.55), size = 133, replace = TRUE)
table(sim_basket)
## sim_basket
## H M
## 67 66
Now we compare Kobe with the simulation using the function designed by the Data Camp tutorial designers called “calc_streak”:
kobe_streak <- calc_streak(kobe$basket)
sim_streak <- calc_streak(sim_basket)
kobe_streak
## [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0
## [36] 0 1 0 0 0 1 1 0 1 0 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0
## [71] 1 1 0 0 0 1
sim_streak
## [1] 3 1 0 1 4 0 1 1 1 6 5 1 1 0 0 5 1 0 0 0 0 2 3 0 2 2 0 0 0 0 0 3 0 1 2
## [36] 0 0 0 1 0 0 1 4 0 1 0 0 3 1 0 1 0 0 2 0 0 0 0 0 1 0 1 0 3 0 0 2
summary(kobe_streak)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 0.763 1.000 4.000
summary(sim_streak)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 1 1 6
table(kobe_streak)
## kobe_streak
## 0 1 2 3 4
## 39 24 6 6 1
table(sim_streak)
## sim_streak
## 0 1 2 3 4 5 6
## 35 16 6 5 2 2 1
plot(kobe_streak) #these are not very good
plot(sim_streak)
barplot(kobe_streak)
barplot(sim_streak)
Inspecting the barplots it seems clear that there is not a great difference between the two distributions and that, therefore, we can conclude that Kobe does not have a “hot hand”. You can also embed plots, for example:
plot(cars)