Data Camp 2

Probability First load the data

load(url("http://s3.amazonaws.com/assets.datacamp.com/course/dasi/kobe.RData"))
head(kobe)
##    vs game quarter time
## 1 ORL    1       1 9:47
## 2 ORL    1       1 9:07
## 3 ORL    1       1 8:11
## 4 ORL    1       1 7:41
## 5 ORL    1       1 7:03
## 6 ORL    1       1 6:01
##                                               description basket
## 1                 Kobe Bryant makes 4-foot two point shot      H
## 2                               Kobe Bryant misses jumper      M
## 3                        Kobe Bryant misses 7-foot jumper      M
## 4 Kobe Bryant makes 16-foot jumper (Derek Fisher assists)      H
## 5                         Kobe Bryant makes driving layup      H
## 6                               Kobe Bryant misses jumper      M

Now look at the variables in the data set and look at the basket variable.

names(kobe)
## [1] "vs"          "game"        "quarter"     "time"        "description"
## [6] "basket"
head(kobe$basket)
## [1] "H" "M" "M" "H" "H" "M"

Now we use a bit of custom code that the Data Camp authors have made to calculate the number of streaks in a player's record called, appropriately, calc_streak.

kobe_streak <- calc_streak(kobe$basket)
head(kobe_streak)
## [1] 1 0 2 0 0 0
barplot(table(kobe_streak))

plot of chunk unnamed-chunk-3

We see that the streaks of Kobe are left skewed. In fact the mode of the variable is 0.

Now we learn about simulations. First we simulate a random variable.

outcomes <- c("heads", "tails")
# Now we use the sample function to sample from the outcomes of size = ?.
sample(outcomes, size = 1, replace = TRUE)
## [1] "heads"

Istead of manually running the program over and over again to see if the outcome roughly equals 50-50 we will just increase the sample size.

sample(outcomes, size = 10, replace = TRUE)
##  [1] "heads" "tails" "heads" "heads" "tails" "heads" "heads" "heads"
##  [9] "heads" "heads"

Now store the result of 100 samples:

sim_fair_coin <- sample(outcomes, size = 100, replace = TRUE)
table(sim_fair_coin)
## sim_fair_coin
## heads tails 
##    56    44

Now we want to adjust the probabilities of the outcomes in the random sample.

sim_bad_coin <- sample(outcomes, size = 100, prob = c(0.2, 0.8), replace = TRUE)
head(sim_bad_coin)
## [1] "heads" "tails" "heads" "tails" "tails" "tails"
table(sim_bad_coin)
## sim_bad_coin
## heads tails 
##    23    77

Now we simulate a basketball player that shoots 45%, Kobe's shooting percentage, with 133 attempts, the number in the Kobe data set.

outcomes = c("H", "M")
sim_basket = sample(outcomes, prob = c(0.45, 0.55), size = 133, replace = TRUE)
table(sim_basket)
## sim_basket
##  H  M 
## 59 74

Now we compare Kobe with the simulation using the function designed by the Data Camp tutorial designers called “calc_streak”:

kobe_streak <- calc_streak(kobe$basket)
sim_streak <- calc_streak(sim_basket)
kobe_streak
##  [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0
## [36] 0 1 0 0 0 1 1 0 1 0 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0
## [71] 1 1 0 0 0 1
sim_streak
##  [1] 1 4 1 1 2 3 0 4 0 0 2 0 1 0 0 1 1 1 0 1 1 2 0 2 0 0 0 3 2 0 0 0 0 3 1
## [36] 0 1 0 4 0 0 0 0 1 1 3 0 0 0 0 0 0 1 1 0 1 2 0 2 1 0 1 0 0 0 0 0 0 0 0
## [71] 1 2 0 0 0
summary(kobe_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   0.763   1.000   4.000
summary(sim_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   0.787   1.000   4.000
table(kobe_streak)
## kobe_streak
##  0  1  2  3  4 
## 39 24  6  6  1
table(sim_streak)
## sim_streak
##  0  1  2  3  4 
## 41 19  8  4  3
plot(kobe_streak)  #these are not very good

plot of chunk unnamed-chunk-9

plot(sim_streak)

plot of chunk unnamed-chunk-9

barplot(kobe_streak)

plot of chunk unnamed-chunk-9

barplot(sim_streak)

plot of chunk unnamed-chunk-9

Inspecting the barplots it seems clear that there is not a great difference between the two distributions and that, therefore, we can conclude that Kobe does not have a “hot hand”.