To initialize this lab, we first set the working directory, and source the cdc data set.

setwd("C:/Users/Robert/Documents/R/win-library/3.2/IS606/labs/Lab2")
load("more/kobe.RData")
kobe_streak <- calc_streak(kobe$basket)

Exercise 1 - What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

A streak of one represents 2 or more shots made in a row. A streack of 0 means a single made shot followed by a miss.

Exercise 2 - Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets?

mean(kobe_streak)
## [1] 0.7631579
barplot(table(kobe_streak))

The mode is a zero streak. The mean is 0.76. The longest streak was 5 baskets in a row (or four subsequent baskets following an initial make).

Exercise 3 - In your simulation of flipping the unfair coin 100 times, how many flips came up heads?

outcomes <- c("heads", "tails")
sim_unfair_coin <- sample(outcomes, size = 100, replace = TRUE, prob = c(0.2, 0.8))
table(sim_unfair_coin)
## sim_unfair_coin
## heads tails 
##    17    83

While running the above function on my localmachine, I received a number close to 20 heads.

Exercise 4 - What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 133, replace = TRUE,prob = c(0.45, 0.55))

On your own

Comparing Kobe Bryant to the Independent Shooter Using calc_streak, compute the streak lengths of sim_basket.

Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots?

independent1<-calc_streak(sim_basket)
summary(kobe_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7632  1.0000  4.0000
ld.par <- par(mfrow=c(1, 2))
barplot(table(kobe_streak), main="Kobe Finals Data", xlab="streak")
barplot(table(independent1), main="Simulation 1",xlab="streak")

In the above simulation we see a similar trend. The streak of the simulation changes depending on this random iteration. I have seen it reach 10.

If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

independent2<-calc_streak(sample(outcomes, size = 133, replace = TRUE,prob = c(0.45, 0.55)))
summary(kobe_streak)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7632  1.0000  4.0000
ld.par <- par(mfrow=c(1, 2))
barplot(table(kobe_streak), main="Kobe Finals Data", xlab="streak")
barplot(table(independent2), main="Simulation 2",xlab="streak")

Above I calculate multiple simulations of the 45% shooting percentage control player. Each of these (for me, running the first time) shows a slight variation between possibilities, but I would presume that Kobe’s streak data looks well within the normal range of probability. The similation should be run and compiled far more times to have any real statistical certainty.

How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

I believe the answer is with far more simulations. We replicate the real Kobe stats to reach an equivalent proportion with the simulation stats.

kobe_replicated<-rep(kobe$basket,20)
independent3<-calc_streak(sample(outcomes, size = length(kobe_replicated), replace = TRUE,prob = c(0.45, 0.55)))
kobe_streak_replicated <- calc_streak(kobe_replicated)
plot(table(independent3), type="l", col="red", ylab="Frequency", xlab="Streak" , main ="Real vs Simulated")
lines(table(kobe_streak_replicated), type="l", col="blue")

The blue line shows real data and the red line shows the data from a large simulation. We see no substantial difference between the two curves and would have to deny the existance of a hot-hand from this evidence, or lack thereof.