R Markdown

setwd('C:/Users/Exped/Desktop/Textbooks/606 Homeworks/Lab material/DATA606-master/inst/labs/Lab2')
load('more/kobe.RData')
head(kobe)
##    vs game quarter time
## 1 ORL    1       1 9:47
## 2 ORL    1       1 9:07
## 3 ORL    1       1 8:11
## 4 ORL    1       1 7:41
## 5 ORL    1       1 7:03
## 6 ORL    1       1 6:01
##                                               description basket
## 1                 Kobe Bryant makes 4-foot two point shot      H
## 2                               Kobe Bryant misses jumper      M
## 3                        Kobe Bryant misses 7-foot jumper      M
## 4 Kobe Bryant makes 16-foot jumper (Derek Fisher assists)      H
## 5                         Kobe Bryant makes driving layup      H
## 6                               Kobe Bryant misses jumper      M
streak = calc_streak(kobe$basket)
barplot(table(streak),col='red')

Exercise 1 - What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

Streak length of 1 means Kobe made exactly one basket, before missing the following basket. Streak length of 0 occurs when a miss follows another miss, the second miss is the 0 length streak.

Exercise 2 - Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets?

The typical streak length was around 1 basket. Its important to remember that while the majority of all streaks are length 0, the mean of the all the streaks is .76, so you can typically expect a shot to be made rather than to me missed. Kobe’s longest streak was 4, he made 4 baskets and missed one at the very end.

Exercise 3 - In your simulation of flipping the unfair coin 100 times, how many flips came up heads?

28 flips became heads.

Exercise 4 - What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

We change the probability attribute of the sample function to “c(.44,.55)”

outcomes = c('H','M')
set.seed(15);
sim_basket = sample(outcomes,size=133,replace=TRUE,prob=c(.45,.55))
print(length(sim_basket)==length(kobe$basket))
## [1] TRUE
streakSample = calc_streak(sim_basket)
barplot(table(streakSample),col='blue')

1. Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots?

summary(streakSample)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.500   1.094   2.000   7.000

The distribution has a strong right skew, median of .5 and mean of 1.094. The typical streak length would then be 1.094 baskets which is much higher than Kobe’s .76 typical streak length. The longest streak is of 7 baskets made, ending with one basket missed.

2. If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

While not exactly the same, I would expect the same strong right skew towards longer streak length. While the max (longest streak length) would have a significant variance of around 3, I do not expect to see high variance between 1-3 streak length frequency. When independent acts with the same complementary outcomes multiply, the chance of having an outlying result decreases.

3.How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain

Kobe Bryant’s distribution of streak lengths are within a expected variance of the simulated shooter.
The audience who observed Kobe’s hot hand may be remembering his equi numbered streaks of 2 and 3. However this null change in streak frequency is observed in our simulated shooter as well at streaks 3 and 4.
Kobe bryant’s distribution displays less of a “hot hand” then our sample distribution, I note this because of the higher 1 streak distributions, lower 2 streak to 1 streak ratio, and a lower max streak.

table(streak)  # Kobe table
## streak
##  0  1  2  3  4 
## 39 24  6  6  1
table(streakSample) #Simulated table
## streakSample
##  0  1  2  3  4  7 
## 32 14  7  5  5  1
simulatedSimulation = sample(outcomes,size=3000,replace=TRUE,prob=c(.45,.55)) #I remove case variance by widening the sample.
simulatedSimulation = calc_streak(simulatedSimulation)
barplot(table(simulatedSimulation),col='yellow')

#Just messing around, making a function to analyze other shooter sample data, in case we were presented with more data from other players who were noted to exhibit a hot hand.

moreAtizer = function(howMuch,streak,someObject,outcomes=c("H","M")){ 
  
  probability = mean(streak)
  misses = length(someObject[someObject=='M'])
  totalShots = length(someObject)
  probabilityOfMiss = misses/totalShots
  more = sample(outcomes, size=howMuch,replace=TRUE,prob=c(1-probabilityOfMiss,probabilityOfMiss))
  return(more)
}

simulatedKobe = moreAtizer(3000,streak,kobe$basket)
simulatedKobe = calc_streak(simulatedKobe)
table(simulatedKobe)
## simulatedKobe
##   0   1   2   3   4   5   6   7   8 
## 967 435 161  80  40  14   5   3   2