Lab 2 - Probability

By Brian Weinfeld

February 4, 2018

Exercises

#1: What does a streak length of 1 mean, i.e. how many hits and misses are in a streak of 1? What about a streak length of 0?

A streak of length 1 means that Kobe made one basket before missing a shot. A streak of length 0 means that kobe made 0 baskets before missing a shot. Generally, a k-length streak means that Kobe made k baskets before missing a shot, beginning the count of the current streak following the previous missed shot.

#2: Describe the distribution of Kobe’s streak lengths from the 2009 NBA finals. What was his typical streak length? How long was his longest streak of baskets?

kobe_streak <- calc_streak(kobe$basket)
table(kobe_streak)

## kobe_streak
##  0  1  2  3  4 
## 39 24  6  6  1

summary(kobe_streak)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7632  1.0000  4.0000

Kobe’s streaks are skewed right. His most common streak is 0 then 1 (although those may not necessarily be considered a streak by conventional definition). His longest streak of baskets was 4 in a row. This event occurred one time.

The summary shows that only \(\frac{1}{4}\) of his streaks are above 1.

#3: In your simulation of flipping the unfair coin 100 times, how many flips came up heads?

outcomes <- c("heads", "tails")
sim_unfair_coin <- sample(outcomes, size = 100, replace = TRUE, prob = c(0.2, 0.8))
length(which(sim_unfair_coin == 'heads'))

## [1] 19

#4: What change needs to be made to the sample function so that it reflects a shooting percentage of 45%? Make this adjustment, then run a simulation to sample 133 shots. Assign the output of this simulation to a new object called sim_basket.

outcomes <- c("H", "M")
sim_basket <- sample(outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))

On Your Own

#1: Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots?

independent_streak <- calc_streak(sim_basket)
table(independent_streak)

## independent_streak
##  0  1  2  3  4  5 
## 51 22  8  1  1  1

summary(independent_streak)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.5952  1.0000  5.0000

max(independent_streak)

## [1] 5

The distribution of streaks is incredibly similar to Kobe’s. The most common streak is 0 and 1, just as with Kobe.

The summary shows that only about \(\frac{1}{4}\) of his streaks are above 1, again the same as Kobe.

#2: If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

#Code adapted from Stackoverflow
repeated_simulations <- replicate(100, calc_streak(sample(outcomes,size=133, replace = TRUE, prob=c(0.45, 0.55))))
data_simulations <- data.frame(mean=sapply(repeated_simulations, mean), sd=sapply(repeated_simulations, sd), max=sapply(repeated_simulations, max))
ggplot(melt(data_simulations), aes(x=value)) + geom_histogram(bins=8) + facet_wrap(~variable, scales='free_x') + labs(title='Mean, Standard Deviation and Max Streaks of 100 Simulations') + xlab('Value') + ylab('Count')

I would except the simulation to be similar but unlikely to be identical. By the law of large numbers as the number of trials increases the probabilities all trend to their theoretical values. As the simulation size of 133 is large I would expect the distributions to all be fairly close to their theoretical values. Random small variations are also to be expected.

I simulated the experiment 100 times and then created a data frame containing the mean, standard deviation and max of each of those simulations. The histogrames for each variable were then plotted. Each one shows some variance in the results but all are roughly normal centered about their theoretical mean.

#3: How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

#Adapted form Stackoverflow
tabled_info <- sapply(repeated_simulations, table)
tabled_info <- data.frame(lapply(tabled_info, `length<-`, max(lengths(tabled_info))))
tabled_info[is.na(tabled_info)] <- 0
tabled_info <- sapply(tabled_info, as.numeric) %>%
  apply(1, mean)

to_plot <- data.frame(0:7, tabled_info, table(kobe_streak)[seq(tabled_info)], table(independent_streak)[seq(tabled_info)]) %>%
  subset(select=c(1,2,4,6))
to_plot[is.na(to_plot)] <- 0
names(to_plot) <- c('Streak', 'Repated', 'Kobe', 'Simulated')
ggplot(melt(to_plot, id.var = 'Streak'), aes(x=Streak, y=value, fill=variable)) + geom_bar(stat='identity', position='dodge') + labs(title='Streak Comparison between Kobe Bryant and Simulations') + scale_x_continuous(labels=as.character(to_plot$Streak), breaks=to_plot$Streak)

In order to compare Kobe’s streaks with the simulated streaks I wanted to plot his streaks against the single simulation and the 100 repeated simulations conducted above. The first several lines of code calculates the mean amount of k-streaks across the 100 simulations and the second block of code plots all 3 sets against each other

The results are incredibly convincing. As all three have incredibly simular distributions, it can be stated with confidence that this data does not demonstrate that Kobe has the “hot hand.”