library(tidyverse)
library(openintro)
library(mosaic)
library(ggformula)
data("kobe_basket")
glimpse(kobe_basket)
## Rows: 133
## Columns: 6
## $ vs          <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL…
## $ game        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ quarter     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3…
## $ time        <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6:35…
## $ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant misse…
## $ shot        <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", "H"…
?kobe_basket

Exercise 1

#The code in this section creates the variable kobe_streak which is a list of 76 observations and the corresponding length of the streak for each observation. The command gf_bar() is used to create a bar graph to visually analyze the data from kobe_streak.

##ANSWER## A streak length of 1 means that he only made one basket before missing in the following attempt. A streak length of 0 means that Kobe did not make a basket.

kobe_streak <- calc_streak(kobe_basket$shot)
gf_bar(~length,data=kobe_streak)

Exercise 2

#In order to look into the distribution of kobe_streak, the command favstats() was used to find useful information regarding the data. The code following this was used to experiment with the generation of random variabls. This code simulates a coin flip and generates heads/tails, assigns it to the variabl sim_fair_coin, and does this 100 times. Finally this data is displayed using the table() function. With this it is clear that with a fair coin the probability of reciving a head or tail is nearly 50/50 with the actual probabilities being p(heads)=0.49 and p(tails)=0.51.

##ANSWER## This distribution of Kobe’s streak length from the 2009 NBA finals can be determined by analyzing a histogram, boxplot, and a summary of the statistics of the data set. Based on the above histogram this data is unimodal, with the peak being at 0. Using both the histogram and the box plot it can be determined that this data has a right skew. Using the box plot created, it was found that there are a few data points that are outlies, which were streaks of 3 and 4. Using the favstats() command, it can be determined from the median, which the number that tells the typical streak length. The typical streak length is 0. The longest streak of baskets is represented by the maximum value in the data set, which is 4. Additional data that can be found with favstats() is the mean, which is 0.7631579, the standard deviation of 0.9915432, Q1 = 0, and Q3 = 1.

gf_boxplot(~length,data=kobe_streak)

gf_bar(~length,data=kobe_streak)

favstats(~length, data = kobe_streak)
##  min Q1 median Q3 max      mean        sd  n missing
##    0  0      0  1   4 0.7631579 0.9915432 76       0
coin_outcomes <- c("heads", "tails")
set.seed(94720)
sample(coin_outcomes, size = 1, replace = TRUE)
## [1] "tails"
sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)
sim_fair_coin
##   [1] "heads" "tails" "tails" "tails" "tails" "tails" "heads" "tails" "heads"
##  [10] "heads" "heads" "heads" "heads" "tails" "heads" "heads" "tails" "heads"
##  [19] "heads" "tails" "heads" "tails" "tails" "tails" "tails" "heads" "tails"
##  [28] "tails" "heads" "heads" "heads" "heads" "tails" "heads" "tails" "heads"
##  [37] "tails" "tails" "heads" "heads" "heads" "heads" "tails" "heads" "heads"
##  [46] "heads" "tails" "heads" "heads" "heads" "heads" "tails" "tails" "heads"
##  [55] "tails" "heads" "tails" "heads" "tails" "heads" "tails" "heads" "heads"
##  [64] "tails" "tails" "tails" "tails" "heads" "heads" "tails" "tails" "heads"
##  [73] "heads" "heads" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
##  [82] "tails" "tails" "tails" "tails" "heads" "heads" "tails" "heads" "tails"
##  [91] "heads" "tails" "heads" "tails" "heads" "tails" "tails" "heads" "tails"
## [100] "heads"
table(sim_fair_coin)
## sim_fair_coin
## heads tails 
##    49    51

Exercise 3

#For this exercise, very similar code was used as to Exercise 2. A new variable “sim_unfair_coin” was created and assigned outcomes of eaither heads or tails to simulate a coin flip. Since this is an unfair coin the assignment said to sent the probabilities to as follows: p(heads)=0.2 and p(tails)=0.8, this is done with this line of code: prob = c(0.2, 0.8). The size portion of this command was set to 100 in order to generate 100 flips of the coin and the table() function was used to display how many heads and tails were recorded.

##ANSWER## The coin landed heads 21 times.

set.seed(27940)
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE, 
prob = c(0.2, 0.8))
table(sim_unfair_coin)
## sim_unfair_coin
## heads tails 
##    21    79
?sample
## Help on topic 'sample' was found in the following packages:
## 
##   Package               Library
##   mosaic                /cloud/lib/x86_64-pc-linux-gnu-library/4.3
##   base                  /opt/R/4.3.3/lib/R/library
## 
## 
## Using the first match ...

Exercise 4

#The same general code was used to answer this question except now it is used to record shot outcomes as either a H (hit) or M (miss). The variable sim_basket was created to record this data.

##ANSWER##In order to have the sample function reflect a shooting percentage of 45% the section of the command needed to be altered to read “prob = c(0.45, 0.55).” This section of the code represents there is a probability of 0.45 that the shot will be hade and a 0.55 probability that it will not. In order to create a sample of 133 shots, the “size” was set to equal 133 rather than one.

set.seed(38564)
shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
table(sim_basket)
## sim_basket
##  H  M 
## 61 72

Exercise 5

#The variable “sim_streak” was created and used to store the computed strak lengths for the simulated shooter that is being used to compare with Kobe’s record. This will then be used in subsequent exercises in order to analyze this data.

sim_streak <- calc_streak(sim_basket)

Exercise 6

#This exercise called for an analysis of the streak length of the simulated independent shooter. In order to do this, a bar plot, a box plot and the favstats() command.

##ANSWER## Some information based on the bar graph and box plot are that this data is unimodal, with the peak being at 0, and there is a right skew to this data. There are some data points that are outliers, which fall at streaks of 3 and 4. The median tells what the typical streak length is for this simulated independent shooter with a 45% shooting percent, and the favstats() command reports this as 0. The players longest streak of baskets in 133 shots is represented by the maximum value in the data set, which is 4. Additional data that can be found with favstats() is the mean, which is 0.8356164, the standard deviation of 1.09308, Q1 = 0, and Q3 = 1.

gf_bar(~length,data=sim_streak)

gf_boxplot(~length,data=sim_streak)

favstats(~length, data = sim_streak)
##  min Q1 median Q3 max      mean      sd  n missing
##    0  0      0  1   4 0.8356164 1.09308 73       0

Exercise 7

##ANSWER## I would expact the result to be somewhat similar. This is because there would be the exact same probability of getting hits and misses, but since this is a random generator, these hits and misses can occur in any order. This could create different streak lengths for each time this code is run.

##Below the exact code from above was run again the distribution is completely different from the last. While some data is the same, such as the median, the mean and standard deviation vary. Also the maximum for this data set is 6, which differs from the last trial.

set.seed(98753)
shot_outcomes <- c("H", "M")
sim_basket2 <- sample(shot_outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
table(sim_basket2)
## sim_basket2
##  H  M 
## 55 78
sim_streak2 <- calc_streak(sim_basket2)
gf_bar(~length,data=sim_streak2)

gf_boxplot(~length,data=sim_streak2)

favstats(~length, data = sim_streak2)
##  min Q1 median Q3 max      mean       sd  n missing
##    0  0      0  1   6 0.6962025 1.090218 79       0

Exercise 8

#The below code uses the bind_rows() command to combine the data from “kobe_streak” and “sim_streak” in order to display then together on one graph. The the gf_histogram() command is used to create the graph, where data is set to both in order to disply both information from both “kobe_streak” and “sim_streak,” and the bin width is set to 20 to make the histogram very readable and a great way to compare the data from the two sets.

##ANSWER## This histogram suggests that Kobe’s trends in streak length are very similar to a independent simulated shooter. While there is some variation, the histogram shows the same general trend of a peak at 0 and a right skew. Coming from the title of the simulation, it is independent, meaning that the outcome of one shot does not affect the next, or any other shot thereafter. The lab defines a shooter as a hot hand as someone who “will have shots that are not independent of one another,” and “the hot hand model says he will have a higher probability of making his second shot” Since Kobe’s trends show that his shots are independent, he does not comply with this definition, and is therefore, not a hot hand.

both <- bind_rows(
  kobe_streak %>% mutate(source = "kobe_streak"),
     sim_streak %>% mutate(source = "sim_streak"))
gf_histogram(~ length, data = both, fill = ~source, position = "dodge", 
             bins = 20, alpha = 0.7)

