library(tidyverse)
library(openintro)
library(mosaic)
library(ggformula)
data("kobe_basket")
glimpse(kobe_basket)
## Rows: 133
## Columns: 6
## $ vs <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL…
## $ game <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ quarter <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3…
## $ time <fct> 9:47, 9:07, 8:11, 7:41, 7:03, 6:01, 4:07, 0:52, 0:00, 6:35…
## $ description <fct> Kobe Bryant makes 4-foot two point shot, Kobe Bryant misse…
## $ shot <chr> "H", "M", "M", "H", "H", "M", "M", "M", "M", "H", "H", "H"…
Exercise 1
#The code in this section creates the variable kobe_streak which is a
list of 76 observations and the corresponding length of the streak for
each observation. The command gf_bar() is used to create a bar graph to
visually analyze the data from kobe_streak.
##ANSWER## A streak length of 1 means that he only made one basket
before missing in the following attempt. A streak length of 0 means that
Kobe did not make a basket.
kobe_streak <- calc_streak(kobe_basket$shot)
gf_bar(~length,data=kobe_streak)

Exercise 2
#In order to look into the distribution of kobe_streak, the command
favstats() was used to find useful information regarding the data. The
code following this was used to experiment with the generation of random
variabls. This code simulates a coin flip and generates heads/tails,
assigns it to the variabl sim_fair_coin, and does this 100 times.
Finally this data is displayed using the table() function. With this it
is clear that with a fair coin the probability of reciving a head or
tail is nearly 50/50 with the actual probabilities being p(heads)=0.49
and p(tails)=0.51.
##ANSWER## This distribution of Kobe’s streak length from the 2009
NBA finals can be determined by analyzing a histogram, boxplot, and a
summary of the statistics of the data set. Based on the above histogram
this data is unimodal, with the peak being at 0. Using both the
histogram and the box plot it can be determined that this data has a
right skew. Using the box plot created, it was found that there are a
few data points that are outlies, which were streaks of 3 and 4. Using
the favstats() command, it can be determined from the median, which the
number that tells the typical streak length. The typical streak length
is 0. The longest streak of baskets is represented by the maximum value
in the data set, which is 4. Additional data that can be found with
favstats() is the mean, which is 0.7631579, the standard deviation of
0.9915432, Q1 = 0, and Q3 = 1.
gf_boxplot(~length,data=kobe_streak)

gf_bar(~length,data=kobe_streak)

favstats(~length, data = kobe_streak)
## min Q1 median Q3 max mean sd n missing
## 0 0 0 1 4 0.7631579 0.9915432 76 0
coin_outcomes <- c("heads", "tails")
set.seed(94720)
sample(coin_outcomes, size = 1, replace = TRUE)
## [1] "tails"
sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)
sim_fair_coin
## [1] "heads" "tails" "tails" "tails" "tails" "tails" "heads" "tails" "heads"
## [10] "heads" "heads" "heads" "heads" "tails" "heads" "heads" "tails" "heads"
## [19] "heads" "tails" "heads" "tails" "tails" "tails" "tails" "heads" "tails"
## [28] "tails" "heads" "heads" "heads" "heads" "tails" "heads" "tails" "heads"
## [37] "tails" "tails" "heads" "heads" "heads" "heads" "tails" "heads" "heads"
## [46] "heads" "tails" "heads" "heads" "heads" "heads" "tails" "tails" "heads"
## [55] "tails" "heads" "tails" "heads" "tails" "heads" "tails" "heads" "heads"
## [64] "tails" "tails" "tails" "tails" "heads" "heads" "tails" "tails" "heads"
## [73] "heads" "heads" "tails" "tails" "tails" "tails" "tails" "tails" "tails"
## [82] "tails" "tails" "tails" "tails" "heads" "heads" "tails" "heads" "tails"
## [91] "heads" "tails" "heads" "tails" "heads" "tails" "tails" "heads" "tails"
## [100] "heads"
## sim_fair_coin
## heads tails
## 49 51
Exercise 3
#For this exercise, very similar code was used as to Exercise 2. A
new variable “sim_unfair_coin” was created and assigned outcomes of
eaither heads or tails to simulate a coin flip. Since this is an unfair
coin the assignment said to sent the probabilities to as follows:
p(heads)=0.2 and p(tails)=0.8, this is done with this line of code: prob
= c(0.2, 0.8). The size portion of this command was set to 100 in order
to generate 100 flips of the coin and the table() function was used to
display how many heads and tails were recorded.
##ANSWER## The coin landed heads 21 times.
set.seed(27940)
sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE,
prob = c(0.2, 0.8))
table(sim_unfair_coin)
## sim_unfair_coin
## heads tails
## 21 79
## Help on topic 'sample' was found in the following packages:
##
## Package Library
## mosaic /cloud/lib/x86_64-pc-linux-gnu-library/4.3
## base /opt/R/4.3.3/lib/R/library
##
##
## Using the first match ...
Exercise 4
#The same general code was used to answer this question except now it
is used to record shot outcomes as either a H (hit) or M (miss). The
variable sim_basket was created to record this data.
##ANSWER##In order to have the sample function reflect a shooting
percentage of 45% the section of the command needed to be altered to
read “prob = c(0.45, 0.55).” This section of the code represents there
is a probability of 0.45 that the shot will be hade and a 0.55
probability that it will not. In order to create a sample of 133 shots,
the “size” was set to equal 133 rather than one.
set.seed(38564)
shot_outcomes <- c("H", "M")
sim_basket <- sample(shot_outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
table(sim_basket)
## sim_basket
## H M
## 61 72
Exercise 5
#The variable “sim_streak” was created and used to store the computed
strak lengths for the simulated shooter that is being used to compare
with Kobe’s record. This will then be used in subsequent exercises in
order to analyze this data.
sim_streak <- calc_streak(sim_basket)
Exercise 6
#This exercise called for an analysis of the streak length of the
simulated independent shooter. In order to do this, a bar plot, a box
plot and the favstats() command.
##ANSWER## Some information based on the bar graph and box plot are
that this data is unimodal, with the peak being at 0, and there is a
right skew to this data. There are some data points that are outliers,
which fall at streaks of 3 and 4. The median tells what the typical
streak length is for this simulated independent shooter with a 45%
shooting percent, and the favstats() command reports this as 0. The
players longest streak of baskets in 133 shots is represented by the
maximum value in the data set, which is 4. Additional data that can be
found with favstats() is the mean, which is 0.8356164, the standard
deviation of 1.09308, Q1 = 0, and Q3 = 1.
gf_bar(~length,data=sim_streak)

gf_boxplot(~length,data=sim_streak)

favstats(~length, data = sim_streak)
## min Q1 median Q3 max mean sd n missing
## 0 0 0 1 4 0.8356164 1.09308 73 0
Exercise 7
##ANSWER## I would expact the result to be somewhat similar. This is
because there would be the exact same probability of getting hits and
misses, but since this is a random generator, these hits and misses can
occur in any order. This could create different streak lengths for each
time this code is run.
##Below the exact code from above was run again the distribution is
completely different from the last. While some data is the same, such as
the median, the mean and standard deviation vary. Also the maximum for
this data set is 6, which differs from the last trial.
set.seed(98753)
shot_outcomes <- c("H", "M")
sim_basket2 <- sample(shot_outcomes, size = 133, replace = TRUE, prob = c(0.45, 0.55))
table(sim_basket2)
## sim_basket2
## H M
## 55 78
sim_streak2 <- calc_streak(sim_basket2)
gf_bar(~length,data=sim_streak2)

gf_boxplot(~length,data=sim_streak2)

favstats(~length, data = sim_streak2)
## min Q1 median Q3 max mean sd n missing
## 0 0 0 1 6 0.6962025 1.090218 79 0
Exercise 8
#The below code uses the bind_rows() command to combine the data from
“kobe_streak” and “sim_streak” in order to display then together on one
graph. The the gf_histogram() command is used to create the graph, where
data is set to both in order to disply both information from both
“kobe_streak” and “sim_streak,” and the bin width is set to 20 to make
the histogram very readable and a great way to compare the data from the
two sets.
##ANSWER## This histogram suggests that Kobe’s trends in streak
length are very similar to a independent simulated shooter. While there
is some variation, the histogram shows the same general trend of a peak
at 0 and a right skew. Coming from the title of the simulation, it is
independent, meaning that the outcome of one shot does not affect the
next, or any other shot thereafter. The lab defines a shooter as a hot
hand as someone who “will have shots that are not independent of one
another,” and “the hot hand model says he will have a higher probability
of making his second shot” Since Kobe’s trends show that his shots are
independent, he does not comply with this definition, and is therefore,
not a hot hand.
both <- bind_rows(
kobe_streak %>% mutate(source = "kobe_streak"),
sim_streak %>% mutate(source = "sim_streak"))
gf_histogram(~ length, data = both, fill = ~source, position = "dodge",
bins = 20, alpha = 0.7)

