Open_lab3

#1.) Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player’s longest streak of baskets in 133 shots?
Pokemon <- read.csv("Pokemon.csv")
download.file("http://www.openintro.org/stat/data/kobe.RData", destfile = "kobe.RData")
load("kobe.RData")
kobe_streak<-calc_streak(kobe$basket)
kobe_streak

##  [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0 0 1 0
## [39] 0 0 1 1 0 1 0 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0 1 1 0 0 0 1

outcomes<-c("H","M")
sim_basket<-sample(outcomes, size = 133, replace = TRUE, prob = c(0.45,0.55))
sim_basket

##   [1] "H" "H" "H" "H" "M" "M" "M" "M" "M" "M" "M" "M" "H" "M" "H" "H" "H" "H"
##  [19] "H" "H" "M" "M" "H" "M" "M" "H" "H" "M" "H" "H" "M" "M" "H" "M" "H" "M"
##  [37] "H" "H" "H" "H" "H" "M" "M" "M" "H" "H" "M" "H" "H" "H" "M" "M" "M" "M"
##  [55] "M" "M" "H" "M" "M" "H" "H" "H" "M" "M" "M" "M" "H" "M" "M" "H" "H" "M"
##  [73] "H" "M" "H" "M" "H" "M" "M" "H" "M" "H" "M" "M" "H" "M" "M" "M" "M" "H"
##  [91] "H" "M" "H" "M" "M" "M" "H" "M" "M" "M" "M" "M" "H" "H" "H" "M" "M" "H"
## [109] "M" "M" "M" "H" "H" "H" "H" "M" "H" "M" "H" "H" "M" "H" "M" "H" "H" "M"
## [127] "H" "H" "M" "M" "M" "M" "H"

sim_streak<-calc_streak(sim_basket)
sim_streak

##  [1] 4 0 0 0 0 0 0 0 1 6 0 1 0 2 2 0 1 1 5 0 0 2 3 0 0 0 0 0 1 0 3 0 0 0 1 0 2 1
## [39] 1 1 0 1 1 0 1 0 0 0 2 1 0 0 1 0 0 0 0 3 0 1 0 0 4 1 2 1 2 2 0 0 0 1

barplot(table(sim_streak))

#2.) If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

# it is depends on the size of sample space. The larger sample sapce is the less likely the probablity will be the same with previous one. It also depends on how we set the probablity to the code. 


#3.) How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns? Explain.

kobe_streak<-calc_streak(kobe$basket)
kobe_streak

##  [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0 0 1 0
## [39] 0 0 1 1 0 1 0 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0 1 1 0 0 0 1

barplot(table(kobe_streak))

##Milestone

Pokemon <- read.csv("Pokemon.csv")
#1.) Select a numeric variable from your dataset. Identify a probability question you have about the population from which the data were sampled and the selected numeric variable. Estimate the probability empirically using the data.

Pokebowl <- Pokemon$Generation>5
Gen_5<-sum(Pokebowl==TRUE)
Probly<-Gen_5/800
Probly

## [1] 0.1025

#2.) Sample 1000 values from your data, with replacement, to produce an estimate for your probability question. Is this estimate the same as the one computed empirically in the previous question? Why or why not?

outcomes <- c("1", "2","3","4","5","6")
a<-sample(outcomes, size = 600, replace = TRUE)
Gen__replace<-sum(a==6)
probb<- Gen__replace/800


#It's not the same because we replace the data set everytime we pick out our data. Also the first question we already know how many pokemon is at generation 6, so we already knew the proportion of those compare to overall data set.

#3.) Since the number of population in pokemon is less than the actual people in USA, so the estimation of probablity from USA population can not be used to this pokemon data set.

Open_lab3

Jittiwat Sermsripong

2/3/2020