Lab 2 Assignment - Xialing Walla

download.file("http://www.openintro.org/stat/data/kobe.RData",destfile ="kobe.RData")
load("kobe.RData")
head(kobe)
##    vs game quarter time
## 1 ORL    1       1 9:47
## 2 ORL    1       1 9:07
## 3 ORL    1       1 8:11
## 4 ORL    1       1 7:41
## 5 ORL    1       1 7:03
## 6 ORL    1       1 6:01
##                                               description basket
## 1                 Kobe Bryant makes 4-foot two point shot      H
## 2                               Kobe Bryant misses jumper      M
## 3                        Kobe Bryant misses 7-foot jumper      M
## 4 Kobe Bryant makes 16-foot jumper (Derek Fisher assists)      H
## 5                         Kobe Bryant makes driving layup      H
## 6                               Kobe Bryant misses jumper      M

1.Describe the distribution of streak lengths. What is the typical streak length for this simulated independent shooter with a 45% shooting percentage? How long is the player's longest streak of baskets in 133 shots?

Given the definition of streak length is the number o consecutive baskets made until a miss occurs and looking at the barplot - it shows there are some long streaks but they are very few occurance of those long streaks. The most common streak based on the 133 simulatioin is 0. The typical streak length for the independent shooter is zero. The simulated streaks lengths are very similar to those of for Kobe.

kobe[,6]
  [1] "H" "M" "M" "H" "H" "M" "M" "M" "M" "H" "H" "H" "M" "H" "H" "M" "M"
 [18] "H" "H" "H" "M" "M" "H" "M" "H" "H" "H" "M" "M" "M" "M" "M" "M" "H"
 [35] "M" "H" "M" "M" "H" "H" "H" "H" "M" "H" "M" "M" "H" "M" "M" "H" "M"
 [52] "M" "H" "M" "H" "H" "M" "M" "H" "M" "H" "H" "M" "H" "M" "M" "M" "H"
 [69] "M" "M" "M" "M" "H" "M" "H" "M" "M" "H" "M" "M" "H" "H" "M" "M" "M"
 [86] "M" "H" "H" "H" "M" "M" "H" "M" "M" "H" "M" "H" "H" "M" "H" "M" "M"
[103] "H" "M" "M" "M" "H" "M" "H" "H" "H" "M" "H" "H" "H" "M" "H" "M" "H"
[120] "M" "M" "M" "M" "M" "M" "H" "M" "H" "M" "M" "M" "M" "H"
kobe_streak<-calc_streak(kobe$basket)
kobe_streak
 [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0
[36] 0 1 0 0 0 1 1 0 1 0 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0
[71] 1 1 0 0 0 1
table(kobe_streak)
kobe_streak
 0  1  2  3  4 
39 24  6  6  1 
outcomes<-c("H","M")
sim_basket<-sample(outcomes, size = 133, replace = TRUE, prob = c(0.45,0.55))
sim_basket
  [1] "M" "M" "M" "H" "H" "H" "M" "M" "M" "H" "M" "M" "H" "M" "M" "M" "H"
 [18] "H" "H" "M" "M" "M" "M" "H" "M" "M" "M" "H" "H" "M" "M" "H" "H" "H"
 [35] "H" "M" "M" "H" "M" "H" "H" "H" "H" "M" "H" "M" "M" "H" "H" "M" "M"
 [52] "M" "H" "M" "M" "H" "H" "H" "H" "H" "H" "M" "H" "H" "M" "H" "M" "H"
 [69] "M" "H" "M" "H" "M" "M" "H" "M" "H" "H" "M" "H" "H" "M" "M" "M" "M"
 [86] "H" "H" "M" "H" "M" "M" "M" "H" "H" "M" "M" "M" "M" "M" "M" "H" "M"
[103] "M" "H" "H" "H" "M" "M" "M" "H" "M" "M" "H" "H" "H" "H" "M" "M" "H"
[120] "M" "M" "M" "H" "M" "H" "M" "M" "M" "M" "M" "H" "H" "M"
sim_streak<-calc_streak(sim_basket)
sim_streak
 [1] 0 0 0 3 0 0 1 0 1 0 0 3 0 0 0 1 0 0 2 0 4 0 1 4 1 0 2 0 0 1 0 6 2 1 1
[36] 1 1 0 1 2 2 0 0 0 2 1 0 0 2 0 0 0 0 0 1 0 3 0 0 1 0 4 0 1 0 0 1 1 0 0
[71] 0 0 2 0
table(sim_streak)
sim_streak
 0  1  2  3  4  6 
42 17  8  3  3  1 
barplot(table(sim_streak))

plot of chunk unnamed-chunk-3

#2. If you were to run the simulation of the independent shooter a second time, how would you expectits streak distribution to compare to the distribution from the question above? Exactly the same? Somewhat similar? Totally different? Explain your reasoning.

The result will vary each time because the simulation of the 133 shots from the indenpendent shooter are ramdom samples and are being placed back for sampling pool even thought it was selected previously. To get the same result one has to use the set.seed function.

#3.How does Kobe Bryant's distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe's shooting patterns? Explain

Regardless of number of simulation runs I expect Kobe's distribution of streak lengths will be positively mirroring those of the independent shooter. Each shot is independent of the other so a previous Hit will not affect the outcome of the next shot.

#4 Questions- With each purchase of a large pizza at Tony's Pizza, the customer receives a coupon that can bescratched to see if a prize will be awarded. The odds of winning a free soft drink are 1 in 10 and theodds of winning a free large pizza are 1 in 50. You plan to eat lunch tomorrow at Tony's. Computethe following probabilities:

(a) You will win either a large pizza or a soft drink.

1/10 + 1/50
[1] 0.12

(b)You will not win a prize.

1-0.12
[1] 0.88

©You will not win a prize on three consecutive visits to Tony's.

.88^3
[1] 0.6815

(d)You will win at least one prize on one of your next three visits to Tony's.

1-.6815
[1] 0.3185

#5 Questions - There are four people being considered for the position of CEO of Dalton Enterprises. Three of the applicants are over 60 years of age. Two are female, of which only one is over 60.

(a)What is the probability that a candidate is over 60 and female?

There is only 1 female in the 4 candidates who is over 60 so the probability is 25%

(b)Given that the candidate is male, what is the probability he is less than 60?

The probability is zero because the the problem statement says that 4 candidates, 3 over 60, and ony 1 female is over 60. So the 2 male candidates must both be over 60.

©Given that a person is over 60, what is the probability the person is female?

0.25/.75 = 33% chance the person is a female, given that a person is over 60.

#6 Questions - Albert Pujols of the St. Louis Cardinals had the highest batting average in the 2003 Major League Baseball season. His average was .359. Assume the probability of getting a hit is .359 each time he batted. In a particular game, he batted three times.

(a) What is the probability of not getting any hits in a game?

(1-.359)^3
[1] 0.2634

(b)What is the probability of getting three hits in a game?

.359^3
[1] 0.04627

©What is the probability of getting at least one hit in a game?

.359

#7 question - I was recently told that I tested positive for a rare (only affects 1 in 10,000 people) and fatal disease (no known cure or treatment). When we were told the news, my wife was devastated - me, not so much. I asked the doctor how accurate the test is. He indicated it is 98% accurate - when you have the disease, the test is positive 98% of the time and when you do not have the disease, the test is negative 98% of the time. Calculate the probability I have the disease given I have tested positive for the disease (hint: the answer is NOT 98%).

I used the tree diagram to organize the outcome and probabilities for this question. the two marginal probabilities are the primary branch - only 1 out of 10,000 people will have a the disease with a positive test result.

1/10000 have the disease = 0.0001

have diseas and Test positive 98% X 0.0001 = 0.000098
have diseas but false test negative 2% X 0.0001 = 0.000002

9999/10000 will not have the disease = 0.9999

do not have desease but false test positive 2% X 0.9999 = 0.019998

do not have desease and test negative 98% x 0.9999 = 0.979902

P(have diesase | tests positive) .000098/.000098+.019998 = 0.00487659236

The answer is there is about 0.5% probability one will have the disease given the tests was positive.