606_week3

library('DATA606')

## 
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='DATA606') will list the demos that are available.

## 
## Attaching package: 'DATA606'

## The following object is masked from 'package:utils':
## 
##     demo

library(knitr)
normalPlot()

3.2 What % of a standard normal distribution N(μ =0, std_Dev = 1) is found in each region? Be sure to draw a graph.

a <- normalPlot(mean =0, sd = 1,bounds=c(-1.13,Inf))

a_percent <- 1-.1292
b <- normalPlot(mean =0, sd = 1,bounds=c(-Inf,.18))

b_percent <- 0.5714
c <- normalPlot(mean =0, sd = 1,bounds=c(8,Inf))

c_percent <- " 6.66*10-16"
d <- normalPlot(mean =0, sd = 1,bounds=c(-Inf ,.5))

d_percent <- 0.691

paste("Answer for A",a_percent,"Answer for B",b_percent,"Answer for C",c_percent,"Answer for D",d_percent)

## [1] "Answer for A 0.8708 Answer for B 0.5714 Answer for C  6.66*10-16 Answer for D 0.691"

3.4

A. Write down the short-hand for these two normal distributions

mean_men <- 4313
devaiton_men <- 583
mean_women <- 5261
deivation_women <- 807
Leo_Z_score <- (4948-4313)/583
Mary_Z_score <- (5513-5261)/807
Leo_Z_score

## [1] 1.089194

Mary_Z_score

## [1] 0.3122677

Male distributions(\(\mu\)=4313, \(\sigma\)=583)
Female distributions(\(\mu\)=5261, \(\sigma\)=807)

B. What are the Z-scores for Leo’s and Mary’s finishing times? What do these Z-scores tell you?

Z_leo=1.09
Z_Mary= .31
- These Z scores tell us the relative percentage finish of the racers within their repsective groups.
- 86% of racers within his subgroup, finished ahead of Leo
- 62% of racers within her subgroup, finished ahead of Mary

C Did Leo or Mary rank better in their respective groups? Explain your reasoning.

Mary finished in the top 62% and Leo finshed in the top 86%, so comparatively Mary did better

D What percent of the triathletes did Leo finish faster than in his group?

E What percent of the triathletes did Mary finish faster than in her group?

F If the distributions of finishing times are not nearly normal, would your answers to parts (b) - (e) change? Explain your reasoning.

Yes. Our assumption under the z score is that the data is normally distributed. This allows us to estimate where our sample would fall, based on a normal distrubtuion. If the distribution is skewed, our H value is unable to approximate, as accurately, an estimated value

3.18

(a) The mean height is 61.52 inches with a standard deviation of 4.58 inches. Use this information to determine if the heights approximately follow the 68-95-99.7% Rule.

heights <- c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73)
heights_mean <- mean(heights)
sd_means <- sd(heights)
std_66<- c(heights_mean-sd_means,heights_mean+sd_means)
std_95 <- c(heights_mean-2*sd_means,heights_mean+2*sd_means)
std_99.7 <- c(heights_mean-3*sd_means,heights_mean+3*sd_means)    
qqnormsim(heights)

It appears none of the data falls outside 3\(\sigma\)
No data falls outside the left tale and 1 data point falls just ouside the right tale of 2\(\sigma\)
4 samples on each side fall otuside 1 \(\sigma\)
The data approximated the 68-95-99.7% very well

(b) Do these data appear to follow a normal distribution? Explain your reasoning using the graphs provided below.

The dataset is very close to being normally distributed. As pointed out above it meets the 68-95-99.7 rule and most of the data is evenly distributed above and below the mean, although there is a small rightward skew

3.22 Defective rate. A machine that produces a special type of transistor (a component ofcomputers) has a 2% defective rate. The production is considered a random process where each transistor is independent of the others.

A. What is the probability that the 10th transistor produced is the first with a defect?

\[(1-p)^{(n-1)}p \]

p <- .02
not_p <- .98
n=10
not_p^9*.02

## [1] 0.01667496

Answer 1.66%

B. What is the probability that the machine produces no defective transistors in a batch of 100?

not_p^100

## [1] 0.1326196

Answer 13.3%

C.1 On average, how many transistors would you expect to be produced before the first with a defect?

\[ \mu= 1/p \]

1/.02

## [1] 50

Answer 50 transistors

C.2 What is the standard deviation?

\[ \sigma= \sqrt{(1-p)/p^2} \]

sigma <- sqrt(.98/(.02^2))
sigma

## [1] 49.49747

Answer 49.5

D. Another machine that also produces transistors has a 5% defective rate where each transistor is produced independent of the others. On average how many transistors would you expect to be produced with this machine before the first with a defect? What is the standard deviation?

\[ \mu= 1/p \]

five_percent <- 1/.05

Answer 20

D.2 What is the standard deviation?

\[ \sigma= \sqrt{(1-p)/p^2} \]

five_percent_sigma <- sqrt(.95/(.05^2))
five_percent_sigma

## [1] 19.49359

Answer 19.49

E. Based on your answers to parts (c) and (d), how does increasing the probability of an eventa ↵ect the mean and standard deviation of the wait time until success?

Increasing the chance of a defective item will decrease the average time it takes to find a defective part and it will also decrease the standard deviation of that event.

3.38 While it is often assumed that the probabilities of having a boy or a girl are the same, the actual probability of having a boy is slightly higher at 0.51. Suppose a couple plans to have 3 kids.

A. Use the binomial model to calculate the probability that two of them will be boys.

\[{n\choose k}p^k(1-p)^{n-k} \]

binmal_dist <- function(n,k,p){
    my_prob <- factorial(n)/(factorial(k)*(factorial(n-k)))*p^k*(1-p)^(n-k)
    return(my_prob) 
}
p <- .51
k <- 2
n <- 3
binmal_dist(3,2,.51)

## [1] 0.382347

factorial(n)/(factorial(k)*(factorial(n-k)))

## [1] 3

Answer 38.2%

B. Write out all possible orderings of 3 children, 2 of whom are boys. Use these scenarios to calculate the same probability from part (a) but using the addition rule for disjoint outcomes. Confirm that your answers from parts (a) and (b) match.

sex1	sex2	sex3
boy	boy	girl
boy	girl	boy
girl	boy	boy

\[P(combo1)+P(combo2)+P(combo3) \] \[ Let Pb=ProbabilityBoy\] \[P(p^k*(1-p)^{n-k})+(p^k*(1-p)^{n-k})+(p^k*(1-p)^{n-k}) or (Pb*Pb*1-Pb)+ (Pb*1-Pb*Pb) +(1-Pb*Pb*Pb) \] \[ \] \[ (p=.51 , k=2,n=3) or (.51*.51*.49)+(.51*.49*.51)+(.49*.51*.51) \] \[.127+.127+.127=.381 \]

Answer .381 aproximately equals 38.2%…difference is due to rounding

C.If we wanted to calculate the probability that a couple who plans to have 8 kids will have 3 boys, briefly describe why the approach from part (b) would be more tedious than the approach from part (a).

The approach from part A solves the amount of possible combinations for us. Sovling combinations by hand can be very cumbersome Take the example of \({8\choose 3}\)= 56 combinations. Drawing out 56 combinations would be very difficult. Being that each combination has the same liklihood, all we really want is that answer of 56 and an n choose k forumla can do that faster than a person can draw out even a 2 choose 1 scenario.

3.42 Serving in volleyball. A not-so-skilled volleyball player has a 15% chance of making the serve, which involves hitting the ball so it passes over the net on a trajectory such that it will land in the opposing team’s court. Suppose that her serves are independent of each other.

A. What is the probability that on the 10th try she will make her 3rd successful serve?

\[ {10-1\choose 3-1} * (1-.15)^{7}*.15^3 \]

prob_event <- (1-.15)^7*.15^3
prob_event

## [1] 0.001081948

combos <- factorial(9)/(factorial(2)*factorial(7))
total_prob <- combos*prob_event
total_prob

## [1] 0.03895012

Answer: 3.89%

B. Suppose she has made two successful serves in nine attempts. What is the probability that her 10th serve will be successful?

15%. This event is disjoint and independent of previous events

C. Even though parts (a) and (b) discuss the same scenario, the probabilities you calculated should be different. Can you explain the reason for this discrepancy?

The asnwer to A involves the cumulative sum of the disjointed probability of the events leading up to the 10th serve
The answer to B is simply a singular event that is disjoint from the set of 10 serves.