MSDS Spring 2018

DATA 606 Statistics and Probability for Data Analytics

Jiadi Li

Chapter 3: Distributions of Random Variables

HW 3: 3.2, 3.4, 3.18, 3.22, 3.38, 3.42

3.2 Area under the curve, Part II

(a)P(Z>-1.13)

pnorm(q = 1.31,mean = 0,sd = 1) #calculate the probability
## [1] 0.9049021
x1 <- c(-1.13,seq(-1.13,3,0.01),3.5) # Create data for the area to shade
y1 <- c(0,dnorm(seq(-1.13,3,0.01)),0) 

curve(dnorm(x,0,1), xlim=c(-3.5,3.5)) #draw a curve
polygon(x1,y1,col = "snow3") #add shaded area

(b)P(Z<0.18)

pnorm(q = 0.18,mean = 0,sd = 1) #calculate the probability
## [1] 0.5714237
x2 <- c(-3.5,seq(-3.5,0.18,0.01),0.18) # Create data for the area to shade
y2 <- c(0,dnorm(seq(-3.5,0.18,0.01)),0) 

curve(dnorm(x,0,1), xlim=c(-3.5,3.5)) #draw a curve
polygon(x2,y2,col = "snow3") #add shaded area

(c)P(Z>8)

1 - pnorm(q = 8,mean = 0,sd = 1) #calculate the probability
## [1] 6.661338e-16
x2 <- c(8,seq(8,9,0.01),9) # Create data for the area to shade
y2 <- c(0,dnorm(seq(8,9,0.01)),0) 

curve(dnorm(x,0,1), xlim=c(-3.5,9)) #draw a curve
polygon(x2,y2,col = "snow3") #add shaded area

(d)P(|Z|<0.5)

(pnorm(q = 0.5,mean = 0,sd = 1) - pnorm(q = 0,mean = 0,sd = 1))*2 #calculate the probability
## [1] 0.3829249
x4 <- c(-0.5,seq(-0.5,0.5,0.01),0.5) # Create data for the area to shade
y4 <- c(0,dnorm(seq(-0.5,0.5,0.01)),0) 

curve(dnorm(x,0,1), xlim=c(-3.5,3.5)) #draw a curve
polygon(x4,y4,col = "snow3") #add shaded area

3.4 Triathlon times, Part I

(a)
Men, Ages 30-34: \(N(\mu=4313, \sigma=583)\)
Women, Ages 25-29: \(N(\mu=5261, \sigma=807)\)

zLeo <- (4948-4313)/583
zMary <- (5513-5261)/807

The Z-score indicates their performance comparing to their own group.

zLeo
## [1] 1.089194
zMary
## [1] 0.3122677

Mary performed better than Leo since her z-score indicates that even though both of them finished slower than the mean of each group, Mary is closer to the mean of her own group.

pnorm(q = 4948,mean = 4313,sd = 583)
## [1] 0.8619658
pnorm(q = 5513,mean = 5261,sd = 807)
## [1] 0.6225814
  1. Yes, the answers would definetely change because all the answers are based on the assumption that the data is nearly normal.

3.18 Heights of female college students

#qqnormsim
height <- c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73)
mean(height)
## [1] 61.52
sd(height)
## [1] 4.583667
sample <- rnorm(length(height),mean = mean(height),sd = sd(height))
quantile(height,0.68)
## 68% 
##  63
quantile(sample,0.68)
##      68% 
## 63.68313
quantile(height,0.95)
##  95% 
## 68.6
quantile(sample,0.95)
##      95% 
## 71.05852
quantile(height,0.997)
##  99.7% 
## 72.712
quantile(sample,0.997)
##    99.7% 
## 72.00822

The 68%,95% and 99.7% of height and sample are close, but there are still discrepancies.

hist(height,probability = TRUE,ylim = c(0,0.1))
x <- 50:75
y <- dnorm(x = x, mean = mean(height), sd = sd(height))
lines(x = x, y = y, col = "blue")

qqnorm(height)
qqline(height)

The graph is slightly normally distributed and highly skewed to the left.

3.22 Defective rate

(a)10\(^{th}\) being the first defected

defective.rate <- 0.02
(1-defective.rate)^9*defective.rate
## [1] 0.01667496

(b)no defective one in a batch of 100

(1-defective.rate)^100
## [1] 0.1326196

(c)\(\mu=\frac{1}{p}\); \(\sigma^2=\frac{1-p}{p^2}\)

1/defective.rate
## [1] 50
sqrt((1-defective.rate)/defective.rate^2)
## [1] 49.49747

(d)defective rate = 5%, find \(E[x]\) and sd

defective.rate2 <- 0.05
1/defective.rate2
## [1] 20
sqrt((1-defective.rate2)/defective.rate2^2)
## [1] 19.49359
  1. As defective rate increases, both the expected value and standard deviation before the first with a defect produced will decrease.

3.38 Male children

  1. P(boy) = 0.51
    \(\frac{n!}{k!(n-k)!} p^{k} (1-p)^{n-k}\)
p.boy <- 0.51
k.boy <- 2 #number of successes
n.kid <- 3 #number of independent trials


choose(n.kid,k.boy)*p.boy^k.boy*(1-p.boy)^(n.kid-k.boy)
## [1] 0.382347
  1. find P(2boys)
    ggg, ggb, gbg, gbb, bgb, bgg, bbg, bbb
(p.boy*p.boy*(1-p.boy))*3
## [1] 0.382347
  1. The method in (b) needs more calculation while (a) only involves replacement of variables.

3.42 Serving in volleyball

P(serve) = 15% (a) 3\(^{rd}\) serve on the 10\(^{th}\) try

p.serve <- 0.15
choose(9,2)*(p.serve^2)*(1-p.serve)^(9-2)*p.serve
## [1] 0.03895012
  1. 2 successful serves in nine attempts, P(10\(^{th}\) is successful) =
p.serve
## [1] 0.15
  1. The probability calculated on (b) is independent of first nine attempts while in (a) all ten attempts are taken into calculation.