MSDS Spring 2018

DATA 606 Statistics and Probability for Data Analytics

Jiadi Li

Chapter 3: Distributions of Random Variables

HW 3: 3.2, 3.4, 3.18, 3.22, 3.38, 3.42

3.2 Area under the curve, Part II

(a)P(Z>-1.13)

pnorm(q = 1.31,mean = 0,sd = 1) #calculate the probability

## [1] 0.9049021

x1 <- c(-1.13,seq(-1.13,3,0.01),3.5) # Create data for the area to shade
y1 <- c(0,dnorm(seq(-1.13,3,0.01)),0) 

curve(dnorm(x,0,1), xlim=c(-3.5,3.5)) #draw a curve
polygon(x1,y1,col = "snow3") #add shaded area

(b)P(Z<0.18)

pnorm(q = 0.18,mean = 0,sd = 1) #calculate the probability

## [1] 0.5714237

x2 <- c(-3.5,seq(-3.5,0.18,0.01),0.18) # Create data for the area to shade
y2 <- c(0,dnorm(seq(-3.5,0.18,0.01)),0) 

curve(dnorm(x,0,1), xlim=c(-3.5,3.5)) #draw a curve
polygon(x2,y2,col = "snow3") #add shaded area

(c)P(Z>8)

1 - pnorm(q = 8,mean = 0,sd = 1) #calculate the probability

## [1] 6.661338e-16

x2 <- c(8,seq(8,9,0.01),9) # Create data for the area to shade
y2 <- c(0,dnorm(seq(8,9,0.01)),0) 

curve(dnorm(x,0,1), xlim=c(-3.5,9)) #draw a curve
polygon(x2,y2,col = "snow3") #add shaded area

(d)P(|Z|<0.5)

(pnorm(q = 0.5,mean = 0,sd = 1) - pnorm(q = 0,mean = 0,sd = 1))*2 #calculate the probability

## [1] 0.3829249

x4 <- c(-0.5,seq(-0.5,0.5,0.01),0.5) # Create data for the area to shade
y4 <- c(0,dnorm(seq(-0.5,0.5,0.01)),0) 

curve(dnorm(x,0,1), xlim=c(-3.5,3.5)) #draw a curve
polygon(x4,y4,col = "snow3") #add shaded area

3.4 Triathlon times, Part I

(a)
Men, Ages 30-34: \(N(\mu=4313, \sigma=583)\)
Women, Ages 25-29: \(N(\mu=5261, \sigma=807)\)

zLeo <- (4948-4313)/583
zMary <- (5513-5261)/807

The Z-score indicates their performance comparing to their own group.

zLeo

## [1] 1.089194

zMary

## [1] 0.3122677

Mary performed better than Leo since her z-score indicates that even though both of them finished slower than the mean of each group, Mary is closer to the mean of her own group.

pnorm(q = 4948,mean = 4313,sd = 583)

## [1] 0.8619658

pnorm(q = 5513,mean = 5261,sd = 807)

## [1] 0.6225814

Yes, the answers would definetely change because all the answers are based on the assumption that the data is nearly normal.

3.18 Heights of female college students

#qqnormsim
height <- c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73)

mean(height)

## [1] 61.52

sd(height)

## [1] 4.583667

sample <- rnorm(length(height),mean = mean(height),sd = sd(height))
quantile(height,0.68)

## 68% 
##  63

quantile(sample,0.68)

##      68% 
## 63.68313

quantile(height,0.95)

##  95% 
## 68.6

quantile(sample,0.95)

##      95% 
## 71.05852

quantile(height,0.997)

##  99.7% 
## 72.712

quantile(sample,0.997)

##    99.7% 
## 72.00822

The 68%,95% and 99.7% of height and sample are close, but there are still discrepancies.

hist(height,probability = TRUE,ylim = c(0,0.1))
x <- 50:75
y <- dnorm(x = x, mean = mean(height), sd = sd(height))
lines(x = x, y = y, col = "blue")

qqnorm(height)
qqline(height)

The graph is slightly normally distributed and highly skewed to the left.

3.22 Defective rate

(a)10\(^{th}\) being the first defected

defective.rate <- 0.02
(1-defective.rate)^9*defective.rate

## [1] 0.01667496

(b)no defective one in a batch of 100

(1-defective.rate)^100

## [1] 0.1326196

(c)\(\mu=\frac{1}{p}\); \(\sigma^2=\frac{1-p}{p^2}\)

1/defective.rate

## [1] 50

sqrt((1-defective.rate)/defective.rate^2)

## [1] 49.49747

(d)defective rate = 5%, find \(E[x]\) and sd

defective.rate2 <- 0.05
1/defective.rate2

## [1] 20

sqrt((1-defective.rate2)/defective.rate2^2)

## [1] 19.49359

As defective rate increases, both the expected value and standard deviation before the first with a defect produced will decrease.

3.38 Male children

P(boy) = 0.51
\(\frac{n!}{k!(n-k)!} p^{k} (1-p)^{n-k}\)

p.boy <- 0.51
k.boy <- 2 #number of successes
n.kid <- 3 #number of independent trials


choose(n.kid,k.boy)*p.boy^k.boy*(1-p.boy)^(n.kid-k.boy)

## [1] 0.382347

find P(2boys)
ggg, ggb, gbg, gbb, bgb, bgg, bbg, bbb

(p.boy*p.boy*(1-p.boy))*3

## [1] 0.382347

The method in (b) needs more calculation while (a) only involves replacement of variables.

3.42 Serving in volleyball

P(serve) = 15% (a) 3\(^{rd}\) serve on the 10\(^{th}\) try

p.serve <- 0.15
choose(9,2)*(p.serve^2)*(1-p.serve)^(9-2)*p.serve

## [1] 0.03895012

2 successful serves in nine attempts, P(10\(^{th}\) is successful) =

p.serve

## [1] 0.15

The probability calculated on (b) is independent of first nine attempts while in (a) all ten attempts are taken into calculation.