HW 3

Load Library

library('DATA606')
## 
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='DATA606') will list the demos that are available.
## 
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
## 
##     demo

3.2

normalPlot(bounds=c(-1.13,Inf))

round(1-pnorm(-1.13,0,1),2)
## [1] 0.87
normalPlot(bound=c(-Inf,.18))

round(pnorm(.18,0,1),2)
## [1] 0.57
normalPlot(bound=c(8,Inf))

round(1-pnorm(8,0,1),2)
## [1] 0
normalPlot(bound=c(-.5,.5))

round(pnorm(.5,0,1) - pnorm(-.5,0,1),2)
## [1] 0.38

3.4

a)

\[Men,\quad Ages\quad 30-34:\quad N(\mu =4313,\quad \sigma =583)\]
\[Women,\quad Ages\quad 25-29:\quad N(\mu =5261,\quad \sigma =807)\]
###b)

Zleo <- (4948-4313)/583
Zleo  
## [1] 1.089194
Zmary <- (5513-5261)/807
Zmary  
## [1] 0.3122677

These Z-scores show that Leo was 1.09 and Mary was .31 standard deviations above the mean in their groups.

c)

1-pnorm(4948, 4313, 583)
## [1] 0.1380342
1-pnorm(5313, 5261, 807)
## [1] 0.4743115

Leo ranked better, being in the top 14% while Mary was in top 47%.

d)

pnorm(4948, 4313, 583)
## [1] 0.8619658

Leo was faster than 86% of his group

e)

pnorm(5313, 5261, 807)
## [1] 0.5256885

Mary was faster than 53% of her group

f)

Answer b) would not change since a normal distribution isn’t needed for Z-scores. c) to e) could not be calculated without a normal distribution.

3.18

a)

url <- "https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%203%20Exercise%20Data/fheights.txt"
fheights <- read.table(url, header = TRUE, stringsAsFactors = FALSE) 

onesd <- 61.52 + (1*4.58)
twosd <- 61.52 + (2*4.58)
threesd <- 61.52 + (3*4.58)

sum(fheights$heighs < onesd)/length(fheights$heighs)
## [1] 0.8333333
sum(fheights$heighs < twosd)/length(fheights$heighs)
## [1] 0.9583333
sum(fheights$heighs < threesd)/length(fheights$heighs)
## [1] 1

This does follow the 68-95-99.7 rule since 83% are within 1 standard deviation, 96% are with 2, and 100% are within 3.

b)

qqnorm(fheights$heighs)

hist(fheights$heighs,probability = TRUE)
lines(50:75,dnorm(50:75,mean(fheights$heighs),sd(fheights$heighs)), col="blue")

qqnormsim(fheights$heighs)

The data appears to follow a normal distribution. The outputs are symetrical and match the randomly generated simulation.

3.22

a)

((1-.02)^9) * (.02)
## [1] 0.01667496

b)

(1-.02)^100
## [1] 0.1326196

c)

Ex <- 1/.02
Ex
## [1] 50
sd <- sqrt((1-0.02)/0.02^2)
sd
## [1] 49.49747

d)

Ex_new <- 1/.05  
Ex_new
## [1] 20
sd_new <- sqrt((1-0.05)/0.05^2)
sd_new
## [1] 19.49359

e)

Increasing the probability decreases te mean and standard deviation of waiting before failure.

3.38

a)

dbinom(x=2,size=3,prob=.51)
## [1] 0.382347

b)

B <- .51 #probability of boy
G <- 1-B #probability of girl

#Different ways 3 kids can be born
P_kid1 <- B*B*G  
P_kid2 <- B*G*B 
P_kid3 <- G*B*B 

#Total Probabilty
Total_kid <- P_kid1+P_kid2+P_kid3
Total_kid
## [1] 0.382347

The answers for a and b match.

c)

Using part b whould be more tedious since there would be many more combinations in the order that the kids could be born, making the formula longer.

3.42

a)

p <- 0.15 #probability of successful server
n <- 10 #number of attempts
k <- 3 #number of successes
factorial(n - 1) / (factorial(k-1) * (factorial(n - k))) * p^k * (1-p)^(n-k)
## [1] 0.03895012

The probability is 3.9%

b)

Since the events are independent the probabilty is 15%.

c)

Part b is looking at the probabilty of a single event with each event being independent. While part a is looking at a combination of events.