Ch3. Data606. HW3.

#a)
normalPlot(mean = 0, sd = 1, bounds = c(-1.13,Inf))

#b)
normalPlot(mean = 0, sd = 1, bounds = c(-Inf,0.18))

#c)
normalPlot(mean = 0, sd = 1, bounds = c(8,Inf))

#d)
normalPlot(mean = 0, sd = 1, bounds = c(-0.5,0.5))

#a)
##Men, Ages 30 - 34
##N(mu = 4313, std = 583)

##Women, Ages 25 - 29
##N(mu = 5261, std = 807)

#b)

leo.x = 4948
male.Mu = 4313
male.std = 583

leo.z = ( (leo.x - male.Mu) / male.std )
leo.z

## [1] 1.089194

mary.x = 5513
female.Mu = 5261
female.std = 807

mary.z = ( (mary.x - female.Mu ) / female.std )
mary.z

## [1] 0.3122677

##Z-scores tell us that both of their finishing times are greater than each of their group's average. Closer to 0, closer to the average, and that being said, Mary's record is closer to her group's mean than Leo's.

#c)
##Both of their finishing times are greater than each of their group's average in terms of Z-score. They did not rank better in their respective groups. Relatively, Mary did better than Leo since her Z score is lower than Leo's. (Shorter finishing time means you performed better)

#d)
pnorm(leo.x, male.Mu, male.std,lower.tail=FALSE)

## [1] 0.1380342

#e)
pnorm(mary.x, female.Mu, female.std, lower.tail = F)

## [1] 0.3774186

#f)
##Yes, since Z-score is normalized value, it will be hard for us to tell who performed better relative to the respective group if distribution is not normally distributed.

#a)
height <- c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73)
height_mean <- 61.52
height_sd <- 4.58

#1sd
sd1Upp = height_mean + height_sd
sd1Upp

## [1] 66.1

sd1Low = height_mean - height_sd
sd1Low

## [1] 56.94

#2sd
sd2Upp = height_mean + (2*height_sd)
sd2Upp

## [1] 70.68

sd2Low = height_mean - (2*height_sd)
sd2Low

## [1] 52.36

#3sd
sd3Upp = height_mean + (3*height_sd)
sd3Upp

## [1] 75.26

sd3Low = height_mean - (3*height_sd)
sd3Low

## [1] 47.78

##68 rule
length(height[height >= sd1Low & height <= sd1Upp] ) / length(height) * 100

## [1] 68

##95 rule
length(height[height >= sd2Low & height <= sd2Upp] ) / length(height) * 100

## [1] 96

##99.7 rule
length(height[height >= sd3Low & height <= sd3Upp] ) / length(height) * 100

## [1] 100

##Ans: It shows that the scores approximately follow the 68-95-99.7 rule. 68% of data sets are within 1 standard deviation, 96% fall within 2 standard deviations and 100% fall within 3 standard deviations.


#b)
hist(height, probability = TRUE, col = "sky blue")

x <- 50:80
y <- dnorm(x=x, mean = height_mean, sd = height_sd)
lines(x=x, y=y, col = "red")

qqnormSim(height)

##Ans: The distribution looks pretty like much bell-shaped, unimodal and symmetric as 68-95-99.7 rule works in this case. Also, Q-Q plot indicates that most of data points are approximately fitting the straight line. Therefore, we can say that data follows a normal distribution.

#3.22,3.38,3.42
#a)
p <- 0.02

dbinom(1, 10, p)

## [1] 0.1667496

#b)

dbinom(0, 100, p)

## [1] 0.1326196

#c)
1/p

## [1] 50

sqrt((1 - p)/p^2)

## [1] 49.49747

#d)
p <- 0.05
1/p

## [1] 20

sqrt((1 - p)/p^2)

## [1] 19.49359

#e) as P (chance of certain event to happen) increases, both mean and standard deviation of the wait time until success decreases.

#a)
p <- 0.51
dbinom(2, 3, p)

## [1] 0.382347

#b)
combn(3,2)

##      [,1] [,2] [,3]
## [1,]    1    1    2
## [2,]    2    3    3

dim(combn(3,2))[2] * (p^2) * (1-p)

## [1] 0.382347

#It is true that answers from a and b are the same.

#c)
#approach from b) is more tedious because you have to list out all of the combinations by hand where as approach from a) will take care of it automatically.

#a)
n <- 10
k <- 3
p <- 0.15

dnbinom(n-k, k, p)

## [1] 0.03895012

#b)
#15% since each trials are assumed to be independent.

#c)
#Since her serves are independent of each other, the probability of making any serve will be always 15%. In b), the probability of 10th serve being successful is indepedent of previous events. In a), it asks you the probability of the k th success on the n th trial which requires the calculation of probability based on negative binomial distribution logic. Therefore, answers from a) and b) are not the same.

Ch3. Data606. HW3.

Sang Yoon (Andy) Hwang

March 2, 2018