1) Area under the curve :
library(DATA606)
## Loading required package: shiny
## Loading required package: openintro
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
## Loading required package: OIdata
## Loading required package: RCurl
## Loading required package: bitops
## Loading required package: maps
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:openintro':
##
## diamonds
## Loading required package: markdown
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
(a) Z < −1.35
1 - pnorm(-1.35, mean = 0, sd = 1)
## [1] 0.911492
normalPlot(mean = 0, sd = 1, bounds=c(-Inf,-1.35), tails = F)

8.85% percent of a standard normal distribution N(µ = 0, σ = 1) is found in this region.
(b) Z > 1.48
1-pnorm(1.48, mean = 0, sd = 1)
## [1] 0.06943662
normalPlot(mean = 0, sd = 1, bounds=c(1.48,Inf), tails = F)

6.94% percent of a standard normal distribution N(µ = 0, σ = 1) is found in this region.
(c) −0.4 < Z < 1.5
1-pnorm(c(-0.4,1.5), mean = 0, sd = 1)
## [1] 0.6554217 0.0668072
normalPlot(mean = 0, sd = 1, bounds=c(-0.4,1.5), tails = F)

58.9% percent of a standard normal distribution N(µ = 0, σ = 1) is found in this region.
(d) |Z| > 2
1-pnorm(2, mean = 0, sd = 1)
## [1] 0.02275013
normalPlot(mean = 0, sd = 1, bounds=c(2,Inf), tails = F)

2.28% percent of a standard normal distribution N(µ = 0, σ = 1) is found in this region.
2)Triathlon times :
(a) Write down the short-hand for these two normal distributions?
Ans) Men: Ages 30-34: N(μ=4313,σ=583).
Women: Ages 25-29: N(μ=5261,σ=807).
(b) What are the Z-scores for Leo’s and Mary’s finishing times? What do these Z-scores tell you?
Ans)
(Z_Leo <- (4948-4313)/583)
## [1] 1.089194
(Z_Mary <- (5513-5261)/807)
## [1] 0.3122677
Leo is 1.089194 of SD above the mean.
Marry is 0.3122677 of SD above the mean.
(c) Did Leo or Mary rank better in their respective groups? Explain your reasoning.
Ans) Leo ranked better than Mary in their respective group,since Leo’s Z-score was much higher than Mary’s.
(d) What percent of the triathletes did Leo finish faster than in his group?
round(pnorm(Z_Leo)*100, 2)
## [1] 86.2
(e) What percent of the triathletes did Mary finish faster than in her group?
round(pnorm(Z_Mary)*100, 2)
## [1] 62.26
(f) If the distributions of finishing times are not nearly normal, would your answers to parts (b) - (e) change? Explain your reasoning.
Ans) If distributions are not nearly normal, then part (b) will remain the same since Z-scores can still be
calculated. However, parts (d) and (e) rely on the normal model for calculations, so the results would change.
3) Heights of female college students:
(b) Do these data appear to follow a normal distribution? Explain your reasoning using the graphs provided below?
Ans) Yes,we can say that the distribution is nearly normal.
It is little harder to tell based on the histogram than on the normal probability plot.
The histogram graph roughly follows the normal curve, but the normal probability plot follows straight line. #### There is one possible outlier on the lower end that is apparent in both graphs, but it is not too extreme.
5) Male children :
(a) Use the binomial model to calculate the probability that two of them will be boys.
dbinom(2, 3, 0.51)
## [1] 0.382347
(b) Write out all possible orderings of 3 children, 2 of whom are boys. Use these scenarios to calculate the same probability from part (a) but using the addition rule for disjoint outcomes. Confirm that your answers from parts (a) and (b) match.
# Rule for disjoint
# P(B) = 0.51, P(G) = 1-0.51 = 0.49
# P = P[{G,B,B})+P({B,G,B})+P({B,B,G}]
prob <- ((0.49*0.51*0.51)+(0.51*0.49*0.51)+(0.51*0.51*0.49))
prob
## [1] 0.382347
(c) If we wanted to calculate the probability that a couple who plans to have 8 kids will have 3 boys, briefly describe why the approach from part (b) would be more tedious than the approach from part (a).
dbinom(3,8,0.51)
## [1] 0.2098355
No of ways = 8!/3!(8−3)!=8!/3!5!=(8)(7)(6)/(3)(2)(1)=56
Using the choose function shows that there are 56 ways to have 3 boys out of 8 children.
With method (b), the probability of each of those instances would have to be computed individually and then summed.With method (a), the formula for the binomial distribution computes the probability in one step.