Assignment #3: Distribution

Raj Kumar

Import Libraries

# Good Practise: Basic house keeping: cleanup the env before you start new work
rm(list=ls())

# Libraries 
library(DATA606)
## 
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='DATA606') will list the demos that are available.
## 
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
## 
##     demo
library(StMoSim)
## Loading required package: RcppParallel
## Loading required package: Rcpp
## 
## Attaching package: 'Rcpp'
## The following object is masked from 'package:RcppParallel':
## 
##     LdFlags
library(tidyverse)
## -- Attaching packages -------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.1     v dplyr   0.7.4
## v tidyr   0.8.0     v stringr 1.2.0
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ----------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
  1. Exercise: 3.2 Area under the curve

Exercise: 3.2

(a): Z > -1.13

Answer = 0.8707619

1 - pnorm(-1.13, mean=0, sd=1)
## [1] 0.8707619
normalPlot(mean = 0, sd = 1, bounds = c(-1.13, Inf), tails = FALSE)

(b): Z < 0.18

Answer = 0.5714237

pnorm(0.18, mean=0, sd=1)
## [1] 0.5714237
normalPlot(mean = 0, sd = 1, bounds = c(-Inf,0.18), tails = FALSE)

(c): Z > 8

Answer = 6.661338e-16

1- pnorm(8, mean=0, sd=1)
## [1] 6.661338e-16
normalPlot(mean = 0, sd = 1, bounds = c(8,Inf), tails = FALSE)

(d): |Z| < 0.5

Answer = 0.3829249

pnorm(0.5, mean=0, sd=1) - pnorm(-0.5, mean=0, sd=1)
## [1] 0.3829249
normalPlot(mean = 0, sd = 1, bounds = c(-0.5, 0.5), tails = FALSE)

  1. Exercise: 3.4 Triathlon Times

Exercise: 3.4 Triathlon Times

Q(a)

Answer:

Mens: \[N(\mu=4313, \sigma=583)\] Womens: \[N(\mu=5261, \sigma=807)\]

Q(b)

Answer: Mary=0.3122677 and Leo=1.089194

z_mary <- (5513-5261)/807
z_mary 
## [1] 0.3122677
z_leo <- (4948-4313)/583
z_leo
## [1] 1.089194

Q(c)

Answer:

Mary’s score is closer to the mean. Leo’s score is farther from the mean. Mary did better. Lesser score indicates quicker finish in the race.

Q(d)

Answer: Leo finished faster than 13.80% others in the group

1 - pnorm(4948, mean=4313, sd=583)
## [1] 0.1380342

Q(e)

Answer: Mary finished faster than 37.74% others in the group

1 - pnorm(5513, mean=5261, sd=807)
## [1] 0.3774186

Q(f)

Answer: Since the assumption is the data is normal, the change of these asumption would surely change the answer to the questions

  1. Exercise: 3.18 Heights of female college student

Exercise: 1.28 Reading the Paper

Q(a)

heights <- c(54, 55, 56, 56, 57, 58, 58, 59, 60, 60, 60, 61, 61, 62, 62, 63, 63, 63, 64, 65, 65, 67, 67, 69, 73)
sd_heights <- sd(heights)
mean_heights <- mean(heights)

summary(heights)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   54.00   58.00   61.00   61.52   64.00   73.00
hist(heights)

Q(b)

Answer:

# check if mean+-sd falls in the 68-95-99.7 rule 
value1 <- pnorm(61.52+1*4.58,mean=61.52,sd=4.58)-pnorm(61.52-1*4.58,mean=61.52,sd=4.58)

# check if mean+-2sd falls in the 68-95-99.7 rule 
value2 <- pnorm(61.52+2*4.58,mean=61.52,sd=4.58)-pnorm(61.52-2*4.58,mean=61.52,sd=4.58)


# check if mean+-3sd falls in the 68-95-99.7 rule 
value3 <- pnorm(61.52+3*4.58,mean=61.52,sd=4.58)-pnorm(61.52-3*4.58,mean=61.52,sd=4.58)

rulevalues <- c(value1, value2, value3)
rulevalues
## [1] 0.6826895 0.9544997 0.9973002

The values are close to normal curve but dont exactly match the normal curve.

qqnormSim(heights, nSim=500)

  1. Exercise: 3.22 Defective Rate

Exercise: 3.22 Defective Rate

Q (a)

Answer: .01667

(1 - .02)^(10 - 1) * .02
## [1] 0.01667496

Q (b)

Answer: 0.1326

value = (1-.02)^(100)
value
## [1] 0.1326196

Q (c)

Answer: 50

value = 1/.02
value
## [1] 50

Q (d)

Answer:

value = 1/0.05
value
## [1] 20

Q (e)

Answer:

If we increase the probability of the event, it will decrease the mean and standard deviation.

  1. Exercise: 3.38 Male Children

Exercise: 3.38 Male Children

Q (a) use binomial model

Answer: .392347

pb <- 0.51
k <- 2
n <- 3

factorial_n <- factorial(n)
factorial_k <- factorial(k)
factorial_nk <- factorial(n-k)

value <- ( factorial_n / (factorial_k * factorial_nk)) * pb^k * (1-pb)^(n-k)
value
## [1] 0.382347

Q (b)

Answer: 0.382347

P(B, B, G)

P(B, G, B)

P(G, B, B)

((1-pb) * pb * pb) * 3
## [1] 0.382347
((1-.51) * .51 * .51) * 3
## [1] 0.382347

Q (c)

Answer: approach b would require actual values and then calculating based on possibilities, while approach a would help us calculate using the formula

  1. Exercise: 3.42 Serving the volleyball

Exercise: 3.42 Serving the volleyball

Q (a)

Answer: 0.03895012

choose(9,2)*0.15^3*0.85^7
## [1] 0.03895012

Q (b)

Answer: 15%

Q (c)

Answer: In a, we get marginal probability of success independent of previous trials, In b, we calculate conditional probability based on previous trial. This is the reason for discrepency.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.