#install.packages('StMoSim');
library(StMoSim)
## Loading required package: RcppParallel
## Loading required package: Rcpp
## 
## Attaching package: 'Rcpp'
## The following object is masked from 'package:RcppParallel':
## 
##     LdFlags
#install.packages("devtools")
library(devtools)
#devtools::install_github("jbryer/DATA606")
library(DATA606)
## 
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='DATA606') will list the demos that are available.
## 
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
## 
##     demo

3.2

  1. Z > -1.13
    1 - 0.1292 = 0.8708 = 87.08%

  2. Z < 0.18
    57.14%

  3. Z > 8
    0%

  4. |Z| < 0.5
    0.5: 0.6915
    -0.5: 0.3085
    0.6915 - 0.3085 = 0.383 = 38.3%

#3.2a
a.x <- c(-1.13,seq(-1.13,3,0.01),3) 
a.y <- c(0,dnorm(seq(-1.13,3,0.01)),0) 
curve(dnorm(x,0,1),xlim=c(-3,3),main='Z > 1.13', xlab = '(a)', ylab = '') 
polygon(a.x,a.y,col='lightgreen')

#3.2b
b.x <- c(-3,seq(-3,0.18,0.01),0.18) 
b.y <- c(0,dnorm(seq(-3,0.18,0.01)),0) 
curve(dnorm(x,0,1),xlim=c(-3,3),main='Z < 0.18', xlab = '(b)', ylab = '') 
polygon(b.x,b.y,col='lightgreen')

#3.2c
c.x <- c(8,seq(8,10,0.01),10) 
c.y <- c(0,dnorm(seq(8,10,0.01)),0) 
curve(dnorm(x,0,1),xlim=c(-3,10),main='Z > 8', xlab = '(c)', ylab = '') 
polygon(c.x,c.y,col='lightgreen')

#3.2d
d.x <- c(-0.5,seq(-0.5,0.5,0.01),0.5) 
d.y <- c(0,dnorm(seq(-0.5,0.5,0.01)),0) 
curve(dnorm(x,0,1),xlim=c(-3,3),main='|Z| = > 0.5', xlab = '(d)', ylab = '') 
polygon(d.x,d.y,col='lightgreen')

3.4

  1. \[Men: N(\mu = 4313, \sigma = 583)\]

\[Women: N(\mu = 5261, \sigma = 807)\]
b)
\[Z = \frac{(X - \mu)}{\sigma}\]
Leo: 4948
\[Z~Leo~ = \frac{(4948 - 4313)}{583} = 1.089193\] Leo’s Z-score tells us that he is 1.09 standard deviations away from the mean of 4313.

Mary: 5513 \[Z~Mary~ = \frac{(5513 - 5261)}{807} = 0.312267\] Mary’s Z-score tells us that she is 0.31 standard deviations away from the mean of 5261.

  1. Mary ranked better in her group, because you are less likely to find someone who finished faster than she did within the women’s group than you are to find someone in the men’s group who finished faster than Leo.

  2. Leo finished faster than 13.79% of male triathletes.

  3. Mary finished faster than 37.03% of female triathletes.

  4. If the distributions of finishing times were not nearly normal, our answers to parts (b) - (e) would likely change. Z scores are individual markers in relation to the the rest of the population, so if there is a population shift, the z-score will shift accordingly.

3.18

fheight <- c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73)
fheightmean <- mean(fheight)
fheightsd <- sd(fheight)

hist(fheight, probability = TRUE, ylim = c(0, 0.09), main = "Histogram of Height of Female College Students", xlab = "Height", col = "lightblue")
x <- 40:80
y <- dnorm(x = x, mean = fheightmean, sd = fheightsd)
lines(x = x, y = y, col = "red")

qqnormsim(fheight)

With a mean of 61.52, and a standard deviation of 4.58, this means that roughly 68% of the population is between 56.94 and 66.1 inches tall. 95% of the population is between 52.36 and 70.68 inches tall. Finally, the histogram above confirms that 99.7% of the population is between 47.78 and 75.26 inches tall. The histogram displays that these data do follow the 68-95-99.7 rule.

  1. These data do appear to follow a normal distribution. The data follow a bell shaped curve, and the points follow the line fairly closely in the normal QQ plot. The QQ plot made from our data looks very similar to the ones created with normally distributed data.

3.22

  1. \[(1 - 0.02)^9 * 0.02 = 0.01667\]
  2. 0.98^100 = 0.1326
  3. \[\frac{1}{0.02} = 50\] Standard Deviation:\[\sqrt\frac{1-0.02}{0.02^2} = \sqrt(2450) = 49.4974\]
  4. \[\frac{1}{0.05} = 20\] Standard Deviation:\[\sqrt\frac{1-0.05}{0.05^2} = \sqrt(380) = 19.4935\]
  5. Increasing the probability of an event decreases the mean and standard deviation, creating a smaller, more compact distribution. With this smaller distribution comes a decreased wait time until success.

3.38

dbinom(2, 3, 0.51)
## [1] 0.382347
  1. Girl Boy Boy
    Boy Girl Boy
    Boy Boy Girl
    \[(0.51^2) * 0.49 * 3 = 0.382347\] 0.382347 = 0.382347, check.
  2. The approach from part B would prove more tedious than part A, because we would have to write out all of the possible combinations for the 8 children. Part A is simple.

3.42

choose(9,2)
## [1] 36

\[36 * 0.15^3 * 0.85^7 = 36 * 0.003375 * 0.320577 = 0.03895\] b) The probability that her 10th serve will be successful given that she has made two successful serves in nine attempts is 0.15 or 15%. All serves are independent of one another, so this does not change from attempt to attempt.
c) The difference is due to a) taking the probabilities of the prior attempts into account, while b) is only concerned about the probability of the 10th attempt.