library(DATA606)
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
#Z > -1.13
normalPlot(bounds = c(-1.13, Inf)); 1 - pnorm(-1.13)
## [1] 0.8707619
#Z < 0.18
normalPlot(bounds = c(-Inf, 0.18)); pnorm(0.18)
## [1] 0.5714237
#Z > 8
normalPlot(bounds = c(8, Inf)); 1 - pnorm(8)
## [1] 6.661338e-16
#|Z| < 1.5
normalPlot(bounds = c(-0.5, 0.5)); pnorm(0.5) - (1- pnorm(0.5));
## [1] 0.3829249
Men $N(= 4313, = 583) $ Women \(N(\mu = 5261, \sigma = 807)\)
#LEO
(4948 - 4313) / 583
## [1] 1.089194
#MARY
(5513 - 5261) / 807
## [1] 0.3122677
Leo finished 1.09 standard deviation above the mean for his group and Mary finised 0.31 standard deviation above the mean for her group.
Since this is a race, the lower the time, the better. Since Mary has a z-score of 0.31 which is lower than Leo’s 1.09, Mary ranked better than Leo.
#LEO
1 - pnorm(1.09)
## [1] 0.1378566
Leo finished faster than 13.8% of the runners in his group
#MARY
1 - pnorm(0.31)
## [1] 0.3782805
Mary finished faster than 37.8% of the runners in her group
The z-scores would not change but the percentiles would.
The mean height is 61.52 inches with a standard deviation of 4.58 inches. Use this information to determine if the heights approximately follow the 68-95-99.7% Rule.
68% of the data is within 1 standard deviation of the mean \(61.52 - 4.58 = 56.94\) -> \(61.52 + 4.58 = 66.1\)
96% of the data is within 2 standard deviations of the mean \(61.52 - (2*4.58) = 52.36\) -> \(61.52 + (2*4.58)= 72.68\)
100% of the data is within 3 standard deviations of the mean \(61.52 - (3*4.58) = 47.78\) -> \(61.52 + (3*4.58) = 75.26\)
Therefore the heights do follow the 68-95-99.7% Rule
The histogram is not as symmetric as should be but the distribution is unimodal. The points on the qq plot also seem to follow a straight line or is close to the straight line so it’s normal. There are outliers on both ends of the qq plot but it is not too extreme.
((1 - 0.02) ^ 9) * 0.02
## [1] 0.01667496
0.017% chance.
0.98 ^ 100
## [1] 0.1326196
13.3% chance
avgt <- 1 / 0.02
avgt
## [1] 50
sdt <- sqrt((1 - 0.02)/ 0.02 ^ 2)
sdt
## [1] 49.49747
avgt2 <- 1 / 0.05
avgt2
## [1] 20
sdt2 <- sqrt((1 - 0.05)/ 0.05 ^ 2)
sdt2
## [1] 19.49359
When p is higher, the wait time until success is lower.
Male children. While it is often assumed that the probabilities of having a boy or a girl are the same, the actual probability of having a boy is slightly higher at 0.51. Suppose a couple plans to have 3 kids.
\((# of scenarios) * P(single scenario)\)
\(_nC_r\) * \(p^k(1 - p)^{n-k}\)
p <- 0.51
n <- 3
k <- 2
nCr <- factorial(n)/(factorial(k) * factorial(n-k))
psinglescenario <- (p^k)*((1 - p)^(n-k))
nCr * psinglescenario
## [1] 0.382347
\((B, B, G)\) \((B, G, B)\) \((G, B, B)\)
B = 0.51
G = 0.49
(B^2 * G) + (B * G * B) + (G * B^2)
## [1] 0.382347
It’s a match.
There are 56 different ways of ordering the kids under this task which is indeed more tedious.
p <- 0.15
n <- 10
k <- 3
nCr <- factorial(n-1)/(factorial(k-1) * factorial(n-k))
psinglescenario2 <- (p^k)*((1 - p)^(n-k))
nCr * psinglescenario2
## [1] 0.03895012
15% chance.
For (a) is concerned with negative binomial distribution, one of the condition is that the last trial is a success whci was assumed in the (a). For (b) since each serve is independent they do not affect each other and has an equal chance of happening so with each serve she has a 15% chance.