library('DATA606') # Load the package##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
vignette(package='DATA606') # Lists vignettes in the DATA606 package## no vignettes found
vignette('os3') # Loads a PDF of the OpenIntro Statistics book## Warning: vignette 'os3' not found
data(package='DATA606') # Lists data available in the packagenormalPlot(mean = 0 , sd = 1, bounds = c(-1.13, 4), tails = FALSE)1-0.13 ## [1] 0.87
normalPlot(mean = 0 , sd = 1, bounds = c(-4, 0.18), tails = FALSE)0.57## [1] 0.57
normalPlot(mean = 0 , sd = 1, bounds = c(.8,4), tails = FALSE)1-0.78## [1] 0.22
normalPlot(mean = 0 , sd = 1, bounds = c(-0.5, 0.5), tails = FALSE)#Area for Z < -0.5
0.3085## [1] 0.3085
# Area for Z < 0.5
0.6915## [1] 0.6915
AreaZ <- 0.6915 - 0.3085
AreaZ## [1] 0.383
##qqnormsimleo_Z <- (4948 - 4313)/583
leo_Z## [1] 1.089194
Mary_Z <- (5513 - 5261)/807
Mary_Z## [1] 0.3122677
Leo’s Z score is 1.09 so its 1.09 standard deviations above the mean for Men’s race.
Mary scored 0.312 standard deviations above the mean for Women’s race.
Since Leo’s Z score is higher than Mary’s, he ranked better in his race group compared to Mary’s rank in her race group.
Percentile for Leo is 0.8621, which means 86.2 % of triathletes finished slower than Leo.
Percentile for Mary is 0.6217, which means she finished faster than 62.17% of triathletes in her category.
If this was not a normal distribution then we would still be able to answer (b) and (c) as the Z score applies even if not normal distribution. However, (d) and (e) will not be the same as it is using Normal distribution table.
Number of females within first SD = 61.52 - 4.58 and 61.62 + 4.58 = 56.84 < SD1 range < 66.1 are 17 out of 25. So 68% of women are within 1 SD range # of females in 2 SD range (61.52 - 2x4.58) an 61.52 + 2x4.58) = 24 so 96% of women are within 2 SD range and almost all are within 3 SD range. Hence it follows 68-95-99.7% Rule
17/25## [1] 0.68
61.52 - (2*4.58)## [1] 52.36
61.52 + (2*4.58)## [1] 70.68
24/25## [1] 0.96
Probability of no defect in the batch of 100 = (9.98)^100 = 0.1326 = 13.26%
On average, how many transistors would you expect to be produced before the first with a defect?
\(\mu \quad =\quad 1 / 0.02\) = 50
\(\quad \sigma \quad =\sqrt { (1-0.02)/{ 0.02}^{2}}\) = 49.5
\(\mu \quad =\quad 1 / 0.05\) = 20
\(\quad \sigma \quad =\sqrt { (1-0.05)/{ 0.05}^{2}}\) = 19.5
As the probability of an event increases the chances of the event occuring are higher hence the average of the event not occuring reduces. So in this case when the probability was more than doubled, mean and SD were less than half.
#a
(0.98)^9 * 0.02## [1] 0.01667496
#b
(0.98)^100## [1] 0.1326196
#c
mn <- 1/0.02
mn## [1] 50
sd <- sqrt((1-0.02)/0.02^2)
sd## [1] 49.49747
#d
#c
mn2 <- 1/0.05
mn2## [1] 20
sd2 <- sqrt((1-0.05)/0.05^2)
sd2## [1] 19.49359
While it is often assumed that the probabilities of having a boy or a girl are the same, the actual probability of having a boy is slightly higher at 0.51. Suppose a couple plans to have 3 kids.
The probability that the two of them will be boys is 0.38
Pr <- .51
Pr <- dbinom(2,3,Pr)
Pr## [1] 0.382347
bbg bgb gbb
## addition rule for disjoint outcomes
(0.51*0.51*0.49) + (0.51*0.49*0.51) + (0.49*0.51*0.51)## [1] 0.382347
factorial(5)## [1] 120
#(c)
n<- 8
r<- 3
factorial(n)/(factorial(r)*factorial(n-r))## [1] 56
The number of different ways a couple can have 3 boys is 56 so with approach (b) we will have to calculate the probability of having 3 boys and add up 56 times.
Serving in volleyball. A not-so-skilled volleyball player has a 15% chance of making the serve, which involves hitting the ball so it passes over the net on a trajectory such that it will land in the opposing team’s court. Suppose that her serves are independent of each other.
# Probability of 2 successes in 9 trials
pr <- 0.15
pr2 <- dbinom(2,9,pr)
# probability that the 10th try will be the 3rd succcess
pr2*pr## [1] 0.03895012
Since these are independent trials, the probability that her 10th serve will be successful remains 0.15
Part (a) asks for probability of combined results with 2 out of 9 success along with the 10 try to be successful, whereas (b) is determining only the probability of the 10th try