library('DATA606')          # Load the package
## 
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='DATA606') will list the demos that are available.
## 
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
## 
##     demo
vignette(package='DATA606') # Lists vignettes in the DATA606 package
## no vignettes found
vignette('os3')             # Loads a PDF of the OpenIntro Statistics book
## Warning: vignette 'os3' not found
data(package='DATA606')     # Lists data available in the package

Chapter 3 - Distributions of Random Variables

Practice: 3.1 (see normalPlot), 3.3, 3.17 (use qqnormsim from lab 3), 3.21, 3.37, 3.41

Graded: 3.2 (see normalPlot), 3.4, 3.18 (use qqnormsim from lab 3), 3.22, 3.38, 3.42

3.2 Area under the curve

normalPlot(mean = 0 , sd = 1, bounds = c(-1.13, 4), tails = FALSE)

1-0.13 
## [1] 0.87
normalPlot(mean = 0 , sd = 1, bounds = c(-4, 0.18), tails = FALSE)

0.57
## [1] 0.57
normalPlot(mean = 0 , sd = 1, bounds = c(.8,4), tails = FALSE)

1-0.78
## [1] 0.22
normalPlot(mean = 0 , sd = 1, bounds = c(-0.5, 0.5), tails = FALSE)

#Area for Z < -0.5 
0.3085
## [1] 0.3085
# Area for Z < 0.5
0.6915
## [1] 0.6915
AreaZ <- 0.6915 - 0.3085
AreaZ
## [1] 0.383
##qqnormsim

3.4 Triathalon Times

  1. \(N (\mu \quad =\quad 4313,\quad \sigma \quad =\quad 583)\)
    \(N (\mu \quad =\quad 5261,\quad \sigma \quad =\quad 807)\)
leo_Z <- (4948 - 4313)/583
leo_Z
## [1] 1.089194
Mary_Z <- (5513 - 5261)/807
Mary_Z
## [1] 0.3122677
  1. Leo’s Z score is 1.09 so its 1.09 standard deviations above the mean for Men’s race.
    Mary scored 0.312 standard deviations above the mean for Women’s race.

  2. Since Leo’s Z score is higher than Mary’s, he ranked better in his race group compared to Mary’s rank in her race group.

  3. Percentile for Leo is 0.8621, which means 86.2 % of triathletes finished slower than Leo.

  4. Percentile for Mary is 0.6217, which means she finished faster than 62.17% of triathletes in her category.

  5. If this was not a normal distribution then we would still be able to answer (b) and (c) as the Z score applies even if not normal distribution. However, (d) and (e) will not be the same as it is using Normal distribution table.

3.18. Heights of Female college students

  1. \(N (\mu \quad =\quad 61.52,\quad \sigma \quad =\quad 4.58)\)

Number of females within first SD = 61.52 - 4.58 and 61.62 + 4.58 = 56.84 < SD1 range < 66.1 are 17 out of 25. So 68% of women are within 1 SD range # of females in 2 SD range (61.52 - 2x4.58) an 61.52 + 2x4.58) = 24 so 96% of women are within 2 SD range and almost all are within 3 SD range. Hence it follows 68-95-99.7% Rule

  1. The line curve over the bar graph is very symmetric and follows normal distribution. Majority of the data is within the line curve with a few outliers but nearly normal
17/25
## [1] 0.68
61.52 - (2*4.58)
## [1] 52.36
61.52 + (2*4.58)
## [1] 70.68
24/25
## [1] 0.96

3.22 Defective Rate

  1. The probability that given the first 9 are without defect, the 10th transistor is produced with a defect.
    Since these are independent events, P(Defect) = 0.02 P(no defect) = 0.98 P(no defect) first 9 times = (0.98)^9 P(defect | no defects 9 times) = (0.98)^9 * 0.02 = 0.0167 = 1.7%
  1. Probability of no defect in the batch of 100 = (9.98)^100 = 0.1326 = 13.26%

  2. On average, how many transistors would you expect to be produced before the first with a defect?

\(\mu \quad =\quad 1 / 0.02\) = 50

\(\quad \sigma \quad =\sqrt { (1-0.02)/{ 0.02}^{2}}\) = 49.5

  1. Another machine that also produces transistors has a 5% defective rate where each transistor is produced independent of the others. On average how many transistors would you expect to be produced with this machine before the first with a defect? What is the standard deviation?

\(\mu \quad =\quad 1 / 0.05\) = 20

\(\quad \sigma \quad =\sqrt { (1-0.05)/{ 0.05}^{2}}\) = 19.5

  1. Based on your answers to parts (c) and (d), how does increasing the probability of an event affect the mean and standard deviation of the wait time until success?

As the probability of an event increases the chances of the event occuring are higher hence the average of the event not occuring reduces. So in this case when the probability was more than doubled, mean and SD were less than half.

#a
(0.98)^9 * 0.02
## [1] 0.01667496
#b
(0.98)^100
## [1] 0.1326196
#c
mn <- 1/0.02
mn
## [1] 50
sd <- sqrt((1-0.02)/0.02^2)
sd
## [1] 49.49747
#d
#c
mn2 <- 1/0.05
mn2
## [1] 20
sd2 <- sqrt((1-0.05)/0.05^2)
sd2
## [1] 19.49359

3.38 Male children.

While it is often assumed that the probabilities of having a boy or a girl are the same, the actual probability of having a boy is slightly higher at 0.51. Suppose a couple plans to have 3 kids.

  1. Use the binomial model to calculate the probability that two of them will be boys.

The probability that the two of them will be boys is 0.38

Pr <- .51
Pr <- dbinom(2,3,Pr)
Pr
## [1] 0.382347
  1. Write out all possible orderings of 3 children, 2 of whom are boys. Use these scenarios to calculate the same probability from part (a) but using the addition rule for disjoint outcomes. Confirm that your answers from parts (a) and (b) match.
    Below are the three possible outcomes each outcome with a probability of (0.510.510.49)

bbg bgb gbb

## addition rule for disjoint outcomes

(0.51*0.51*0.49) + (0.51*0.49*0.51) + (0.49*0.51*0.51)
## [1] 0.382347
factorial(5)
## [1] 120
#(c)

n<- 8
r<- 3
factorial(n)/(factorial(r)*factorial(n-r))
## [1] 56
  1. If we wanted to calculate the probability that a couple who plans to have 8 kids will have 3 boys, briefly describe why the approach from part (b) would be more tedious than the approach from part (a).

The number of different ways a couple can have 3 boys is 56 so with approach (b) we will have to calculate the probability of having 3 boys and add up 56 times.

3.42

Serving in volleyball. A not-so-skilled volleyball player has a 15% chance of making the serve, which involves hitting the ball so it passes over the net on a trajectory such that it will land in the opposing team’s court. Suppose that her serves are independent of each other.

  1. What is the probability that on the 10th try she will make her 3rd successful serve?
# Probability of 2 successes in 9 trials
pr <- 0.15
pr2 <- dbinom(2,9,pr)

# probability that the 10th try will be the 3rd succcess
pr2*pr
## [1] 0.03895012
  1. Suppose she has made two successful serves in nine attempts. What is the probability that her 10th serve will be successful?

Since these are independent trials, the probability that her 10th serve will be successful remains 0.15

  1. Even though parts (a) and (b) discuss the same scenario, the probabilities you calculated should be different. Can you explain the reason for this discrepancy?

Part (a) asks for probability of combined results with 2 out of 9 success along with the 10 try to be successful, whereas (b) is determining only the probability of the 10th try