The following is the graded problems for the Distribution of Random Variable chapter of Open Intro to Statistics. My answers to the questions in bold. The headers correspond to a question.
In triathlons, it is common for racers to be placed into age and gender groups. Friends Leo and Mary both completed the Hermosa Beach Triathlon, where Leo competed in the Men, Ages 30 - 34 group while Mary competed in the Women, Ages 25 - 29 group. Leo completed the race in 1:22:28 (4948 seconds), while Mary completed the race in 1:31:53 (5513 seconds). Obviously Leo finished faster, but they are curious about how they did within their respective groups. Can you help them? Here is some information on the performance of their groups:
\[ N_{\text{men 30 to 34}}(\mu=4313, \sigma=583) \] \[ N_{\text{women 25 to 29}}(\mu=5261, \sigma=807) \]
z_leo <- (4948-4313)/583
z_leo
## [1] 1.089194
z_mary <- (5513-5261)/807
z_mary
## [1] 0.3122677
Did Leo or Mary rank better in their respective groups? Explain your reasoning. Mary because her Z score is lower so she is closer to the mean than Leo.
What percent of the triathletes did Leo finish faster than in his group? About 14%
# Faster would be the area to the left of Leo's time
1 - pnorm(z_leo)
## [1] 0.1380342
1 - pnorm(z_mary)
## [1] 0.3774186
Below are heights of 25 female college students.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
54, | 55, | 56, | 56, | 57, | 58, | 58, | 59, | 60, | 60, | 60, | 61, | 61, | 62, | 62, | 63, | 63, | 63, | 64, | 65, | 65, | 67, | 67, | 69, | 73 |
heights <- c(54, 55, 56, 56, 57, 58, 58, 59, 60, 60, 60, 61, 61, 62, 62, 63, 63, 63, 64, 65, 65, 67, 67, 69, 73)
mu <- mean(heights)
sigma <- sd(heights)
for(n_sigma in 1:3){
# Return a Boolean vector
boolean_vector <- heights < (mu + n_sigma * sigma) & heights > (mu + -n_sigma * sigma)
# Count the TRUEs
n_true <- length(boolean_vector[boolean_vector==TRUE])
# Share TRUE
print(n_true / length(heights))
}
## [1] 0.68
## [1] 0.96
## [1] 1
A machine that produces a special type of transistor (a component of computers) has a 2% defective rate. The production is considered a random process where each transistor is independent of the others.
probability_of_defect <- 0.02
probability_of_10th_first_defect = 1
for(i in 1:9){
probability_of_10th_first_defect <- probability_of_10th_first_defect * (1 - probability_of_defect)
}
probability_of_10th_first_defect <- probability_of_10th_first_defect * probability_of_defect
probability_of_10th_first_defect
## [1] 0.01667496
This matches the geometric distribution formula’s \((1-p)^{n-1}\times p\)
((1 - 0.02)^(10-1))*0.02
## [1] 0.01667496
probability_of_no_defect = 1
for(i in 1:100){
probability_of_no_defect <- probability_of_no_defect * (1 - probability_of_defect)
}
probability_of_no_defect
## [1] 0.1326196
sigma <- sqrt(probability_of_defect*(1 - probability_of_defect))
sigma
## [1] 0.14
sigma <- sqrt(0.05*(1 - 0.05))
sigma
## [1] 0.2179449
While it is often assumed that the probabilities of having a boy or a girl are the same, the actual probability of having a boy is slightly higher at 0.51. Suppose a couple plans to have 3 kids.
k <- 2
n <- 3
p <- 0.51
((factorial(n)/(factorial(k)*factorial(n-k))) * p^k)*(1-p)^(n-k)
## [1] 0.382347
p_gbb <- (1 - p) * p * p
p_bgb <- p * (1 - p) * p
p_bbg <- p * p * (1 - p)
p_gbb + p_gbb + p_bbg
## [1] 0.382347
A not-so-skilled volleyball player has a 15% chance of making the serve, which involves hitting the ball so it passes over the net on a trajectory such that it will land in the opposing team’s court. Suppose that her serves are independent of each other.
# Compute the chances of having 2 successful serve out of 9
k <- 2
n <- 9
p <- 0.15
p_2_out_of_9 <- ((factorial(n)/(factorial(k)*factorial(n-k))) * p^k)*(1-p)^(n-k)
# Now factor in the chance of her 10th being a success
p_2_out_of_9 * p
## [1] 0.03895012
Suppose she has made two successful serves in nine attempts. What is the probability that her 10th serve will be successful? 15% because each serve is independent of the others.
Even though parts (a) and (b) discuss the same scenario, the probabilities you calculated should be different. Can you explain the reason for this discrepancy? The probability in (b) is for a single event, while the probability in (a) is for a series of events. That’s why they are different.