# Load standard libraries
library(tidyverse)
In triathlons, it is common for racers to be placed into age and gender groups. Fred and Catarina both completed the Hermosa Beach Triathlon, where Fred competed in the Men, Ages 30 - 34 group while Catarina competed in the Women, Ages 25 - 29 group. Fred completed the race in 1:22:28 (4948 seconds), while Catarina completed the race in 1:31:53 (5513 seconds). They are curious about how they did within their respective groups.
Here is some information on the performance of their groups:
Remember: a better performance corresponds to a faster finish.
z_Fred <- (4948-4313)/583
z_Catarina <- (5513-5261)/807
cat("\nZ Score for Fred = ", z_Fred)
##
## Z Score for Fred = 1.089194
cat("\nZ Score for Catarina = ", z_Catarina)
##
## Z Score for Catarina = 0.3122677
pnorm(4948,4313,583)
## [1] 0.8619658
pnorm(5513,5261,807)
## [1] 0.6225814
As per the probability distribution function of the men’s finishing time, Fred’s finishing time lies in the 86th percentile, that is to say that his finishing time was more than about 86% of the total participants. Therefore, about 86% people did better on the race (had lesser finishing times) than Fred.
As per the probability distribution function of the women’s finishing time, Catarina’s finishing time lies in the 62nd percentile, that is to say that her finishing time was more than about 62% of the total participants. Therefore, about 62% people did better on the race (had lesser finishing times) than Catarina.
1-pnorm(4948,4313,583)
## [1] 0.1380342
1-pnorm(5513,5261,807)
## [1] 0.3774186
In the following situations we assume that half of the specified population is male and the other half is female.
cat("\nProbability of sampling two females in a row when sampling with replacement =",(5/10)*(5/10))
##
## Probability of sampling two females in a row when sampling with replacement = 0.25
cat("\nProbability of sampling two females in a row when sampling without replacement =",(5/10)*(4/9))
##
## Probability of sampling two females in a row when sampling without replacement = 0.2222222
cat("\nProbability of sampling two females in a row when sampling with replacement =",0.5*0.5)
##
## Probability of sampling two females in a row when sampling with replacement = 0.25
cat("\nProbability of sampling two females in a row when sampling without replacement =",(0.5)*(4999/9999))
##
## Probability of sampling two females in a row when sampling without replacement = 0.249975
This assumption hold true and is demonstrated in the above parts. When the population was small (10 people) there was a significant variation between the probabilities of getting two females in the two cases of with and without replacement. However, when the population size was huge (10000 people) the probabilities of getting 2 females in a row in both the cases of with and without replacement were nearly same.
Thus, it is reasonable to treat individuals who are sampled from a large population as independent for all practical purposes.
You are given the following hypotheses: \(H_0: \mu = 34\), \(H_A: \mu > 34\). We know that the sample standard deviation is 10 and the sample size is 65. For what sample mean would the p-value be equal to 0.05? Assume that all conditions necessary for inference are satisfied.
z <- qnorm(0.95) #since p-value = 0.05
#standard error = s.e. = sigma/n^0.5
se <- 10/(65^0.5)
# z = (X-mu)/se
# => X = z*se + mu
x <- (z*se) + 34
x
## [1] 36.04019