Before we start, let’s write some functions to draw a normal distribution with certain areas of the plot shaded based on z-score.
Modified code from here:
http://rstudio-pubs-static.s3.amazonaws.com/78857_86c2403ca9c146ba8fcdcda79c3f4738.html
plot_setup <- function(mytitle,max_abs_sd){
plot(seq(-1*max_abs_sd,max_abs_sd,length=200),
dnorm(seq(-1*max_abs_sd,max_abs_sd,length=200),mean=0,sd=1),
type="l",xlab="Z-score",ylab="Density",main=mytitle)
abline(h=0)
}
shade_more_or_less_than_z <- function(z,direction,max_abs_sd){
if(direction == "less"){
x=seq(-1*max_abs_sd,z,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(-1*max_abs_sd,x,z),c(0,y,0),col="grey")
}
if(direction == "more"){
x=seq(z,max_abs_sd,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(z,x,max_abs_sd),c(0,y,0),col="grey")
}
}
shade_between_z <- function(z1,z2){
x=seq(z1,z2,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(z1,x,z2),c(0,y,0),col="grey")
}
Also may be useful - a function to get factorial of a number.
Inspired by the code here: https://www.datamentor.io/r-programming/examples/factorial
factorial <- function(num){
factorial = 1
if(num >= 1){
for(i in 1:num) {factorial = factorial * i}
}
return(factorial)
}
par(mfrow=c(2,2))
plot_setup("a",4)
shade_more_or_less_than_z(-1.13,"more",4)
plot_setup("b",4)
shade_more_or_less_than_z(0.18,"less",4)
plot_setup("c",10)
shade_more_or_less_than_z(8,"more",10)
abline(v=8,lty=2)
plot_setup("d",4)
shade_between_z(-0.5,0.5)
proportion_a <- 1 - pnorm(-1.13)
proportion_b <- pnorm(0.18)
proportion_c <- 1 - pnorm(8)
proportion_d <- pnorm(0.5) - pnorm(-0.5)
round(proportion_a*100,digits=2)
## [1] 87.08
round(proportion_b*100,digits=2)
## [1] 57.14
signif(proportion_c*100,3)
## [1] 6.66e-14
round(proportion_d*100,digits=2)
## [1] 38.29
1. a. Mu=4313, sigma=583;Mu=5261, sigma=807
2. b. See code. Leo's zscore of 1.1 indicates that his time was 1.1 standard deviations above the mean. Mary's zscore of 0.3 indicates that her time was 0.3 standard deviations above the mean.
3. c. Mary performed better in her group. Here, you want a lower z-score, as a lower time is better. So Mary's zscore of 0.3 is better than Leo's of 1.1.
4. d. See code. Here the pnorm function gives the proportion of triathletes who finished faster than Leo. So, we take 1 - this value to get the proportion who Leo finished faster than instead. We find that Leo finished faster than 14% of the triathletes in his group.
5. e. See code. Mary finished faster than 38% of the triathletes in her group.
6. f. If the distributions are not normal, the answer to b would stay the same, and the answers to d and e would definitely change. The answer to c may or may not change, depending on whether the men's and women's finishing times have a similar distribution in this case or not. The answers to d and e would change because we can no longer assumes the probabilities of finding people below/above a certain z-scores to be the same as in a normal distribution.
zscore <- function(value,mu,sigma){
deviation <- value - mu
return(deviation/sigma)
}
zscore_Leo <- zscore(4948,4313,583)
round(zscore_Leo,digits=3)
## [1] 1.089
zscore_Mary <- zscore(5513,5261,807)
round(zscore_Mary,digits=3)
## [1] 0.312
round((1 - pnorm(zscore_Leo))*100)
## [1] 14
round((1 - pnorm(zscore_Mary))*100)
## [1] 38
For 25 individuals, the 68-95-99.7% rule would mean that 17 individuals should be within 1 standard deviation of the mean.
95% of 25 is 23.75, so at least 23 (and more likely 24) individuals should be within 2 standard deviations of the mean.
99.7% of 25 is 24.925, so we would expect all individuals to be within 3 standard deviations of the mean under this rule.
heights <- c(54, 55, 56, 56, 57, 58, 58, 59, 60, 60, 60, 61, 61, 62, 62, 63, 63, 63, 64, 65, 65, 67, 67, 69, 73)
height_zscores <- rep(NA,times=length(heights))
for(i in 1:length(heights))
{
height_zscores[i] <- zscore(heights[i],mean(heights),sd(heights))
}
length(which(abs(height_zscores) < 1))
## [1] 17
length(which(abs(height_zscores) < 2))
## [1] 24
length(which(abs(height_zscores) < 3))
## [1] 25
The heights do indeed appear to follow the 68-95-99.7% rule.
The data also appear to follow a normal distribution according to the graphs provided, with the normal probability plot looking pretty close to following a straight line. The histogram is also unimodal and symmetric. There is one slight outlier to the right on the normal probability plot, but it still seems that this is a normal distribution.
percent_probability_tenth_is_first_with_defect <- (.98^9 * .02)*100
round(percent_probability_tenth_is_first_with_defect,digits=2)
## [1] 1.67
percent_probability_100_with_no_defect <- (0.98^100)*100
round(percent_probability_100_with_no_defect,digits=2)
## [1] 13.26
mean_events_before_success <- function(myp){
return(1/myp)
}
sd_events_before_success <- function(myp){
myconverse = 1 - myp
return(sqrt(myconverse/(myp^2)))
}
mean_events_before_success(.02)
## [1] 50
sd_events_before_success(.02)
## [1] 49.49747
mean_events_before_success(.05)
## [1] 20
sd_events_before_success(.05)
## [1] 19.49359
Increasing the probability of an event decreases the mean and standard deviation of the wait time until success.
binomial_probability <- function(k,n,p){
n_minus_k <- n - k
converse_p <- 1 -p
permutations <- factorial(n)/(factorial(k)*factorial(n_minus_k))
return(permutations*p^k*converse_p^n_minus_k)
}
binomial_probability(2,3,0.51)
## [1] 0.382347
Having two boys could occur as BBG, BGB, or GBB. So we get the probability like so.
0.51 * 0.51 * 0.49 + 0.51 * 0.49 * 0.51 + 0.49 * 0.51 * 0.51
## [1] 0.382347
Exact same result as using the binomial probability formula.
If we were to instead calculate the probability of 3 boys out of 8, we would be looking at a lot more permutations.
The number of permutations is given by the first part of the formula. Let’s calculate for this instance.
num_permutations_in_binomial_probability <- function(k,n){
n_minus_k <- n - k
permutations <- factorial(n)/(factorial(k)*factorial(n_minus_k))
return(permutations)
}
num_permutations_in_binomial_probability(3,8)
## [1] 56
We would have to write out 56 different equations for each permutation to add them together.
The probability for each permutation is the same (0.51 * 0.51 * 0.51 * 0.49 * 0.49 * 0.49 * 0.49 * 0.49), just in a different order each time. So it makes sense to just multiply by 56.
negative_binomial_probability <- function(k,n,p){
coefficient_numerator <- factorial((n - 1))
coefficient_denominator <- factorial((k - 1))*factorial((n - k))
coefficient <- coefficient_numerator/coefficient_denominator
n_minus_k <- n - k
converse_p <- 1 -p
return(coefficient*p^k*converse_p^n_minus_k)
}
negative_binomial_probability(3,10,0.15)
## [1] 0.03895012
There is a 3.895% chance that on the 10th try she will make her 3rd successful serve.
Compare this to the scenario where she has already made two successful serves in nine attempts. Here, the probability that her 10th serve will be successful is much higher, at 15%.
In the second scenario, we are only stating the probability of one event (making the serve), assuming that the previous event (her making two serves in nine tries) has already occured. In the full negative binomial probability, we are not assuming that she has already made the first two serves. The probability of her making 2 serves in 9 tries is given by calculating binomial probability with k=2,n=9, which equals ~0.26. If we multiply this by the probability of any given serve being a success (0.15), we get the exact same value as the negative binomial equation gives us.
binomial_probability(2,9,0.15)
## [1] 0.2596674
binomial_probability(2,9,0.15)*0.15
## [1] 0.03895012