A local weather app developer insists that 90% of London students carry an umbrella on days with a rain forecast. A data-minded student observes 75 randomly chosen students on a rainy morning and counts 61 umbrellas. Is this evidence that fewer than 90% actually come prepared? Conduct a hypothesis test to check this.
Our Null Hypothesis is that the true proportion of London students that carry an umbrella is 0.9.
Our Alternative Hypothesis comes from the question, which states that “fewer than 90% actually come prepared’” with an umbrella. The fewer suggestions we use an alternative that says the true proportion is less than 0.9. This is a one-sided alternative hypothesis.
To recap: Null Hypothesis: \(p = 0.9\) Alternative Hypothesis: \(p < 0.9\)
We record the true proportion as “p” below:
p <- 0.9
p
## [1] 0.9
Our sample comes from the fact that “75 students were observed, and 61 were carrying umbrellas.” This means our sample size is 75, and our sample proportion is 61/75. We record this below:
n <- 75
p_hat <- 61/75
p_hat
## [1] 0.8133333
The Central Limit Theorem says that the sampling distribution for proportions, with true proportion \(p\) and with sample size \(n\), is given as a Normal distribution with center \(p\) and standard deviation \(\sqrt{\frac{p(1-p)}{n}}\). Using our formulas and our values of \(p\) and \(n\), we calculte:
Normal_mean <- p
Normal_sd <- sqrt(p*(1-p)/n)
Normal_mean
## [1] 0.9
Normal_sd
## [1] 0.03464102
We see that the model to use is the normal model with a mean of 0.9 and a standard deviation of 0.03464102.
We calculate the z-score of our sample value \(\hat{p}\) by using our z-score formula, which takes the sample value; subtracts the center of the normal model; and then divides by the standard deviation of the normal model. The computation is below:
p_hat_zscore <- (p_hat - Normal_mean)/Normal_sd
p_hat_zscore
## [1] -2.501851
So as we can see, the z-score is -2.501851. We calculate the corresponding P-value using the pnorm() command in Rstudio. We could also have computed it using a z-table.
P_value <- pnorm(p_hat_zscore, mean=0, sd=1)
## pnorm() gives us the area on the normal model to the LEFT of the z-score.
## This is what we need for this particular problem, since our alternative
## hypothesis was one-sided on the left.
## If our alternative was 2-sided, we would need to double (this number).
## If our alternative was 1-sided on the right, we would need to take 1-(this number)
P_value
## [1] 0.006177293
As we can see, the P-value is 0.006177293.
Finally, we draw our conclusion. A P-value of 0.006 means that IF THE NULL HYPOTHESIS WERE TRUE, and if the true proportion of London students that carry an umbrella was 0.9, then the chances of getting a sample that looks like ours (where 61 out of 75 students carried an umbrella), or is more extreme than ours, is quite rare. It will occur at a rate of only 0.006, or 0.6%, of the time. This is less than 1%, and is well below our alpha threshold of 0.05 (or 5%).
This leads us to reject the null hypothesis and accept the alternative hypothesis. It is likely that the true proportion of students who carry umbrellas is less than 0.9, and this is why our sample proportion was so small.
It is rumored that 20% of students ever fall asleep on the Tube. However, a sociology student interviews 80 London undergraduates and 23 admit to having done so. Is this evidence that something other than 20% of students nap underground? Construct a hypothesis test to check this.
Our Null Hypothesis is that the true proportion of students that sleep on the tubes is 0.2.
Our Alternative Hypothesis comes from the question, which states that “fewer than 20% actually sleep on the tube” . The fewer suggestions we use an alternative that says the true proportion is not equal to 0.2. This is a one-sided alternative hypothesis.
To recap: Null Hypothesis: \(p = 0.2\) Alternative Hypothesis: \(p ≠ 0.2\)
We record the true proportion as “p” below:
p <- 0.2
p
## [1] 0.2
Our sample comes from the fact that “80 students were asked, and 23 slept on the tube.” This means our sample size is 80, and our sample proportion is 23/80. We record this below:
n <- 80
p_hat <- 23/80
p_hat
## [1] 0.2875
The Central Limit Theorem says that the sampling distribution for proportions, with true proportion \(p\) and with sample size \(n\), is given as a Normal distribution with center \(p\) and standard deviation \(\sqrt{\frac{p(1-p)}{n}}\). Using our formulas and our values of \(p\) and \(n\), we calculte:
Normal_mean <- p
Normal_sd <- sqrt(p*(1-p)/n)
Normal_mean
## [1] 0.2
Normal_sd
## [1] 0.04472136
We see that the model to use is the normal model with a mean of 0.2 and a standard deviation of 0.04472136.
We calculate the z-score of our sample value \(\hat{p}\) by using our z-score formula, which takes the sample value; subtracts the center of the normal model; and then divides by the standard deviation of the normal model. The computation is below:
p_hat_zscore <- (p_hat - Normal_mean)/Normal_sd
p_hat_zscore
## [1] 1.956559
So as we can see, the z-score is 1.956559. We calculate the corresponding P-value using the pnorm() command in Rstudio. We could also have computed it using a z-table.
P_value <- 1-pnorm(p_hat_zscore, mean=0, sd=1)
## pnorm() gives us the area on the normal model to the LEFT of the z-score.
## This is what we need for this particular problem, since our alternative
## hypothesis was one-sided on the left.
## Since our alternative was 2-sided, we would need to double P_value(0.02519964).
P_value*2
## [1] 0.05039928
As we can see, the P-value is 0.0504.
Finally, we can construct our conclusion. Because the P value of 0.0504 is slightly greater than our alpha threshold of 0.05, we fail to reject the null hypothesis. There is sufficient evidence that the proportion of tube nappers is equal to 20%.