Problem 1: Umbrella Optimism.

A local weather app developer insists that 90% of London students carry an umbrella on days with a rain forecast. A data-minded student observes 75 randomly chosen students on a rainy morning and counts 61 umbrellas. Is this evidence that fewer than 90% actually come prepared? Conduct a hypothesis test to check this.

(a)

Our Null Hypothesis is that the true proportion of London students that carry an umbrella is 0.9.

Our Alternative Hypothesis comes from the question, which states that “fewer than 90% actually come prepared’” with an umbrella. The fewer suggestions we use an alternative that says the true proportion is less than 0.9. This is a one-sided alternative hypothesis.

To recap: Null Hypothesis: \(p = 0.9\) Alternative Hypothesis: \(p < 0.9\)

We record the true proportion as “p” below:

p <- 0.9

p
## [1] 0.9

(b)

Our sample comes from the fact that “75 students were observed, and 61 were carrying umbrellas.” This means our sample size is 75, and our sample proportion is 61/75. We record this below:

n <- 75
p_hat <- 61/75

p_hat
## [1] 0.8133333

(c)

The Central Limit Theorem says that the sampling distribution for proportions, with true proportion \(p\) and with sample size \(n\), is given as a Normal distribution with center \(p\) and standard deviation \(\sqrt{\frac{p(1-p)}{n}}\). Using our formulas and our values of \(p\) and \(n\), we calculte:

Normal_mean <- p
Normal_sd <- sqrt(p*(1-p)/n)

Normal_mean
## [1] 0.9
Normal_sd
## [1] 0.03464102

We see that the model to use is the normal model with a mean of 0.9 and a standard deviation of 0.03464102.

(d)

We calculate the z-score of our sample value \(\hat{p}\) by using our z-score formula, which takes the sample value; subtracts the center of the normal model; and then divides by the standard deviation of the normal model. The computation is below:

p_hat_zscore <- (p_hat - Normal_mean)/Normal_sd

p_hat_zscore
## [1] -2.501851

So as we can see, the z-score is -2.501851. We calculate the corresponding P-value using the pnorm() command in Rstudio. We could also have computed it using a z-table.

P_value <- pnorm(p_hat_zscore, mean=0, sd=1)

## pnorm() gives us the area on the normal model to the LEFT of the z-score.

## This is what we need for this particular problem, since our alternative
## hypothesis was one-sided on the left.

## If our alternative was 2-sided, we would need to double (this number).
## If our alternative was 1-sided on the right, we would need to take 1-(this number)

P_value
## [1] 0.006177293

As we can see, the P-value is 0.006177293.

(e)

Finally, we draw our conclusion. A P-value of 0.006 means that IF THE NULL HYPOTHESIS WERE TRUE, and if the true proportion of London students that carry an umbrella was 0.9, then the chances of getting a sample that looks like ours (where 61 out of 75 students carried an umbrella), or is more extreme than ours, is quite rare. It will occur at a rate of only 0.006, or 0.6%, of the time. This is less than 1%, and is well below our alpha threshold of 0.05 (or 5%).

This leads us to reject the null hypothesis and accept the alternative hypothesis. It is likely that the true proportion of students who carry umbrellas is less than 0.9, and this is why our sample proportion was so small.


Problem 2: Tube Nappers.

It is rumored that 20% of students ever fall asleep on the Tube. However, a sociology student interviews 80 London undergraduates and 23 admit to having done so. Is this evidence that something other than 20% of students nap underground?

(a)

The claim is “other than 20%,” so our alternative is two-sided.

Null Hypothesis: \(p = 0.20\)
Alternative Hypothesis: \(p \neq 0.20\)

p <- 0.20
p
## [1] 0.2

(b)

We have 80 students sampled, with 23 saying yes. So:

n <- 80
p_hat <- 23/80
p_hat
## [1] 0.2875

(c)

We use the normal model with:

Mean = \(p\)
SD = \(\sqrt{\frac{p(1-p)}{n}}\)

Normal_mean <- p
Normal_sd <- sqrt(p*(1-p)/n)

Normal_mean
## [1] 0.2
Normal_sd
## [1] 0.04472136

(d)

Compute the z-score and two-sided P-value:

p_hat_zscore <- (p_hat - Normal_mean)/Normal_sd
p_hat_zscore
## [1] 1.956559

Two-sided P-value:

P_value <- 2*(1 - pnorm(p_hat_zscore))
P_value
## [1] 0.05039928

(e)

If the P-value is below 0.05, we reject the null hypothesis; otherwise we fail to reject.
We interpret this P-value in context: it tells us how unusual our sample proportion (23/80) would be if exactly 20% of all students nap on the Tube.

Problem 3: Pret vs Tesco Lunch Wars.

To see whether Pret really dominates Tesco amongst London student lunches, 180 study-abroad students are surveyed. Of these, 100 say they prefer Pret A Manger over Tesco. Construct a 90% confidence interval for the proportion of students who are regular Pret customers. We will do this in steps:

(a)

Here the sample proportion is:

p_hat <- 100/180
p_hat
## [1] 0.5555556

Our sample size is:

n <- 180
n
## [1] 180

(b)

The standard error is:

SE <- sqrt(p_hat*(1-p_hat)/n)

SE
## [1] 0.03703704

(c)

For a 90% confidence interval, the critical value is \(z^* = 1.645\).

z_star <- 1.645
z_star
## [1] 1.645

(d)

The full confidence interval is:

ME <- z_star * SE
Lower_int <- p_hat - ME
Upper_int <- p_hat + ME

Lower_int
## [1] 0.4946296
Upper_int
## [1] 0.6164815

Problem 4: The Traditional Sunday Roast.

A group of American students wants to know how many of their London classmates actually eat a traditional Sunday roast each week. They survey 100 students and find that 30 do. Construct a 95% confidence interval for the proportion of students who keep this weekly tradition.

(a)

Here the sample proportion is:

p_hat <- 30/100
p_hat
## [1] 0.3

Our sample size is:

n <- 100
n
## [1] 100

(b)

The standard error is:

SE <- sqrt(p_hat*(1-p_hat)/n)

SE
## [1] 0.04582576

(c)

For a 95% confidence interval, the critical value is \(z^* = 1.96\).

z_star <- 1.96
z_star
## [1] 1.96

(d)

The full confidence interval is:

ME <- z_star * SE
Lower_int <- p_hat - ME
Upper_int <- p_hat + ME

Lower_int
## [1] 0.2101815
Upper_int
## [1] 0.3898185