One-sample means with the normal distribution
We’ve taken a random sample of 100 runners from a race called the Cherry Blossom Run in Washington, DC, which was a race with 16,924 participants. The sample data for the 100 runners is summarized in Table 4.1, histograms of the run time and age of participants are in Figure 4.2, and summary statistics are available in Table 4.3.
Goal: create a 95% confidence interval for the average time it takes runners in the Cherry Blossom Run to complete the race.
First, we import the dataset summarizing the finishing times, age, gender, state, and other data about each runner:
run10 <- read.delim("run10.txt")
View(run10)
For the case of a single mean, the standard error of the sample mean can be calculated as \[SE = \frac{\sigma}{\sqrt{n}}\]
where \(\sigma\) is the population standard deviation and \(n\) is the sample size. Generally we use the sample standard deviation, denoted by \(s\), in place of the population standard deviation when we compute the standard error: \[SE = \frac{s}{\sqrt{n}}.\]
Is the typical US runner getting faster or slower over time? We consider this question in the context of the Cherry Blossom Run, comparing runners in 2006 and 2012. Technological advances in shoes, training, and diet might suggest runners would be faster in 2012. An opposing viewpoint might say that with the average body mass index on the rise, people tend to run slower. In fact, all of these components might be influencing run time. The average time for all runners who finished the Cherry Blossom Run in 2006 was \(93.29\) minutes (\(93\) minutes and about \(17\) seconds). We want to determine using data from 100 participants in the 2012 Cherry Blossom Run whether runners in this race are getting faster or slower, versus the other possibility that there has been no change
- What are appropriate hypotheses for this context?
Unknown standard errors
If we want to run a hypothesis test, we will have a null hypothesis about the true value of the population mean \(\mu\). For example,
\[H_{0}: \mu = 93.29\text{ minutes}\]
Now we gather a sample and compute the sample mean:
mean(run10$time)
[1] 94.51919
We would like to be able to compare the sample mean \(\bar{y}\) to the hypothesized value 93.29 using a z score:
\[z = \frac{(\bar{y} - \mu)}{\sigma/\sqrt{n}} = \frac{(94.519 - 93.29)}{\sigma/\sqrt{16924}}.\]
However, we have a problem: we don’t know the true value of \(\sigma\). We just know the standard deviation of our sample, not of all US runners.
The best we can do with a sample is calculate this z score replacing the unknown \(\sigma\) with the sample standard deviation \(s\), 15.9271567.
sd(run10$time)
[1] 15.92716
\[z = \frac{(\bar{y} - \mu)}{s/\sqrt{n}} = \frac{(94.519-93.29)}{15.93/\sqrt{16924}} = 10.04.\]
- Using this z-score, decide whether to reject or fail to reject the null hypothesis.
Introducing the \(t\) distribution
The problem is that \(s\) is not a perfect estimate of \(\sigma\). We saw earlier that \(s\) is usually close to \(\sigma\), but \(s\) has its own sampling variability. That means that our test above in which we assumed that \(\sigma\) was known and equal to 15.93 was wrong for the type of situation that will arise when we run a hypothesis test.
In order to address this issue, we’ll use a distribution that’s similar to a normal distribution (it looks like a bell curve) but has a somewhat different shape. This distribution is called the \(t\) distribution and is especially useful for smaller datasets.
Since time is a numerical variable, we can also summarize it using favstats:
library(mosaic)
favstats(~ time, data = run10)
Here is a histogram.
ggplot(run10, aes(x = time)) +
geom_histogram(binwidth = 5)

And here is a QQ plot.
ggplot(run10, aes(sample = time)) +
geom_qq()

- Does it appear that the time it took for runners to complete the Cherry Blossom Run follows a normal distribution? Explain your answer.
Even though the time seems to follow a normal distribution, it’s safest to use a \(t\) distribution as our sampling distribution in order to account for our error in estimating the population standard deviation \(\sigma\) using the sample standard deviation \(s\).
The \(t\) distribution has only one defining characteristic: the number of degrees of freedom \(df\). The number of degrees of freedom is one less than the sample size:
\[ df=n-1 \]
Compute and report the test statistic.
library(broom)
time_test <- t.test(~ time, data = run10, mu = 93.29)
time_test_tidy <- tidy(time_test)
time_test_tidy
Our test statistic (the equivalent of a z-score in a \(t\) distribution, called a \(t\) score) is:
t1 <- time_test_tidy$statistic
t1
[1] 10.03999
The t score is 10.0399853.
Commentary: In addition to identifying the variable of interest (using the tilde) and the data frame, the t.test command requires the null value with the argument mu. We then tidy the results as usual.
The tidy output stores the following numbers: the sample mean (estimate), the test statistic (statistic), the P-value (p.value), the degrees of freedom (parameter), and limits for the confidence interval (conf.low and conf.high). The test statistic is the t score that we stored above as t1.
Plot the null distribution.
pdist("t", df = time_test_tidy$parameter,
q = c(-t1, t1),
invisible = TRUE)

- Would you choose to reject or fail to reject the null hypothesis based on the histogram above? Explain your answer.
Notice that pretty much all the data is less extreme than the test statistic \(t = 10.03999\). This backs up our decision to reject the null hypothesis.
Commentary: The pdist command is the same as always except now we use a "t" model. We know there are 1.692310^{4} degrees of freedom, but we might as well use the value stored in the tidy output (the parameter variable) to make our code reusable. Finally, note that q = c(-t1, t1) will plot in both tails of the distribution since we’re running a two-sided test. Of course, since the t score is crazy huge, it doesn’t actually appear in the resulting graph.
Calculate and interpret the P-value.
P1 <- time_test_tidy$p.value
P1
[1] 1.18438e-23
\(P < 0.001\). If US runners truly averaged a time of \(93.29\) minutes, there would be a 1.1843810^{-21}% chance of seeing data at least as extreme as what we saw.
Commentary: When the P-value is this small, remember that it is traditional to report simply \(P < 0.001\).
Conclusion
State the statistical conclusion.
We reject the null hypothesis.
State (but do not overstate) a contextually meaningful conclusion.
Identify the possibility of either a Type I or Type II error and state what making such an error means in the context of the hypotheses.
Confidence interval
Check the relevant conditions to ensure that model assumptions are met.
Conditions for using the t distribution
Random
10%
Nearly normal: satisfied if either
We expect to see at least 10 successes/failures in each category, and there isn’t much skew; or
Our sample size is at least 30, and there’s no more than moderate amounts of skew.
- Are the conditions for using a \(t\) distribution satisfied for this dataset?
Calculate the confidence interval.
time_test_tidy$conf.low
[1] 94.27922
time_test_tidy$conf.high
[1] 94.75917
State (but do not overstate) a contextually meaningful interpretation.
Explain how the confidence interval reinforces the conclusion of the hypothesis test.
