A researcher is interested in whether the heights of St. Ann's HS senior differs from the national average of 67". To test this, a sample of 8 Saint Ann's seniors is selected at random and their heights are measured.
\[ H_{0}: \mu = 67 \: inches \\ H_{a}: \mu \neq 67 \: inches \]
Their heights (in inches) are : 70, 66, 69, 69, 68, 70, 73, 69
sample_heights <- c(70,66,69,69,68,70,73,69)
stem(sample_heights, scale=2)
The decimal point is at the |
66 | 0
67 |
68 | 0
69 | 000
70 | 00
71 |
72 |
73 | 0
This kind of plot is easy to do by hand, of course, and serves as a check of whether the data is roughly normally distributed.
mean(sample_heights)
[1] 69.25
sd(sample_heights)
[1] 1.982
Our sample has a mean of 69.25" and a sample standard deviation* of 1.98"
Note: This SD is an estimate of standard deviation of the St. Ann's population. The SD of the sample is only 1.85.
Population Variance \[ \begin{equation} \label{E:population variance} \sigma^{2}_{x} = \frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n} \end{equation} \] Sample Variance \[ \begin{equation} \label{E:sample variance} s^{2}_{x} = \frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n-1} \end{equation} \]
Population SD \[ \begin{equation} \label{E:population sd} \sigma_{x} = \sqrt{\frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n}} \end{equation} \] Sample SD \[ \begin{equation} \label{E:sd} s_{x} = \sqrt{\frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n-1}} \end{equation} \]
Estimate the standard deviation of the full population with the sample SD. \[ \begin{equation} \label{E:estimate of population SD} \hat{\sigma} = s_{x}\\ \end{equation} \] Then divide by the square root of n to find the standard deviation in the means of random samples of size n. \[ \begin{equation} \label{E:estimate of SDof sample average} \hat{\sigma_{\bar{x}}} = \frac{\hat{\sigma}}{\sqrt{n}} \end{equation} \]
\[ z-score/z-statistic = \frac{x - \mu}{\sigma_{x}}\\ t-score/t-statistic = \frac{\bar{x} - \mu_0}{\hat{\sigma_{\bar{x}}}} = \frac{\bar{x} - \mu_0}{\frac{s_{x}}{\sqrt{n}}} \] where: \[ \bar{x} = sample \: mean \\ \mu_0 = null \: hypothesis \: mean \\ \sigma_{\bar{x}} = estimate \: of \: standard \: deviation \: \\ in \: sample \: means \: given \: the \: null \: hypothesis\\ s_{x} = sample \: standard \: deviation \: (dividing \: by \: n-1) \]
t_stat <- (mean(sample_heights) - 67)/
(sd(sample_heights)/sqrt(8))
t_stat
[1] 3.211
While z-statistics follow a normal distribution, t statistics follow a distribution with fatter tails due to added uncertainty since the standard deviation was estimated rather than known.
The more degrees of freedom used to estimate the spread around the mean, the more accurately it is known. With an infinite number of degrees of freedom, t statistics simply follow a normal distribution since the standard deviation is essentially known.
degrees of freedom (dof) = n - 1 (since the SD is estimated around the mean, one contraint)
t_stat <- (mean(sample_heights) - 67)/
(sd(sample_heights)/sqrt(8))
t_stat
[1] 3.211
1-pt(t_stat, df=7) #one tailed
[1] 0.007421
2*(1-pt(t_stat, df=7)) #two tailed
[1] 0.01484
mean(sample_heights)+qt(c(.025,.975),df=7)*
(sd(sample_heights)/sqrt(8)) # 95%
[1] 67.59 70.91
mean(sample_heights)+qt(c(.1,.9),df=7)*
(sd(sample_heights)/sqrt(8)) # 80%
[1] 68.26 70.24
t.test(sample_heights, mu=67)
One Sample t-test
data: sample_heights
t = 3.211, df = 7, p-value = 0.01484
alternative hypothesis: true mean is not equal to 67
95 percent confidence interval:
67.59 70.91
sample estimates:
mean of x
69.25