One Sample t test

alt text

The Data

A researcher is interested in whether the heights of St. Ann's HS senior differs from the national average of 67". To test this, a sample of 8 Saint Ann's seniors is selected at random and their heights are measured.

\[ H_{0}: \mu = 67 \: inches \\ H_{a}: \mu \neq 67 \: inches \]

Their heights (in inches) are : 70, 66, 69, 69, 68, 70, 73, 69

sample_heights <- c(70,66,69,69,68,70,73,69)

First Plot the Data

stem(sample_heights, scale=2)

  The decimal point is at the |

  66 | 0
  67 | 
  68 | 0
  69 | 000
  70 | 00
  71 | 
  72 | 
  73 | 0

This kind of plot is easy to do by hand, of course, and serves as a check of whether the data is roughly normally distributed.

Some Calculations

mean(sample_heights)
[1] 69.25
sd(sample_heights)
[1] 1.982

Our sample has a mean of 69.25" and a sample standard deviation* of 1.98"

Note: This SD is an estimate of standard deviation of the St. Ann's population. The SD of the sample is only 1.85.

Two Variance Equations

Population Variance \[ \begin{equation} \label{E:population variance} \sigma^{2}_{x} = \frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n} \end{equation} \] Sample Variance \[ \begin{equation} \label{E:sample variance} s^{2}_{x} = \frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n-1} \end{equation} \]

Two SD Equations

Population SD \[ \begin{equation} \label{E:population sd} \sigma_{x} = \sqrt{\frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n}} \end{equation} \] Sample SD \[ \begin{equation} \label{E:sd} s_{x} = \sqrt{\frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n-1}} \end{equation} \]

Standard Deviation in the Sample Mean

Estimate the standard deviation of the full population with the sample SD. \[ \begin{equation} \label{E:estimate of population SD} \hat{\sigma} = s_{x}\\ \end{equation} \] Then divide by the square root of n to find the standard deviation in the means of random samples of size n. \[ \begin{equation} \label{E:estimate of SDof sample average} \hat{\sigma_{\bar{x}}} = \frac{\hat{\sigma}}{\sqrt{n}} \end{equation} \]

t statistic v. z statistic

\[ z-score/z-statistic = \frac{x - \mu}{\sigma_{x}}\\ t-score/t-statistic = \frac{\bar{x} - \mu_0}{\hat{\sigma_{\bar{x}}}} = \frac{\bar{x} - \mu_0}{\frac{s_{x}}{\sqrt{n}}} \] where: \[ \bar{x} = sample \: mean \\ \mu_0 = null \: hypothesis \: mean \\ \sigma_{\bar{x}} = estimate \: of \: standard \: deviation \: \\ in \: sample \: means \: given \: the \: null \: hypothesis\\ s_{x} = sample \: standard \: deviation \: (dividing \: by \: n-1) \]

Code for to Calculating the t statistic

t_stat <- (mean(sample_heights) - 67)/
  (sd(sample_heights)/sqrt(8))
t_stat
[1] 3.211

The t distribution

While z-statistics follow a normal distribution, t statistics follow a distribution with fatter tails due to added uncertainty since the standard deviation was estimated rather than known.

The more degrees of freedom used to estimate the spread around the mean, the more accurately it is known. With an infinite number of degrees of freedom, t statistics simply follow a normal distribution since the standard deviation is essentially known.

degrees of freedom (dof) = n - 1 (since the SD is estimated around the mean, one contraint)

The Shape of t distributions

plot of chunk unnamed-chunk-6

p-values

t_stat <- (mean(sample_heights) - 67)/
  (sd(sample_heights)/sqrt(8))
t_stat
[1] 3.211
1-pt(t_stat, df=7) #one tailed
[1] 0.007421
2*(1-pt(t_stat, df=7)) #two tailed
[1] 0.01484

Confidence Intervals around Sample Mean

mean(sample_heights)+qt(c(.025,.975),df=7)*
  (sd(sample_heights)/sqrt(8)) # 95%
[1] 67.59 70.91

mean(sample_heights)+qt(c(.1,.9),df=7)*
  (sd(sample_heights)/sqrt(8)) # 80%
[1] 68.26 70.24

For the truly lazy...

t.test(sample_heights, mu=67)

    One Sample t-test

data:  sample_heights
t = 3.211, df = 7, p-value = 0.01484
alternative hypothesis: true mean is not equal to 67
95 percent confidence interval:
 67.59 70.91
sample estimates:
mean of x 
    69.25