Hypothesis testing

Hypothesis testing is concerned with making decisions using data
A null hypothesis is specified that represents the status quo, usually labeled $ H_0 $
The null hypothesis is assumed true and statistical evidence is required to reject it in favor of a research or alternative hypothesis

Example

A respiratory disturbance index of more than $ 30 $ events / hour, say, is considered evidence of severe sleep disordered breathing (SDB).
Suppose that in a sample of $ 100 $ overweight subjects with other risk factors for sleep disordered breathing at a sleep clinic, the mean RDI was $ 32 $ events / hour with a standard deviation of $ 10 $ events / hour.
We might want to test the hypothesis that
- $ H_0 : \mu = 30 $
- $ H_a : \mu > 30 $
- where $ \mu $ is the population mean RDI.

Hypothesis testing

The alternative hypotheses are typically of the form $ < $, $ > $ or $ \neq $
Note that there are four possible outcomes of our statistical decision process

Truth	Decide	Result
$ H_0 $	$ H_0 $	Correctly accept null
$ H_0 $	$ H_a $	Type I error
$ H_a $	$ H_a $	Correctly reject null
$ H_a $	$ H_0 $	Type II error

Discussion

Consider a court of law; the null hypothesis is that the defendant is innocent
We require evidence to reject the null hypothesis (convict)
If we require little evidence, then we would increase the percentage of innocent people convicted (type I errors); however we would also increase the percentage of guilty people convicted (correctly rejecting the null)
If we require a lot of evidence, then we increase the the percentage of innocent people let free (correctly accepting the null) while we would also increase the percentage of guilty people let free (type II errors)

Example

Consider our example again
A reasonable strategy would reject the null hypothesis if $ \bar X $ was larger than some constant, say $ C $
Typically, $ C $ is chosen so that the probability of a Type I error, $ \alpha $, is $ .05 $ (or some other relevant constant)
$ \alpha $ = Type I error rate = Probability of rejecting the null hypothesis when, in fact, the null hypothesis is correct

Example continued

\[ \begin{align} 0.05 & = P\left(\bar X \geq C ~|~ \mu = 30 \right) \\ & = P\left(\frac{\bar X - 30}{10 / \sqrt{100}} \geq \frac{C - 30}{10/\sqrt{100}} ~|~ \mu = 30\right) \\ & = P\left(Z \geq \frac{C - 30}{1}\right) \\ \end{align} \]

Hence $ (C - 30) / 1 = 1.645 $ implying $ C = 31.645 $
Since our mean is $ 32 $ we reject the null hypothesis

Discussion

In general we don't convert $ C $ back to the original scale
We would just reject because the Z-score; which is how many standard errors the sample mean is above the hypothesized mean \[ \frac{32 - 30}{10 / \sqrt{100}} = 2 \] is greater than $ 1.645 $
Or, whenever $ \sqrt{n} (\bar X - \mu_0) / s > Z_{1-\alpha} $

General rules

The $ Z $ test for $ H_0:\mu = \mu_0 $ versus
- $ H_1: \mu < \mu_0 $
- $ H_2: \mu \neq \mu_0 $
- $ H_3: \mu > \mu_0 $
Test statistic $ TS = \frac{\bar{X} - \mu_0}{S / \sqrt{n}} $
Reject the null hypothesis when
- $ TS \leq -Z_{1 - \alpha} $
- $ |TS| \geq Z_{1 - \alpha / 2} $
- $ TS \geq Z_{1 - \alpha} $

Notes

We have fixed $ \alpha $ to be low, so if we reject $ H_0 $ (either our model is wrong) or there is a low probability that we have made an error
We have not fixed the probability of a type II error, $ \beta $; therefore we tend to say “Fail to reject $ H_0 $'' rather than accepting $ H_0 $
Statistical significance is no the same as scientific significance
The region of TS values for which you reject $ H_0 $ is called the rejection region

More notes

The $ Z $ test requires the assumptions of the CLT and for $ n $ to be large enough for it to apply
If $ n $ is small, then a Gossett's $ T $ test is performed exactly in the same way, with the normal quantiles replaced by the appropriate Student's $ T $ quantiles and $ n-1 $ df
The probability of rejecting the null hypothesis when it is false is called power
Power is a used a lot to calculate sample sizes for experiments

Example reconsidered

Consider our example again. Suppose that $ n= 16 $ (rather than $ 100 $). Then consider that \[ .05 = P\left(\frac{\bar X - 30}{s / \sqrt{16}} \geq t_{1-\alpha, 15} ~|~ \mu = 30 \right) \]
So that our test statistic is now $\sqrt{16}(32 - 30) / 10 = 0.8 $, while the critical value is $ t_{1-\alpha, 15} = 1.75 $.
We now fail to reject.

Two sided tests

Suppose that we would reject the null hypothesis if in fact the mean was too large or too small
That is, we want to test the alternative $ H_a : \mu \neq 30 $ (doesn't make a lot of sense in our setting)
Then note \[ \alpha = P\left(\left. \left|\frac{\bar X - 30}{s /\sqrt{16}}\right| > t_{1-\alpha/2,15} ~\right|~ \mu = 30\right) \]
That is we will reject if the test statistic, $ 0.8 $, is either too large or too small, but the critical value is calculated using $ \alpha / 2 $
In our example the critical value is $ 2.13 $, so we fail to reject.

T test in R

library(UsingR); data(father.son)
t.test(father.son$sheight - father.son$fheight)


    One Sample t-test

data:  father.son$sheight - father.son$fheight
t = 11.79, df = 1077, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 0.831 1.163
sample estimates:
mean of x 
    0.997

Connections with confidence intervals

Consider testing $ H_0: \mu = \mu_0 $ versus $ H_a: \mu \neq \mu_0 $
Take the set of all possible values for which you fail to reject $ H_0 $, this set is a $ (1-\alpha)100\% $ confidence interval for $ \mu $
The same works in reverse; if a $ (1-\alpha)100\% $ interval contains $ \mu_0 $, then we fail to reject $ H_0 $

Exact binomial test

Recall this problem, Suppose a friend has $ 8 $ children, $ 7 $ of which are girls and none are twins
Perform the relevant hypothesis test. $ H_0 : p = 0.5 $ $ H_a : p > 0.5 $
- What is the relevant rejection region so that the probability of rejecting is (less than) 5%?

Rejection region	Type I error rate
[0 : 8]	1
[1 : 8]	0.9961
[2 : 8]	0.9648
[3 : 8]	0.8555
[4 : 8]	0.6367
[5 : 8]	0.3633
[6 : 8]	0.1445
[7 : 8]	0.0352
[8 : 8]	0.0039

Notes

It's impossible to get an exact 5% level test for this case due to the discreteness of the binomial.
- The closest is the rejection region [7 : 8]
- Any alpha level lower than 0.0039 is not attainable.
For larger sample sizes, we could do a normal approximation, but you already knew this.
Two sided test isn't obvious.
- Given a way to do two sided tests, we could take the set of values of $ p_0 $ for which we fail to reject to get an exact binomial confidence interval (called the Clopper/Pearson interval, BTW)
For these problems, people always create a P-value (next lecture) rather than computing the rejection region.

Truth	Decide	Result
\( H_0 \)	\( H_0 \)	Correctly accept null
\( H_0 \)	\( H_a \)	Type I error
\( H_a \)	\( H_a \)	Correctly reject null
\( H_a \)	\( H_0 \)	Type II error