Hypothesis testing

Statistical hypothesis testing is the formal inferential framework around choosing between hypotheses. The null hypothesis is assumed true, H0, and statistical evidence is required to reject it in favor of a research or alternative hypothesis, Ha.

Example

A respiratory disturbance index of more than $30$ events / hour, say, is considered evidence of severe sleep disordered breathing (SDB).
Suppose that in a sample of $100$ overweight subjects with other risk factors for sleep disordered breathing at a sleep clinic, the mean RDI was $32$ events / hour with a standard deviation of $10$ events / hour.
We might want to test the hypothesis that
- $H_0 : \mu = 30$
- $H_a : \mu > 30$
- where $\mu$ is the population mean RDI.

The alternative hypotheses are typically of the form $<$, $>$ or $\neq$
Note that there are four possible outcomes of our statistical decision process

Truth	Decide	Result
$H_0$	$H_0$	Correctly accept null
$H_0$	$H_a$	Type I error
$H_a$	$H_a$	Correctly reject null
$H_a$	$H_0$	Type II error

Example

Consider our example again
A reasonable strategy would reject the null hypothesis if $\bar X$ was larger than some constant, say $C$
Typically, $C$ is chosen so that the probability of a Type I error, $\alpha$, is $.05$ (or some other relevant constant)
$\alpha$ = Type I error rate = Probability of rejecting the null hypothesis when, in fact, the null hypothesis is correct

\[ \begin{align} 0.05 & = P\left(\bar X \geq C ~|~ \mu = 30 \right) \\ & = P\left(\frac{\bar X - 30}{10 / \sqrt{100}} \geq \frac{C - 30}{10/\sqrt{100}} ~|~ \mu = 30\right) \\ & = P\left(Z \geq \frac{C - 30}{1}\right) \\ \end{align} \] * Hence $(C - 30) / 1 = 1.645$ implying $C = 31.645$ * Since our mean is $32$ we reject the null hypothesis

General rules

The $Z$ test for $H_0:\mu = \mu_0$ versus
- $H_1: \mu < \mu_0$
- $H_2: \mu \neq \mu_0$
- $H_3: \mu > \mu_0$
Test statistic $ TS = $
Reject the null hypothesis when
- $TS \leq -Z_{1 - \alpha}$
- $|TS| \geq Z_{1 - \alpha / 2}$
- $TS \geq Z_{1 - \alpha}$

Using the R package, the data and the father.son dataset, we can test whether the population of son’s height was equivalent to the population mean of father’s heights by taking take the difference and we want to test whether the difference in the heights is 0 or its non zero, we do that with t.test.

## Warning: package 'UsingR' was built under R version 4.0.3

## Loading required package: MASS

## Loading required package: HistData

## Warning: package 'HistData' was built under R version 4.0.3

## Loading required package: Hmisc

## Warning: package 'Hmisc' was built under R version 4.0.3

## Loading required package: lattice

## Loading required package: survival

## Loading required package: Formula

## Warning: package 'Formula' was built under R version 4.0.3

## Loading required package: ggplot2

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:base':
## 
##     format.pval, units

## 
## Attaching package: 'UsingR'

## The following object is masked from 'package:survival':
## 
##     cancer

## 
##  One Sample t-test
## 
## data:  father.son$sheight - father.son$fheight
## t = 11.789, df = 1077, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.8310296 1.1629160
## sample estimates:
## mean of x 
## 0.9969728

Here the result has a t of 11.79, so we reject the null hypothesis. You can also see whether the range of values in the confidence interval are of practical significance as it is expressed in the units of the data that you’re interested in.

Example reconsidered

Consider our example again. Suppose that $n= 16$ (rather than $100$). Then consider that

\[ .05 = P\left(\frac{\bar X - 30}{s / \sqrt{16}} \geq t_{1-\alpha, 15} ~|~ \mu = 30 \right) \] - So that our test statistic is now $(32 - 30) / 10 = 0.8 $, while the critical value is $t_{1-\alpha, 15} = 1.75$. - We now fail to reject.

Connections with confidence intervals

Consider testing $H_0: \mu = \mu_0$ versus $H_a: \mu \neq \mu_0$
Take the set of all possible values for which you fail to reject $H_0$, this set is a $(1-\alpha)100\%$ confidence interval for $\mu$
The same works in reverse; if a $(1-\alpha)100\%$ interval contains $\mu_0$, then we fail to reject $H_0$

Exact binomial test

Suppose a friend has $8$ children, $7$ of which are girls and none are twins - Perform the relevant hypothesis test. $H_0 : p = 0.5$ $H_a : p > 0.5$ - What is the relevant rejection region so that the probability of rejecting is (less than) 5%?

print(pbinom(-1, size = 8, p = .5, lower.tail = FALSE))

## [1] 1

print(pbinom( 0, size = 8, p = .5, lower.tail = FALSE))

## [1] 0.9960938

print(pbinom( 1, size = 8, p = .5, lower.tail = FALSE))

## [1] 0.9648438

print(pbinom( 2, size = 8, p = .5, lower.tail = FALSE))

## [1] 0.8554688

print(pbinom( 3, size = 8, p = .5, lower.tail = FALSE))

## [1] 0.6367187

print(pbinom( 4, size = 8, p = .5, lower.tail = FALSE))

## [1] 0.3632813

print(pbinom( 5, size = 8, p = .5, lower.tail = FALSE))

## [1] 0.1445313

print(pbinom( 6, size = 8, p = .5, lower.tail = FALSE))

## [1] 0.03515625

print(pbinom( 7, size = 8, p = .5, lower.tail = FALSE))

## [1] 0.00390625

It’s impossible to get an exact 5% level test for this case due to the discreteness of the binomial:

The closest is the rejection region [7 : 8]
Any alpha level lower than 0.0039063 is not attainable.
For larger sample sizes, we could do a normal approximation, but you already knew this.
Two sided test isn’t obvious.
- Given a way to do two sided tests, we could take the set of values of $p_0$ for which we fail to reject to get an exact binomial confidence interval (called the Clopper/Pearson interval, BTW)

Truth	Decide	Result
\(H_0\)	\(H_0\)	Correctly accept null
\(H_0\)	\(H_a\)	Type I error
\(H_a\)	\(H_a\)	Correctly reject null
\(H_a\)	\(H_0\)	Type II error

Hypothesis_testing

Linda Angulo Lopez

08/12/2020

Hypothesis testing

Example

Example

General rules

Example reconsidered

Connections with confidence intervals

Exact binomial test

It’s impossible to get an exact 5% level test for this case due to the discreteness of the binomial:

For these problems, people always create a P-value rather than computing the rejection region.