Hypothesis testing

M. Drew LaMar
February 17, 2016

“…a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data.”

- William C. Blackwelder

Class Announcements

Hypothesis testing (general)

Definition: Hypothesis testing compares data to what we would expect to see if a specific null hypothesis were true. If the data are too unusual, compared to what we would expect to see if the null hypothesis were true, then the null hypothesis is rejected.

Definition: A null hypothesis is a specific statement about a population parameter made for the purpose of argument.

Definition: The alternative hypothesis includes all other feasible values for the population parameter besides the value stated in the null hypothesis.

Hypothesis testing (Problem #25)

Can parents distinguish their own children by smell alone? To investigate, Porter and Moore (1981) gave new T-shirts to children of nine mothers. Each child wore his or her shirt to bed for three consecutive nights. During the day, from waking until bedtime, the shirts were kept in individually sealed plastic bags. No scented soaps or perfumes were used during the study. Each mother was then given the shirt of her child and that of another, randomly chosen child and asked to identify her own by smell.

Discuss: What is the null hypothesis? alternative hypothesis?

Hypothesis testing (Problem #25)

Can parents distinguish their own children by smell alone? To investigate, Porter and Moore (1981) gave new T-shirts to children of nine mothers. Each child wore his or her shirt to bed for three consecutive nights. During the day, from waking until bedtime, the shirts were kept in individually sealed plastic bags. No scented soaps or perfumes were used during the study. Each mother was then given the shirt of her child and that of another, randomly chosen child and asked to identify her own by smell.

Discuss: What is the null hypothesis? alternative hypothesis?

Answer: With \( p \) the probability of choosing correctly,
\[ H_{0}: \ p = 0.5 \] \[ H_{A}: \ p \neq 0.5 \]

Hypothesis testing (how it's done)

Definition: The test statistic is a number calculated from the data that is used to evaluate how compatible the data are with the result expected under the null hypothesis.

Definition: The null distribution is the sampling distribution of outcomes for a test statistic under the assumption that the null hypothesis is true.

Definition: A  \( P \)-value is the probability of obtaining the data (or data showing as great or greater difference from the null hypothesis) if the null hypothesis were true.

Hypothesis testing (how it's done)

  • State the hypotheses.
  • Compute the test statistic.
  • Determine the \( P \)-value.
  • Draw the appropriate conclusions.

Hypothesis testing (Problem #25)

Can parents distinguish their own children by smell alone? To investigate, Porter and Moore (1981) gave new T-shirts to children of nine mothers. Each child wore his or her shirt to bed for three consecutive nights. During the day, from waking until bedtime, the shirts were kept in individually sealed plastic bags. No scented soaps or perfumes were used during the study. Each mother was then given the shirt of her child and that of another, randomly chosen child and asked to identify her own by smell. Eight of nine mothers identified their children correctly.

Discuss: What test statistic should you use?

Answer: The number of mothers with correct identifications.

Hypothesis testing (Problem #25)

The following figure shows the null distribution for the number of mothers out of nine guessing correctly. alt text

Discuss: If \( H_{0} \) were true, what is the probability of exactly eight correct identifications?

Answer: Pr[number correct = 8] = 0.018

Hypothesis testing (Problem #25)

The following figure shows the null distribution for the number of mothers out of nine guessing correctly. alt text

Discuss: If \( H_{0} \) were true, what is the probability of obtaining eight or more correct identifications?

Answer: Pr[number correct \( \geq \) 8] = 0.018 + 0.002 = 0.02

Discuss: What is the \( P \)-value?

Answer: \( P = 2\times(0.02) = 0.04 \)

Hypothesis testing (Problem #25)

So, P = 0.04. Is that good?

https://youtu.be/7jSE3JANx14?t=4m29s

Hypothesis testing (Problem #25)

So, P = 0.04. Is that good?

Definition: The significance level, \( \alpha \), is the probability used as a criterion for rejecting the null hypothesis. If the \( P \)-value is less than or equal to \( \alpha \), then the null hypothesis is rejected. If the \( P \)-value is greater than \( \alpha \), then the null hypothesis is not rejected

Definition: A result is considered statistically significant when \( P \)-value \( < \alpha \).

Definition: A result is considered not statistically significant when \( P \)-value \( \geq \alpha \).

Hypothesis testing (Problem #25)

Can parents distinguish their own children by smell alone? To investigate, Porter and Moore (1981) gave new T-shirts to children of nine mothers. Each child wore his or her shirt to bed for three consecutive nights. During the day, from waking until bedtime, the shirts were kept in individually sealed plastic bags. No scented soaps or perfumes were used during the study. Each mother was then given the shirt of her child and that of another, randomly chosen child and asked to identify her own by smell. Eight of nine mothers identified their children correctly.

Discuss: Given \( \alpha = 0.05 \), \( \{H_{0}: \ p = 0.5\} \), and \( P \)-value of 0.04, what is the appropriate conclusion?

Answer: Reject \( H_{0} \). There is evidence that mothers consistently identify own children correctly by smell.

LOTS of confusion about P-values

“We want to know if results are right, but a p-value doesn’t measure that. It can’t tell you the magnitude of an effect, the strength of the evidence or the probability that the finding was the result of chance.”

Christie Aschwanden

http://fivethirtyeight.com/pvalue

“Belief that "statistical significance” can alone discriminate between truth and falsehood borders on magical thinking.“

Cohen

LOTS of confusion about P-values

Recommended practice

Measure and report precision and effect size separately (the \( P \)-value is a summary measure that mixes them):

  • Present the magnitude of effect through the use of measures such as rates, risk differences, and odds ratios.
  • Report precision with standard errors or confidence intervals.

Caveats

  • Statistical significance is NOT the same as biological importance.
  • Effect sizes are important. Large sample sizes can lead to statistically significant results, even though the effect size is small!

Errors in Hypothesis Testing

alt text

Definition: Type I error is rejecting a true null hypothesis. The probability of a Type I error is given by \[ \mathrm{Pr[Reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ true}] = \alpha \]

Definition: Type II error is failing to reject a false null hypothesis. The probability of a Type II error is given by \[ \mathrm{Pr[Do \ not \ reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ false}] = \beta \]

Errors in Hypothesis Testing - Power

alt text

Definition: The power of a statistical test (denoted \( 1-\beta \)) is given by \[ \begin{align*} \mathrm{Pr[Reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ false}] & = 1-\beta \\ & = 1 - \mathrm{Pr[Type \ II \ error]} \end{align*} \]

Probability of errors in hypothesis testing

alt text

  • \( \alpha \) is the significance level
  • \( 1-\beta \) is the power

Statistical power example
https://qubeshub.org/tools/statpowerviz/