Questions, Hypotheses, and Predictions

M. Drew LaMar
September 11, 2020

“Good experimental design is all about maximizing the amount of information that we can get, given the resources that we have available.”

- Ruxton & Colegrave

Office Hours

Office hours:

Monday, 10 am: 992 5898 9870
Tuesday, 4 pm: 985 7103 7625 (Professor Jusino)
Wednesday, 11 am: 948 6796 7161
Friday, 1 pm: 988 4294 3670

Buddy Has a Hypothesis!

Alright, back to hypotheses...

Definition: A hypothesis is a clear statement articulating a plausible candidate explanation for observations.

Observation \( \rightarrow \) Question \( \rightarrow \) Hypothesis \( \rightarrow \) Prediction

Q 2.1: Suggest some hypotheses that could explain the observation that people drive faster on the journey to work than on the way home.

A 2.1:
(1) Time management - have a lot of work to do.
(2) Energy difference - newly caffeinated vs. exhausted.

Example: Chimp activity

Question: Why does chimp activity vary during the day?

Hypothesis: Chimp activity pattern is affected by feeding regime.

Prediction: The fraction of time that a chimp spends moving around will be higher in the hour around feeding time than at other times of day.

Alternate hypothesis (statistical): \( p_{f} > p_{t} \), where \( p_{f} \) is the fraction of time that a chimp spends moving around in the hour around feeding time, with \( p_{t} \) the same metric for all other hours of the day.

Null hypothesis (statistical): \( p_{f} \leq p_{t} \).

Example: Multiple hypotheses

Question: Why do whelks group?

Hypothesis #1: Whelks group for shelter from wave action.

Prediction #1: Whelks are more likely to be found in groups in areas sheltered from wave action.

Hypothesis #2: Whelks group for feeding.

Prediction #2: Whelks are more likely to be found in groups in areas of higher food density.

Note: Multiple hypotheses can explain the same prediction.

Hypothesis #1b: Whelks are more vulnerable to predators in sheltered areas, but grouping provides protection from predators.

Concept maps of causation

Possibility 1: Neither hypothesis is true and the observed patters are due to something else entirely.

alt text

Concept maps of causation

Possibility 2: Predation is true and shelter is false.

alt text

Concept maps of causation

Possibility 3: Predation is false and shelter is true.

alt text

Concept maps of causation

Possibility 4: Both predation and shelter are true.

alt text

"Good" experimental design

“No matter how the study is organized, the important thing is that the best study will be the one that allows us to tease apart the influence of the different hypothesized unfluences on grouping behavior.”

- Ruxton & Colegrave

Experimental design example: Factorial experiment with predation and wave action.

With wave action and without predation.
Without wave action and with predation.
With wave action and predation (interaction effect).
Without wave action or predation (control).

Differing levels of predation and wave action?

alt text

3 levels of predation and 3 levels of wave action = 9 different experiments!!!

Oh, and don't forget about replication (sample size)

The elusive "good idea"

“Perhaps the key to having really novel ideas is just to keep your eyes and ears open and try and question the things you see around you.”

- Ruxton & Colegrave

Switching from answers to questions

“We encourage you to design experiments that are interesting because of the question they ask more than because of the specific answer to the question that emerges from the data.”

- Ruxton & Colegrave

Yeah, but is it worth it to explore a question?

Three To-Dos:

Ask yourself: Is it possible to explore this question in a way to obtain a valuable answer? (i.e. do you have the resources? Careful, though: What is valuable?)
Ask yourself: Are you committed to obtaining an answer? (note: this is different than excited - excitement comes and goes, but commitment sticks)
Bounce ideas off others you respect, but don't rely solely on their responses!!!

Understanding Science

http://undsci.berkeley.edu/

Self awareness to satisfy sceptic

“You should think of the Devil's advocate as a highly intelligent but sceptical person. If there is a weakness in your argument, then they will find it.”

- Ruxton & Colegrave

Experimental vs observational

Causation vs correlation

https://xkcd.com/552/

The causal soup - Systems Biology

http://www.nature.com/ni/journal/v12/n8/full/ni.2067.html

Issues with correlational results

Correlation issues - Reverse causation

alt text

Correlation issues - Confounding variable

alt text

Example:

A = tail length
B = # matings
C = territorial quality

Correlation issues - Confounding variable

Manipulative studies

“The only way to be certain of removing problems with third variables is to carry out experimental manipulations.”

- Ruxton & Colegrave

Difficulties:

Sometimes not possible to manipulate, for practical or ethical reasons.
Not technically feasible to carry out manipulative study.

Note: Correlational studies can be used as a first step towards a manipulative study. Also, if you are interested in natural variation, observational is a great way to go.

Hypothesis testing (general)

Definition: Hypothesis testing compares data to what we would expect to see if a specific null hypothesis were true. If the data are too unusual, compared to what we would expect to see if the null hypothesis were true, then the null hypothesis is rejected.

Definition: A null hypothesis is a specific statement about a population parameter made for the purpose of argument.

Definition: The alternative hypothesis includes all other feasible values for the population parameter besides the value stated in the null hypothesis.

Hypothesis testing (Problem #25)

Can parents distinguish their own children by smell alone? To investigate, Porter and Moore (1981) gave new T-shirts to children of nine mothers. Each child wore his or her shirt to bed for three consecutive nights. During the day, from waking until bedtime, the shirts were kept in individually sealed plastic bags. No scented soaps or perfumes were used during the study. Each mother was then given the shirt of her child and that of another, randomly chosen child and asked to identify her own by smell.

Discuss: What is the null hypothesis? alternative hypothesis?

Hypothesis testing (Problem #25)

Discuss: What is the null hypothesis? alternative hypothesis?

Answer: With \( p \) the probability of choosing correctly,
\[ H_{0}: \ p = 0.5 \] \[ H_{A}: \ p \neq 0.5 \]

Hypothesis testing (how it's done)

Definition: The test statistic is a number calculated from the data that is used to evaluate how compatible the data are with the result expected under the null hypothesis.

Definition: The null distribution is the sampling distribution of outcomes for a test statistic under the assumption that the null hypothesis is true.

Definition: A \( P \)-value is the probability of obtaining the data (or data showing as great or greater difference from the null hypothesis) if the null hypothesis were true.

Hypothesis testing (how it's done)

State the hypotheses.
Compute the test statistic.
Determine the \( P \)-value.
Draw the appropriate conclusions.

Hypothesis testing (Problem #25)

Discuss: What test statistic should you use?

Answer: The number of mothers with correct identifications.

Hypothesis testing (Problem #25)

The following figure shows the null distribution for the number of mothers out of nine guessing correctly. alt text

Discuss: If \( H_{0} \) were true, what is the probability of exactly eight correct identifications?

Answer: Pr[number correct = 8] = 0.018

Hypothesis testing (Problem #25)

The following figure shows the null distribution for the number of mothers out of nine guessing correctly. alt text

Discuss: If \( H_{0} \) were true, what is the probability of obtaining eight or more correct identifications?

Answer: Pr[number correct \( \geq \) 8] = 0.018 + 0.002 = 0.02

Discuss: What is the \( P \)-value?

Answer: \( P = 2\times(0.02) = 0.04 \)

Hypothesis testing (Problem #25)

So, P = 0.04. Is that good?

https://www.youtube.com/watch?v=0SoIrBvk9ic

Hypothesis testing (Problem #25)

So, P = 0.04. Is that good?

Definition: The significance level, \( \alpha \), is the probability used as a criterion for rejecting the null hypothesis. If the \( P \)-value is less than or equal to \( \alpha \), then the null hypothesis is rejected. If the \( P \)-value is greater than \( \alpha \), then the null hypothesis is not rejected

Definition: A result is considered statistically significant when \( P \)-value \( < \alpha \).

Definition: A result is considered not statistically significant when \( P \)-value \( \geq \alpha \).

Hypothesis testing (Problem #25)

Discuss: Given \( \alpha = 0.05 \), \( \{H_{0}: \ p = 0.5\} \), and \( P \)-value of 0.04, what is the appropriate conclusion?

Answer: Reject \( H_{0} \). There is evidence that mothers consistently identify own children correctly by smell.

LOTS of confusion about P-values

“We want to know if results are right, but a p-value doesn’t measure that. It can’t tell you the magnitude of an effect, the strength of the evidence or the probability that the finding was the result of chance.”

Christie Aschwanden

http://fivethirtyeight.com/pvalue

“Belief that "statistical significance” can alone discriminate between truth and falsehood borders on magical thinking.“

Cohen

LOTS of confusion about P-values

A Dirty Dozen: Twelve P-Value Misconceptions, by Steven Goodman
[http://dx.doi.org/10.1053/j.seminhematol.2008.04.003]

Recommended practice

Measure and report precision and effect size separately (the \( P \)-value is a summary measure that mixes them):

Present the magnitude of effect through the use of measures such as rates, risk differences, and odds ratios.
Report precision with standard errors or confidence intervals.

Caveats

Statistical significance is NOT the same as biological importance.
Effect sizes are important. Large sample sizes can lead to statistically significant results, even though the effect size is small!

Errors in Hypothesis Testing

alt text

Definition: Type I error is rejecting a true null hypothesis. The probability of a Type I error is given by \[ \mathrm{Pr[Reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ true}] = \alpha \]

Definition: Type II error is failing to reject a false null hypothesis. The probability of a Type II error is given by \[ \mathrm{Pr[Do \ not \ reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ false}] = \beta \]

Errors in Hypothesis Testing - Power

alt text

Definition: The power of a statistical test (denoted \( 1-\beta \)) is given by \[ \begin{align*} \mathrm{Pr[Reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ false}] & = 1-\beta \\ & = 1 - \mathrm{Pr[Type \ II \ error]} \end{align*} \]

Power analysis

Power of a statistical test is a function of
     - Significance level \( \alpha \)
     - Variability of data
     - Sample size
     - Effect size

Desired power is set by researcher (typically 80%)
Significance level set by researcher
Effect size (signal) and data variability (noise) can be estimated by previous studies or pilot studies
Sample size is then calculated to achieve desired power given previous fixed attributes

Statistical power visualized (http://rpsychologist.com/d3/NHST/)