P Values: What are these critters?
Tyler McDaniel
5/13/2018
Some Learning Goals
- Emergence of P <0.05
- Methods, results, and inferential replicability
- Understanding of when to use P values

Psychology

- 97% of original studies statistically significant
- 36% of replication studies statistically significant
Psychology

- Success better predicted by strength of original evidence than characteristics of research teams

- Methods reproducibility: same data, same tools
- Results reproducibility: same methods, new study
- Inferential reproducibiliy: investigate claims
So what is a P value?
- Probability that one would get a result as extreme (as the current one)
- or more extreme,
- assuming the null hypothesis is true,
- and the effect is constant,
- if the experiment were replicated infinately,
- under perfect conditions.
- (Or something like that)
P Value Thresholds
- \( 0.05 \) (social sciences, biomedical sciences, etc.)
- \( 3 \) x \( 10^{-7} \) (high energy physics)
- \( 5 \) x \( 10^{-8} \) (genomics)
The Origins of P Values
- Fisher invented the P value to use in agricultural decisions
- “This is an arbitrary, but
convenient, level of significance for the practical investigator, but it does not mean
that he allows himself to be deceived once every twenty experiments. The test of
significance only tells him what to ignore, namely all experiments in which
significant results are not obtained. He should only claim that a phenomenon is
experimentally demonstrable when he knows how to design an experiment so that it
will rarely fail to give a significant result. Consequently, isolated significant results
which he does not know how to reproduce are left in suspense pending further
investigation.” (Fisher, 1929)
But the implications are quite interesting!
The Origins of P Values
- Fisher invented the P value to use in agricultural decisions
- “This is an arbitrary, but convenient, level of significance for the practical investigator, but
it does not mean that he allows himself to be deceived once every twenty experiments
. The test of significance only tells him what to ignore, namely all experiments in which
significant results are not obtained. He should only claim that a phenomenon is
experimentally demonstrable when he knows how to design an experiment so that it
will rarely fail to give a significant result. Consequently, isolated significant results
which he does not know how to reproduce are left in suspense pending further
investigation.” (Fisher, 1929)
ASA Statement
- Can indicate how compatible data are with given model
- Does not measure probability that hypothesis is true or that data were produced by chance alone
- Decisions should not be based on P value threshold
- Different from effect size and importance of results
- Should not be used as sole measure of a hypothesis
Closing Thoughts
- P values indicate compatibility with specific model
- Need corroboration for scientific claims
- Conveying significance vs. effect sizes (i.e. * vs. ***)
