P Values: What are these critters?

Tyler McDaniel
5/13/2018

Some Learning Goals

  • Emergence of P <0.05
  • Methods, results, and inferential replicability
  • Understanding of when to use P values

Some Background Info

emoji

Psychology

psychology

  • 97% of original studies statistically significant
  • 36% of replication studies statistically significant

Psychology

  • Success better predicted by strength of original evidence than characteristics of research teams

Psychology

desperate

Medicine

medicine

  • Methods reproducibility: same data, same tools
  • Results reproducibility: same methods, new study
  • Inferential reproducibiliy: investigate claims

So what is a P value?

  • Probability that one would get a result as extreme (as the current one)
  • or more extreme,
  • assuming the null hypothesis is true,
  • and the effect is constant,
  • if the experiment were replicated infinately,
  • under perfect conditions.
  • (Or something like that)

P Value Thresholds

  • \( 0.05 \) (social sciences, biomedical sciences, etc.)
  • \( 3 \) x \( 10^{-7} \) (high energy physics)
  • \( 5 \) x \( 10^{-8} \) (genomics)

The Origins of P Values

  • Fisher invented the P value to use in agricultural decisions
  • “This is an arbitrary, but convenient, level of significance for the practical investigator, but it does not mean that he allows himself to be deceived once every twenty experiments. The test of significance only tells him what to ignore, namely all experiments in which significant results are not obtained. He should only claim that a phenomenon is experimentally demonstrable when he knows how to design an experiment so that it will rarely fail to give a significant result. Consequently, isolated significant results which he does not know how to reproduce are left in suspense pending further investigation.” (Fisher, 1929)

This is pretty boring

But the implications are quite interesting!

The Origins of P Values

  • Fisher invented the P value to use in agricultural decisions
  • “This is an arbitrary, but convenient, level of significance for the practical investigator, but it does not mean that he allows himself to be deceived once every twenty experiments . The test of significance only tells him what to ignore, namely all experiments in which significant results are not obtained. He should only claim that a phenomenon is experimentally demonstrable when he knows how to design an experiment so that it will rarely fail to give a significant result. Consequently, isolated significant results which he does not know how to reproduce are left in suspense pending further investigation.” (Fisher, 1929)

ASA Statement

asa

ASA Statement

  • Can indicate how compatible data are with given model
  • Does not measure probability that hypothesis is true or that data were produced by chance alone
  • Decisions should not be based on P value threshold
  • Different from effect size and importance of results
  • Should not be used as sole measure of a hypothesis

Alternatives?

Closing Thoughts

  • P values indicate compatibility with specific model
  • Need corroboration for scientific claims
  • Conveying significance vs. effect sizes (i.e. * vs. ***)