Fisher invented the P value to use in agricultural decisions
“This is an arbitrary, but
convenient, level of significance for the practical investigator, but it does not mean
that he allows himself to be deceived once every twenty experiments. The test of
significance only tells him what to ignore, namely all experiments in which
significant results are not obtained. He should only claim that a phenomenon is
experimentally demonstrable when he knows how to design an experiment so that it
will rarely fail to give a significant result. Consequently, isolated significant results
which he does not know how to reproduce are left in suspense pending further
investigation.” (Fisher, 1929)
This is pretty boring
But the implications are quite interesting!
The Origins of P Values
Fisher invented the P value to use in agricultural decisions
“This is an arbitrary, but
convenient, level of significance for the practical investigator, but it does not mean
that he allows himself to be deceived once every twenty experiments. The test of
significance only tells him what to ignore, namely all experiments in which
significant results are not obtained. He should only claim that a phenomenon is
experimentally demonstrable when he knows how to design an experiment so that it
will rarely fail to give a significant result. Consequently, isolated significant results
which he does not know how to reproduce are left in suspense pending further
investigation.” (Fisher, 1929)
ASA Statement
ASA Statement
Can indicate how compatible data are with given model
Does not measure probability that hypothesis is true or that data were produced by chance alone
Decisions should not be based on P value threshold
Proper inference requires transparency
Different from effect size and importance of results
Should not be used as sole measure of a hypothesis
Discussion
How do we define P values?
How do we interpret (non)significant results?
What do * vs. *** mean?
Part 2: Alternative Approaches
Why 0.05?
In fields where p<0.05 is considered the threshold for significance, changing the standard to 0.005 would make P hacking and naiive data manipulation more difficult
For psychology, the Bayesian prior odds of \( H_1 \) relative to \( H_0 \) are roughly 1:10
A two-sided P value of 0.05 equates to a Bayes Factor of 2.5-3.4
This means that, according to a Bayesian, the probability of \( H_1 \) relative to \( H_0 \) for a study deemed “significant” might be \( 3 * \frac{1}{10} = \frac{3}{10} \)!!!!
Alternatively, a two-sided P value of 0.005 corresponds to a Bayes Factor of between 14 and 26
Racial disparities in math for Black and Hispanic youth have been linked to school segregation (Ready & Silander, 2011)
Three times as many Black and Hispanic students attend intensely segregated schools as white students, which are associated with high levels of poverty, higher teacher mobility, less qualified teachers, and less resources (Orfied & Lee, 2005)
Extremely few Black and Hispanic 4th-8th graders live in districts where test scores are at or above the national average (Reardon, 2017)
Model Robustness
Algebra
Prop_wh
Mftotal
Security
Mentalhealth
APClasses
Corporal
Teachers
Athletics
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Example: model how do the number of certified algebra teachers vary with the proportion of white students?