The year: 2123. A mega-team of authors — whose number is describable only as a transfinite cardinal — has published a paper suggesting retiring the concept of “compatibility”. The mega-team decries the early-21st-century switch from “significance” to “compatibility” intervals, because it turns out that intervals, with their inside and outside, are the most dichotomous reporting option. In spite of early warnings, authors began to describe results as being “compatible” with the the null when the interval contains 0. Compatibility became a technical term in statistics meaning that a value was within a 95% confidence interval. These implications are all obvious in hindsight to the mega-team in 2123. They suggest using \(p\) values, which do not require dichotomization and can be computed for any hypothesis. They remind everyone that values on both sides of \(\alpha\) (which in 2123 has been lowered several times and is now only describable as an infinitesimal) are similar, and helpfully recommend humility. Meanwhile, to avoid reading about this argument yet again, someone somewhere designs a cyborg to travel back in time and convince Ronald Fisher to go into politics instead of science, with predictably disastrous results.

This is tongue-in-cheek, but the point is serious: the problem is not significance. Significance is just a word to describe a result far away from where we’d expect it to be; there is nothing inherently dichotomous about it. One needs only look at the interval advocacy literature to see the same issues: fallacious acceptance of values inside intervals and “vote counting” (e.g., Velicer et al. 2008). Moving to intervals reboots the education cycle, requiring us to build a literature explaining the same issues the issues with intervals, as if they were new (Morey et al. 2016).

The core issues are rather opportunism, lack of transparency, and poor training. Even if everyone heeded the call to “Retire statistical significance” we will find ourselves back here in the future. This misdiagnosis will lead to the prolonging of the problem, not amelioration.

References

Morey, Richard D, Rink Hoekstra, Jeffrey N Rouder, Michael D Lee, and Eric-Jan Wagenmakers. 2016. “The Fallacy of Placing Confidence in Confidence Intervals.” Psychonomic Bulletin & Review 23 (1): 103–23.

Velicer, Wayne F., Geoff Cumming, Joseph L. Fava, Joseph S. Rossi, James O. Prochaska, and Janet Johnson. 2008. “Theory Testing Using Quantitative Predictions of Effect Size.” Applied Psychology 57 (4): 589–608. http://dx.doi.org/10.1111/j.1464-0597.2008.00348.x.