Psychological Theories and Empirical Research: Closing the Loop for Better Science

Carolinas Psychology Conference 2017

Matthew T. McBee

April 22, 2017

These slides are available at www.rpubs.com/mmcbee.

Some background context

Replication Crisis in psychology and other health / behavioral sciences

Open Science Movement

Nullius In Verba

Nullius In Verba

The Circle of Life

The Circle of Life

The Circle of Life

The Circle of Life

Falsifiability

Falsifiability

Popper argued that falsifiability is what distinguishes science from non-science.

Falsifiability

Falsificationist orientation is a state of mind.

Gozer the Destructor

Gozer the Destructor

Choose and Perish

Choose and Perish

Statistical Hypotheses

Typically \(H_1\), the alternative or research hypothesis, is what the theory says should happen.

Its foil is \(H_0\), the null.

Statistical Hypothesis Tests

The procedure underlying statistical tests is:

  1. Assume that \(H_0\) is true.

  2. Calculate a p-value that describes how unusual your data are given \(H_0\).

  3. If p is smaller than a pre-specified criterion (\(\alpha\)), then reject \(H_0\). We have achieved statistical significance.

Making Inferences from Statistical Results to Theories

The “Nil” Null Hypothesis

Cohen (1994) coined the phrase “nil hypothesis” to describe the typical null of zero effect.

Best-case scenario, the directional hypothesis.

We can test a different kind of “nil” null hypothesis when we can a least commit to a directional hypothesis.

Is this really the best that we can do?

Paradox

In the absence of specific predictions, we can only test the nil hypothesis.

Paradoxically, our test becomes less falsificationist as sample size increases!

Meehl (1967)

“In the physical sciences, the usual result of an improvement in experimental design, instrumentation, or numerical mass of data, is to increase the difficulty of the ‘observational hurdle’ which the physical theory of interest must successfully surmount; whereas, in psychology and some of the allied behavior sciences, the usual effect of such improvement in experimental precision is to provide an easier hurdle for the theory to surmount.” – Meehl (1967), p. 103

Non-directional “nil hypothesis” significance test for correlation.

\(H_0: \rho = 0\)

\(H_1: \rho \ne 0\)

Statistically significant estimated correlation coefficients versus sample size

Statistically significant estimated correlation coefficients versus sample size

In nil hypothesis tests, statistically significant results are interpreted as supportive of a theory.

Interpretation of estimated correlation coefficients by sample size

Interpretation of estimated correlation coefficients by sample size

As the sample size (\(n\)) approaches \(\infty\), the theory rules out less of the parameter space.

Interpretation of estimated correlation coefficients by sample size

Interpretation of estimated correlation coefficients by sample size

Directional Hypothesis Tests

If we can at least derive the sign of the correlation from theory, we can do a directional hypothesis test.

\(H_1: \rho > 0\)

\(H_0: \rho \le 0\)

Statistically significant correlation coefficients versus sample size, directional case

Statistically significant correlation coefficients versus sample size, directional case

In both of these cases, the statistical test becomes less strict as sample size increases!

On point predictions

Imagine that we could derive some quantitative expectations from psychological theories:

For example:

Testing point predictions

Assume that our theory implies that the correlation between two variables should be \(\rho=.5\).

Everything changes!

When testing point predictions:

Most importantly, our hypothesis tests become much more falsificationist.

Testing a point prediction

Interpretation of estimated correlation coefficients by sample size

Interpretation of estimated correlation coefficients by sample size

Resolving the paradox

The paradox is now resolved: larger samples lead to stronger, stricter tests of theories.

Decision errors become reversed:

Barriers to point predictions

So what can we do?

Bounded predictions

Often a theory will not be able of predicting specific values for parameters, but they can rule out certain ranges.

For example:

Testing bounded predictions

\(H_0: \rho > 0.5\)

\(H_1: \rho \le 0.5\)

Statistically significant correlation coefficients versus sample size, bounded prediction

Statistically significant correlation coefficients versus sample size, bounded prediction

Testing interval predictions

We use theory to place a lower and upper bound a parameter that we cannot derive precisely from theory. For example:

Hypotheses

Testing an interval prediction

Interpretation of estimated correlation coefficients by sample size, interval prediction case

Interpretation of estimated correlation coefficients by sample size, interval prediction case

Implementing Tests of Point, Boundary, and Interval Predictions

Confidence Intervals for Correlation Coefficients

Most statistical applications do not report standard errors or confidence intervals for correlation coefficients.

R can do it using the CIr() function in the package psychometric.

Confidence Intervals for Correlation Coefficients

CIr(r=0.57, n=98, level=0.95)
## [1] 0.4189640 0.6903431

Confidence Intervals for Correlation Coefficients

For SPSS users, use the free online calculator at http://vassarstats.net/rho.html to calculate the CI.

Vassarstats online calculator

Vassarstats online calculator

Confidence Intervals for Group Mean Differences

R users can use the cohen.d() function in the effsize library to calculate confidence intervals around Cohen’s d effect sizes.

Confidence Intervals for Group Mean Differences

cohen.d(Y~grp, data=data)
## 
## Cohen's d
## 
## d estimate: 0.3060114 (small)
## 95 percent confidence interval:
##        inf        sup 
## 0.02406885 0.58795387

-The hypothesized lower-boundary value of \(\delta=0.7\) is outside the upper bound of the estimated confidence interval.

-Thus we have evidence against the theory

Confidence Intervals for Group Mean Differences

Daniel Lakens has compiled some great resources on how to calculate confidence intervals around effect sizes.

http://daniellakens.blogspot.com/2014/06/calculating-confidence-intervals-for.html

Another Analytic Tool

Exploratory Software for Confidence Intervals (ESCI; Cummings, 2016) runs under Excel and calculates confidence intervals around many effect sizes.

http://thenewstatistics.com/itns/esci/

Closing the Loop

Summing Up

Closing Thought

Science is not self-correcting. We have to correct it.