Evidence-based practice in Mississippi's adult criminal justice system

PEER staff
10/27/17

Overview

Today I want to achieve two things:

  • Discuss the standards for evidence-based practice used in Mississippi.

  • Show examples of those standards applied to adult criminal justice programs in use – or available for use – in this state.

But before we begin...

“Evidence-based practice” is kind of like “the right outcome” of a trial.

Nobody's against it – but people have very different opinions of what it is!

But they're alike in another way: not all opinions are created equal.

Anecdotes and evidence

Here's what I hear a lot, and I'm sure you do too:

“Our program definitely works. Just look at Timmy! He went through it, and turned his whole life around!”

Timmy and a couple of his friends

plot of chunk unnamed-chunk-1

The rest of Timmy's class

plot of chunk unnamed-chunk-2

The point

With enough participants and normal conditions, you're pretty much guaranteed to have a few Timmies even in a bad program!

That's why stories about Timmy and his friends shouldn't convince you. In other words:

Anecdotes aren't good evidence!

Fortunately, MS law steps in.

MISS. CODE ANN. §27-103-159 gives some relevant definitions, including:

“Evidence-based program” shall mean a program or practice that has had multiple site random controlled trials across heterogeneous populations demonstrating that the program or practice is effective for the population.

Let's break that down.

An evidence-based program has had:

  • Multiple-site
  • random controlled
  • trials (plural)
  • across heterogeneous populations
  • demonstrating effectiveness.

Why all those requirements?

All of these ensure that some common-sense questions that we have are answered.

  • effectiveness: Does the program do something?
  • trials: Is whatever the program does distinguishable from noise and error?
  • random controlled: Is what the program does because of the program?
  • multiple-site, heterogeneous populations: Does the program's effect generalize to us?

The importance of effectiveness and generalizability should be pretty clear.

A little more on trials

Why do statistical trials? Why not just compare the numbers?

Imagine two programs doing the same thing.

  • One of them achieves 40 units of effect.
  • The other achieves 80 units of effect.

Here's what we probably imagine...

plot of chunk unnamed-chunk-3

But here's what could be happening!

plot of chunk unnamed-chunk-4

So the takeaway:

Simple numeric comparisons – my number is bigger than your number – are meaningless without context!

Rigorous statistical trials provide that context.

But one more point: Why randomized, controlled trials?

RCTs and causation

The short answer: RCTs are our best method of establishing that A causes B.

Imagine you’re a researcher for a shoe company; you’re testing a running shoe that is supposed to shave time off of your sprint.

So you set up a test: Runners in your shoes versus runners in some different shoe.

Shoe trials

After statistical analysis, it turns out the group with your shoe crossed the finish line significantly before the other group!

So we've now satisfied the “trial” requirement. Congratulations!

Shoe trials

But wait: It turns out that you had your group running 100m, while the comparison group ran 200m!

Obviously, this comparison wasn’t fair.

Even if the results are good, it doesn’t seem to be because of the shoe.

Statistical control and fairness

When you hear people talk about “controlling for confounding variables” this is all they mean!

(Statistical) control = making sure everybody has the same starting line before comparing them.

It’s basic fairness!

Statistical control continued

There are several ways to control for confounding variables. For instance:

  • Simple physical setup of the trial
    • Don’t use different length tracks
  • Various mathematical methods
    • Multiply short-track group time by two

Statistical control continued

These methods of control can be very sophisticated. But there's a problem:

  • You have to know that a confounding variable exists in order to control for it.
  • And it's impossible to know ahead of time what all the confounding variables are!

A relevant quote

“… the golden rule of causal analysis: No causal claim can be established by a purely statistical method, be it propensity scores, regression, stratification, or any other distribution-based design.”

-Judea Pearl, “Causality,” p. 350

RCTs and causation

Well-conducted random sampling guarantees that all possible confounding variables are randomly distributed among conditions – which is to say, there’s no correlation between any trait and group membership!

Which means the groups, overall, start and finish on the same lines…

Which lets us assume that if they finish at different times, it's because of the program!

So to summarize:

The MS standard for evidence-based practice is the gold standard. Research quality drops off dramatically the more of these standards you lose.

  • In medicine: 50-80% of positive results in initial clinical trials are overturned by subsequent RCTs (Ioannidis (2005), Zia et al. (2005))
  • In business: 80-90% of new products and strategies tested under RCTs by Google and Microsoft have found no significant effects (Manzi (2012))
  • In education: 91% of rigorous RCTs conducted by the Institute for Education Sciences showed weak or no positive effects (CEBP (2013))

But sometimes, the perfect is the enemy of the good!

Gold is rare, though. What if we don't have any and still need to act?

MISS. CODE ANN. §27-103-159 provides some loose definitions of less rigorous alternatives:

  • “Research-based program” shall mean a program or practice that has some research demonstrating effectiveness, but that does not yet meet the standard of evidence-based practices.
  • “Promising practices” shall mean a practice that presents, based upon preliminary information, potential for becoming a research-based or evidence-based program or practice.

But these definitions are very loose!

So to make things easier...

We've adopted an existing scale to rate research below the MS standard of evidence:

The Maryland Scientific Methods scale!

The MSM scale?

Described by Farrington et al. (2002) in Evidence-based Crime Prevention.

It's a five-point ordinal scale – 1 is the worst, 5 is the best!

It rates our general ability to draw conclusions from the study.

  • Or said another way: it rates what threats to our desired conclusions are ruled out.

The MSM scale (and remaining threats at each level)

  1. Simple descriptive association
    • causal direction, confounders
  2. Pre-post testing
    • confounders
  3. Control group
    • nonequivalence of groups
  4. Control group plus high-quality statistical controls
    • inadequate control
  5. Randomized control group
    • inappropriate implementation and analysis

The MSM scale

It's not safe to make inferences from any trial below level 3!

So that's where we've drawn our line for “High-quality research”…

(Although you should always want the gold standard if possible!)

So finally... applications!

  1. The research basis of existing adult correctional programs in MS
    • Exhibit 2, pp. 5-6 of the Results First MS brief
  2. The cost-effectiveness of existing programs backed by high-quality research
    • Exhibit 3, p. 9 of the brief
  3. Programs backed by high quality research not currently in use in MS
    • Appendix F, pp. 28-31 of the brief
  4. Re-entry programs backed by high quality research
  5. RID and community-based Thinking for a Change

References

Coalition for Evidence-Based Policy (2013). Randomized Controlled Trials Commissioned by the Institute of Education Sciences Since 2002: How Many Found Positive Versus Weak or No Effects. Retrieved from http://coalition4evidence.org/wp-content/uploads/2013/06/IES-Commissioned-RCTs-positive-vs-weak-or-null-findings-7-2013.pdf

Farrington, D.P., Gottfredson, D.C., Sherman, L.W. & Welsh, B.C. (2002). The Maryland Scientific Methods Scale. In Farrington, D.P., MacKenzie. D. L., Sherman, L.W.,& Welsh, B.C. (Eds.), Evidence-Based Crime Prevention (pp. 13-21). London: Routledge.

Ioannidis, J.P.A. (2005). Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. Journal of the American Medical Association, 294(2), 218-228.

Manzi, J. (2012). Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society. New York: Perseus Books Group.

Pearl, J. (2009). Causality (2nd ed.). Cambridge: Cambridge University Press.

Zia, M. I., Siu, L. L., Pond, G. R., & Chen, E. X. (2005). Comparison of Outcomes of Phase II Studies and Subsequent Randomized Control Studies Using Identical Chemotherapeutic Regimens. Journal of Clinical Oncology, 23(28), 6982-6991.