This document is one of a series designed to illustrate how the R statistical computing environment can be used in various types of social work research. In this report, we present examples about how to use R and the R effectsize and metafor packages to compute various effect sizes commonly used in intervention research and reported in meta-analysis studies.
Two other Using R in Social Work Research monographs provide helpful information about the use of effect size and other important related intervention research topics:
Before getting to effect sizes, however, let’s do a quick review of what the design of an intervention study might look like. The following is the diagram for a randomized two-group pre-post research design (an RCT). In an RCT, a randomly selected pool of respondents is randomly assigned to a program or treatment group. Both groups are measured on the outcome before the intervention (pre) and then after the intervention (post). The diagram also works for a set of quasi-experimental studies the only difference being that these studies do not use random selection and random assignment procedures. (Important note: There is a wide range of possible research designs available for intervention research — in this monograpgh we foucus on RCTs).
What is really is the ‘heart of the matter’ for a study is the effect size represented by the two-headed arrow. An effect size is a measure that quantifies the difference between people in the study who experienced the intervention and people who were in the control group. In an intervention study, the hypothesis tested is that people in the intervention group will be better or improved as a result of the intervention compared to people who did not get the intervention. The effect size indicates how much better or improved they actually are.
There are two general families of effect sizes used in intervention research (remember these are represented by the two-headed in the RCT diagram) and in meta-analysis studies of interventions:
These outcome types determine what statistical test and effect size should be used to assess intervention success.
The four effect sizes based on dichotomous outcome data are:
The results of a study with a dichotomous outcome can be sumarized in a 2x2 table called a cross-tabulation. The general format for a study with two groups and a dichotomous outcome is shown as follows:
The cross-tabulation is created by the experimental and control groups in the study (the rows) and the dichotomous outcome of positive and no effect (columns). The crossing of these two variables creates a set of cells which are defined as follows:
The four popular effect sizes for dichotomous outcomes we identified above are simple functions of these cell values:
Let’s say we conducted a study using an intervention designed to help at-risk first-time mothers who, among other things, may be experiencing postnatal depression. The intervention used, Caring for 2, is evidence-based and focuses on building self-efficacy and postnatal depression prevention. We used a randomized two-group pre-post experimental study (RCT) with 60 participants in each group. One outcome of the study was expressed as a dichotomy – at post-test moms were diagnosed either as having no depressions symptoms (PD No in the table below) or as having depression symptoms (PD Yes in the table below).
The analysis of these results is straightforward. First, we load various R packages including the effectsize and metafor packages:
library(dplyr) ## Data management
library(knitr) ## Creates Markup files
library(effectsize)## Used to compute various effect sizes
library(metafor) ## Meta-analysis
Next, we load the data file:
pd <- read.csv("care2.csv") ##load a .csv file
head(pd, 3) ## Examine the first few records
## group pd
## 1 care2 no
## 2 care2 no
## 3 care2 no
We examine our cross-tabulated data looking at both cell frequencies and cell proportions:
## pd
## group no yes
## care2 48 12
## control 21 39
## pd
## group no yes
## care2 0.80 0.20
## control 0.35 0.65
The effectsize package does not compute a risk difference value. However, it is easily computed by simple subtraction using built-in R math functions. For example, we see in the proportion table that .80 (80 percent) of the Care2 moms did not have depression symptoms at the post test compared to .35 (35 percent) of the control moms. The simple difference (.80 - .35 = .45) is the risk difference effect size.
(risk_diff <- .80 - .35)
## [1] 0.45
We then compute other effect sizes using the effectsize package:
## Risk ratio | 95% CI
## -------------------------
## 2.96 | [1.78, 4.92]
## Odds ratio | 95% CI
## --------------------------
## 7.43 | [3.25, 16.96]
## Phi | 95% CI
## -------------------
## 0.46 | [0.30, 1.00]
##
## - One-sided CIs: upper bound fixed at (1).
We interpret these effect sizes as follows:
Our conclusion is that Caring for 2 was effective in impacting postnatal depression. There was both a statistical and practical difference between the program group and the control group in favor of the program group. Both our odds ratio and phi values were considered large effect sizes.
Like other professions, social work needs a systematic accumulation of evidence about effective practices and interventions. This need spans a wide range of issues and problems with which the profession is concerned. It is the ‘systematic accumulation’ part of the statement that requires constant attention. Meta-analytic thinking is the conceptual framework for developing the this systematic accumulation of evidence. Social work practitioners and researchers alike should be aware of the fact that what they do is a part of larger efforts to accumulate critical knowledge about what constitutes effective practice.
Meta-analysis is a set of statistical tools that facilitates ‘studies of studies’, and the building of accessible articles and sites where social work practitioners can find helpful resources to inform their practice. Meta-analysis studies use effect sizes to make important cross-study comparisons. In this example, we focus on how our odds ratio fits in a larger context of postnatal depression prevention studies.
We can see in the the forest plot that our study of the Caring for 2 intervention compares favorably to other studies of postnatal depression interventions. Our odds ratio is the largest followed closely by the odds ratio for the Vaughn study (OR = 5.00). One benefit of a meta-analysis is the computation of a weighted average of effect sizes which provides a summary estimate of how well, overall, the interventions represented in the study impacted an outcome. Using the Cohen odds ratio interpretation framework, the summary OR = 3.19 suggests that the postnatal depression interventions in this meta-analysis have a medium effect overall with wide OR variability across studies.
The two effect sizes based on continuous outcome data are:
The layout for a study using a continuous outcome is simpler that the cross-tabulation we discussed for dichotomous outcomes. We compute means for the experimental and control groups on the continuous outcome measure represented as shown in the following table:
If a study uses a continuous outcome on some scale or metric that has a commonly understood meaning (weight, income in dollars, behavioral counts, body mass index, minutes spent exercising, etc.), it is often convenient to express the effect size as a simple difference between the two group means (mean difference) using the following:
If a study uses a continuous outcome on some scale metric that does not necessarily have commonly understood meaning (most of our areas of interest), researchers use a standardized form called a standardized mean difference. Standardized mean difference effect sizes are computed using the following:
You can see in this equation that the standardized mean difference is computed by using the mean difference we noted above in the numerators and a measure of a standard deviation (SD) in the denominator. Thus, standardized mean difference effect sizes are expressed in standard deviation units – they are like the z-scores you may remember from your statistics classes.
Let’s say we conducted a study focusing on weight loss for kids in foster care. The intervention used, Bright Bodies/Smart Moves, is evidence-based and uses a lifestyles approach to weight loss and weight control. We used a randomized two-group pre-post experimental study (RCT). The primary outcome of the study was expressed as continuous – a body mass index (BMI) was computed at post-test. A lower BMI score is desirable.
For our analysis, we use the R libraries loaded above and load the weight data file:
wdata <- read.csv("weight.csv") ##load a .csv file
head(wdata, 3) ## list a first few records to check your data set
## group pre.bmi post.bmi
## 1 control 18.5 18.6
## 2 control 18.8 18.2
## 3 control 20.3 20.1
As a next step in the analysis, we compute a graph (boxplot) that visually presents the difference between the program and control groups on posttest BMI scores. We can see in this boxplot that the groups differ with the program groups having lower BMI scores than the control group.
The effectsize package does not compute a mean difference value. However, it is easily computed by simple subtraction again using built-in R math functions. The mean difference for our data 20.5 - 18.3 = 2.2. While there are no firm cutoff rules about what constitutes meaningful change in the BMI metric, we considered this difference value of 2.2 BMI units to be substantive.
(mean_diff <- 20.5 - 18.3)
## [1] 2.2
Next, we computed an independent samples t-test that assessed the statistical difference between the groups. Our null hypothesis was that there was no difference between groups; our research hypothesis was that there was a difference. We set alpha = .05 and ran the test. Our results indicated that there is a statistically significant difference between group with the results favoring our program participants. (Note: the reported p-value is expressed in scientific notation and is very small — less than .0000)
##
## Welch Two Sample t-test
##
## data: post.bmi by group
## t = 4, df = 58, p-value = 8e-04
## alternative hypothesis: true difference in means between group control and group program is not equal to 0
## 95 percent confidence interval:
## 0.953 3.427
## sample estimates:
## mean in group control mean in group program
## 20.5 18.3
Finally, we computed a standardized mean difference coefficient called Cohen’s d. Cohen’s d is a measure of the magnitude of the difference between the group expressed in standard deviation units. The program group was, on the average, .91 standard deviation units lower than the control on the BMI outcome. Our results were interpreted using the Cohen effect size framework where a d = .2 is a small effect size, d = .5 is a medium effect size, and d = .8 is a large effect size. The obtained d = .91 was a large effect size.
## Cohen's d | 95% CI
## ------------------------
## 0.91 | [0.38, 1.44]
##
## - Estimated using pooled SD.
Our conclusion was that the Bright Bodies/Smart Moves intervention was effective in impacting participant BMI. There was both a statistical and practical difference between the program group and the control group on BMI in favor of the program group. Cohen’s d = .91 was a large effect size.
Finally, because the evaluation was a high quality experimental design, we submitted it to a reputable social work research journal. Some time later you notice that it is included in an updated meta-analysis of weight control programs for children.
We can see in the the forest plot that our study of the Bright Bodies/Smart Moves intervention compares favorably to other studies of lifestyle based interventions. Our Cohen’s d = ,91 is the largest value in the set of studies. The summary Cohen’s d = .68 suggests that the lifestyle interventions in this meta-analysis have a medium effect overall effect using the Cohen framework interpretation framework.
The goal of this monograph was to present effect sizes useful for intervention studies. Hopefully, we were able to provide enough detail for you to consider how to make effect size reporting a basic part of your intervention research studies. Both the effectsize package and the metafor package are free and acessible tools that will make your effect size work easier. We should note that there are many more effect sizes available for your research needs. The are some estimates that suggest there are more than 150 effect sizes available for research work in many disciplines.
We also would like to add that effect sizes are a core feature of something called the ‘new statistics’. The ‘new’ statistics framework has important ties to social work research and practice in the following ways:
Researchers committed to the use of this new statistical analysis framework and approach argue that these are the things we are really interested in from our data:
First, we want to know how ‘big’ our relationships and effects are (not whether they are statistically significant or not). This interest has led to an emphasis on examining and reporting effect sizes.
Next, we We want to know how ‘precise’ our effect size estimates are. This interest has led to a focus on the use of confidence intervals in our analysis and reporting.
Further, we want to know how to design studies that will have a high probability of finding relationships or effects. This interest has led to critical attention to power analysis in the design of our studies.
Finally, We want to (or need to) think about our research in the larger context of research about a topic or problem area. This interest has led to something called meta-analytic thinking and meta-analysis as a strategy for research synthesis and replication.
The components of the ‘new’ statistics strengthen the ‘evidence’ part of evidence-based social work practice in the following ways:
A good effect size and meta-analysis reference for social work and other researchers:
Ellis, P.D. (2010). The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results. New York, NY: Cambridge University Press