Tutorial: Running meta-analysis in R using the metafor package

This brief tutorial should help you with the first steps in R. In a few guided examples, we are loading some data, calculating effect sizes and conducting a meta-analysis of a fictional data set.

Let's beginn with some R basics:

# define a numerical variable 'x'
x <- 7
# print the value of your variable 'x'
x
[1] 7
# simple calculations with 'x'
x + 8
[1] 15
10 * x
[1] 70
# define a string variable 'my_name'
my_name <- "Jokel"
Loading required package: RCurl
Loading required package: bitops
Loading required package: metafor
Loading required package: Formula
Loading 'metafor' package (version 1.8-0). For an overview and
introduction to the package please type: help(metafor).

I created a fictional data set for this tutorial. It is saved as a google spreadsheet. Thus the RCurl package is used in order to load the data from google drive into R. The data represents cognitive performance of patients and healthy controls in a number of different tests. You can have a look at the data: https://docs.google.com/spreadsheet/pub?key=0AvykV4O1IaendFRSQVVVV3pYN3JZRlNKSzBud0FFa1E&output=html The aim of the present meta-analysis is to investigate differences in cognitive performance between healthy controls and patients. Also we want to check whether those differences are affected by other factors such as the age of the subjects, the year of publication or the cognitive test used.

# load data from google spreadsheet
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?key=0AvykV4O1IaendFRSQVVVV3pYN3JZRlNKSzBud0FFa1E&output=csv")
my_data <- read.csv(textConnection(myCsv), dec = ",")

Let's have a look at how the data looks like.

# look at data
my_data
      study year       test n_controls n_patients age_controls
1  McMullin 1990     n-back         10         10         35.6
2     Meyer 1992 digit_span         75         80         22.2
3  Caraquez 1998     n-back         12         15         39.1
4     Smith 1998     n-back         24         20         29.1
5    Waters 2000 digit_span        102        120         19.1
6   Michael 2005 digit_span         20         19         65.0
7  Swensson 2005     n-back         46         37         25.7
8    Wilson 2007     n-back         12          9         44.4
9    Miller 2009 digit_span         70         74         22.2
10   Takasu 2010     n-back         38         48         55.6
11     Lieu 2012 digit_span        102         99         21.1
12  Franzen 2012     n-back         65         56         67.1
   age_patients mean_controls mean_patients sd_controls sd_patients
1          35.4          1.02          0.77        0.10        0.09
2          23.7         56.00         53.00        2.10        3.40
3          41.1          1.01          0.54        0.30        0.23
4          30.1        121.00        131.00        8.00        7.00
5          22.9         12.00         14.50        3.02        4.09
6          65.1         22.90         15.20        2.40        1.60
7          24.7         99.00         97.00       11.00       12.00
8          45.0         22.00         18.00        0.90        2.20
9          25.1         14.90         13.10        1.50        1.02
10         54.5         17.00         12.00        1.90        2.20
11         23.4         12.00         20.00        4.10        4.50
12         66.9        101.00         67.00        5.00        6.00

In order to work with your data, analyze and manipulate it, you need to be able to select subsets. Our data is stored in the object 'my_data'. You can select subsets of it using squared brackets [].

# select the first row of your data
my_data[1, ]
     study year   test n_controls n_patients age_controls age_patients
1 McMullin 1990 n-back         10         10         35.6         35.4
  mean_controls mean_patients sd_controls sd_patients
1          1.02          0.77         0.1        0.09

# select the fourth column of your data
my_data[, 4]
 [1]  10  75  12  24 102  20  46  12  70  38 102  65

# select the element of the second column and third row
my_data[3, 2]
[1] 1998

In the first step we use the escalc() function to calculate effect sizes and their variance.

my_data <- escalc(n1i = n_controls, n2i = n_patients, m1i = mean_controls, m2i = mean_patients, 
    sd1i = sd_controls, sd2i = sd_patients, data = my_data, measure = "SMD", 
    append = TRUE)

Using the effect sizes and variance we can calculate the random-effects meta-analysis. The results are stored in the object 'ma_model_1'. You can retrieve the results using the summary() function.

ma_model_1 <- rma(yi, vi, data = my_data)
summary(ma_model_1)

Random-Effects Model (k = 12; tau^2 estimator: REML)

  logLik  deviance       AIC       BIC  
-24.4650   48.9300   52.9300   53.7258  

tau^2 (estimated amount of total heterogeneity): 4.8657 (SE = 2.1341)
tau (square root of estimated tau^2 value):      2.2058
I^2 (total heterogeneity / total variability):   98.76%
H^2 (total variability / sampling variability):  80.64

Test for Heterogeneity: 
Q(df = 11) = 604.6327, p-val < .0001

Model Results:

estimate       se     zval     pval    ci.lb    ci.ub          
  1.4446   0.6459   2.2366   0.0253   0.1787   2.7105        * 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

There is a lot of info in the R output above. Most importantly you can see that there is an summary effect size of 1.4446 representing differences between patients and controls. Also this difference is statistically significant with a p=0.0253. In order to visualize the results you can create a forest-plot using the forest() function.

forest(ma_model_1, slab = paste(my_data$study, as.character(my_data$year), sep = ", "))

plot of chunk unnamed-chunk-10

A common way to investigate potential publication bias in a meta-analysis is the funnel plot. Asymmetrical distribution indicates potential publication bias.

funnel(ma_model_1)

plot of chunk unnamed-chunk-11

We could also check if there are differences between the two tests used. First let's visualize the difference using a boxplot.

boxplot(yi ~ test, data = my_data)

plot of chunk unnamed-chunk-12