This brief tutorial should help you with the first steps in R. In a few guided examples, we are loading some data, calculating effect sizes and conducting a meta-analysis of a fictional data set.
Let's beginn with some R basics:
# define a numerical variable 'x'
x <- 7
# print the value of your variable 'x'
x
[1] 7
# simple calculations with 'x'
x + 8
[1] 15
10 * x
[1] 70
# define a string variable 'my_name'
my_name <- "Jokel"
Loading required package: RCurl
Loading required package: bitops
Loading required package: metafor
Loading required package: Formula
Loading 'metafor' package (version 1.8-0). For an overview and
introduction to the package please type: help(metafor).
I created a fictional data set for this tutorial. It is saved as a google spreadsheet. Thus the RCurl package is used in order to load the data from google drive into R. The data represents cognitive performance of patients and healthy controls in a number of different tests. You can have a look at the data: https://docs.google.com/spreadsheet/pub?key=0AvykV4O1IaendFRSQVVVV3pYN3JZRlNKSzBud0FFa1E&output=html The aim of the present meta-analysis is to investigate differences in cognitive performance between healthy controls and patients. Also we want to check whether those differences are affected by other factors such as the age of the subjects, the year of publication or the cognitive test used.
# load data from google spreadsheet
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?key=0AvykV4O1IaendFRSQVVVV3pYN3JZRlNKSzBud0FFa1E&output=csv")
my_data <- read.csv(textConnection(myCsv), dec = ",")
Let's have a look at how the data looks like.
# look at data
my_data
study year test n_controls n_patients age_controls
1 McMullin 1990 n-back 10 10 35.6
2 Meyer 1992 digit_span 75 80 22.2
3 Caraquez 1998 n-back 12 15 39.1
4 Smith 1998 n-back 24 20 29.1
5 Waters 2000 digit_span 102 120 19.1
6 Michael 2005 digit_span 20 19 65.0
7 Swensson 2005 n-back 46 37 25.7
8 Wilson 2007 n-back 12 9 44.4
9 Miller 2009 digit_span 70 74 22.2
10 Takasu 2010 n-back 38 48 55.6
11 Lieu 2012 digit_span 102 99 21.1
12 Franzen 2012 n-back 65 56 67.1
age_patients mean_controls mean_patients sd_controls sd_patients
1 35.4 1.02 0.77 0.10 0.09
2 23.7 56.00 53.00 2.10 3.40
3 41.1 1.01 0.54 0.30 0.23
4 30.1 121.00 131.00 8.00 7.00
5 22.9 12.00 14.50 3.02 4.09
6 65.1 22.90 15.20 2.40 1.60
7 24.7 99.00 97.00 11.00 12.00
8 45.0 22.00 18.00 0.90 2.20
9 25.1 14.90 13.10 1.50 1.02
10 54.5 17.00 12.00 1.90 2.20
11 23.4 12.00 20.00 4.10 4.50
12 66.9 101.00 67.00 5.00 6.00
In order to work with your data, analyze and manipulate it, you need to be able to select subsets. Our data is stored in the object 'my_data'. You can select subsets of it using squared brackets [].
# select the first row of your data
my_data[1, ]
study year test n_controls n_patients age_controls age_patients
1 McMullin 1990 n-back 10 10 35.6 35.4
mean_controls mean_patients sd_controls sd_patients
1 1.02 0.77 0.1 0.09
# select the fourth column of your data
my_data[, 4]
[1] 10 75 12 24 102 20 46 12 70 38 102 65
# select the element of the second column and third row
my_data[3, 2]
[1] 1998
In the first step we use the escalc() function to calculate effect sizes and their variance.
my_data <- escalc(n1i = n_controls, n2i = n_patients, m1i = mean_controls, m2i = mean_patients,
sd1i = sd_controls, sd2i = sd_patients, data = my_data, measure = "SMD",
append = TRUE)
Using the effect sizes and variance we can calculate the random-effects meta-analysis. The results are stored in the object 'ma_model_1'. You can retrieve the results using the summary() function.
ma_model_1 <- rma(yi, vi, data = my_data)
summary(ma_model_1)
Random-Effects Model (k = 12; tau^2 estimator: REML)
logLik deviance AIC BIC
-24.4650 48.9300 52.9300 53.7258
tau^2 (estimated amount of total heterogeneity): 4.8657 (SE = 2.1341)
tau (square root of estimated tau^2 value): 2.2058
I^2 (total heterogeneity / total variability): 98.76%
H^2 (total variability / sampling variability): 80.64
Test for Heterogeneity:
Q(df = 11) = 604.6327, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
1.4446 0.6459 2.2366 0.0253 0.1787 2.7105 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is a lot of info in the R output above. Most importantly you can see that there is an summary effect size of 1.4446 representing differences between patients and controls. Also this difference is statistically significant with a p=0.0253. In order to visualize the results you can create a forest-plot using the forest() function.
forest(ma_model_1, slab = paste(my_data$study, as.character(my_data$year), sep = ", "))
A common way to investigate potential publication bias in a meta-analysis is the funnel plot. Asymmetrical distribution indicates potential publication bias.
funnel(ma_model_1)
We could also check if there are differences between the two tests used. First let's visualize the difference using a boxplot.
boxplot(yi ~ test, data = my_data)