Luong Nguyen graduated Bachelor of Public Health at Hanoi Medical University and currently is a researcher at Center for Public Health and Ecosystem Research, Hanoi University of Public Health. His research focus on the link between health and agriculture, food safety, infectious and zoonotic diseases with an emphasis on the use of integrative approaches such as One Health and Ecohealth
This is an RMD file written in R and it is rendered to an HTML format if you are viewing it on Rpubs. The purpose of writing this document is to give a brief introduction about meta-analysis using R. My main focus would be the studies in “Health care and Medicine.”
For the sake of introduction and familiarizing ourselves with R and RStudio, we will limit ourselves only to calculating and pooling the effect sizes of events in a single arm trial, using a dummy dataset.
R is a programming language just like other programming languages out there which you might have heard before, like Python, C++, Java etc. But R is specifically built to carry out statistical operations. It is used mainly by data scientists for data manipulation and visualization. But it has been a language of choice for conducting meta-analysis due to widely supported packages written by some brilliant intellectuals all over the world.
Meta-analysis is a quantitative summary of a given data. It is widely used in health care sciences to summarize the effects and outcomes of an intervention which is usually spread across multiple trials. In addition to the qualitative analysis which mostly is done in the form of systematic reviews and focused reviews, meta-analysis helps us in quantifying the net effect of an intervention. The subject of meta-analysis is quite extensive and a person learns this process gradually overtime. One of the very good resources to understand the basics of meta-analysis is the book, “Introduction to Meta-analysis,” by Borenstein et al. (2009).
In addition to a basic understanding of meta-analysis, you need to install R language and R Studio. Both of these are open source and free to download. The respective websites are https://cran.r-project.org and https://rstudio.com.
After setting up R and RStudio, you need to understand the packages in R. Packages are set of functions in R created by developers which do the tasks for us in the background and provide us with the output. The packages mainly used for meta-analysis in R include “Meta” and “Metafor.” Meta is good for beginners due to its easier to read code and command lines. In order to install meta package in R, following command is used:
In order to use a package, we have to load a package first. The command to load a package is:
One of the things you will need the most while using R is to see the documentation of the package and various functions in it. You can do it easily by typing a question mark before that package or function.
After seeing the documentation, you will have a basic idea about the functions of that pacakage.
Meta package is one of the most comprehensive packages to conduct meta-analysis. From conducting a meta-analysis of single arm proportions to meta-regression, almost all the tasks can be accomplished using the meta package. The important thing missing from this package and present in metafor package is moderator analysis using multiple moderators.
After installing and loading the package, let us take a simple example of meta-analysis in R using a dummy dataset of single arm trial reporting a Y event rate for a drug X. In this article, we will only calculate the effect sizes and generate a default forest plot.
There are many ways to import a dataset in R. The way I prefer is to have the data in a csv file created using microsoft excel or any spreadsheet software and use the following command to import the dataset and assign it a name. Let’s call it abc.
It will open a window for you to choose file.
Just so that you can copy what I am mentioning here, just copy the following code into R, and run it and it will generate the dataset for you.
abc <- structure(list(Study = c("Study 1", "Study 2", "Study 3", "Study 4",
"Study 5", "Study 6", "Study 7", "Study 8",
"Study 9", "Study 10", "Study 11", "Study 12",
"Study 13", "Study 14", "Study 15", "Study 16"),
event.e = c(1L, 4L, 6L, 20L, 69L, 22L, 19L, 26L, 13L, 2L, 7L, 2L, 29L, 6L, 11L, 4L),
n.e = c(12L, 21L, 43L, 83L, 373L, 219L, 164L, 264L, 102L, 14L, 53L, 60L, 172L, 55L, 49L, 14L)),
class = "data.frame", row.names = c(NA, -16L))Now you can explore the dataset and see it’s structure.
## Study event.e n.e
## Length:16 Min. : 1.00 Min. : 12.0
## Class :character 1st Qu.: 4.00 1st Qu.: 37.5
## Mode :character Median : 9.00 Median : 57.5
## Mean :15.06 Mean :106.1
## 3rd Qu.:20.50 3rd Qu.:166.0
## Max. :69.00 Max. :373.0
As you can see, our data set consists of 16 studies and it includes the number of events event.e and total population n.e in each study.
Metaprop is a function in the meta package which can be used to pool the effect sizes of proportions from a single arm study. You will find more information about other functions from the meta documentation.
The code for meta-analysis using metaprop function taking into account various important arguments is below and we will assign the results of our meta-analysis to another variable, let’s call it def.
def <- metaprop(event = event.e,
n= n.e,
data=abc,
studlab = Study,
sm="Plogit",
level = 0.95,
comb.fixed=TRUE, comb.random=TRUE,
hakn=F,
method.tau="DL")Now, if we take a look at the code, let me explain it. We have fed metaprop function a bunch of data which is necessary to carry out this meta-analysis. You can also find the details of all these arguments in the meta documentation. But I will briefly write them here.
Firstly we have assigned the appropriate column from our dummy dataset abc we created in the previous step so that it can get the data. event=event.e means that for the event, the data is in the column event.e of our dataset, and the dataset is abc. And the study labels are in column Study. sm indicates which summary measure should be used to pool effect sizes. Here we have used Plogit which is uses logit transformation. Some other commonly used measures like Freeman-Tukey Double Arscine Transformation (PFT) are written along with their respective codes in documentation of meta. Level stands for confidence level which we have set to 95%. comb.fixed and comb.random stands for fixed and random effects meta-analysis which we have set to true as we want to see the results from both of them. method.tau is the method used to calculate between study variance and the most commonly used is DerSimonian-Laird estimator which we have used.
There are a ton of other arguments but for the beginning, we will only use these ones and leave others to the default.
Now, in order to take a look at the results of the meta-analysis, just write the value which you have assigned to the results of the meta-anlaysis and run it using ctrl+enter. And it will give the details of meta-analysis including various statistical results. The command and its output would be this:
## proportion 95%-CI %W(fixed) %W(random)
## Study 1 0.0833 [0.0021; 0.3848] 0.5 1.2
## Study 2 0.1905 [0.0545; 0.4191] 1.6 3.4
## Study 3 0.1395 [0.0530; 0.2793] 2.5 4.7
## Study 4 0.2410 [0.1538; 0.3473] 7.5 8.6
## Study 5 0.1850 [0.1469; 0.2282] 27.7 12.2
## Study 6 0.1005 [0.0640; 0.1481] 9.8 9.5
## Study 7 0.1159 [0.0712; 0.1750] 8.3 8.9
## Study 8 0.0985 [0.0653; 0.1410] 11.6 10.0
## Study 9 0.1275 [0.0696; 0.2081] 5.6 7.5
## Study 10 0.1429 [0.0178; 0.4281] 0.8 2.0
## Study 11 0.1321 [0.0548; 0.2534] 3.0 5.3
## Study 12 0.0333 [0.0041; 0.1153] 1.0 2.2
## Study 13 0.1686 [0.1159; 0.2331] 11.9 10.1
## Study 14 0.1091 [0.0411; 0.2225] 2.6 4.9
## Study 15 0.2245 [0.1177; 0.3662] 4.2 6.5
## Study 16 0.2857 [0.0839; 0.5810] 1.4 3.1
##
## Number of studies combined: k = 16
##
## proportion 95%-CI
## Fixed effect model 0.1496 [0.1329; 0.1679]
## Random effects model 0.1444 [0.1183; 0.1750]
##
## Quantifying heterogeneity:
## tau^2 = 0.0935 [0.0000; 0.4335]; tau = 0.3058 [0.0000; 0.6584];
## I^2 = 52.2% [15.5%; 73.0%]; H = 1.45 [1.09; 1.92]
##
## Test of heterogeneity:
## Q d.f. p-value
## 31.41 15 0.0077
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
## - Logit transformation
## - Clopper-Pearson confidence interval for individual studies
In this result, you will find the quantitative summary of the studies as well as various measures of heterogeneity.
We can create a default forest plot quite easily using the forest function in R. For the time being, let’s create only a default plot in this paper. Forest function has numerous arguments, but we are not going into its detail in this article.
The command and the output is as under:
We have successfully generated the forest plot of our studies. Although a single arm meta-analysis is not commonly used, but we have used it to orient ourselves to the workings of R and the basic methodology for a meta-analysis in this language.
There are various other things to account for in meta-analysis which are beyond the scope of this paper. So, we will conclude here for now.
Do try to replicate it. It will help you get comfortable with R and RStudio.
A work by by Luong Nguyen - 14 June 2020