Meta-Analysis of Epidemiological
Dose-Response Studies with the
dosresmeta R package

useR! 2016, June 2016

Department of Public Health Sciences
Karolinska Institutet
Stockholm, Sweden

Outline

Background and rationale
Statistical methodology
How to present the results
Additional slides

Dose-response meta-analysis

Derive a dose-response curve from published results of multiple studies on the relation between a quantitative exposure (e.g. number of cigarettes, diet) and the occurrence of a health outcome (e.g. mortality risk, incidence of a disease)

Research questions:

Is there any association? What is the shape of the association?
What are the exposure values associated with the best or worst outcome?
What are the factors that can influence the dose–response shape?

Increasing number of publications

plot of chunk unnamed-chunk-3

Several fields of application
Many leading medical and epidemiological journals
Global health organizations and foundations
Measures of public health impact

Motivating example

If any, what is the dose-response association between coffee consumption and all-cause mortality?

# devtools::install_github("alecri/dosresmeta")
library(dosresmeta)
coffee <- get(data(coffee_mort))
head(coffee, n = 10)

   id           author year type dose cases    n  logrr   se gender   area
1   1   LeGrady et al. 1987   ci  0.5    57  249  0.000 0.00      M    USA
2   1   LeGrady et al. 1987   ci  2.5   136  655 -0.288 0.14      M    USA
3   1   LeGrady et al. 1987   ci  4.5   144  619 -0.174 0.14      M    USA
4   1   LeGrady et al. 1987   ci  6.5   115  387  0.086 0.14      M    USA
5   2 Rosengren et al. 1991   ci  0.0    17  192  0.000 0.00      M Europe
6   2 Rosengren et al. 1991   ci  1.5    88 1121 -0.120 0.25      M Europe
7   2 Rosengren et al. 1991   ci  3.5   155 2464 -0.342 0.24      M Europe
8   2 Rosengren et al. 1991   ci  5.5   155 1986 -0.126 0.24      M Europe
9   2 Rosengren et al. 1991   ci  7.5    39  575 -0.267 0.28      M Europe
10  2 Rosengren et al. 1991   ci  9.5    24  427 -0.511 0.28      M Europe

Published results: categorizing the exposure

library(Epi)
library(dplyr)
data(esoph)
summary(esoph$alcgp)

0-39g/day     40-79    80-119      120+ 
       23        23        21        21

model <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp, data = esoph, family = binomial())
summary(model)


Call:
glm(formula = cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp, 
    family = binomial(), data = esoph)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.689  -0.562  -0.217   0.231   2.064  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.7800     0.1980   -8.99  < 2e-16 ***
agegp.L       3.0053     0.6522    4.61  4.1e-06 ***
agegp.Q      -1.3379     0.5911   -2.26   0.0236 *  
agegp.C       0.1531     0.4485    0.34   0.7329    
agegp^4       0.0641     0.3088    0.21   0.8356    
agegp^5      -0.1936     0.1954   -0.99   0.3216    
tobgp.L       0.5945     0.1942    3.06   0.0022 ** 
tobgp.Q       0.0654     0.1881    0.35   0.7282    
tobgp.C       0.1568     0.1866    0.84   0.4007    
alcgp.L       1.4919     0.1994    7.48  7.2e-14 ***
alcgp.Q      -0.2266     0.1795   -1.26   0.2068    
alcgp.C       0.2546     0.1591    1.60   0.1094    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 227.241  on 87  degrees of freedom
Residual deviance:  53.973  on 76  degrees of freedom
AIC: 225.5

Number of Fisher Scoring iterations: 6

# published (aggregated) results
data.frame(
  group_by(esoph, alcgp) %>% 
    select(ncases, ncontrols) %>%
    summarise_each(funs(sum)),
  rbind(c(rr = 1, lb = NA, ub = NA),
        unname(ci.exp(model)[10:12, ]))
) %>% mutate(logrr = log(rr), 
             se = (log(ub) - log(lb))/(2*qnorm(.975)))

      alcgp ncases ncontrols  rr   lb  ub logrr   se
1 0-39g/day     29       415 1.0   NA  NA  0.00   NA
2     40-79     75       355 4.4 3.01 6.6  1.49 0.20
3    80-119     51       138 0.8 0.56 1.1 -0.23 0.18
4      120+     45        67 1.3 0.94 1.8  0.25 0.16

Aggregated data

subset(coffee, id == 1)

  id         author year type dose cases   n  logrr   se gender area
1  1 LeGrady et al. 1987   ci  0.5    57 249  0.000 0.00      M  USA
2  1 LeGrady et al. 1987   ci  2.5   136 655 -0.288 0.14      M  USA
3  1 LeGrady et al. 1987   ci  4.5   144 619 -0.174 0.14      M  USA
4  1 LeGrady et al. 1987   ci  6.5   115 387  0.086 0.14      M  USA

Observations are not independent (share reference category)
The RR in the referent is 1

# How many observations in ech study?
with(subset(coffee, se!=0), table(id))

id
 1  2  3  4  5  6  7  8  9 12 13 14 15 16 17 18 19 20 21 22 23 24 
 3  5  4  3  3  3  3  2  2  4  4  5  5  4  4  3  3  3  3  3  5  5

Two-stage dose-response meta-analysis: 1st stage

Log-linear model:

\[ y = X \beta + \epsilon \]

\( y \) : vector of non-referent log \( RR \)

\( X \) : design matrix contains the assigned doses

Model without intercept, so that \( RR = 1|x = 0 \)
\( Cov(\epsilon) \) can be approximated depending on the study design (“cc” case-control, “ir” incidence-rate, “ci” cumulative-incidence)

Additional information needed for each exposure level:
number of cases, and number of subjects (for case-control and cumulative incidence data) or person-time (for incidence rate data)

covar.logrr(cases, n, logrr, se^2, type, data = subset(coffee, id == 1))

      [,1]  [,2]  [,3]
[1,] 0.019 0.013 0.013
[2,] 0.013 0.019 0.013
[3,] 0.013 0.013 0.020

Assuming a linear trend: \( y = \beta_1 x + \epsilon \)

dosresmeta(formula = logrr ~ dose, se = se, type = type, 
           cases = cases, n = n, data = subset(coffee, id == 1))

Call:  dosresmeta(formula = logrr ~ dose, type = type, cases = cases, 
    n = n, data = subset(coffee, id == 1), se = se)

Fixed-effects coefficients:
  dose  
0.0328  

1 study 3 values, 1 fixed and 0 random-effects parameters
 logLik      AIC      BIC  
-0.7914   3.5827   2.6813

lin_i <- lapply(split(coffee, coffee$id), function(d)
   dosresmeta(logrr ~ dose, id, I(se^2), type, cases, n, data = d))
(dat_i <- data.frame(bi = unname(do.call("c", lapply(lin_i, coef))),
                     vi_bi = do.call("c", lapply(lin_i, vcov))))

       bi    vi_bi
1   0.033 0.000447
2  -0.024 0.000356
3  -0.014 0.000058
4  -0.048 0.000619
5  -0.047 0.001219
6  -0.020 0.000113
7  -0.065 0.000422
8  -0.418 0.008903
9  -0.103 0.016835
12 -0.020 0.000071
13 -0.025 0.000054
14 -0.028 0.000114
15 -0.055 0.000045
16 -0.029 0.000543
17 -0.018 0.000109
18 -0.014 0.000176
19 -0.032 0.000608
20 -0.079 0.001991
21 -0.068 0.000194
22 -0.086 0.000435
23 -0.022 0.000012
24 -0.038 0.000023

with(subset(coffee, se !=0), 
     plot(dose, logrr, pch = 1, 
          cex = sqrt((1/se^2))/2))
lapply(lin_i, function(l) 
  abline(a = 0, b = coef(l)))

plot of chunk unnamed-chunk-14

Two-stage dose-response meta-analysis: 2nd stage

\[ b_i \sim N \left( \bar b , v_{b_i} + \tau^2 \right) \]

\[ \hat {\bar b} = \frac{\sum_{i = 1}^K b_i w_i}{\sum_{i = 1}^K w_i} \] \( w_i = (v_{b_i} + \tau^2)^{-1} \)

\( \tau^2 \) is the between-studies heterogeneity
Test for overall association: \( H_0: \bar b = 0 \)
Test (Cochran \( Q \)) and quantify the impact (\( I^2 \)) of statistical heterogeneity

# library(metafor)
# rma(bi, vi_bi, data = dat_i)
lin <- dosresmeta(formula = logrr ~ dose, id = id, se = se, type = type, cases = cases, n = n, data = coffee, method = "mm")
summary(lin)

Call:  dosresmeta(formula = logrr ~ dose, id = id, type = type, cases = cases, 
    n = n, data = coffee, se = se, method = "mm")

Two-stage random-effects meta-analysis
Estimation method: 
Covariance approximation: Greenland & Longnecker

Chi2 model: X2 = 44.4226 (df = 1), p-value = 0.0000

Fixed-effects coefficients
             Estimate  Std. Error        z  Pr(>|z|)  95%ci.lb  95%ci.ub     
(Intercept)   -0.0324      0.0049  -6.6650    0.0000   -0.0419   -0.0229  ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Between-study random-effects (co)variance components
  Std. Dev
    0.0163

Univariate Cochran Q-test for residual heterogeneity:
Q = 77.0088 (df = 21), p-value = 0.0000
I-square statistic = 72.7%

22 studies, 22 values, 1 fixed and 1 random-effects parameters

Every increase of one cup of coffee per day was associated with a pooled \( 100*(1 - \exp({\beta_1})) = 100*(1 - \exp( \) -0.03)) = 3.19% reduction in all-cause mortality (95% CI: 2.26, 4.1).

Modeling non-linearity: Restricted Cubic Splines

library(rms)
k <- with(coffee, quantile(dose, c(.1, .5, .9)))
spl <- dosresmeta(logrr ~ rcs(dose, k), id = id, se = se, type = type, cases = cases, n = n, data = coffee, method = "mm")
summary(spl)

Call:  dosresmeta(formula = logrr ~ rcs(dose, k), id = id, type = type, 
    cases = cases, n = n, data = coffee, se = se, method = "mm")

Two-stage random-effects meta-analysis
Estimation method: 
Covariance approximation: Greenland & Longnecker

Chi2 model: X2 = 209.5069 (df = 2), p-value = 0.0000

Fixed-effects coefficients
                               Estimate  Std. Error        z  Pr(>|z|)  95%ci.lb  95%ci.ub     
rcs(dose, k)dose.(Intercept)    -0.0764      0.0105  -7.3101    0.0000   -0.0969   -0.0559  ***
rcs(dose, k)dose'.(Intercept)    0.1053      0.0229   4.5935    0.0000    0.0604    0.1502  ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Between-study random-effects (co)variance components
                   Std. Dev              Corr
rcs(dose, k)dose     0.0354  rcs(dose, k)dose
rcs(dose, k)dose'    0.0797                -1
(Note: Truncated estimate - 1 negative eigenvalues set to 0)

Univariate Cochran Q-test for residual heterogeneity:
Q = 101.3912 (df = 42), p-value = 0.0000
I-square statistic = 58.6%

22 studies, 44 values, 2 fixed and 1 random-effects parameters

Presenting non-linearity

“Every unit increase is associated with …” not appropriate
Predict RRs for a set of exposure levels \( x^* \) (as compared to \( x_{ref} \))

\[ \widehat {RR} = e^{(X^* - X_{ref}) \hat {\bar \beta}} \]

\[ \text{95% CI: } e^{(X^* - X_{ref}) \hat {\bar \beta} \pm z_{1-\alpha/2} \left[ (X^* - X_{ref})^\top Var(\hat {\bar \beta}) (X^* - X_{ref}) \right]} \]

Tip: use the predict function
Predicted RRs can be both presented in either graphical or tabular form

xref <- 0
newdata <- data.frame(dose = 0:7)
predict(spl, newdata, xref = xref, expo = T)

  rcs(dose, k)dose rcs(dose, k)dose' pred ci.lb ci.ub
1                0             0.000 1.00  1.00  1.00
2                1             0.024 0.93  0.91  0.95
3                2             0.189 0.88  0.85  0.90
4                3             0.605 0.85  0.82  0.88
5                4             1.241 0.84  0.82  0.86
6                5             2.036 0.85  0.83  0.87
7                6             2.924 0.86  0.84  0.88
8                7             3.846 0.88  0.84  0.92

newdata <- data.frame(dose = seq(0, 7, .2))
with(predict(spl, newdata, xref = xref, expo = TRUE),
     matplot(newdata$dose, cbind(pred, ci.ub, ci.lb), 
             type = "l", col = 1, lty = c(1, 2, 2),
             log = "y", ylim = c(.7, 1.1),
             ylab = "Relative risk", 
             xlab = "Coffee consumption, cups/day")
)
with(predict(lin, newdata, xref = xref, expo = TRUE),
     points(dose, pred, type = "l", lty = 3))
rug(coffee$dose)

plot of chunk unnamed-chunk-19

Meta-regression and stratified analysis

To further explain heterogeneity and/or explore differences in dose-response curves
Prespecified study-level characteristics
Limited number of data points: interpret with caution
Stratified (or subgroups) analysis: divide the observations in groups and evaluate differences
Meta-regression: formalization of stratified analysis, + investigation of continuous variables (be aware of pitfall)

spl_regr <- dosresmeta(logrr ~ rcs(dose, k), id, I(se^2), type, cases, n, data = coffee,
                       mod = ~ area, method = "mm")
summary(spl_regr)

Call:  dosresmeta(formula = logrr ~ rcs(dose, k), id = id, v = I(se^2), 
    type = type, cases = cases, n = n, data = coffee, mod = ~area, 
    method = "mm")

Two-stage random-effects meta-regression
Estimation method: 
Covariance approximation: Greenland & Longnecker

Chi2 model: X2 = 187.2975 (df = 6), p-value = 0.0000

Fixed-effects coefficients
                               Estimate  Std. Error        z  Pr(>|z|)  95%ci.lb  95%ci.ub     
rcs(dose, k)dose.(Intercept)    -0.0830      0.0231  -3.5855    0.0003   -0.1283   -0.0376  ***
rcs(dose, k)dose.areaJapan      -0.0580      0.0373  -1.5558    0.1197   -0.1310    0.0151     
rcs(dose, k)dose.areaUSA         0.0126      0.0268   0.4684    0.6395   -0.0400    0.0651     
rcs(dose, k)dose'.(Intercept)    0.0923      0.0421   2.1917    0.0284    0.0098    0.1749    *
rcs(dose, k)dose'.areaJapan      0.1709      0.1008   1.6945    0.0902   -0.0268    0.3685    .
rcs(dose, k)dose'.areaUSA        0.0039      0.0519   0.0747    0.9405   -0.0978    0.1055     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Between-study random-effects (co)variance components
                   Std. Dev              Corr
rcs(dose, k)dose     0.0331  rcs(dose, k)dose
rcs(dose, k)dose'    0.0747           -0.9964

Univariate Cochran Q-test for residual heterogeneity:
Q = 86.2279 (df = 38), p-value = 0.0000
I-square statistic = 55.9%

22 studies, 44 values, 6 fixed and 1 random-effects parameters

waldtest(Sigma = vcov(spl_regr), b = coef(spl_regr), Terms = c(2, 3, 5, 6))

Wald test:
----------

Chi-squared test:
X2 = 9.7, df = 4, P(> X2) = 0.046

newdata <- expand.grid(dose = seq(0, 7, .2), 
                       area = levels(coffee$area))
pred_regr <- predict(spl_regr, newdata, exp = T)

ggplot(newdata, aes(x = dose, y = pred_regr$pred, 
                    col = area)) +
   xlab("Coffee consumption, cups/day") +
   scale_y_continuous(trans = "log", "Relative risk", 
      breaks = seq(.75, 1.1, .05)) + geom_line() + 
  theme_classic()

plot of chunk unnamed-chunk-23

Stratified analysis

Goodness of fit

Deviance (D)
- Total absolute distance between fitted and reported RRs
- Test for model specification
Coefficient of determination (\( R^2 \))
- Descriptive measure of agreement
- Dimensionless measure bounded between 0 and 1
Plot of de-correlated residuals versus exposure
- Visual assessment of the goodness of fit
- Evaluate how the pooled dose–response curve fits the data by exposure levels

source("additionalMethods.R")
models <- list(lin, spl, spl_regr)
data.frame(do.call("rbind", lapply(models, function(m) 
   unlist(gof(m)[c("deviance","R2", "R2adj")]))))

  deviance.D deviance.df deviance.p   R2 R2adj
1        225          78    3.2e-16 0.49  0.48
2        141          77    1.1e-05 0.68  0.67
3        126          73    1.1e-04 0.71  0.69

par(mfrow = c(1, 3))
lapply(models, function(m)
       with(gof(m)$tdata,{
          plot(coffee$dose[coffee$se != 0], residuals, ylim = -c(-5, 5), xlab = "Coffee consumption, cups/day"))
          lines(lowess(coffee$dose[coffee$se != 0], residuals), lwd = 4)
       })
)

plot of chunk unnamed-chunk-26

References

http://alecri.github.io/software/dosresmeta
Crippa A, Orsini N. Multivariate Dose-Response Meta-Analysis: the dosresmeta R Package. 2016. J Stat Softw. In press.
Crippa A, Discacciati A, Larsson SC, Wolk A, Orsini N. Coffee consumption and mortality from all causes, cardiovascular disease, and cancer: a dose-response meta-analysis. American journal of epidemiology. 2014 Aug 24:kwu194.
Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. American journal of epidemiology. 1992 Jun 1;135(11):1301-9.
Orsini N, Li R, Wolk A, Khudyakov P, Spiegelman D. Meta-analysis for linear and nonlinear dose-response relations: examples, an evaluation of approximations, and software. American journal of epidemiology. 2012 Jan 1;175(1):66-73.
Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology. 1993 May 1;4(3):218-28.
Orsini N, Bellocco R, Greenland S. Generalized least squares for trend estimation of summarized dose-response data. Stata Journal. 2006 Jan 1;6(1):40.
Discacciati A, Crippa A, Orsini N. Goodness of fit tools for dose–response meta‐analysis of binary outcomes. Research synthesis methods. 2015 Jan 1.

Information about R session:

R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rms_4.5-0        SparseM_1.7      Hmisc_3.17-4     Formula_1.2-1    survival_2.39-2  lattice_0.20-33 
 [7] dplyr_0.5.0      Epi_2.0          dosresmeta_2.0.0 mvmeta_0.4.7     ggplot2_2.1.0    knitr_1.13      

loaded via a namespace (and not attached):
 [1] zoo_1.7-13          splines_3.3.0       etm_0.6-2           colorspace_1.2-6    chron_2.3-47       
 [6] foreign_0.8-66      DBI_0.4-1           RColorBrewer_1.1-2  multcomp_1.4-5      plyr_1.8.4         
[11] stringr_1.0.0       MatrixModels_0.4-1  munsell_0.4.3       gtable_0.2.0        mvtnorm_1.0-5      
[16] codetools_0.2-14    evaluate_0.9        labeling_0.3        latticeExtra_0.6-28 quantreg_5.26      
[21] parallel_3.3.0      TH.data_1.0-7       Rcpp_0.12.5         acepack_1.3-3.3     scales_0.4.0       
[26] formatR_1.4         cmprsk_2.2-7        gridExtra_2.2.1     digest_0.6.9        stringi_1.1.1      
[31] polspline_1.1.12    grid_3.3.0          tools_3.3.0         sandwich_2.3-4      magrittr_1.5       
[36] lazyeval_0.2.0      tibble_1.0          cluster_2.0.4       MASS_7.3-45         Matrix_1.2-6       
[41] data.table_1.9.6    assertthat_0.1      R6_2.1.2            rpart_4.1-10        nnet_7.3-12        
[46] nlme_3.1-127

Meta-Analysis of Epidemiological Dose-Response Studies with the dosresmeta R package