Meta-Analysis of Epidemiological
Dose-Response Studies with the
dosresmeta R package

useR! 2016, June 2016

Alessio Crippa & Nicola Orsini

Department of Public Health Sciences
Karolinska Institutet
Stockholm, Sweden

Outline



  • Background and rationale

  • Statistical methodology

  • How to present the results

  • Additional slides

Dose-response meta-analysis

Derive a dose-response curve from published results of multiple studies on the relation between a quantitative exposure (e.g. number of cigarettes, diet) and the occurrence of a health outcome (e.g. mortality risk, incidence of a disease)

Research questions:

  • Is there any association? What is the shape of the association?

  • What are the exposure values associated with the best or worst outcome?

  • What are the factors that can influence the dose–response shape?

Increasing number of publications

plot of chunk unnamed-chunk-3



  • Several fields of application

  • Many leading medical and epidemiological journals

  • Global health organizations and foundations

  • Measures of public health impact

Motivating example

If any, what is the dose-response association between coffee consumption and all-cause mortality?

# devtools::install_github("alecri/dosresmeta")
library(dosresmeta)
coffee <- get(data(coffee_mort))
head(coffee, n = 10)
   id           author year type dose cases    n  logrr   se gender   area
1   1   LeGrady et al. 1987   ci  0.5    57  249  0.000 0.00      M    USA
2   1   LeGrady et al. 1987   ci  2.5   136  655 -0.288 0.14      M    USA
3   1   LeGrady et al. 1987   ci  4.5   144  619 -0.174 0.14      M    USA
4   1   LeGrady et al. 1987   ci  6.5   115  387  0.086 0.14      M    USA
5   2 Rosengren et al. 1991   ci  0.0    17  192  0.000 0.00      M Europe
6   2 Rosengren et al. 1991   ci  1.5    88 1121 -0.120 0.25      M Europe
7   2 Rosengren et al. 1991   ci  3.5   155 2464 -0.342 0.24      M Europe
8   2 Rosengren et al. 1991   ci  5.5   155 1986 -0.126 0.24      M Europe
9   2 Rosengren et al. 1991   ci  7.5    39  575 -0.267 0.28      M Europe
10  2 Rosengren et al. 1991   ci  9.5    24  427 -0.511 0.28      M Europe

Published results: categorizing the exposure

library(Epi)
library(dplyr)
data(esoph)
summary(esoph$alcgp)
0-39g/day     40-79    80-119      120+ 
       23        23        21        21 
model <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp, data = esoph, family = binomial())
summary(model)

Call:
glm(formula = cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp, 
    family = binomial(), data = esoph)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.689  -0.562  -0.217   0.231   2.064  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.7800     0.1980   -8.99  < 2e-16 ***
agegp.L       3.0053     0.6522    4.61  4.1e-06 ***
agegp.Q      -1.3379     0.5911   -2.26   0.0236 *  
agegp.C       0.1531     0.4485    0.34   0.7329    
agegp^4       0.0641     0.3088    0.21   0.8356    
agegp^5      -0.1936     0.1954   -0.99   0.3216    
tobgp.L       0.5945     0.1942    3.06   0.0022 ** 
tobgp.Q       0.0654     0.1881    0.35   0.7282    
tobgp.C       0.1568     0.1866    0.84   0.4007    
alcgp.L       1.4919     0.1994    7.48  7.2e-14 ***
alcgp.Q      -0.2266     0.1795   -1.26   0.2068    
alcgp.C       0.2546     0.1591    1.60   0.1094    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 227.241  on 87  degrees of freedom
Residual deviance:  53.973  on 76  degrees of freedom
AIC: 225.5

Number of Fisher Scoring iterations: 6
# published (aggregated) results
data.frame(
  group_by(esoph, alcgp) %>% 
    select(ncases, ncontrols) %>%
    summarise_each(funs(sum)),
  rbind(c(rr = 1, lb = NA, ub = NA),
        unname(ci.exp(model)[10:12, ]))
) %>% mutate(logrr = log(rr), 
             se = (log(ub) - log(lb))/(2*qnorm(.975)))
      alcgp ncases ncontrols  rr   lb  ub logrr   se
1 0-39g/day     29       415 1.0   NA  NA  0.00   NA
2     40-79     75       355 4.4 3.01 6.6  1.49 0.20
3    80-119     51       138 0.8 0.56 1.1 -0.23 0.18
4      120+     45        67 1.3 0.94 1.8  0.25 0.16

Aggregated data

subset(coffee, id == 1)
  id         author year type dose cases   n  logrr   se gender area
1  1 LeGrady et al. 1987   ci  0.5    57 249  0.000 0.00      M  USA
2  1 LeGrady et al. 1987   ci  2.5   136 655 -0.288 0.14      M  USA
3  1 LeGrady et al. 1987   ci  4.5   144 619 -0.174 0.14      M  USA
4  1 LeGrady et al. 1987   ci  6.5   115 387  0.086 0.14      M  USA
  • Observations are not independent (share reference category)

  • The RR in the referent is 1

# How many observations in ech study?
with(subset(coffee, se!=0), table(id))
id
 1  2  3  4  5  6  7  8  9 12 13 14 15 16 17 18 19 20 21 22 23 24 
 3  5  4  3  3  3  3  2  2  4  4  5  5  4  4  3  3  3  3  3  5  5 

Two-stage dose-response meta-analysis: 1st stage

Log-linear model:

\[ y = X \beta + \epsilon \]

\( y \) : vector of non-referent log \( RR \)

\( X \) : design matrix contains the assigned doses

  • Model without intercept, so that \( RR = 1|x = 0 \)

  • \( Cov(\epsilon) \) can be approximated depending on the study design (“cc” case-control, “ir” incidence-rate, “ci” cumulative-incidence)

Additional information needed for each exposure level:
number of cases, and number of subjects (for case-control and cumulative incidence data) or person-time (for incidence rate data)

covar.logrr(cases, n, logrr, se^2, type, data = subset(coffee, id == 1))
      [,1]  [,2]  [,3]
[1,] 0.019 0.013 0.013
[2,] 0.013 0.019 0.013
[3,] 0.013 0.013 0.020

Assuming a linear trend: \( y = \beta_1 x + \epsilon \)

dosresmeta(formula = logrr ~ dose, se = se, type = type, 
           cases = cases, n = n, data = subset(coffee, id == 1))
Call:  dosresmeta(formula = logrr ~ dose, type = type, cases = cases, 
    n = n, data = subset(coffee, id == 1), se = se)

Fixed-effects coefficients:
  dose  
0.0328  

1 study 3 values, 1 fixed and 0 random-effects parameters
 logLik      AIC      BIC  
-0.7914   3.5827   2.6813  
lin_i <- lapply(split(coffee, coffee$id), function(d)
   dosresmeta(logrr ~ dose, id, I(se^2), type, cases, n, data = d))
(dat_i <- data.frame(bi = unname(do.call("c", lapply(lin_i, coef))),
                     vi_bi = do.call("c", lapply(lin_i, vcov))))

       bi    vi_bi
1   0.033 0.000447
2  -0.024 0.000356
3  -0.014 0.000058
4  -0.048 0.000619
5  -0.047 0.001219
6  -0.020 0.000113
7  -0.065 0.000422
8  -0.418 0.008903
9  -0.103 0.016835
12 -0.020 0.000071
13 -0.025 0.000054
14 -0.028 0.000114
15 -0.055 0.000045
16 -0.029 0.000543
17 -0.018 0.000109
18 -0.014 0.000176
19 -0.032 0.000608
20 -0.079 0.001991
21 -0.068 0.000194
22 -0.086 0.000435
23 -0.022 0.000012
24 -0.038 0.000023
with(subset(coffee, se !=0), 
     plot(dose, logrr, pch = 1, 
          cex = sqrt((1/se^2))/2))
lapply(lin_i, function(l) 
  abline(a = 0, b = coef(l)))

plot of chunk unnamed-chunk-14

Two-stage dose-response meta-analysis: 2nd stage

\[ b_i \sim N \left( \bar b , v_{b_i} + \tau^2 \right) \]

\[ \hat {\bar b} = \frac{\sum_{i = 1}^K b_i w_i}{\sum_{i = 1}^K w_i} \] \( w_i = (v_{b_i} + \tau^2)^{-1} \)

  • \( \tau^2 \) is the between-studies heterogeneity

  • Test for overall association: \( H_0: \bar b = 0 \)

  • Test (Cochran \( Q \)) and quantify the impact (\( I^2 \)) of statistical heterogeneity

# library(metafor)
# rma(bi, vi_bi, data = dat_i)
lin <- dosresmeta(formula = logrr ~ dose, id = id, se = se, type = type, cases = cases, n = n, data = coffee, method = "mm")
summary(lin)
Call:  dosresmeta(formula = logrr ~ dose, id = id, type = type, cases = cases, 
    n = n, data = coffee, se = se, method = "mm")

Two-stage random-effects meta-analysis
Estimation method: 
Covariance approximation: Greenland & Longnecker

Chi2 model: X2 = 44.4226 (df = 1), p-value = 0.0000

Fixed-effects coefficients
             Estimate  Std. Error        z  Pr(>|z|)  95%ci.lb  95%ci.ub     
(Intercept)   -0.0324      0.0049  -6.6650    0.0000   -0.0419   -0.0229  ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Between-study random-effects (co)variance components
  Std. Dev
    0.0163

Univariate Cochran Q-test for residual heterogeneity:
Q = 77.0088 (df = 21), p-value = 0.0000
I-square statistic = 72.7%

22 studies, 22 values, 1 fixed and 1 random-effects parameters

Every increase of one cup of coffee per day was associated with a pooled \( 100*(1 - \exp({\beta_1})) = 100*(1 - \exp( \) -0.03)) = 3.19% reduction in all-cause mortality (95% CI: 2.26, 4.1).

Modeling non-linearity: Restricted Cubic Splines

library(rms)
k <- with(coffee, quantile(dose, c(.1, .5, .9)))
spl <- dosresmeta(logrr ~ rcs(dose, k), id = id, se = se, type = type, cases = cases, n = n, data = coffee, method = "mm")
summary(spl)
Call:  dosresmeta(formula = logrr ~ rcs(dose, k), id = id, type = type, 
    cases = cases, n = n, data = coffee, se = se, method = "mm")

Two-stage random-effects meta-analysis
Estimation method: 
Covariance approximation: Greenland & Longnecker

Chi2 model: X2 = 209.5069 (df = 2), p-value = 0.0000

Fixed-effects coefficients
                               Estimate  Std. Error        z  Pr(>|z|)  95%ci.lb  95%ci.ub     
rcs(dose, k)dose.(Intercept)    -0.0764      0.0105  -7.3101    0.0000   -0.0969   -0.0559  ***
rcs(dose, k)dose'.(Intercept)    0.1053      0.0229   4.5935    0.0000    0.0604    0.1502  ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Between-study random-effects (co)variance components
                   Std. Dev              Corr
rcs(dose, k)dose     0.0354  rcs(dose, k)dose
rcs(dose, k)dose'    0.0797                -1
(Note: Truncated estimate - 1 negative eigenvalues set to 0)

Univariate Cochran Q-test for residual heterogeneity:
Q = 101.3912 (df = 42), p-value = 0.0000
I-square statistic = 58.6%

22 studies, 44 values, 2 fixed and 1 random-effects parameters

Presenting non-linearity

  • “Every unit increase is associated with …” not appropriate

  • Predict RRs for a set of exposure levels \( x^* \) (as compared to \( x_{ref} \))

\[ \widehat {RR} = e^{(X^* - X_{ref}) \hat {\bar \beta}} \]

\[ \text{95% CI: } e^{(X^* - X_{ref}) \hat {\bar \beta} \pm z_{1-\alpha/2} \left[ (X^* - X_{ref})^\top Var(\hat {\bar \beta}) (X^* - X_{ref}) \right]} \]

  • Tip: use the predict function

  • Predicted RRs can be both presented in either graphical or tabular form



xref <- 0
newdata <- data.frame(dose = 0:7)
predict(spl, newdata, xref = xref, expo = T)
  rcs(dose, k)dose rcs(dose, k)dose' pred ci.lb ci.ub
1                0             0.000 1.00  1.00  1.00
2                1             0.024 0.93  0.91  0.95
3                2             0.189 0.88  0.85  0.90
4                3             0.605 0.85  0.82  0.88
5                4             1.241 0.84  0.82  0.86
6                5             2.036 0.85  0.83  0.87
7                6             2.924 0.86  0.84  0.88
8                7             3.846 0.88  0.84  0.92
newdata <- data.frame(dose = seq(0, 7, .2))
with(predict(spl, newdata, xref = xref, expo = TRUE),
     matplot(newdata$dose, cbind(pred, ci.ub, ci.lb), 
             type = "l", col = 1, lty = c(1, 2, 2),
             log = "y", ylim = c(.7, 1.1),
             ylab = "Relative risk", 
             xlab = "Coffee consumption, cups/day")
)
with(predict(lin, newdata, xref = xref, expo = TRUE),
     points(dose, pred, type = "l", lty = 3))
rug(coffee$dose)

plot of chunk unnamed-chunk-19




Meta-regression and stratified analysis

  • To further explain heterogeneity and/or explore differences in dose-response curves

  • Prespecified study-level characteristics

  • Limited number of data points: interpret with caution

  • Stratified (or subgroups) analysis: divide the observations in groups and evaluate differences

  • Meta-regression: formalization of stratified analysis, + investigation of continuous variables (be aware of pitfall)

spl_regr <- dosresmeta(logrr ~ rcs(dose, k), id, I(se^2), type, cases, n, data = coffee,
                       mod = ~ area, method = "mm")
summary(spl_regr)
Call:  dosresmeta(formula = logrr ~ rcs(dose, k), id = id, v = I(se^2), 
    type = type, cases = cases, n = n, data = coffee, mod = ~area, 
    method = "mm")

Two-stage random-effects meta-regression
Estimation method: 
Covariance approximation: Greenland & Longnecker

Chi2 model: X2 = 187.2975 (df = 6), p-value = 0.0000

Fixed-effects coefficients
                               Estimate  Std. Error        z  Pr(>|z|)  95%ci.lb  95%ci.ub     
rcs(dose, k)dose.(Intercept)    -0.0830      0.0231  -3.5855    0.0003   -0.1283   -0.0376  ***
rcs(dose, k)dose.areaJapan      -0.0580      0.0373  -1.5558    0.1197   -0.1310    0.0151     
rcs(dose, k)dose.areaUSA         0.0126      0.0268   0.4684    0.6395   -0.0400    0.0651     
rcs(dose, k)dose'.(Intercept)    0.0923      0.0421   2.1917    0.0284    0.0098    0.1749    *
rcs(dose, k)dose'.areaJapan      0.1709      0.1008   1.6945    0.0902   -0.0268    0.3685    .
rcs(dose, k)dose'.areaUSA        0.0039      0.0519   0.0747    0.9405   -0.0978    0.1055     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Between-study random-effects (co)variance components
                   Std. Dev              Corr
rcs(dose, k)dose     0.0331  rcs(dose, k)dose
rcs(dose, k)dose'    0.0747           -0.9964

Univariate Cochran Q-test for residual heterogeneity:
Q = 86.2279 (df = 38), p-value = 0.0000
I-square statistic = 55.9%

22 studies, 44 values, 6 fixed and 1 random-effects parameters

waldtest(Sigma = vcov(spl_regr), b = coef(spl_regr), Terms = c(2, 3, 5, 6))
Wald test:
----------

Chi-squared test:
X2 = 9.7, df = 4, P(> X2) = 0.046
newdata <- expand.grid(dose = seq(0, 7, .2), 
                       area = levels(coffee$area))
pred_regr <- predict(spl_regr, newdata, exp = T)

ggplot(newdata, aes(x = dose, y = pred_regr$pred, 
                    col = area)) +
   xlab("Coffee consumption, cups/day") +
   scale_y_continuous(trans = "log", "Relative risk", 
      breaks = seq(.75, 1.1, .05)) + geom_line() + 
  theme_classic()

plot of chunk unnamed-chunk-23

Stratified analysis

Goodness of fit

  1. Deviance (D)

    • Total absolute distance between fitted and reported RRs
    • Test for model specification
  2. Coefficient of determination (\( R^2 \))

    • Descriptive measure of agreement
    • Dimensionless measure bounded between 0 and 1
  3. Plot of de-correlated residuals versus exposure

    • Visual assessment of the goodness of fit
    • Evaluate how the pooled dose–response curve fits the data by exposure levels
source("additionalMethods.R")
models <- list(lin, spl, spl_regr)
data.frame(do.call("rbind", lapply(models, function(m) 
   unlist(gof(m)[c("deviance","R2", "R2adj")]))))
  deviance.D deviance.df deviance.p   R2 R2adj
1        225          78    3.2e-16 0.49  0.48
2        141          77    1.1e-05 0.68  0.67
3        126          73    1.1e-04 0.71  0.69
par(mfrow = c(1, 3))
lapply(models, function(m)
       with(gof(m)$tdata,{
          plot(coffee$dose[coffee$se != 0], residuals, ylim = -c(-5, 5), xlab = "Coffee consumption, cups/day"))
          lines(lowess(coffee$dose[coffee$se != 0], residuals), lwd = 4)
       })
)

plot of chunk unnamed-chunk-26

References

  • http://alecri.github.io/software/dosresmeta

  • Crippa A, Orsini N. Multivariate Dose-Response Meta-Analysis: the dosresmeta R Package. 2016. J Stat Softw. In press.

  • Crippa A, Discacciati A, Larsson SC, Wolk A, Orsini N. Coffee consumption and mortality from all causes, cardiovascular disease, and cancer: a dose-response meta-analysis. American journal of epidemiology. 2014 Aug 24:kwu194.

  • Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. American journal of epidemiology. 1992 Jun 1;135(11):1301-9.

  • Orsini N, Li R, Wolk A, Khudyakov P, Spiegelman D. Meta-analysis for linear and nonlinear dose-response relations: examples, an evaluation of approximations, and software. American journal of epidemiology. 2012 Jan 1;175(1):66-73.

  • Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology. 1993 May 1;4(3):218-28.

  • Orsini N, Bellocco R, Greenland S. Generalized least squares for trend estimation of summarized dose-response data. Stata Journal. 2006 Jan 1;6(1):40.

  • Discacciati A, Crippa A, Orsini N. Goodness of fit tools for dose–response meta‐analysis of binary outcomes. Research synthesis methods. 2015 Jan 1.

Information about R session:

R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rms_4.5-0        SparseM_1.7      Hmisc_3.17-4     Formula_1.2-1    survival_2.39-2  lattice_0.20-33 
 [7] dplyr_0.5.0      Epi_2.0          dosresmeta_2.0.0 mvmeta_0.4.7     ggplot2_2.1.0    knitr_1.13      

loaded via a namespace (and not attached):
 [1] zoo_1.7-13          splines_3.3.0       etm_0.6-2           colorspace_1.2-6    chron_2.3-47       
 [6] foreign_0.8-66      DBI_0.4-1           RColorBrewer_1.1-2  multcomp_1.4-5      plyr_1.8.4         
[11] stringr_1.0.0       MatrixModels_0.4-1  munsell_0.4.3       gtable_0.2.0        mvtnorm_1.0-5      
[16] codetools_0.2-14    evaluate_0.9        labeling_0.3        latticeExtra_0.6-28 quantreg_5.26      
[21] parallel_3.3.0      TH.data_1.0-7       Rcpp_0.12.5         acepack_1.3-3.3     scales_0.4.0       
[26] formatR_1.4         cmprsk_2.2-7        gridExtra_2.2.1     digest_0.6.9        stringi_1.1.1      
[31] polspline_1.1.12    grid_3.3.0          tools_3.3.0         sandwich_2.3-4      magrittr_1.5       
[36] lazyeval_0.2.0      tibble_1.0          cluster_2.0.4       MASS_7.3-45         Matrix_1.2-6       
[41] data.table_1.9.6    assertthat_0.1      R6_2.1.2            rpart_4.1-10        nnet_7.3-12        
[46] nlme_3.1-127