Alessio Crippa & Nicola Orsini
Department of Public Health Sciences
Karolinska Institutet
Stockholm, Sweden
Background and rationale
Statistical methodology
How to present the results
Additional slides
Derive a dose-response curve from published results of multiple studies on the relation between a quantitative exposure (e.g. number of cigarettes, diet) and the occurrence of a health outcome (e.g. mortality risk, incidence of a disease)
Research questions:
Is there any association? What is the shape of the association?
What are the exposure values associated with the best or worst outcome?
What are the factors that can influence the dose–response shape?
Several fields of application
Many leading medical and epidemiological journals
Global health organizations and foundations
Measures of public health impact
If any, what is the dose-response association between coffee consumption and all-cause mortality?
# devtools::install_github("alecri/dosresmeta")
library(dosresmeta)
coffee <- get(data(coffee_mort))
head(coffee, n = 10)
id author year type dose cases n logrr se gender area
1 1 LeGrady et al. 1987 ci 0.5 57 249 0.000 0.00 M USA
2 1 LeGrady et al. 1987 ci 2.5 136 655 -0.288 0.14 M USA
3 1 LeGrady et al. 1987 ci 4.5 144 619 -0.174 0.14 M USA
4 1 LeGrady et al. 1987 ci 6.5 115 387 0.086 0.14 M USA
5 2 Rosengren et al. 1991 ci 0.0 17 192 0.000 0.00 M Europe
6 2 Rosengren et al. 1991 ci 1.5 88 1121 -0.120 0.25 M Europe
7 2 Rosengren et al. 1991 ci 3.5 155 2464 -0.342 0.24 M Europe
8 2 Rosengren et al. 1991 ci 5.5 155 1986 -0.126 0.24 M Europe
9 2 Rosengren et al. 1991 ci 7.5 39 575 -0.267 0.28 M Europe
10 2 Rosengren et al. 1991 ci 9.5 24 427 -0.511 0.28 M Europe
library(Epi)
library(dplyr)
data(esoph)
summary(esoph$alcgp)
0-39g/day 40-79 80-119 120+
23 23 21 21
model <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp, data = esoph, family = binomial())
summary(model)
Call:
glm(formula = cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp,
family = binomial(), data = esoph)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.689 -0.562 -0.217 0.231 2.064
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7800 0.1980 -8.99 < 2e-16 ***
agegp.L 3.0053 0.6522 4.61 4.1e-06 ***
agegp.Q -1.3379 0.5911 -2.26 0.0236 *
agegp.C 0.1531 0.4485 0.34 0.7329
agegp^4 0.0641 0.3088 0.21 0.8356
agegp^5 -0.1936 0.1954 -0.99 0.3216
tobgp.L 0.5945 0.1942 3.06 0.0022 **
tobgp.Q 0.0654 0.1881 0.35 0.7282
tobgp.C 0.1568 0.1866 0.84 0.4007
alcgp.L 1.4919 0.1994 7.48 7.2e-14 ***
alcgp.Q -0.2266 0.1795 -1.26 0.2068
alcgp.C 0.2546 0.1591 1.60 0.1094
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 227.241 on 87 degrees of freedom
Residual deviance: 53.973 on 76 degrees of freedom
AIC: 225.5
Number of Fisher Scoring iterations: 6
# published (aggregated) results
data.frame(
group_by(esoph, alcgp) %>%
select(ncases, ncontrols) %>%
summarise_each(funs(sum)),
rbind(c(rr = 1, lb = NA, ub = NA),
unname(ci.exp(model)[10:12, ]))
) %>% mutate(logrr = log(rr),
se = (log(ub) - log(lb))/(2*qnorm(.975)))
alcgp ncases ncontrols rr lb ub logrr se
1 0-39g/day 29 415 1.0 NA NA 0.00 NA
2 40-79 75 355 4.4 3.01 6.6 1.49 0.20
3 80-119 51 138 0.8 0.56 1.1 -0.23 0.18
4 120+ 45 67 1.3 0.94 1.8 0.25 0.16
subset(coffee, id == 1)
id author year type dose cases n logrr se gender area
1 1 LeGrady et al. 1987 ci 0.5 57 249 0.000 0.00 M USA
2 1 LeGrady et al. 1987 ci 2.5 136 655 -0.288 0.14 M USA
3 1 LeGrady et al. 1987 ci 4.5 144 619 -0.174 0.14 M USA
4 1 LeGrady et al. 1987 ci 6.5 115 387 0.086 0.14 M USA
Observations are not independent (share reference category)
The RR in the referent is 1
# How many observations in ech study?
with(subset(coffee, se!=0), table(id))
id
1 2 3 4 5 6 7 8 9 12 13 14 15 16 17 18 19 20 21 22 23 24
3 5 4 3 3 3 3 2 2 4 4 5 5 4 4 3 3 3 3 3 5 5
Log-linear model:
\[ y = X \beta + \epsilon \]
\( y \) : vector of non-referent log \( RR \)
\( X \) : design matrix contains the assigned doses
Model without intercept, so that \( RR = 1|x = 0 \)
\( Cov(\epsilon) \) can be approximated depending on the study design (“cc” case-control, “ir” incidence-rate, “ci” cumulative-incidence)
Additional information needed for each exposure level:
number of cases, and number of subjects
(for case-control and cumulative incidence data) or person-time (for incidence rate data)
covar.logrr(cases, n, logrr, se^2, type, data = subset(coffee, id == 1))
[,1] [,2] [,3]
[1,] 0.019 0.013 0.013
[2,] 0.013 0.019 0.013
[3,] 0.013 0.013 0.020
Assuming a linear trend: \( y = \beta_1 x + \epsilon \)
dosresmeta(formula = logrr ~ dose, se = se, type = type,
cases = cases, n = n, data = subset(coffee, id == 1))
Call: dosresmeta(formula = logrr ~ dose, type = type, cases = cases,
n = n, data = subset(coffee, id == 1), se = se)
Fixed-effects coefficients:
dose
0.0328
1 study 3 values, 1 fixed and 0 random-effects parameters
logLik AIC BIC
-0.7914 3.5827 2.6813
lin_i <- lapply(split(coffee, coffee$id), function(d)
dosresmeta(logrr ~ dose, id, I(se^2), type, cases, n, data = d))
(dat_i <- data.frame(bi = unname(do.call("c", lapply(lin_i, coef))),
vi_bi = do.call("c", lapply(lin_i, vcov))))
bi vi_bi
1 0.033 0.000447
2 -0.024 0.000356
3 -0.014 0.000058
4 -0.048 0.000619
5 -0.047 0.001219
6 -0.020 0.000113
7 -0.065 0.000422
8 -0.418 0.008903
9 -0.103 0.016835
12 -0.020 0.000071
13 -0.025 0.000054
14 -0.028 0.000114
15 -0.055 0.000045
16 -0.029 0.000543
17 -0.018 0.000109
18 -0.014 0.000176
19 -0.032 0.000608
20 -0.079 0.001991
21 -0.068 0.000194
22 -0.086 0.000435
23 -0.022 0.000012
24 -0.038 0.000023
with(subset(coffee, se !=0),
plot(dose, logrr, pch = 1,
cex = sqrt((1/se^2))/2))
lapply(lin_i, function(l)
abline(a = 0, b = coef(l)))
\[ b_i \sim N \left( \bar b , v_{b_i} + \tau^2 \right) \]
\[ \hat {\bar b} = \frac{\sum_{i = 1}^K b_i w_i}{\sum_{i = 1}^K w_i} \] \( w_i = (v_{b_i} + \tau^2)^{-1} \)
\( \tau^2 \) is the between-studies heterogeneity
Test for overall association: \( H_0: \bar b = 0 \)
Test (Cochran \( Q \)) and quantify the impact (\( I^2 \)) of statistical heterogeneity
# library(metafor)
# rma(bi, vi_bi, data = dat_i)
lin <- dosresmeta(formula = logrr ~ dose, id = id, se = se, type = type, cases = cases, n = n, data = coffee, method = "mm")
summary(lin)
Call: dosresmeta(formula = logrr ~ dose, id = id, type = type, cases = cases,
n = n, data = coffee, se = se, method = "mm")
Two-stage random-effects meta-analysis
Estimation method:
Covariance approximation: Greenland & Longnecker
Chi2 model: X2 = 44.4226 (df = 1), p-value = 0.0000
Fixed-effects coefficients
Estimate Std. Error z Pr(>|z|) 95%ci.lb 95%ci.ub
(Intercept) -0.0324 0.0049 -6.6650 0.0000 -0.0419 -0.0229 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Between-study random-effects (co)variance components
Std. Dev
0.0163
Univariate Cochran Q-test for residual heterogeneity:
Q = 77.0088 (df = 21), p-value = 0.0000
I-square statistic = 72.7%
22 studies, 22 values, 1 fixed and 1 random-effects parameters
Every increase of one cup of coffee per day was associated with a pooled \( 100*(1 - \exp({\beta_1})) = 100*(1 - \exp( \) -0.03)) = 3.19% reduction in all-cause mortality (95% CI: 2.26, 4.1).
library(rms)
k <- with(coffee, quantile(dose, c(.1, .5, .9)))
spl <- dosresmeta(logrr ~ rcs(dose, k), id = id, se = se, type = type, cases = cases, n = n, data = coffee, method = "mm")
summary(spl)
Call: dosresmeta(formula = logrr ~ rcs(dose, k), id = id, type = type,
cases = cases, n = n, data = coffee, se = se, method = "mm")
Two-stage random-effects meta-analysis
Estimation method:
Covariance approximation: Greenland & Longnecker
Chi2 model: X2 = 209.5069 (df = 2), p-value = 0.0000
Fixed-effects coefficients
Estimate Std. Error z Pr(>|z|) 95%ci.lb 95%ci.ub
rcs(dose, k)dose.(Intercept) -0.0764 0.0105 -7.3101 0.0000 -0.0969 -0.0559 ***
rcs(dose, k)dose'.(Intercept) 0.1053 0.0229 4.5935 0.0000 0.0604 0.1502 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Between-study random-effects (co)variance components
Std. Dev Corr
rcs(dose, k)dose 0.0354 rcs(dose, k)dose
rcs(dose, k)dose' 0.0797 -1
(Note: Truncated estimate - 1 negative eigenvalues set to 0)
Univariate Cochran Q-test for residual heterogeneity:
Q = 101.3912 (df = 42), p-value = 0.0000
I-square statistic = 58.6%
22 studies, 44 values, 2 fixed and 1 random-effects parameters
“Every unit increase is associated with …” not appropriate
Predict RRs for a set of exposure levels \( x^* \) (as compared to \( x_{ref} \))
\[ \widehat {RR} = e^{(X^* - X_{ref}) \hat {\bar \beta}} \]
\[ \text{95% CI: } e^{(X^* - X_{ref}) \hat {\bar \beta} \pm z_{1-\alpha/2} \left[ (X^* - X_{ref})^\top Var(\hat {\bar \beta}) (X^* - X_{ref}) \right]} \]
Tip: use the predict function
Predicted RRs can be both presented in either graphical or tabular form
xref <- 0
newdata <- data.frame(dose = 0:7)
predict(spl, newdata, xref = xref, expo = T)
rcs(dose, k)dose rcs(dose, k)dose' pred ci.lb ci.ub
1 0 0.000 1.00 1.00 1.00
2 1 0.024 0.93 0.91 0.95
3 2 0.189 0.88 0.85 0.90
4 3 0.605 0.85 0.82 0.88
5 4 1.241 0.84 0.82 0.86
6 5 2.036 0.85 0.83 0.87
7 6 2.924 0.86 0.84 0.88
8 7 3.846 0.88 0.84 0.92
newdata <- data.frame(dose = seq(0, 7, .2))
with(predict(spl, newdata, xref = xref, expo = TRUE),
matplot(newdata$dose, cbind(pred, ci.ub, ci.lb),
type = "l", col = 1, lty = c(1, 2, 2),
log = "y", ylim = c(.7, 1.1),
ylab = "Relative risk",
xlab = "Coffee consumption, cups/day")
)
with(predict(lin, newdata, xref = xref, expo = TRUE),
points(dose, pred, type = "l", lty = 3))
rug(coffee$dose)
To further explain heterogeneity and/or explore differences in dose-response curves
Prespecified study-level characteristics
Limited number of data points: interpret with caution
Stratified (or subgroups) analysis: divide the observations in groups and evaluate differences
Meta-regression: formalization of stratified analysis, + investigation of continuous variables (be aware of pitfall)
spl_regr <- dosresmeta(logrr ~ rcs(dose, k), id, I(se^2), type, cases, n, data = coffee,
mod = ~ area, method = "mm")
summary(spl_regr)
Call: dosresmeta(formula = logrr ~ rcs(dose, k), id = id, v = I(se^2),
type = type, cases = cases, n = n, data = coffee, mod = ~area,
method = "mm")
Two-stage random-effects meta-regression
Estimation method:
Covariance approximation: Greenland & Longnecker
Chi2 model: X2 = 187.2975 (df = 6), p-value = 0.0000
Fixed-effects coefficients
Estimate Std. Error z Pr(>|z|) 95%ci.lb 95%ci.ub
rcs(dose, k)dose.(Intercept) -0.0830 0.0231 -3.5855 0.0003 -0.1283 -0.0376 ***
rcs(dose, k)dose.areaJapan -0.0580 0.0373 -1.5558 0.1197 -0.1310 0.0151
rcs(dose, k)dose.areaUSA 0.0126 0.0268 0.4684 0.6395 -0.0400 0.0651
rcs(dose, k)dose'.(Intercept) 0.0923 0.0421 2.1917 0.0284 0.0098 0.1749 *
rcs(dose, k)dose'.areaJapan 0.1709 0.1008 1.6945 0.0902 -0.0268 0.3685 .
rcs(dose, k)dose'.areaUSA 0.0039 0.0519 0.0747 0.9405 -0.0978 0.1055
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Between-study random-effects (co)variance components
Std. Dev Corr
rcs(dose, k)dose 0.0331 rcs(dose, k)dose
rcs(dose, k)dose' 0.0747 -0.9964
Univariate Cochran Q-test for residual heterogeneity:
Q = 86.2279 (df = 38), p-value = 0.0000
I-square statistic = 55.9%
22 studies, 44 values, 6 fixed and 1 random-effects parameters
waldtest(Sigma = vcov(spl_regr), b = coef(spl_regr), Terms = c(2, 3, 5, 6))
Wald test:
----------
Chi-squared test:
X2 = 9.7, df = 4, P(> X2) = 0.046
newdata <- expand.grid(dose = seq(0, 7, .2),
area = levels(coffee$area))
pred_regr <- predict(spl_regr, newdata, exp = T)
ggplot(newdata, aes(x = dose, y = pred_regr$pred,
col = area)) +
xlab("Coffee consumption, cups/day") +
scale_y_continuous(trans = "log", "Relative risk",
breaks = seq(.75, 1.1, .05)) + geom_line() +
theme_classic()
Deviance (D)
Coefficient of determination (\( R^2 \))
Plot of de-correlated residuals versus exposure
source("additionalMethods.R")
models <- list(lin, spl, spl_regr)
data.frame(do.call("rbind", lapply(models, function(m)
unlist(gof(m)[c("deviance","R2", "R2adj")]))))
deviance.D deviance.df deviance.p R2 R2adj
1 225 78 3.2e-16 0.49 0.48
2 141 77 1.1e-05 0.68 0.67
3 126 73 1.1e-04 0.71 0.69
par(mfrow = c(1, 3))
lapply(models, function(m)
with(gof(m)$tdata,{
plot(coffee$dose[coffee$se != 0], residuals, ylim = -c(-5, 5), xlab = "Coffee consumption, cups/day"))
lines(lowess(coffee$dose[coffee$se != 0], residuals), lwd = 4)
})
)
Crippa A, Orsini N. Multivariate Dose-Response Meta-Analysis: the dosresmeta R Package. 2016. J Stat Softw. In press.
Crippa A, Discacciati A, Larsson SC, Wolk A, Orsini N. Coffee consumption and mortality from all causes, cardiovascular disease, and cancer: a dose-response meta-analysis. American journal of epidemiology. 2014 Aug 24:kwu194.
Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. American journal of epidemiology. 1992 Jun 1;135(11):1301-9.
Orsini N, Li R, Wolk A, Khudyakov P, Spiegelman D. Meta-analysis for linear and nonlinear dose-response relations: examples, an evaluation of approximations, and software. American journal of epidemiology. 2012 Jan 1;175(1):66-73.
Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology. 1993 May 1;4(3):218-28.
Orsini N, Bellocco R, Greenland S. Generalized least squares for trend estimation of summarized dose-response data. Stata Journal. 2006 Jan 1;6(1):40.
Discacciati A, Crippa A, Orsini N. Goodness of fit tools for dose–response meta‐analysis of binary outcomes. Research synthesis methods. 2015 Jan 1.
Information about R session:
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rms_4.5-0 SparseM_1.7 Hmisc_3.17-4 Formula_1.2-1 survival_2.39-2 lattice_0.20-33
[7] dplyr_0.5.0 Epi_2.0 dosresmeta_2.0.0 mvmeta_0.4.7 ggplot2_2.1.0 knitr_1.13
loaded via a namespace (and not attached):
[1] zoo_1.7-13 splines_3.3.0 etm_0.6-2 colorspace_1.2-6 chron_2.3-47
[6] foreign_0.8-66 DBI_0.4-1 RColorBrewer_1.1-2 multcomp_1.4-5 plyr_1.8.4
[11] stringr_1.0.0 MatrixModels_0.4-1 munsell_0.4.3 gtable_0.2.0 mvtnorm_1.0-5
[16] codetools_0.2-14 evaluate_0.9 labeling_0.3 latticeExtra_0.6-28 quantreg_5.26
[21] parallel_3.3.0 TH.data_1.0-7 Rcpp_0.12.5 acepack_1.3-3.3 scales_0.4.0
[26] formatR_1.4 cmprsk_2.2-7 gridExtra_2.2.1 digest_0.6.9 stringi_1.1.1
[31] polspline_1.1.12 grid_3.3.0 tools_3.3.0 sandwich_2.3-4 magrittr_1.5
[36] lazyeval_0.2.0 tibble_1.0 cluster_2.0.4 MASS_7.3-45 Matrix_1.2-6
[41] data.table_1.9.6 assertthat_0.1 R6_2.1.2 rpart_4.1-10 nnet_7.3-12
[46] nlme_3.1-127