This markdown reproduces the main results presented in the paper
Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology. 1993 May 1:218-28 (Berlin, Longnecker, and Greenland 1993).
The authors present methods for summarizing epidemiological studies of dose-response associations. In particular, they discuss methods for estimating the dose-response association from either single or multiple studies; for assigning levels to exposure categories; for investigating the effects of study design and subject characteristics on dose-response curves; and for choosing between a fixed and random-effects model. The topics are illustrated using data from case-control studies of the relation between duration of oral contraceptive use and risk of breast cancer.
The packages used throughout this markdown are dosresmeta, knitr, and the collection of packages in tidyverse.
The dataset are contained in the dosresmeta package: oc_breast and cc_ex.
The first topic is how to estimate a curve from summarized dose-response from a single study (Orsini et al. 2006). The methodology is illustrated using one study of the relation between the duration of oral contraceptive use and breast cancer risk (Meirik et al. 1986).
The data are part of the oc_breast data that contains the results from multiple studies.
library(dosresmeta)
data("oc_breast")
Reconstructing data presented in Table 1.
meirik <- subset(oc_breast, author == "Meirik")
meirik %>%
mutate(`Duration categories (assigned dose)` = c("Never (0)", "0-3 (1.5)", "4-7 (5.5)", "8-11 (9.5)", ">=12 (14.4)"),
controls = n - cases,
ci = paste(lb, ub, sep = "-"),
ci = replace(ci, is.na(se), NA)) %>%
select(`Duration categories (assigned dose)`, cases, controls, or, ci) %>%
kable()
| Duration categories (assigned dose) | cases | controls | or | ci |
|---|---|---|---|---|
| Never (0) | 96 | 156 | 1.0 | NA |
| 0-3 (1.5) | 156 | 205 | 1.1 | 0.8-1.6 |
| 4-7 (5.5) | 80 | 93 | 1.2 | 0.8-1.9 |
| 8-11 (9.5) | 51 | 50 | 1.4 | 0.8-2.3 |
| >=12 (14.4) | 39 | 23 | 2.2 | 1.2-4 |
One of the features of summarized dose-response data is the correlation within set of estimated (log) relative risks. The authors discuss the implication of ignoring the covariance structure.
# Model assuming independece
tab1_ind <- dosresmeta(logor ~ duration, type = type, cases = cases, n = n,
se = se, data = meirik, covariance = "indep")
# Model taking into account the covariances
tab1_dep <- dosresmeta(logor ~ duration, type = type, cases = cases, n = n,
se = se, data = meirik, covariance = "gl")
data.frame(
model = c("Weighted least squares", "Weighted least squares with adjustment for covariance"),
slope = c(coef(tab1_dep), coef(tab1_ind)),
`standard error` = c(vcov(tab1_dep)^.5, vcov(tab1_ind)^.5)
) %>% kable(digits = 4)
| model | slope | standard.error |
|---|---|---|
| Weighted least squares | 0.0454 | 0.0174 |
| Weighted least squares with adjustment for covariance | 0.0458 | 0.0156 |
The authors compared the results of an increment of 15 years of use.
rbind(
c("Weighted least squares", predict(tab1_ind, delta = 15, expo = T)),
c("Weighted least squares with adjustment for covariance", predict(tab1_dep, delta = 15, expo = T))
)
delta pred ci.lb ci.ub
[1,] "Weighted least squares" 15 1.989114 1.259205 3.142121
[2,] "Weighted least squares with adjustment for covariance" 15 1.976406 1.185243 3.29568
rbind(
c("Weighted least squares", predict(tab1_ind, delta = 10, expo = T)),
c("Weighted least squares with adjustment for covariance", predict(tab1_dep, delta = 10, expo = T)))
delta pred ci.lb ci.ub
[1,] "Weighted least squares" 10 1.581635 1.166087 2.14527
[2,] "Weighted least squares with adjustment for covariance" 10 1.574892 1.119966 2.214607
So the effect of the covariance adjustment in this example was to increase the width of the 95% confidence interval for the 10-year ln relative risk from 0.61 to 0.682 (12%).
When several studies investigated a common association, meta-analysis can be adopted for combining the study-specific results. The methodology is illustrated using the whole dataset.
oc_breast %>% slice(1:10) %>% kable()
| id | author | year | type | duration | cases | n | or | lb | ub | logor | se | menopause | period |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Paffenbarger | 1977 | cc | 0.00 | 230 | 718 | 1.00 | NA | NA | 0.0000000 | NA | 1 | 1972 |
| 1 | Paffenbarger | 1977 | cc | 0.50 | 81 | 231 | 1.16 | 0.82 | 1.64 | 0.1484200 | 0.1768265 | 1 | 1972 |
| 1 | Paffenbarger | 1977 | cc | 1.50 | 23 | 68 | 0.91 | 0.50 | 1.67 | -0.0943107 | 0.3076513 | 1 | 1972 |
| 1 | Paffenbarger | 1977 | cc | 3.00 | 49 | 102 | 1.76 | 1.07 | 2.94 | 0.5653138 | 0.2578494 | 1 | 1972 |
| 1 | Paffenbarger | 1977 | cc | 5.00 | 30 | 103 | 0.85 | 0.50 | 1.43 | -0.1625189 | 0.2680717 | 1 | 1972 |
| 1 | Paffenbarger | 1977 | cc | 7.20 | 39 | 102 | 1.28 | 0.79 | 2.07 | 0.2468601 | 0.2457369 | 1 | 1972 |
| 2 | Sartwell | 1977 | cc | 0.00 | 262 | 595 | 1.00 | NA | NA | 0.0000000 | NA | 1 | 1972 |
| 2 | Sartwell | 1977 | cc | 0.25 | 4 | 10 | 1.00 | 0.28 | 3.58 | 0.0000000 | 0.6500957 | 1 | 1972 |
| 2 | Sartwell | 1977 | cc | 0.75 | 3 | 8 | 0.92 | 0.22 | 3.89 | -0.0833816 | 0.7328035 | 1 | 1972 |
| 2 | Sartwell | 1977 | cc | 1.50 | 2 | 5 | 0.82 | 0.14 | 4.94 | -0.1984509 | 0.9090673 | 1 | 1972 |
The authors considered different models:
- model 2a and 2f: \(\log \textrm{RR} = \beta_1 \times \textrm{duration}\)
- model 2b and 2g: \(\log \textrm{RR} = \beta_1 \times \textrm{duration} + \beta_2 \times \textrm{duration}^2\)
- model 2d and 2if: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}}\)
both ignoring and taking into accounts the covariances.
## Different dose-response models
mod2a <- dosresmeta(logor ~ duration, id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, covariance = "indep", method = "fixed")
mod2b <- dosresmeta(logor ~ duration + I(duration^2), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, covariance = "indep", method = "fixed")
mod2d <- dosresmeta(logor ~I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, covariance = "indep", method = "fixed")
mod2f <- dosresmeta(logor ~ duration, id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, covariance = "gl", method = "fixed")
mod2g <- dosresmeta(logor ~ duration + I(duration^2), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, covariance = "gl", method = "fixed")
mod2i <- dosresmeta(logor ~I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, covariance = "gl", method = "fixed")
## Auxiliary function
tab2f <- function(mod){
tab <- cbind(t(t(coef(mod))), diag(mod$vcov)^.5, abs(t(t(coef(mod)))/diag(mod$vcov)^.5))
colnames(tab) <- c("coef", "se", "z")
tab
}
data.frame(model = c("2a.", "2b.", "", "2d.", "2f.", "2g.", "", "2i.")) %>%
cbind(do.call("rbind", lapply(list(mod2a, mod2b, mod2d, mod2f, mod2g, mod2i), tab2f))) %>%
kable(digits = c(0, 4, 4, 2))
Warning in data.row.names(row.names, rowsi, i): some row.names duplicated: 4,5,6,7,8 --> row.names NOT used
| model | coef | se | z |
|---|---|---|---|
| 2a. | -0.0102 | 0.0026 | 3.87 |
| 2b. | -0.0272 | 0.0067 | 4.08 |
| 0.0020 | 0.0007 | 2.79 | |
| 2d. | -0.0298 | 0.0068 | 4.36 |
| 2f. | -0.0125 | 0.0029 | 4.36 |
| 2g. | -0.0328 | 0.0066 | 4.98 |
| 0.0023 | 0.0007 | 3.42 | |
| 2i. | -0.0411 | 0.0080 | 5.12 |
Calculating and comparing the pooled relative risks for selected duration values.
data.frame(duration = c(0, 5, 16)) %>%
cbind(
quadratic = predict(mod2g, ., expo = T)[, -c(1:2)],
square_root = predict(mod2i, ., expo = T)[, -1]
) %>%
kable(digits = 2)
| duration | quadratic.pred | quadratic.ci.lb | quadratic.ci.ub | square_root.pred | square_root.ci.lb | square_root.ci.ub |
|---|---|---|---|---|---|---|
| 0 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 5 | 0.90 | 0.86 | 0.93 | 0.91 | 0.88 | 0.94 |
| 16 | 1.05 | 0.89 | 1.25 | 0.85 | 0.80 | 0.90 |
The authors compared different approaches (means, medians, and midpoints) to assign doses to exposure categories. They use data from a case-control study on the relation between alcohol and breast cancer to make the point.
data("cc_ex")
tab3 <- cc_ex %>%
mutate(
mean = c(0, 1.5, 5.1, 21.3),
median = c(0, 1.7, 5.1, 16.3),
mid = c(0, 1.9, 6.0, 10.8),
ci = paste(round(lb, 1), round(ub, 1), sep = "-"),
var = ((log(ub)-log(lb))/(2*qnorm(.975)))^2
)
tab3 %>%
select(gday, mean, median, mid, adjrr, ci, logrr, var) %>%
kable(digits = c(0, 1, 1, 1, 1, 0, 1, 2))
| gday | mean | median | mid | adjrr | ci | logrr | var |
|---|---|---|---|---|---|---|---|
| Ref. | 0.0 | 0.0 | 0.0 | 1.0 | 1-1 | 0.0 | 0.00 |
| <2.5 | 1.5 | 1.7 | 1.9 | 0.8 | 0.5-1.3 | -0.2 | 0.05 |
| 2.5-9.3 | 5.1 | 5.1 | 6.0 | 1.2 | 0.7-1.9 | 0.1 | 0.06 |
| >9.3 | 21.3 | 16.3 | 10.8 | 1.6 | 1-2.5 | 0.5 | 0.06 |
Reproducing the results in Table 3
mod_mean <- dosresmeta(formula = logrr ~ mean, type = "cc", cases = case,
n = n, lb = lb, ub = ub, data= tab3)
mod_median <- dosresmeta(formula = logrr ~ median, type = "cc", cases = case,
n = n, lb = lb, ub = ub, data= tab3)
mod_mid <- dosresmeta(formula = logrr ~ mid, type = "cc", cases = case,
n = n, lb = lb, ub = ub, data= tab3)
do.call("rbind", lapply(list(mod_mean, mod_median, mod_mid), tab2f)) %>%
kable()
| coef | se | z | |
|---|---|---|---|
| mean | 0.0244134 | 0.0106534 | 2.291593 |
| median | 0.0321280 | 0.0140842 | 2.281140 |
| mid | 0.0462183 | 0.0209631 | 2.204740 |
The authors considered the case of meta-regression, where covariates at the study level may explain the heterogeneity across study findings. In the practical example based on the oc_breast data, the two potential sources of heterogeneity were whether or not a study included postmenopausal women, and the final year of case accrual, thought to be a surrogate for the changing formulations of oral contraceptives over time.
The different estimated models were:
- model 4a: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}}\)
- model 4b: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}} + \beta_2 \times \textrm{menopause}\)
- model 4c: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}} + \beta_2 \times \textrm{(period - 1986)}\)
- model 4d: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}} + \beta_2 \times \textrm{menopause} + \beta_3 \times \textrm{(period - 1986)}\)
- model 4e: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}} + \beta_2 \times \textrm{menopause} + \beta_3 \times \textrm{(period - 1986)} + \beta_4 \times \textrm{menopause} \times \textrm{(period - 1986)}\)
mod4a <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, method = "fixed")
mod4b <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, method = "fixed",
mod = ~ menopause)
mod4c <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, method = "fixed",
mod = ~ I(period - 1986))
mod4d <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, method = "fixed",
mod = ~ menopause + I(period - 1986))
mod4e <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, method = "fixed",
mod = ~ menopause*I(period - 1986))
tab4f <- function(mod){
gof <- dosresmeta:::gof(mod)
tab <- cbind(t(t(coef(mod))), diag(mod$vcov)^.5, gof$deviance$D, gof$deviance$df)
colnames(tab) <- c("coef", "se", "deviance", "df")
round(tab, 4)
}
data.frame(model = c("4a.", "4b.", "", "4c.", "", "4d.", "", "",
"4e.", "", "", "")) %>%
cbind(do.call("rbind", lapply(list(mod4a, mod4b, mod4c, mod4d, mod4e), tab4f))) %>%
mutate(
deviance = replace(deviance, model == "", NA),
df = replace(df, model == "", NA)
) %>% kable(digits = c(0, 4, 4, 0, 0))
Warning in data.row.names(row.names, rowsi, i): some row.names duplicated: 2,4,6,7,8,9,10,11 --> row.names NOT used
| model | coef | se | deviance | df |
|---|---|---|---|---|
| 4a. | -0.0411 | 0.0080 | 218 | 88 |
| 4b. | 0.1175 | 0.0251 | 174 | 87 |
| -0.1766 | 0.0265 | NA | NA | |
| 4c. | -0.0929 | 0.0112 | 174 | 87 |
| -0.0170 | 0.0025 | NA | NA | |
| 4d. | 0.0480 | 0.0281 | 144 | 86 |
| -0.1473 | 0.0270 | NA | NA | |
| -0.0142 | 0.0026 | NA | NA | |
| 4e. | 0.1718 | 0.0383 | 121 | 85 |
| -0.2883 | 0.0401 | NA | NA | |
| 0.0111 | 0.0059 | NA | NA | |
| -0.0313 | 0.0066 | NA | NA |
The final topic covered is the choice between a fixed- and a random-effects model.
Reproducing the results in table 5.
mod5_fixed <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, method = "fixed",
mod = ~ menopause*I(period - 1986))
mod5_mm <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
se = se, data = oc_breast, method = "mm",
mod = ~ menopause*I(period - 1986))
data.frame(coef_mm = coef(mod5_mm), se_mm = diag(vcov(mod5_mm))^.5,
coef_fixed = coef(mod5_fixed), se_fixed = diag(vcov(mod5_fixed))^.5) %>%
.[c(2:4, 1),] %>%
round(4)
coef_mm se_mm coef_fixed se_fixed
menopause -0.2468 0.0502 -0.2883 0.0401
I(period - 1986) 0.0111 0.0070 0.0111 0.0059
menopause:I(period - 1986) -0.0245 0.0083 -0.0313 0.0066
(Intercept) 0.1720 0.0421 0.1718 0.0383
oc_breast %>%
group_by(id) %>%
filter(row_number()==1) %>%
select(author) %>%
cbind(coef = mod4a$bi, se = unlist(mod4a$Si)^.5) %>%
as.data.frame() %>%
kable(digits = c(0, 0, 5, 5))
| id | author | coef | se |
|---|---|---|---|
| 1 | Paffenbarger | 0.076091876 | 0.07208 |
| 2 | Sartwell | -0.090724132 | 0.16455 |
| 3 | Lees | 0.087833324 | 0.08967 |
| 4 | Ravnihar | -0.062128495 | 0.18727 |
| 5 | Paffenbarger | 0.126795605 | 0.05546 |
| 6 | Kelsey | -0.225524649 | 0.14299 |
| 7 | Harris | -0.059723518 | 0.09848 |
| 8 | Brinton | 0.062919337 | 0.04149 |
| 9 | Janerich | 0.063684420 | 0.04054 |
| 10 | Pike | 0.186057120 | 0.13185 |
| 11 | Vessey | 0.024426839 | 0.04597 |
| 12 | Rosenberg | 0.015408363 | 0.03907 |
| 13 | Miller | 0.071203657 | 0.10590 |
| 14 | Meirik | 0.148012840 | 0.06204 |
| 15 | Ellery | -0.042445450 | 0.11687 |
| 16 | La Vecchia | 0.034509702 | 0.11228 |
| 17 | Paul | -0.111716697 | 0.01090 |
| 18 | Lee | 0.070644893 | 0.08961 |
| 19 | CASH | 0.007116616 | 0.02046 |
| 20 | McPherson YW | 0.091471024 | 0.06911 |
| 21 | McPherson OW | -0.040747827 | 0.05640 |
| 22 | Miller | 0.287003221 | 0.08290 |
| 23 | Jick | -0.013089555 | 0.14784 |
| 24 | Olsson | 0.185356837 | 0.07595 |
Information about R session:
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 dosresmeta_2.0.0 mvmeta_0.4.7 dplyr_0.7.2 purrr_0.2.2.2 readr_1.1.1
[7] tidyr_0.6.3 tibble_1.3.3 ggplot2_2.2.1 tidyverse_1.1.1 knitr_1.16
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 highr_0.6 cellranger_1.1.0 compiler_3.4.1 plyr_1.8.4 bindr_0.1
[7] forcats_0.2.0 tools_3.4.1 digest_0.6.12 lubridate_1.6.0 jsonlite_1.5 evaluate_0.10.1
[13] nlme_3.1-131 gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.1 rlang_0.1.1 psych_1.7.5
[19] yaml_2.1.14 parallel_3.4.1 haven_1.1.0 xml2_1.1.1 httr_1.2.1 stringr_1.2.0
[25] hms_0.3 rprojroot_1.2 grid_3.4.1 glue_1.1.1 R6_2.2.2 readxl_1.0.0
[31] foreign_0.8-69 rmarkdown_1.6 modelr_0.1.1 reshape2_1.4.2 magrittr_1.5 backports_1.1.0
[37] scales_0.4.1 htmltools_0.3.6 rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2
[43] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3 broom_0.4.2
Berlin, Jesse A, Matthew P Longnecker, and Sander Greenland. 1993. “Meta-Analysis of Epidemiologic Dose-Response Data.” Epidemiology. JSTOR, 218–28.
Meirik, Olav, Hans-Olov Adami, Thoralf Christoffersen, Eiliv Lund, Reinhold Bergström, and Per Bergsjö. 1986. “Oral Contraceptive Use and Breast Cancer in Young Women: A Joint National Case-Control Study in Sweden and Norway.” The Lancet 328 (8508). Elsevier: 650–54.
Orsini, Nicola, Rino Bellocco, Sander Greenland, and others. 2006. “Generalized Least Squares for Trend Estimation of Summarized Dose-Response Data.” Stata Journal 6 (1). Stata press 4905 LAKEWAY PARKWAY, COLLEGE STATION, TX 77845 USA: 40.