Berlin et al. (1993), Meta-Analysis of Epidemiologic Dose-Response Data

Summary

This markdown reproduces the main results presented in the paper
Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology. 1993 May 1:218-28 (Berlin, Longnecker, and Greenland 1993).

The authors present methods for summarizing epidemiological studies of dose-response associations. In particular, they discuss methods for estimating the dose-response association from either single or multiple studies; for assigning levels to exposure categories; for investigating the effects of study design and subject characteristics on dose-response curves; and for choosing between a fixed and random-effects model. The topics are illustrated using data from case-control studies of the relation between duration of oral contraceptive use and risk of breast cancer.

Packages and data

The packages used throughout this markdown are dosresmeta, knitr, and the collection of packages in tidyverse.

The dataset are contained in the dosresmeta package: oc_breast and cc_ex.

Fixed Effects Dose-Response Models for a Single Study

The first topic is how to estimate a curve from summarized dose-response from a single study (Orsini et al. 2006). The methodology is illustrated using one study of the relation between the duration of oral contraceptive use and breast cancer risk (Meirik et al. 1986).

The data are part of the oc_breast data that contains the results from multiple studies.

library(dosresmeta)
data("oc_breast")

Reconstructing data presented in Table 1.

meirik <- subset(oc_breast, author == "Meirik")
meirik %>% 
  mutate(`Duration categories (assigned dose)` = c("Never (0)", "0-3 (1.5)", "4-7 (5.5)", "8-11 (9.5)", ">=12 (14.4)"),
         controls = n - cases, 
         ci = paste(lb, ub, sep = "-"),
         ci = replace(ci, is.na(se), NA)) %>%
  select(`Duration categories (assigned dose)`, cases, controls, or, ci) %>%
  kable()

Duration categories (assigned dose)	cases	controls	or	ci
Never (0)	96	156	1.0	NA
0-3 (1.5)	156	205	1.1	0.8-1.6
4-7 (5.5)	80	93	1.2	0.8-1.9
8-11 (9.5)	51	50	1.4	0.8-2.3
>=12 (14.4)	39	23	2.2	1.2-4

One of the features of summarized dose-response data is the correlation within set of estimated (log) relative risks. The authors discuss the implication of ignoring the covariance structure.

# Model assuming independece
tab1_ind <- dosresmeta(logor ~ duration, type = type, cases = cases, n = n,
                    se = se, data = meirik, covariance = "indep")
# Model taking into account the covariances
tab1_dep <- dosresmeta(logor ~ duration, type = type, cases = cases, n = n,
                       se = se, data = meirik, covariance = "gl")

data.frame(
  model = c("Weighted least squares", "Weighted least squares with adjustment for covariance"),
  slope = c(coef(tab1_dep), coef(tab1_ind)),
  `standard error` = c(vcov(tab1_dep)^.5, vcov(tab1_ind)^.5)
) %>% kable(digits = 4)

model	slope	standard.error
Weighted least squares	0.0454	0.0174
Weighted least squares with adjustment for covariance	0.0458	0.0156

The authors compared the results of an increment of 15 years of use.

rbind(
  c("Weighted least squares", predict(tab1_ind, delta = 15, expo = T)),
  c("Weighted least squares with adjustment for covariance", predict(tab1_dep, delta = 15, expo = T))
)

                                                             delta pred     ci.lb    ci.ub   
[1,] "Weighted least squares"                                15    1.989114 1.259205 3.142121
[2,] "Weighted least squares with adjustment for covariance" 15    1.976406 1.185243 3.29568

rbind(
  c("Weighted least squares", predict(tab1_ind, delta = 10, expo = T)),
  c("Weighted least squares with adjustment for covariance", predict(tab1_dep, delta = 10, expo = T)))

                                                             delta pred     ci.lb    ci.ub   
[1,] "Weighted least squares"                                10    1.581635 1.166087 2.14527 
[2,] "Weighted least squares with adjustment for covariance" 10    1.574892 1.119966 2.214607

So the effect of the covariance adjustment in this example was to increase the width of the 95% confidence interval for the 10-year ln relative risk from 0.61 to 0.682 (12%).

Meta-analysis of Dose-Response Studies with Fixed Effects Model

When several studies investigated a common association, meta-analysis can be adopted for combining the study-specific results. The methodology is illustrated using the whole dataset.

oc_breast %>% slice(1:10) %>% kable()

id	author	year	type	duration	cases	n	or	lb	ub	logor	se	menopause	period
1	Paffenbarger	1977	cc	0.00	230	718	1.00	NA	NA	0.0000000	NA	1	1972
1	Paffenbarger	1977	cc	0.50	81	231	1.16	0.82	1.64	0.1484200	0.1768265	1	1972
1	Paffenbarger	1977	cc	1.50	23	68	0.91	0.50	1.67	-0.0943107	0.3076513	1	1972
1	Paffenbarger	1977	cc	3.00	49	102	1.76	1.07	2.94	0.5653138	0.2578494	1	1972
1	Paffenbarger	1977	cc	5.00	30	103	0.85	0.50	1.43	-0.1625189	0.2680717	1	1972
1	Paffenbarger	1977	cc	7.20	39	102	1.28	0.79	2.07	0.2468601	0.2457369	1	1972
2	Sartwell	1977	cc	0.00	262	595	1.00	NA	NA	0.0000000	NA	1	1972
2	Sartwell	1977	cc	0.25	4	10	1.00	0.28	3.58	0.0000000	0.6500957	1	1972
2	Sartwell	1977	cc	0.75	3	8	0.92	0.22	3.89	-0.0833816	0.7328035	1	1972
2	Sartwell	1977	cc	1.50	2	5	0.82	0.14	4.94	-0.1984509	0.9090673	1	1972

The authors considered different models:
- model 2a and 2f: \(\log \textrm{RR} = \beta_1 \times \textrm{duration}\)
- model 2b and 2g: \(\log \textrm{RR} = \beta_1 \times \textrm{duration} + \beta_2 \times \textrm{duration}^2\)
- model 2d and 2if: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}}\)

both ignoring and taking into accounts the covariances.

## Different dose-response models
mod2a <- dosresmeta(logor ~ duration, id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, covariance = "indep", method = "fixed")
mod2b <- dosresmeta(logor ~ duration + I(duration^2), id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, covariance = "indep", method = "fixed")
mod2d <- dosresmeta(logor ~I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, covariance = "indep", method = "fixed")
mod2f <- dosresmeta(logor ~ duration, id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, covariance = "gl", method = "fixed")
mod2g <- dosresmeta(logor ~ duration + I(duration^2), id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, covariance = "gl", method = "fixed")
mod2i <- dosresmeta(logor ~I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, covariance = "gl", method = "fixed")

## Auxiliary function
tab2f <- function(mod){
  tab <- cbind(t(t(coef(mod))), diag(mod$vcov)^.5, abs(t(t(coef(mod)))/diag(mod$vcov)^.5))
  colnames(tab) <- c("coef", "se", "z")
  tab
}
data.frame(model = c("2a.", "2b.", "", "2d.", "2f.", "2g.", "", "2i.")) %>% 
  cbind(do.call("rbind", lapply(list(mod2a, mod2b, mod2d, mod2f, mod2g, mod2i), tab2f))) %>%
  kable(digits = c(0, 4, 4, 2))

Warning in data.row.names(row.names, rowsi, i): some row.names duplicated: 4,5,6,7,8 --> row.names NOT used

model	coef	se	z
2a.	-0.0102	0.0026	3.87
2b.	-0.0272	0.0067	4.08
	0.0020	0.0007	2.79
2d.	-0.0298	0.0068	4.36
2f.	-0.0125	0.0029	4.36
2g.	-0.0328	0.0066	4.98
	0.0023	0.0007	3.42
2i.	-0.0411	0.0080	5.12

Calculating and comparing the pooled relative risks for selected duration values.

data.frame(duration = c(0, 5, 16)) %>%
  cbind(
    quadratic = predict(mod2g, ., expo = T)[, -c(1:2)],
    square_root = predict(mod2i, ., expo = T)[, -1]
    ) %>%
  kable(digits = 2)

duration	quadratic.pred	quadratic.ci.lb	quadratic.ci.ub	square_root.pred	square_root.ci.lb	square_root.ci.ub
0	1.00	1.00	1.00	1.00	1.00	1.00
5	0.90	0.86	0.93	0.91	0.88	0.94
16	1.05	0.89	1.25	0.85	0.80	0.90

Exposure Scale Issues

The authors compared different approaches (means, medians, and midpoints) to assign doses to exposure categories. They use data from a case-control study on the relation between alcohol and breast cancer to make the point.

data("cc_ex")
tab3 <- cc_ex %>%
  mutate(
    mean = c(0, 1.5, 5.1, 21.3),
    median = c(0, 1.7, 5.1, 16.3),
    mid = c(0, 1.9, 6.0, 10.8),
    ci = paste(round(lb, 1), round(ub, 1), sep = "-"),
    var = ((log(ub)-log(lb))/(2*qnorm(.975)))^2
  ) 
tab3 %>% 
  select(gday, mean, median, mid, adjrr, ci, logrr, var) %>%
  kable(digits = c(0, 1, 1, 1, 1, 0, 1, 2))

gday	mean	median	mid	adjrr	ci	logrr	var
Ref.	0.0	0.0	0.0	1.0	1-1	0.0	0.00
<2.5	1.5	1.7	1.9	0.8	0.5-1.3	-0.2	0.05
2.5-9.3	5.1	5.1	6.0	1.2	0.7-1.9	0.1	0.06
>9.3	21.3	16.3	10.8	1.6	1-2.5	0.5	0.06

Reproducing the results in Table 3

mod_mean <- dosresmeta(formula = logrr ~ mean, type = "cc", cases = case,
                       n = n, lb = lb, ub = ub, data= tab3)
mod_median <- dosresmeta(formula = logrr ~ median, type = "cc", cases = case,
                       n = n, lb = lb, ub = ub, data= tab3)
mod_mid <- dosresmeta(formula = logrr ~ mid, type = "cc", cases = case,
                         n = n, lb = lb, ub = ub, data= tab3)
do.call("rbind", lapply(list(mod_mean, mod_median, mod_mid), tab2f)) %>%
  kable()

	coef	se	z
mean	0.0244134	0.0106534	2.291593
median	0.0321280	0.0140842	2.281140
mid	0.0462183	0.0209631	2.204740

Identifying Sources of Heterogeneity with Fixed Effects Models

The authors considered the case of meta-regression, where covariates at the study level may explain the heterogeneity across study findings. In the practical example based on the oc_breast data, the two potential sources of heterogeneity were whether or not a study included postmenopausal women, and the final year of case accrual, thought to be a surrogate for the changing formulations of oral contraceptives over time.

The different estimated models were:
- model 4a: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}}\)
- model 4b: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}} + \beta_2 \times \textrm{menopause}\)
- model 4c: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}} + \beta_2 \times \textrm{(period - 1986)}\)
- model 4d: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}} + \beta_2 \times \textrm{menopause} + \beta_3 \times \textrm{(period - 1986)}\)
- model 4e: \(\log \textrm{RR} = \beta_1 \times \sqrt{\textrm{duration}} + \beta_2 \times \textrm{menopause} + \beta_3 \times \textrm{(period - 1986)} + \beta_4 \times \textrm{menopause} \times \textrm{(period - 1986)}\)

mod4a <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, method = "fixed")
mod4b <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, method = "fixed",
                    mod = ~ menopause)
mod4c <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, method = "fixed",
                    mod = ~ I(period - 1986))
mod4d <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, method = "fixed",
                    mod = ~ menopause + I(period - 1986))
mod4e <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
                    se = se, data = oc_breast, method = "fixed",
                    mod = ~ menopause*I(period - 1986))

tab4f <- function(mod){
  gof <- dosresmeta:::gof(mod)
  tab <- cbind(t(t(coef(mod))), diag(mod$vcov)^.5, gof$deviance$D, gof$deviance$df)
  colnames(tab) <- c("coef", "se", "deviance", "df")
  round(tab, 4)
}
data.frame(model = c("4a.", "4b.", "", "4c.", "", "4d.", "", "", 
                     "4e.", "", "", "")) %>% 
  cbind(do.call("rbind", lapply(list(mod4a, mod4b, mod4c, mod4d, mod4e), tab4f))) %>%
  mutate(
    deviance = replace(deviance, model == "", NA),
    df = replace(df, model == "", NA)
    ) %>% kable(digits = c(0, 4, 4, 0, 0))

Warning in data.row.names(row.names, rowsi, i): some row.names duplicated: 2,4,6,7,8,9,10,11 --> row.names NOT used

model	coef	se	deviance	df
4a.	-0.0411	0.0080	218	88
4b.	0.1175	0.0251	174	87
	-0.1766	0.0265	NA	NA
4c.	-0.0929	0.0112	174	87
	-0.0170	0.0025	NA	NA
4d.	0.0480	0.0281	144	86
	-0.1473	0.0270	NA	NA
	-0.0142	0.0026	NA	NA
4e.	0.1718	0.0383	121	85
	-0.2883	0.0401	NA	NA
	0.0111	0.0059	NA	NA
	-0.0313	0.0066	NA	NA

Meta-analysis of Dose-Response Studies with Random Effects Models

The final topic covered is the choice between a fixed- and a random-effects model.
Reproducing the results in table 5.

mod5_fixed <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
                     se = se, data = oc_breast, method = "fixed",
                     mod = ~ menopause*I(period - 1986))
mod5_mm <- dosresmeta(logor ~ I(sqrt(duration)), id = id, type = type, cases = cases, n = n,
                         se = se, data = oc_breast, method = "mm",
                         mod = ~ menopause*I(period - 1986))

data.frame(coef_mm = coef(mod5_mm), se_mm = diag(vcov(mod5_mm))^.5,
           coef_fixed = coef(mod5_fixed), se_fixed = diag(vcov(mod5_fixed))^.5) %>%
  .[c(2:4, 1),] %>%
  round(4)

                           coef_mm  se_mm coef_fixed se_fixed
menopause                  -0.2468 0.0502    -0.2883   0.0401
I(period - 1986)            0.0111 0.0070     0.0111   0.0059
menopause:I(period - 1986) -0.0245 0.0083    -0.0313   0.0066
(Intercept)                 0.1720 0.0421     0.1718   0.0383

Appendix Table A1

oc_breast %>%
  group_by(id) %>%
  filter(row_number()==1) %>%
  select(author) %>%
  cbind(coef = mod4a$bi, se = unlist(mod4a$Si)^.5) %>%
  as.data.frame() %>%
  kable(digits = c(0, 0, 5, 5))

id	author	coef	se
1	Paffenbarger	0.076091876	0.07208
2	Sartwell	-0.090724132	0.16455
3	Lees	0.087833324	0.08967
4	Ravnihar	-0.062128495	0.18727
5	Paffenbarger	0.126795605	0.05546
6	Kelsey	-0.225524649	0.14299
7	Harris	-0.059723518	0.09848
8	Brinton	0.062919337	0.04149
9	Janerich	0.063684420	0.04054
10	Pike	0.186057120	0.13185
11	Vessey	0.024426839	0.04597
12	Rosenberg	0.015408363	0.03907
13	Miller	0.071203657	0.10590
14	Meirik	0.148012840	0.06204
15	Ellery	-0.042445450	0.11687
16	La Vecchia	0.034509702	0.11228
17	Paul	-0.111716697	0.01090
18	Lee	0.070644893	0.08961
19	CASH	0.007116616	0.02046
20	McPherson YW	0.091471024	0.06911
21	McPherson OW	-0.040747827	0.05640
22	Miller	0.287003221	0.08290
23	Jick	-0.013089555	0.14784
24	Olsson	0.185356837	0.07595

Information about R session:

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2     dosresmeta_2.0.0 mvmeta_0.4.7     dplyr_0.7.2      purrr_0.2.2.2    readr_1.1.1     
 [7] tidyr_0.6.3      tibble_1.3.3     ggplot2_2.2.1    tidyverse_1.1.1  knitr_1.16      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12     highr_0.6        cellranger_1.1.0 compiler_3.4.1   plyr_1.8.4       bindr_0.1       
 [7] forcats_0.2.0    tools_3.4.1      digest_0.6.12    lubridate_1.6.0  jsonlite_1.5     evaluate_0.10.1 
[13] nlme_3.1-131     gtable_0.2.0     lattice_0.20-35  pkgconfig_2.0.1  rlang_0.1.1      psych_1.7.5     
[19] yaml_2.1.14      parallel_3.4.1   haven_1.1.0      xml2_1.1.1       httr_1.2.1       stringr_1.2.0   
[25] hms_0.3          rprojroot_1.2    grid_3.4.1       glue_1.1.1       R6_2.2.2         readxl_1.0.0    
[31] foreign_0.8-69   rmarkdown_1.6    modelr_0.1.1     reshape2_1.4.2   magrittr_1.5     backports_1.1.0 
[37] scales_0.4.1     htmltools_0.3.6  rvest_0.3.2      assertthat_0.2.0 mnormt_1.5-5     colorspace_1.3-2
[43] stringi_1.1.5    lazyeval_0.2.0   munsell_0.4.3    broom_0.4.2

References

Berlin, Jesse A, Matthew P Longnecker, and Sander Greenland. 1993. “Meta-Analysis of Epidemiologic Dose-Response Data.” Epidemiology. JSTOR, 218–28.

Meirik, Olav, Hans-Olov Adami, Thoralf Christoffersen, Eiliv Lund, Reinhold Bergström, and Per Bergsjö. 1986. “Oral Contraceptive Use and Breast Cancer in Young Women: A Joint National Case-Control Study in Sweden and Norway.” The Lancet 328 (8508). Elsevier: 650–54.

Orsini, Nicola, Rino Bellocco, Sander Greenland, and others. 2006. “Generalized Least Squares for Trend Estimation of Summarized Dose-Response Data.” Stata Journal 6 (1). Stata press 4905 LAKEWAY PARKWAY, COLLEGE STATION, TX 77845 USA: 40.