I previously re-analyzed a meta-analysis of exercise for depression. However, a new one has been published with larger dataset and it concludes the opposite of what I concluded last time.
Schuch et al (2016) do not provide the standard errors, but they do provide the sample sizes, d values, confidence intervals (CIs) for d values, and the reported p values. We can infer the standard errors from the width of the CIs. The CIs for d values are symmetric, so whether we use the upper or lower limits should not matter, but we try both to check for errors.
#infer SEs
d2 %<>% mutate(se_upper = (d - d.upper)/1.96,
se_lower = (d.lower - d)/1.96,
se_delta = se_upper - se_lower,
se = (se_upper + se_upper)/2
)
#largest discrepancy between upper and lower
max(d2$se_delta)
## [1] 0.00051
We find no major discrepancies.
We replicate the random effects meta-analysis using the reported data.
#meta-analyze
meta_main = rma(yi = d, sei = se, data = d2) %T>% print
##
## Random-Effects Model (k = 25; tau^2 estimator: REML)
##
## tau^2 (estimated amount of total heterogeneity): 0.7823 (SE = 0.2746)
## tau (square root of estimated tau^2 value): 0.8845
## I^2 (total heterogeneity / total variability): 89.49%
## H^2 (total variability / sampling variability): 9.51
##
## Test for Heterogeneity:
## Q(df = 24) = 134.1304, p-val < .0001
##
## Model Results:
##
## estimate se zval pval ci.lb ci.ub
## 1.0313 0.1959 5.2637 <.0001 0.6473 1.4152 ***
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#forest plot
GG_forest(meta_main, .names = d2$Study) +
scale_x_continuous(breaks = seq(-1, 6, by = .5))
ggsave("figures/schuch/forest.png")
## Saving 7 x 5 in image
We replicate the overall effect size of about 1: we find d = 1.031, but the authors reported 0.987.
Meta-researchers have not converged on the best method for detecting or adjusting for publication bias yet, but we can be quite certain that when we only have a small number of studies (k=25), then no method exists that can give us a certain result. However, the funnel plot is generally considered a good simple exploratory tool, so we use that:
#plot funnel
GG_funnel(meta_main) +
scale_x_continuous(breaks = seq(-1, 5, by = .5))
ggsave("figures/schuch/funnel.png")
## Saving 7 x 5 in image
#correlation
cor_se_es = cor.test(d2$d, d2$se)
#5 largest studies
meta_top5 = rma(d, sei = se, data = d2 %>% arrange(-se) %>% `[`(1:5, ))
We can immediately see that there probably is a problem with this literature as there is an obvious relationship between the precision and the effect size reported. The correlation is -0.53 [-0.77 to -0.17] Looking at the largest 5 studies or so produces a markedly smaller estimate than all the studies: d = 0.253.
The funnel plot cannot provide numerical estimates of the true effect size, but there are various methods that can. The most commonly used are the trim and fill methods. They come in 3 variants of which 2 are applicable here.
#trim and fill estimates
meta_trimfill_L0 = trimfill(meta_main, estimator = "L0") %T>% print
##
## Estimated number of missing studies on the left side: 0 (SE = 2.6645)
##
## Random-Effects Model (k = 25; tau^2 estimator: REML)
##
## tau^2 (estimated amount of total heterogeneity): 0.7823 (SE = 0.2746)
## tau (square root of estimated tau^2 value): 0.8845
## I^2 (total heterogeneity / total variability): 89.49%
## H^2 (total variability / sampling variability): 9.51
##
## Test for Heterogeneity:
## Q(df = 24) = 134.1304, p-val < .0001
##
## Model Results:
##
## estimate se zval pval ci.lb ci.ub
## 1.0313 0.1959 5.2637 <.0001 0.6473 1.4152 ***
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
meta_trimfill_R0 = trimfill(meta_main, estimator = "R0") %T>% print
##
## Estimated number of missing studies on the left side: 3 (SE = 2.8284)
## Test of H0: no missing studies on the left side: p-val = 0.0625
##
## Random-Effects Model (k = 28; tau^2 estimator: REML)
##
## tau^2 (estimated amount of total heterogeneity): 1.6013 (SE = 0.4886)
## tau (square root of estimated tau^2 value): 1.2654
## I^2 (total heterogeneity / total variability): 94.21%
## H^2 (total variability / sampling variability): 17.28
##
## Test for Heterogeneity:
## Q(df = 27) = 199.9772, p-val < .0001
##
## Model Results:
##
## estimate se zval pval ci.lb ci.ub
## 0.7578 0.2536 2.9879 0.0028 0.2607 1.2549 **
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Surprisingly, the first variant of trim and fill does not impute any missing studies, and the second variant only imputes 3. Simulation studies of these methods have shown that they generally have lower power, meaning that when k is small (as it is here), they sometimes miss even sizable publication bias. For this reason we can’t be very certain about these conclusions.
The large twin control study for exercise makes any very large effect very improbable, but is congruent with smaller effects such as that seen based on the 5 largest studies: d = 0.253 There is another recent meta-analysis of exercise for depression (Rebar et al 2015), but it meta-analyzed other meta-analyses not primary studies and was thus not interesting to re-analyze. However, it did find a much smaller effect size than the Schuch study, d = 0.50.