if (!requireNamespace("pacman", quietly = TRUE)) install.packages("pacman")
pacman::p_load(
metafor,
tidyverse,
gt,
forestplot,
patchwork
)
# ── Global ggplot theme ───────────────────────────────────────────────────────
theme_meta <- function() {
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14,
margin = margin(b = 6)),
plot.subtitle = element_text(colour = "grey40", size = 11,
margin = margin(b = 12)),
plot.caption = element_text(colour = "grey55", size = 9, hjust = 0),
axis.title = element_text(size = 11),
axis.text = element_text(colour = "grey30"),
panel.grid.minor = element_blank(),
panel.grid.major = element_line(colour = "grey92"),
strip.text = element_text(face = "bold"),
legend.position = "bottom"
)
}
clr_primary <- "#2C6FAC"
clr_pooled <- "#C0392B"
clr_mid <- "#7F8C8D"
clr_fill <- "#D6E8F7"Tuberculosis (TB) remains one of the leading infectious causes of death globally, disproportionately affecting populations in low- and middle-income countries. The BCG (Bacillus Calmette-Guérin) vaccine has been used for decades as the primary preventive intervention against TB, yet its efficacy has been shown to vary substantially across geographic and demographic settings. Understanding the sources of this variability is essential for informing vaccination policy and targeting interventions effectively.
A leading explanation for the geographic pattern centres on exposure to environmental mycobacteria: populations in tropical regions encounter these organisms more frequently, potentially inducing partial cross-immunity that diminishes the relative benefit of BCG vaccination.
This report presents a full random-effects meta-analysis of the BCG
vaccine using the classic 13-trial dataset assembled by Colditz et
al. (1994), available in the metafor package
(dat.bcg). Specifically, the objectives are to:
The analysis uses the dat.bcg dataset from the
metafor package, comprising 13 randomised controlled
trials of BCG vaccination. The primary outcome is the risk ratio
(RR) for tuberculosis infection, comparing vaccinated
participants to unvaccinated controls. Key variables include trial-level
case counts, absolute latitude of the trial site, year of publication,
and method of treatment allocation.
Effect sizes are computed as log risk ratios using
escalc(). A random-effects model is fitted using restricted maximum likelihood (REML) estimation, appropriate given the expected between-study heterogeneity arising from differences in populations, TB exposure, BCG strains, and climate. Heterogeneity is quantified using Cochran’s Q, I², and τ². A funnel plot and Egger’s test assess publication bias. A meta-regression model is fitted with absolute latitude as a continuous moderator. Finally, a leave-one-out analysis examines the influence of individual studies on the pooled estimate.
data("dat.bcg", package = "metafor")
dat <- dat.bcg
dat <- escalc(
measure = "RR",
ai = tpos, bi = tneg,
ci = cpos, di = cneg,
data = dat
)
dat <- dat |>
mutate(
slab = paste(author, year, sep = ", "),
RR = exp(yi),
RR_lo = exp(yi - 1.96 * sqrt(vi)),
RR_hi = exp(yi + 1.96 * sqrt(vi)),
n_treat = tpos + tneg,
n_ctrl = cpos + cneg
)dat |>
select(author, year, ablat, alloc, n_treat, n_ctrl, RR, RR_lo, RR_hi) |>
mutate(across(RR:RR_hi, \(x) round(x, 3))) |>
rename(
Author = author,
Year = year,
`Latitude (°)` = ablat,
Allocation = alloc,
`n (Treated)` = n_treat,
`n (Control)` = n_ctrl,
`RR` = RR,
`95% CI Low` = RR_lo,
`95% CI High` = RR_hi
) |>
gt() |>
tab_header(
title = md("**Study Characteristics and Individual Effect Sizes**"),
subtitle = md("BCG Vaccine Randomised Controlled Trials")
) |>
tab_source_note(
source_note = md(
"Source: Colditz et al. (1994). *JAMA*, 271(9), 698–702."
)
) |>
tab_style(
style = cell_fill(color = "#EAF4FB"),
locations = cells_body(rows = seq(1, nrow(dat), 2))
) |>
tab_style(
style = list(cell_text(weight = "bold"),
cell_fill(color = "#F2F6FB")),
locations = cells_column_labels()
) |>
opt_align_table_header(align = "left") |>
tab_options(
table_body.hlines.color = "transparent",
table.width = pct(100)
) |>
cols_align(align = "center", columns = everything()) |>
cols_align(align = "left", columns = c(Author, Allocation)) |>
fmt_number(columns = c(RR, `95% CI Low`, `95% CI High`), decimals = 3)| Study Characteristics and Individual Effect Sizes | ||||||||
| BCG Vaccine Randomised Controlled Trials | ||||||||
| Author | Year | Latitude (°) | Allocation | n (Treated) | n (Control) | RR | 95% CI Low | 95% CI High |
|---|---|---|---|---|---|---|---|---|
| Aronson | 1948 | 44 | random | 123 | 139 | 0.411 | 0.134 | 1.257 |
| Ferguson & Simes | 1949 | 55 | random | 306 | 303 | 0.205 | 0.086 | 0.486 |
| Rosenthal et al | 1960 | 42 | random | 231 | 220 | 0.260 | 0.073 | 0.919 |
| Hart & Sutherland | 1977 | 52 | random | 13598 | 12867 | 0.237 | 0.179 | 0.312 |
| Frimodt-Moller et al | 1973 | 13 | alternate | 5069 | 5808 | 0.804 | 0.516 | 1.254 |
| Stein & Aronson | 1953 | 44 | alternate | 1541 | 1451 | 0.456 | 0.387 | 0.536 |
| Vandiviere et al | 1973 | 19 | random | 2545 | 629 | 0.198 | 0.078 | 0.499 |
| TPT Madras | 1980 | 13 | random | 88391 | 88391 | 1.012 | 0.895 | 1.145 |
| Coetzee & Berjak | 1968 | 27 | random | 7499 | 7277 | 0.625 | 0.393 | 0.996 |
| Rosenthal et al | 1961 | 42 | systematic | 1716 | 1665 | 0.254 | 0.149 | 0.431 |
| Comstock et al | 1974 | 18 | systematic | 50634 | 27338 | 0.712 | 0.573 | 0.886 |
| Comstock & Webster | 1969 | 33 | systematic | 2498 | 2341 | 1.562 | 0.374 | 6.529 |
| Comstock et al | 1976 | 33 | systematic | 16913 | 17854 | 0.983 | 0.582 | 1.659 |
| Source: Colditz et al. (1994). JAMA, 271(9), 698–702. | ||||||||
Why random-effects? The 13 BCG trials span different populations, geographic settings, BCG strains, and years. It is implausible that they estimate a single common true effect. A random-effects model assumes true effects are drawn from a distribution with mean μ and variance τ², explicitly modelling between-study heterogeneity. REML estimation produces unbiased estimates of τ².
pi_lb <- exp(res$beta - 1.96 * sqrt(res$se^2 + res$tau2))
pi_ub <- exp(res$beta + 1.96 * sqrt(res$se^2 + res$tau2))
tibble(
Group = c(
rep("Model Specification & Rationale", 4),
rep("Pooled Estimate", 7),
rep("Heterogeneity", 7)
),
Parameter = c(
"Model type", "Estimator", "Number of studies (k)", "Rationale",
"Pooled log RR (β)", "Standard error (SE)", "z value",
"p-value (H₀: β = 0)", "95% CI — log RR", "95% CI — RR",
"95% Prediction interval — RR",
"Q statistic", "Degrees of freedom (df)", "p-value (Q test)",
"I²", "H²", "τ²", "τ"
),
Value = c(
"Random-effects",
"REML (Restricted Maximum Likelihood)",
as.character(res$k),
"Trials span different populations, BCG strains, climates & years — single common effect implausible; RE models τ²",
round(res$beta[[1]], 4),
round(res$se, 4),
round(res$zval, 3),
ifelse(res$pval < 0.001, "< 0.001", round(res$pval, 4)),
paste0("[", round(res$ci.lb, 4), ", ", round(res$ci.ub, 4), "]"),
paste0("[", round(exp(res$ci.lb), 3), ", ", round(exp(res$ci.ub), 3), "]"),
paste0("[", round(pi_lb, 3), ", ", round(pi_ub, 3), "]"),
round(res$QE, 3),
as.character(res$k - 1),
ifelse(res$QEp < 0.001, "< 0.001", round(res$QEp, 4)),
paste0(round(res$I2, 2), "% — ",
ifelse(res$I2 < 25, "low",
ifelse(res$I2 < 50, "moderate",
ifelse(res$I2 < 75, "substantial", "considerable"))),
" heterogeneity"),
round(res$H2, 3),
round(res$tau2, 4),
round(sqrt(res$tau2), 4)
)
) |>
gt(groupname_col = "Group") |>
tab_header(
title = md("**Random-Effects Model: Specification & Output (REML)**"),
subtitle = md("BCG Vaccine vs Control")
) |>
tab_style(
style = list(cell_fill(color = "#EAF4FB"),
cell_text(weight = "bold")),
locations = cells_row_groups()
) |>
tab_style(
style = list(cell_text(color = "#C0392B", weight = "bold"),
cell_fill(color = "#FFF0EE")),
locations = cells_body(rows = Parameter %in%
c("95% CI — RR", "95% Prediction interval — RR"))
) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_body(rows = Parameter == "I²")
) |>
tab_style(
style = cell_text(style = "italic", color = "grey50", size = px(12)),
locations = cells_body(rows = Parameter == "Rationale")
) |>
tab_style(
style = list(cell_text(weight = "bold"),
cell_fill(color = "#F2F6FB")),
locations = cells_column_labels()
) |>
cols_label(Parameter = "Parameter", Value = "Estimate / Value") |>
opt_align_table_header(align = "left") |>
tab_source_note(
source_note = md(
"REML = Restricted Maximum Likelihood · I² = proportion of variance due to between-study heterogeneity · τ² = between-study variance · PI = 95% prediction interval"
)
) |>
tab_options(
table_body.hlines.color = "transparent",
table.width = pct(100)
) |>
cols_align(align = "left", columns = everything())| Random-Effects Model: Specification & Output (REML) | |
| BCG Vaccine vs Control | |
| Parameter | Estimate / Value |
|---|---|
| Model Specification & Rationale | |
| Model type | Random-effects |
| Estimator | REML (Restricted Maximum Likelihood) |
| Number of studies (k) | 13 |
| Rationale | Trials span different populations, BCG strains, climates & years — single common effect implausible; RE models τ² |
| Pooled Estimate | |
| Pooled log RR (β) | -0.7145 |
| Standard error (SE) | 0.1798 |
| z value | -3.974 |
| p-value (H₀: β = 0) | < 0.001 |
| 95% CI — log RR | [-1.0669, -0.3622] |
| 95% CI — RR | [0.344, 0.696] |
| 95% Prediction interval — RR | [0.155, 1.549] |
| Heterogeneity | |
| Q statistic | 152.233 |
| Degrees of freedom (df) | 12 |
| p-value (Q test) | < 0.001 |
| I² | 92.22% — considerable heterogeneity |
| H² | 12.856 |
| τ² | 0.3132 |
| τ | 0.5597 |
| REML = Restricted Maximum Likelihood · I² = proportion of variance due to between-study heterogeneity · τ² = between-study variance · PI = 95% prediction interval | |
wts <- weights(res)
tabletext <- cbind(
c("Study",
dat$slab,
NA,
"Pooled (RE)"),
c("Lat (°)",
as.character(dat$ablat),
NA,
NA),
c("n (Vax)",
formatC(dat$n_treat, format = "d", big.mark = ","),
NA,
NA),
c("n (Ctrl)",
formatC(dat$n_ctrl, format = "d", big.mark = ","),
NA,
NA),
c("Wt (%)",
formatC(wts, digits = 1, format = "f"),
NA,
NA),
c("RR [95% CI]",
sprintf("%.2f [%.2f, %.2f]", dat$RR, dat$RR_lo, dat$RR_hi),
NA,
sprintf("%.2f [%.2f, %.2f]",
exp(res$beta), exp(res$ci.lb), exp(res$ci.ub)))
)
mean_v <- c(NA, dat$RR, NA, exp(res$beta))
lower_v <- c(NA, dat$RR_lo, NA, exp(res$ci.lb))
upper_v <- c(NA, dat$RR_hi, NA, exp(res$ci.ub))
box_sizes <- c(
NA,
0.25 + 0.75 * (wts - min(wts)) / (max(wts) - min(wts)),
NA,
NA
)
forestplot(
labeltext = tabletext,
mean = mean_v,
lower = lower_v,
upper = upper_v,
is.summary = c(TRUE, rep(FALSE, nrow(dat)), FALSE, TRUE),
boxsize = box_sizes,
graph.pos = 5,
xlog = TRUE,
xticks = c(0.1, 0.25, 0.5, 1, 2, 4),
zero = 1,
xlab = expression(
"Risk Ratio (log scale)" %~~% " \u25c4 Favours vaccine | Favours control \u25ba"
),
hrzl_lines = list(
"1" = gpar(lwd = 1.5, col = "black"), # top border
"2" = gpar(lwd = 1.0, col = "black"), # below header
"15" = gpar(lwd = 0.8, col = "grey60"), # above spacer row
"16" = gpar(lwd = 1.5, col = "black") # bottom border
),
col = fpColors(
box = "#2C6FAC",
line = "#2C6FAC",
summary = "#C0392B",
hrz_lines = "grey80",
zero = "grey40"
),
txt_gp = fpTxtGp(
label = gpar(fontfamily = "sans", cex = 0.82),
ticks = gpar(fontfamily = "sans", cex = 0.80),
xlab = gpar(fontfamily = "sans", cex = 0.85),
title = gpar(fontfamily = "sans", cex = 1.0, fontface = "bold"),
summary = gpar(fontfamily = "sans", cex = 0.88, fontface = "bold",
col = "#C0392B")
),
shapes_gp = fpShapesGp(
default = gpar(lineend = "square", linejoin = "mitre")
),
addpred = TRUE,
clip = c(0.05, 6),
lwd.ci = 1.2,
lwd.zero = 0.8,
lwd.xaxis = 0.8,
ci.vertices = TRUE,
ci.vertices.height = 0.12,
lineheight = unit(7, "mm"),
colgap = unit(3, "mm"),
graphwidth = unit(7, "cm"),
title = sprintf(
"BCG Vaccine vs Control — Random-Effects Meta-Analysis\nPooled RR = %.2f [%.2f, %.2f] I\u00b2 = %.1f%% \u03c4\u00b2 = %.4f",
exp(res$beta), exp(res$ci.lb), exp(res$ci.ub), res$I2, res$tau2
)
)Figure 1. Forest plot of BCG vaccine trials (forestplot package). Square size reflects study weight. Diamond = pooled estimate with 95% CI; wider bar = 95% prediction interval.
Heterogeneity across trials was assessed using Cochran’s Q test, I², and the between-study variance τ².
tibble(
Measure = c("Q statistic", "df", "p-value", "I²", "H²", "τ²", "τ"),
Value = c(
round(res$QE, 2),
res$k - 1,
ifelse(res$QEp < 0.001, "< 0.001", round(res$QEp, 4)),
paste0(round(res$I2, 1), "%"),
round(res$H2, 3),
round(res$tau2, 4),
round(sqrt(res$tau2), 4)
),
Interpretation = c(
"Test statistic for heterogeneity",
"Number of studies minus 1",
"Evidence of significant heterogeneity",
paste0(round(res$I2, 1),
"% of variance due to between-study differences — considerable"),
"Ratio of total to sampling variance",
"Between-study variance on log-RR scale",
"Between-study SD on log-RR scale"
)
) |>
gt() |>
tab_header(
title = md("**Heterogeneity Statistics**"),
subtitle = md("Random-Effects Model (REML)")
) |>
tab_style(
style = list(cell_text(weight = "bold"),
cell_fill(color = "#FFF3CD")),
locations = cells_body(rows = Measure == "I²")
) |>
tab_style(
style = list(cell_text(weight = "bold"),
cell_fill(color = "#F2F6FB")),
locations = cells_column_labels()
) |>
tab_style(
style = cell_fill(color = "#EAF4FB"),
locations = cells_body(rows = seq(1, 7, 2))
) |>
opt_align_table_header(align = "left") |>
tab_options(
table_body.hlines.color = "transparent",
table.width = pct(100)
) |>
cols_align(align = "left", columns = everything()) |>
cols_align(align = "center", columns = Value)| Heterogeneity Statistics | ||
| Random-Effects Model (REML) | ||
| Measure | Value | Interpretation |
|---|---|---|
| Q statistic | 152.23 | Test statistic for heterogeneity |
| df | 12 | Number of studies minus 1 |
| p-value | < 0.001 | Evidence of significant heterogeneity |
| I² | 92.2% | 92.2% of variance due to between-study differences — considerable |
| H² | 12.856 | Ratio of total to sampling variance |
| τ² | 0.3132 | Between-study variance on log-RR scale |
| τ | 0.5597 | Between-study SD on log-RR scale |
Interpretation: I² = 92.2% indicates considerable heterogeneity. The large majority of variation across studies reflects genuine differences in true effects rather than sampling error, justifying the use of a random-effects model and motivating the moderator analysis below.
funnel_df <- dat |> mutate(sei = sqrt(vi))
se_seq <- seq(0, max(funnel_df$sei) * 1.1, length.out = 100)
ci_lo <- res$beta - 1.96 * se_seq
ci_hi <- res$beta + 1.96 * se_seq
funnel_poly <- tibble(
x = c(exp(ci_lo), rev(exp(ci_hi))),
y = c(se_seq, rev(se_seq))
)
ggplot(funnel_df, aes(x = RR, y = sqrt(vi))) +
geom_polygon(
data = funnel_poly, aes(x = x, y = y),
fill = "#EAF4FB", colour = NA, alpha = 0.7
) +
geom_path(
data = tibble(x = exp(ci_lo), y = se_seq),
aes(x = x, y = y), linetype = "dashed", colour = "grey60"
) +
geom_path(
data = tibble(x = exp(ci_hi), y = se_seq),
aes(x = x, y = y), linetype = "dashed", colour = "grey60"
) +
geom_vline(xintercept = exp(res$beta), colour = clr_pooled,
linewidth = 0.8, linetype = "dotted") +
geom_vline(xintercept = 1, colour = "grey40",
linewidth = 0.5, linetype = "dashed") +
geom_point(colour = clr_primary, size = 3, alpha = 0.85) +
scale_x_log10(breaks = c(0.1, 0.25, 0.5, 1, 2, 4)) +
scale_y_reverse(
limits = c(max(funnel_df$sei) * 1.12, 0),
expand = c(0, 0)
) +
labs(
title = "Funnel Plot — BCG Trials",
subtitle = "Assessing small-study effects and publication bias",
x = "Risk Ratio (log scale)",
y = "Standard Error",
caption = "Red dotted line = pooled estimate. Shaded region = 95% pseudo-CI."
) +
theme_meta()Figure 2. Funnel plot for publication bias assessment. Dashed lines = 95% pseudo-confidence region. Asymmetry may suggest selective reporting of small studies with positive results.
The funnel plot is complemented by Egger’s test, which formally tests for funnel plot asymmetry using a regression of the standardised effect on its precision.
egg <- regtest(res)
tibble(
Statistic = c("z statistic", "p-value", "Interpretation"),
Value = c(
round(egg$zval, 3),
round(egg$pval, 4),
ifelse(
egg$pval < 0.05,
"Significant asymmetry detected — possible publication bias",
"No significant asymmetry detected at the 5% level"
)
)
) |>
gt() |>
tab_header(
title = md("**Egger's Test for Funnel Plot Asymmetry**"),
subtitle = "Regression test for publication bias"
) |>
tab_style(
style = cell_text(
weight = "bold",
color = ifelse(egg$pval < 0.05, "#C0392B", "#27AE60")
),
locations = cells_body(rows = Statistic == "Interpretation")
) |>
tab_style(
style = list(cell_text(weight = "bold"),
cell_fill(color = "#F2F6FB")),
locations = cells_column_labels()
) |>
tab_footnote(
footnote = "A non-significant p-value (p > 0.05) suggests no evidence of publication bias.",
locations = cells_column_labels(columns = Value)
) |>
opt_align_table_header(align = "left") |>
tab_options(
table_body.hlines.color = "transparent",
table.width = pct(60)
) |>
cols_align(align = "left", columns = everything())| Egger’s Test for Funnel Plot Asymmetry | |
| Regression test for publication bias | |
| Statistic | Value1 |
|---|---|
| z statistic | -0.803 |
| p-value | 0.4218 |
| Interpretation | No significant asymmetry detected at the 5% level |
| 1 A non-significant p-value (p > 0.05) suggests no evidence of publication bias. | |
Absolute latitude is hypothesised to be a proxy for background exposure to environmental mycobacteria. Trials at higher latitudes (cooler climates) consistently show stronger BCG protection, likely because populations at lower latitudes have pre-existing immunity from environmental mycobacteria that diminishes the relative benefit of vaccination.
tibble(
Parameter = c(
"Intercept (log RR at 0°)",
"Intercept — RR",
"Slope (per degree latitude — log RR)",
"Slope — RR change per degree",
"R² (latitude)",
"Residual I²",
"Residual τ²"
),
Estimate = c(
round(coef(res_mod)[1], 4),
round(exp(coef(res_mod)[1]), 3),
round(coef(res_mod)[2], 5),
paste0(round((1 - exp(coef(res_mod)["ablat"])) * 100, 1), "%"),
paste0(round(res_mod$R2, 1), "%"),
paste0(round(res_mod$I2, 1), "%"),
round(res_mod$tau2, 4)
),
Interpretation = c(
"Log RR at equator (latitude = 0°)",
"Near-null protection at equatorial baseline",
"Each additional degree increases log-RR reduction",
"Further reduction in RR per additional degree of latitude",
"Between-study variance explained by latitude",
"Remaining unexplained heterogeneity",
"Residual between-study variance after adjusting for latitude"
)
) |>
gt() |>
tab_header(
title = md("**Meta-Regression Results**"),
subtitle = md("Moderator: Absolute Latitude (degrees)")
) |>
tab_style(
style = list(cell_text(weight = "bold", color = "#1F4E79"),
cell_fill(color = "#D6E8F7")),
locations = cells_body(rows = Parameter == "R² (latitude)")
) |>
tab_style(
style = list(cell_text(weight = "bold"),
cell_fill(color = "#F2F6FB")),
locations = cells_column_labels()
) |>
tab_style(
style = cell_fill(color = "#EAF4FB"),
locations = cells_body(rows = c(1, 3, 5, 7))
) |>
tab_source_note(
source_note = md(
"REML estimation. R² reflects variance in true effects explained by latitude."
)
) |>
opt_align_table_header(align = "left") |>
tab_options(
table_body.hlines.color = "transparent",
table.width = pct(100)
) |>
cols_align(align = "left", columns = c(Parameter, Interpretation)) |>
cols_align(align = "center", columns = Estimate)| Meta-Regression Results | ||
| Moderator: Absolute Latitude (degrees) | ||
| Parameter | Estimate | Interpretation |
|---|---|---|
| Intercept (log RR at 0°) | 0.2515 | Log RR at equator (latitude = 0°) |
| Intercept — RR | 1.286 | Near-null protection at equatorial baseline |
| Slope (per degree latitude — log RR) | -0.0291 | Each additional degree increases log-RR reduction |
| Slope — RR change per degree | 2.9% | Further reduction in RR per additional degree of latitude |
| R² (latitude) | 75.6% | Between-study variance explained by latitude |
| Residual I² | 68.4% | Remaining unexplained heterogeneity |
| Residual τ² | 0.0764 | Residual between-study variance after adjusting for latitude |
| REML estimation. R² reflects variance in true effects explained by latitude. | ||
lat_seq <- seq(5, 65, by = 1)
pred_mod <- predict(res_mod, newmods = lat_seq, transf = exp)
pred_df <- tibble(
ablat = lat_seq,
fit = pred_mod$pred,
ci_lo = pred_mod$ci.lb,
ci_hi = pred_mod$ci.ub
)
dat2 <- dat |>
mutate(
wt = weights(res_mod),
size = 2 + 5 * (wt - min(wt)) / (max(wt) - min(wt))
)
ggplot() +
geom_ribbon(
data = pred_df,
aes(x = ablat, ymin = ci_lo, ymax = ci_hi),
fill = clr_fill, alpha = 0.7
) +
geom_line(
data = pred_df, aes(x = ablat, y = fit),
colour = clr_pooled, linewidth = 1
) +
geom_hline(yintercept = 1, linetype = "dashed", colour = "grey50") +
geom_point(
data = dat2,
aes(x = ablat, y = RR, size = size),
colour = clr_primary, alpha = 0.85
) +
scale_y_log10(
limits = c(0.05, 3),
breaks = c(0.1, 0.25, 0.5, 1, 2),
labels = c("0.10", "0.25", "0.50", "1.00", "2.00")
) +
scale_size_identity() +
scale_x_continuous(breaks = seq(10, 60, by = 10)) +
annotate(
"text",
x = 50, y = 2.3,
label = sprintf("Slope = %.4f log-RR per degree\nR\u00b2 = %.1f%%",
coef(res_mod)["ablat"], res_mod$R2),
hjust = 0, size = 3.5, colour = clr_pooled
) +
labs(
title = "Meta-Regression: Latitude \u00d7 BCG Efficacy",
subtitle = "Weighted regression; point size proportional to study weight",
x = "Absolute Latitude (degrees)",
y = "Risk Ratio (log scale)",
caption = "Shaded band = 95% CI. Dashed line = no effect (RR = 1)."
) +
theme_meta()Figure 3. Meta-regression of BCG efficacy on absolute latitude. Point size reflects study weight. Shaded band = 95% CI around the regression line.
Interpretation: Each additional degree of absolute latitude is associated with a 2.9% further reduction in the risk ratio. Latitude explains approximately 75.6% of the between-study variance, with 24.4% of heterogeneity remaining unexplained by this single moderator.
The robustness of the pooled estimate is assessed by sequentially omitting each study and re-fitting the model on the remaining 12 trials.
loo <- leave1out(res, transf = exp)
loo_df <- tibble(
study = dat$slab,
est = loo$estimate,
ci_lo = loo$ci.lb,
ci_hi = loo$ci.ub,
i2 = loo$I2
) |>
mutate(study = factor(study, levels = rev(study)))
ggplot(loo_df, aes(y = study, x = est, xmin = ci_lo, xmax = ci_hi)) +
geom_vline(
xintercept = exp(res$beta),
linetype = "dashed", colour = clr_pooled, linewidth = 0.7
) +
geom_errorbarh(height = 0.25, colour = clr_mid, linewidth = 0.5) +
geom_point(aes(colour = i2), size = 3, shape = 18) +
scale_x_log10(breaks = c(0.3, 0.4, 0.5, 0.6, 0.7, 0.8)) +
scale_colour_gradient(low = "#2ECC71", high = "#E74C3C",
name = "I\u00b2 (%)") +
labs(
title = "Leave-One-Out Sensitivity Analysis",
subtitle = "Red dashed line = original pooled estimate",
x = "Pooled RR (log scale)",
y = NULL,
caption = "Point colour indicates residual I\u00b2 after study removal."
) +
theme_meta() +
theme(axis.text.y = element_text(size = 9))Figure 4. Leave-one-out sensitivity analysis. Each row shows the pooled RR when the named study is excluded. Point colour indicates the residual I² after removal.
loo_df |>
mutate(study = as.character(study)) |>
arrange(est) |>
gt() |>
tab_header(
title = md("**Leave-One-Out Results**"),
subtitle = "Pooled RR and residual heterogeneity after each study omission"
) |>
fmt_number(columns = c(est, ci_lo, ci_hi, i2), decimals = 3) |>
cols_label(
study = "Study Omitted",
est = "Pooled RR",
ci_lo = "95% CI Low",
ci_hi = "95% CI High",
i2 = "Residual I² (%)"
) |>
tab_style(
style = cell_fill(color = "#EAF4FB"),
locations = cells_body(rows = seq(1, nrow(loo_df), 2))
) |>
tab_style(
style = list(cell_text(weight = "bold"),
cell_fill(color = "#F2F6FB")),
locations = cells_column_labels()
) |>
tab_footnote(
footnote = md(paste0(
"Red dashed line in Figure 4 = original pooled RR = **",
round(exp(res$beta), 3), "** [",
round(exp(res$ci.lb), 3), ", ",
round(exp(res$ci.ub), 3), "]."
)),
locations = cells_column_labels(columns = est)
) |>
opt_align_table_header(align = "left") |>
tab_options(
table_body.hlines.color = "transparent",
table.width = pct(100)
) |>
cols_align(align = "center", columns = everything()) |>
cols_align(align = "left", columns = study)| Leave-One-Out Results | ||||
| Pooled RR and residual heterogeneity after each study omission | ||||
| Study Omitted | Pooled RR1 | 95% CI Low | 95% CI High | Residual I² (%) |
|---|---|---|---|---|
| TPT Madras, 1980 | 0.452 | 0.317 | 0.643 | 87.031 |
| Comstock et al, 1976 | 0.460 | 0.319 | 0.661 | 92.344 |
| Frimodt-Moller et al, 1973 | 0.466 | 0.320 | 0.678 | 92.763 |
| Comstock & Webster, 1969 | 0.468 | 0.327 | 0.668 | 92.678 |
| Comstock et al, 1974 | 0.469 | 0.319 | 0.688 | 91.811 |
| Coetzee & Berjak, 1968 | 0.477 | 0.324 | 0.701 | 93.213 |
| Stein & Aronson, 1953 | 0.491 | 0.332 | 0.727 | 90.912 |
| Aronson, 1948 | 0.493 | 0.340 | 0.716 | 93.226 |
| Rosenthal et al, 1960 | 0.504 | 0.350 | 0.725 | 92.935 |
| Vandiviere et al, 1973 | 0.519 | 0.365 | 0.740 | 92.278 |
| Ferguson & Simes, 1949 | 0.520 | 0.365 | 0.741 | 92.254 |
| Rosenthal et al, 1961 | 0.520 | 0.363 | 0.747 | 92.232 |
| Hart & Sutherland, 1977 | 0.533 | 0.377 | 0.754 | 90.412 |
| 1 Red dashed line in Figure 4 = original pooled RR = 0.489 [0.344, 0.696]. | ||||
tibble(
Analysis = c(
"Primary RE (REML)",
"With latitude covariate",
"Leave-one-out range"
),
`Pooled RR (95% CI)` = c(
sprintf("%.2f [%.2f, %.2f]",
exp(res$beta), exp(res$ci.lb), exp(res$ci.ub)),
sprintf("Intercept RR: %.2f; Slope: %.4f/°",
exp(coef(res_mod)[1]), coef(res_mod)[2]),
sprintf("%.3f to %.3f", min(loo_df$est), max(loo_df$est))
),
`I² (%)` = c(
round(res$I2, 1),
round(res_mod$I2, 1),
NA_real_
),
`τ²` = c(
round(res$tau2, 4),
round(res_mod$tau2, 4),
NA_real_
),
`p (Heterogeneity)` = c(
ifelse(res$QEp < 0.001, "< 0.001", round(res$QEp, 4)),
ifelse(res_mod$QEp < 0.001, "< 0.001", round(res_mod$QEp, 4)),
"—"
)
) |>
gt() |>
tab_header(
title = md("**Summary of All Models and Sensitivity Analyses**"),
subtitle = md("BCG Vaccine Meta-Analysis")
) |>
sub_missing(missing_text = "—") |>
tab_style(
style = list(cell_fill(color = "#EAF4FB"),
cell_text(weight = "bold")),
locations = cells_body(rows = Analysis == "Primary RE (REML)")
) |>
tab_style(
style = list(cell_text(weight = "bold"),
cell_fill(color = "#F2F6FB")),
locations = cells_column_labels()
) |>
tab_source_note(
source_note = md(
"RE = Random-effects · REML = Restricted Maximum Likelihood · I² = between-study heterogeneity"
)
) |>
opt_align_table_header(align = "left") |>
tab_options(
table_body.hlines.color = "transparent",
table.width = pct(100)
) |>
cols_align(align = "center", columns = everything()) |>
cols_align(align = "left", columns = Analysis)| Summary of All Models and Sensitivity Analyses | ||||
| BCG Vaccine Meta-Analysis | ||||
| Analysis | Pooled RR (95% CI) | I² (%) | τ² | p (Heterogeneity) |
|---|---|---|---|---|
| Primary RE (REML) | 0.49 [0.34, 0.70] | 92.2 | 0.3132 | < 0.001 |
| With latitude covariate | Intercept RR: 1.29; Slope: -0.0291/° | 68.4 | 0.0764 | 0.0012 |
| Leave-one-out range | 0.452 to 0.533 | — | — | — |
| RE = Random-effects · REML = Restricted Maximum Likelihood · I² = between-study heterogeneity | ||||
The meta-analysis of 13 randomised controlled trials provides strong evidence that BCG vaccination significantly reduces the risk of tuberculosis infection. The pooled risk ratio of 0.49 (95% CI: 0.34 to 0.7) indicates a meaningful protective effect of the vaccine compared to no vaccination.
Substantial heterogeneity (I² ≈ 92%) was detected across trials, which is expected given the diversity of study populations, settings, and BCG strains. Meta-regression revealed that absolute latitude explains approximately 76% of this between-study variance, with trials at higher latitudes demonstrating stronger protection. This pattern is consistent with the hypothesis that pre-existing sensitisation to environmental mycobacteria at lower latitudes attenuates vaccine efficacy.
Funnel plot inspection and Egger’s test did not provide strong evidence of publication bias (p = 0.422). Leave-one-out analyses confirmed that the pooled estimate is robust and not unduly influenced by any single trial, with the estimate ranging from 0.452 to 0.533 across all omissions.
In conclusion, BCG vaccination confers significant protection against TB, though the magnitude of benefit varies substantially by geographic context. These findings support continued use of BCG in high-latitude settings and underscore the importance of context-specific evaluation in global vaccination policy.
Colditz GA, Brewer TF, Berkey CS, et al. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis. JAMA, 271(9), 698–702. https://doi.org/10.1001/jama.1994.03510330076038
Viechtbauer W (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03
Higgins JPT, Thompson SG, Deeks JJ, Altman DG (2003). Measuring inconsistency in meta-analyses. BMJ, 327, 557–560.
Egger M, Davey Smith G, Schneider M, Minder C (1997). Bias in meta-analysis detected by a simple, graphical test. BMJ, 315, 629–634.
Analysis conducted in R using the metafor, forestplot, and tidyverse packages. Dataset: Colditz GA et al. (1994). JAMA, 271(9), 698–702.