BCG Vaccine Meta-Analysis

if (!requireNamespace("pacman", quietly = TRUE)) install.packages("pacman")
pacman::p_load(
  metafor,
  tidyverse,
  gt,
  forestplot,
  patchwork
)

# ── Global ggplot theme ───────────────────────────────────────────────────────
theme_meta <- function() {
  theme_minimal(base_size = 12) +
    theme(
      plot.title       = element_text(face = "bold", size = 14,
                                      margin = margin(b = 6)),
      plot.subtitle    = element_text(colour = "grey40", size = 11,
                                      margin = margin(b = 12)),
      plot.caption     = element_text(colour = "grey55", size = 9, hjust = 0),
      axis.title       = element_text(size = 11),
      axis.text        = element_text(colour = "grey30"),
      panel.grid.minor = element_blank(),
      panel.grid.major = element_line(colour = "grey92"),
      strip.text       = element_text(face = "bold"),
      legend.position  = "bottom"
    )
}

clr_primary <- "#2C6FAC"
clr_pooled  <- "#C0392B"
clr_mid     <- "#7F8C8D"
clr_fill    <- "#D6E8F7"

1 Introduction

1.1 Background

Tuberculosis (TB) remains one of the leading infectious causes of death globally, disproportionately affecting populations in low- and middle-income countries. The BCG (Bacillus Calmette-Guérin) vaccine has been used for decades as the primary preventive intervention against TB, yet its efficacy has been shown to vary substantially across geographic and demographic settings. Understanding the sources of this variability is essential for informing vaccination policy and targeting interventions effectively.

A leading explanation for the geographic pattern centres on exposure to environmental mycobacteria: populations in tropical regions encounter these organisms more frequently, potentially inducing partial cross-immunity that diminishes the relative benefit of BCG vaccination.

1.2 Objectives

This report presents a full random-effects meta-analysis of the BCG vaccine using the classic 13-trial dataset assembled by Colditz et al. (1994), available in the metafor package (dat.bcg). Specifically, the objectives are to:

Summarise individual study effect sizes for TB infection risk in vaccinated versus control arms
Pool evidence using a random-effects model (REML) and quantify between-study heterogeneity
Assess potential publication bias using a funnel plot and Egger’s test
Examine absolute latitude as a moderator of BCG efficacy via meta-regression
Evaluate robustness of the pooled estimate using leave-one-out sensitivity analysis

1.3 Dataset

The analysis uses the dat.bcg dataset from the metafor package, comprising 13 randomised controlled trials of BCG vaccination. The primary outcome is the risk ratio (RR) for tuberculosis infection, comparing vaccinated participants to unvaccinated controls. Key variables include trial-level case counts, absolute latitude of the trial site, year of publication, and method of treatment allocation.

1.4 Analytic Approach

Effect sizes are computed as log risk ratios using escalc(). A random-effects model is fitted using restricted maximum likelihood (REML) estimation, appropriate given the expected between-study heterogeneity arising from differences in populations, TB exposure, BCG strains, and climate. Heterogeneity is quantified using Cochran’s Q, I², and τ². A funnel plot and Egger’s test assess publication bias. A meta-regression model is fitted with absolute latitude as a continuous moderator. Finally, a leave-one-out analysis examines the influence of individual studies on the pooled estimate.

2 Data

2.1 Loading and Computing Effect Sizes

data("dat.bcg", package = "metafor")
dat <- dat.bcg

dat <- escalc(
  measure = "RR",
  ai = tpos, bi = tneg,
  ci = cpos, di = cneg,
  data = dat
)

dat <- dat |>
  mutate(
    slab    = paste(author, year, sep = ", "),
    RR      = exp(yi),
    RR_lo   = exp(yi - 1.96 * sqrt(vi)),
    RR_hi   = exp(yi + 1.96 * sqrt(vi)),
    n_treat = tpos + tneg,
    n_ctrl  = cpos + cneg
  )

2.2 Study Characteristics

dat |>
  select(author, year, ablat, alloc, n_treat, n_ctrl, RR, RR_lo, RR_hi) |>
  mutate(across(RR:RR_hi, \(x) round(x, 3))) |>
  rename(
    Author         = author,
    Year           = year,
    `Latitude (°)` = ablat,
    Allocation     = alloc,
    `n (Treated)`  = n_treat,
    `n (Control)`  = n_ctrl,
    `RR`           = RR,
    `95% CI Low`   = RR_lo,
    `95% CI High`  = RR_hi
  ) |>
  gt() |>
  tab_header(
    title    = md("**Study Characteristics and Individual Effect Sizes**"),
    subtitle = md("BCG Vaccine Randomised Controlled Trials")
  ) |>
  tab_source_note(
    source_note = md(
      "Source: Colditz et al. (1994). *JAMA*, 271(9), 698–702."
    )
  ) |>
  tab_style(
    style     = cell_fill(color = "#EAF4FB"),
    locations = cells_body(rows = seq(1, nrow(dat), 2))
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "center", columns = everything()) |>
  cols_align(align = "left",   columns = c(Author, Allocation)) |>
  fmt_number(columns = c(RR, `95% CI Low`, `95% CI High`), decimals = 3)

Author	Year	Latitude (°)	Allocation	n (Treated)	n (Control)	RR	95% CI Low	95% CI High
Study Characteristics and Individual Effect Sizes
BCG Vaccine Randomised Controlled Trials
Aronson	1948	44	random	123	139	0.411	0.134	1.257
Ferguson & Simes	1949	55	random	306	303	0.205	0.086	0.486
Rosenthal et al	1960	42	random	231	220	0.260	0.073	0.919
Hart & Sutherland	1977	52	random	13598	12867	0.237	0.179	0.312
Frimodt-Moller et al	1973	13	alternate	5069	5808	0.804	0.516	1.254
Stein & Aronson	1953	44	alternate	1541	1451	0.456	0.387	0.536
Vandiviere et al	1973	19	random	2545	629	0.198	0.078	0.499
TPT Madras	1980	13	random	88391	88391	1.012	0.895	1.145
Coetzee & Berjak	1968	27	random	7499	7277	0.625	0.393	0.996
Rosenthal et al	1961	42	systematic	1716	1665	0.254	0.149	0.431
Comstock et al	1974	18	systematic	50634	27338	0.712	0.573	0.886
Comstock & Webster	1969	33	systematic	2498	2341	1.562	0.374	6.529
Comstock et al	1976	33	systematic	16913	17854	0.983	0.582	1.659
Source: Colditz et al. (1994). JAMA, 271(9), 698–702.

3 Random-Effects Meta-Analysis

3.1 Model Specification and Output

Why random-effects? The 13 BCG trials span different populations, geographic settings, BCG strains, and years. It is implausible that they estimate a single common true effect. A random-effects model assumes true effects are drawn from a distribution with mean μ and variance τ², explicitly modelling between-study heterogeneity. REML estimation produces unbiased estimates of τ².

res <- rma(yi = yi, vi = vi, data = dat, method = "REML")

pi_lb <- exp(res$beta - 1.96 * sqrt(res$se^2 + res$tau2))
pi_ub <- exp(res$beta + 1.96 * sqrt(res$se^2 + res$tau2))

tibble(
  Group = c(
    rep("Model Specification & Rationale", 4),
    rep("Pooled Estimate", 7),
    rep("Heterogeneity", 7)
  ),
  Parameter = c(
    "Model type", "Estimator", "Number of studies (k)", "Rationale",
    "Pooled log RR (β)", "Standard error (SE)", "z value",
    "p-value (H₀: β = 0)", "95% CI — log RR", "95% CI — RR",
    "95% Prediction interval — RR",
    "Q statistic", "Degrees of freedom (df)", "p-value (Q test)",
    "I²", "H²", "τ²", "τ"
  ),
  Value = c(
    "Random-effects",
    "REML (Restricted Maximum Likelihood)",
    as.character(res$k),
    "Trials span different populations, BCG strains, climates & years — single common effect implausible; RE models τ²",
    round(res$beta[[1]], 4),
    round(res$se, 4),
    round(res$zval, 3),
    ifelse(res$pval < 0.001, "< 0.001", round(res$pval, 4)),
    paste0("[", round(res$ci.lb, 4), ",  ", round(res$ci.ub, 4), "]"),
    paste0("[", round(exp(res$ci.lb), 3), ",  ", round(exp(res$ci.ub), 3), "]"),
    paste0("[", round(pi_lb, 3), ",  ", round(pi_ub, 3), "]"),
    round(res$QE, 3),
    as.character(res$k - 1),
    ifelse(res$QEp < 0.001, "< 0.001", round(res$QEp, 4)),
    paste0(round(res$I2, 2), "%  — ",
           ifelse(res$I2 < 25, "low",
           ifelse(res$I2 < 50, "moderate",
           ifelse(res$I2 < 75, "substantial", "considerable"))),
           " heterogeneity"),
    round(res$H2, 3),
    round(res$tau2, 4),
    round(sqrt(res$tau2), 4)
  )
) |>
  gt(groupname_col = "Group") |>
  tab_header(
    title    = md("**Random-Effects Model: Specification & Output (REML)**"),
    subtitle = md("BCG Vaccine vs Control")
  ) |>
  tab_style(
    style     = list(cell_fill(color = "#EAF4FB"),
                     cell_text(weight = "bold")),
    locations = cells_row_groups()
  ) |>
  tab_style(
    style     = list(cell_text(color = "#C0392B", weight = "bold"),
                     cell_fill(color = "#FFF0EE")),
    locations = cells_body(rows = Parameter %in%
                             c("95% CI — RR", "95% Prediction interval — RR"))
  ) |>
  tab_style(
    style     = cell_text(weight = "bold"),
    locations = cells_body(rows = Parameter == "I²")
  ) |>
  tab_style(
    style     = cell_text(style = "italic", color = "grey50", size = px(12)),
    locations = cells_body(rows = Parameter == "Rationale")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  cols_label(Parameter = "Parameter", Value = "Estimate / Value") |>
  opt_align_table_header(align = "left") |>
  tab_source_note(
    source_note = md(
      "REML = Restricted Maximum Likelihood · I² = proportion of variance due to between-study heterogeneity · τ² = between-study variance · PI = 95% prediction interval"
    )
  ) |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "left", columns = everything())

Parameter	Estimate / Value
Random-Effects Model: Specification & Output (REML)
BCG Vaccine vs Control
Model Specification & Rationale
Model type	Random-effects
Estimator	REML (Restricted Maximum Likelihood)
Number of studies (k)	13
Rationale	Trials span different populations, BCG strains, climates & years — single common effect implausible; RE models τ²
Pooled Estimate
Pooled log RR (β)	-0.7145
Standard error (SE)	0.1798
z value	-3.974
p-value (H₀: β = 0)	< 0.001
95% CI — log RR	[-1.0669, -0.3622]
95% CI — RR	[0.344, 0.696]
95% Prediction interval — RR	[0.155, 1.549]
Heterogeneity
Q statistic	152.233
Degrees of freedom (df)	12
p-value (Q test)	< 0.001
I²	92.22% — considerable heterogeneity
H²	12.856
τ²	0.3132
τ	0.5597
REML = Restricted Maximum Likelihood · I² = proportion of variance due to between-study heterogeneity · τ² = between-study variance · PI = 95% prediction interval

4 Forest Plot

wts <- weights(res)

tabletext <- cbind(
  c("Study",
    dat$slab,
    NA,
    "Pooled (RE)"),
  c("Lat (°)",
    as.character(dat$ablat),
    NA,
    NA),
  c("n (Vax)",
    formatC(dat$n_treat, format = "d", big.mark = ","),
    NA,
    NA),
  c("n (Ctrl)",
    formatC(dat$n_ctrl,  format = "d", big.mark = ","),
    NA,
    NA),
  c("Wt (%)",
    formatC(wts, digits = 1, format = "f"),
    NA,
    NA),
  c("RR [95% CI]",
    sprintf("%.2f [%.2f, %.2f]", dat$RR, dat$RR_lo, dat$RR_hi),
    NA,
    sprintf("%.2f [%.2f, %.2f]",
            exp(res$beta), exp(res$ci.lb), exp(res$ci.ub)))
)

mean_v  <- c(NA, dat$RR,    NA, exp(res$beta))
lower_v <- c(NA, dat$RR_lo, NA, exp(res$ci.lb))
upper_v <- c(NA, dat$RR_hi, NA, exp(res$ci.ub))

box_sizes <- c(
  NA,
  0.25 + 0.75 * (wts - min(wts)) / (max(wts) - min(wts)),
  NA,
  NA
)

forestplot(
  labeltext   = tabletext,
  mean        = mean_v,
  lower       = lower_v,
  upper       = upper_v,
  is.summary  = c(TRUE, rep(FALSE, nrow(dat)), FALSE, TRUE),
  boxsize     = box_sizes,
  graph.pos   = 5,
  xlog        = TRUE,
  xticks      = c(0.1, 0.25, 0.5, 1, 2, 4),
  zero        = 1,
  xlab        = expression(
    "Risk Ratio (log scale)" %~~% "  \u25c4 Favours vaccine  |  Favours control \u25ba"
  ),
  hrzl_lines = list(
    "1"  = gpar(lwd = 1.5, col = "black"),   # top border
    "2"  = gpar(lwd = 1.0, col = "black"),   # below header
    "15" = gpar(lwd = 0.8, col = "grey60"),  # above spacer row
    "16" = gpar(lwd = 1.5, col = "black")    # bottom border
  ),
  col = fpColors(
    box       = "#2C6FAC",
    line      = "#2C6FAC",
    summary   = "#C0392B",
    hrz_lines = "grey80",
    zero      = "grey40"
  ),
  txt_gp = fpTxtGp(
    label   = gpar(fontfamily = "sans", cex = 0.82),
    ticks   = gpar(fontfamily = "sans", cex = 0.80),
    xlab    = gpar(fontfamily = "sans", cex = 0.85),
    title   = gpar(fontfamily = "sans", cex = 1.0, fontface = "bold"),
    summary = gpar(fontfamily = "sans", cex = 0.88, fontface = "bold",
                   col = "#C0392B")
  ),
  shapes_gp = fpShapesGp(
    default = gpar(lineend = "square", linejoin = "mitre")
  ),
  addpred       = TRUE,
  clip          = c(0.05, 6),
  lwd.ci        = 1.2,
  lwd.zero      = 0.8,
  lwd.xaxis     = 0.8,
  ci.vertices        = TRUE,
  ci.vertices.height = 0.12,
  lineheight    = unit(7, "mm"),
  colgap        = unit(3, "mm"),
  graphwidth    = unit(7, "cm"),
  title = sprintf(
    "BCG Vaccine vs Control — Random-Effects Meta-Analysis\nPooled RR = %.2f  [%.2f, %.2f]   I\u00b2 = %.1f%%   \u03c4\u00b2 = %.4f",
    exp(res$beta), exp(res$ci.lb), exp(res$ci.ub), res$I2, res$tau2
  )
)

Figure 1. Forest plot of BCG vaccine trials (forestplot package). Square size reflects study weight. Diamond = pooled estimate with 95% CI; wider bar = 95% prediction interval.

5 Heterogeneity

Heterogeneity across trials was assessed using Cochran’s Q test, I², and the between-study variance τ².

tibble(
  Measure        = c("Q statistic", "df", "p-value", "I²", "H²", "τ²", "τ"),
  Value          = c(
    round(res$QE, 2),
    res$k - 1,
    ifelse(res$QEp < 0.001, "< 0.001", round(res$QEp, 4)),
    paste0(round(res$I2, 1), "%"),
    round(res$H2, 3),
    round(res$tau2, 4),
    round(sqrt(res$tau2), 4)
  ),
  Interpretation = c(
    "Test statistic for heterogeneity",
    "Number of studies minus 1",
    "Evidence of significant heterogeneity",
    paste0(round(res$I2, 1),
           "% of variance due to between-study differences — considerable"),
    "Ratio of total to sampling variance",
    "Between-study variance on log-RR scale",
    "Between-study SD on log-RR scale"
  )
) |>
  gt() |>
  tab_header(
    title    = md("**Heterogeneity Statistics**"),
    subtitle = md("Random-Effects Model (REML)")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#FFF3CD")),
    locations = cells_body(rows = Measure == "I²")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  tab_style(
    style     = cell_fill(color = "#EAF4FB"),
    locations = cells_body(rows = seq(1, 7, 2))
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "left", columns = everything()) |>
  cols_align(align = "center", columns = Value)

Measure	Value	Interpretation
Heterogeneity Statistics
Random-Effects Model (REML)
Q statistic	152.23	Test statistic for heterogeneity
df	12	Number of studies minus 1
p-value	< 0.001	Evidence of significant heterogeneity
I²	92.2%	92.2% of variance due to between-study differences — considerable
H²	12.856	Ratio of total to sampling variance
τ²	0.3132	Between-study variance on log-RR scale
τ	0.5597	Between-study SD on log-RR scale

Interpretation: I² = 92.2% indicates considerable heterogeneity. The large majority of variation across studies reflects genuine differences in true effects rather than sampling error, justifying the use of a random-effects model and motivating the moderator analysis below.

6 Publication Bias

6.1 Funnel Plot

funnel_df <- dat |> mutate(sei = sqrt(vi))

se_seq <- seq(0, max(funnel_df$sei) * 1.1, length.out = 100)
ci_lo  <- res$beta - 1.96 * se_seq
ci_hi  <- res$beta + 1.96 * se_seq

funnel_poly <- tibble(
  x = c(exp(ci_lo), rev(exp(ci_hi))),
  y = c(se_seq,     rev(se_seq))
)

ggplot(funnel_df, aes(x = RR, y = sqrt(vi))) +
  geom_polygon(
    data = funnel_poly, aes(x = x, y = y),
    fill = "#EAF4FB", colour = NA, alpha = 0.7
  ) +
  geom_path(
    data = tibble(x = exp(ci_lo), y = se_seq),
    aes(x = x, y = y), linetype = "dashed", colour = "grey60"
  ) +
  geom_path(
    data = tibble(x = exp(ci_hi), y = se_seq),
    aes(x = x, y = y), linetype = "dashed", colour = "grey60"
  ) +
  geom_vline(xintercept = exp(res$beta), colour = clr_pooled,
             linewidth = 0.8, linetype = "dotted") +
  geom_vline(xintercept = 1, colour = "grey40",
             linewidth = 0.5, linetype = "dashed") +
  geom_point(colour = clr_primary, size = 3, alpha = 0.85) +
  scale_x_log10(breaks = c(0.1, 0.25, 0.5, 1, 2, 4)) +
  scale_y_reverse(
    limits = c(max(funnel_df$sei) * 1.12, 0),
    expand = c(0, 0)
  ) +
  labs(
    title    = "Funnel Plot — BCG Trials",
    subtitle = "Assessing small-study effects and publication bias",
    x        = "Risk Ratio (log scale)",
    y        = "Standard Error",
    caption  = "Red dotted line = pooled estimate. Shaded region = 95% pseudo-CI."
  ) +
  theme_meta()

Figure 2. Funnel plot for publication bias assessment. Dashed lines = 95% pseudo-confidence region. Asymmetry may suggest selective reporting of small studies with positive results.

6.2 Egger’s Test

The funnel plot is complemented by Egger’s test, which formally tests for funnel plot asymmetry using a regression of the standardised effect on its precision.

A p-value < 0.05 suggests significant asymmetry — possible publication bias or small-study effects.
A p-value ≥ 0.05 suggests no significant asymmetry at the 5% level.

egg <- regtest(res)

tibble(
  Statistic      = c("z statistic", "p-value", "Interpretation"),
  Value          = c(
    round(egg$zval, 3),
    round(egg$pval, 4),
    ifelse(
      egg$pval < 0.05,
      "Significant asymmetry detected — possible publication bias",
      "No significant asymmetry detected at the 5% level"
    )
  )
) |>
  gt() |>
  tab_header(
    title    = md("**Egger's Test for Funnel Plot Asymmetry**"),
    subtitle = "Regression test for publication bias"
  ) |>
  tab_style(
    style     = cell_text(
      weight = "bold",
      color  = ifelse(egg$pval < 0.05, "#C0392B", "#27AE60")
    ),
    locations = cells_body(rows = Statistic == "Interpretation")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  tab_footnote(
    footnote  = "A non-significant p-value (p > 0.05) suggests no evidence of publication bias.",
    locations = cells_column_labels(columns = Value)
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(60)
  ) |>
  cols_align(align = "left", columns = everything())

Statistic	Value¹
Egger’s Test for Funnel Plot Asymmetry
Regression test for publication bias
z statistic	-0.803
p-value	0.4218
Interpretation	No significant asymmetry detected at the 5% level
¹ A non-significant p-value (p > 0.05) suggests no evidence of publication bias.

7 Meta-Regression: Latitude as Moderator

Absolute latitude is hypothesised to be a proxy for background exposure to environmental mycobacteria. Trials at higher latitudes (cooler climates) consistently show stronger BCG protection, likely because populations at lower latitudes have pre-existing immunity from environmental mycobacteria that diminishes the relative benefit of vaccination.

7.1 Model

res_mod <- rma(
  yi     = yi,
  vi     = vi,
  mods   = ~ ablat,
  data   = dat,
  method = "REML"
)

tibble(
  Parameter = c(
    "Intercept (log RR at 0°)",
    "Intercept — RR",
    "Slope (per degree latitude — log RR)",
    "Slope — RR change per degree",
    "R² (latitude)",
    "Residual I²",
    "Residual τ²"
  ),
  Estimate = c(
    round(coef(res_mod)[1], 4),
    round(exp(coef(res_mod)[1]), 3),
    round(coef(res_mod)[2], 5),
    paste0(round((1 - exp(coef(res_mod)["ablat"])) * 100, 1), "%"),
    paste0(round(res_mod$R2, 1), "%"),
    paste0(round(res_mod$I2, 1), "%"),
    round(res_mod$tau2, 4)
  ),
  Interpretation = c(
    "Log RR at equator (latitude = 0°)",
    "Near-null protection at equatorial baseline",
    "Each additional degree increases log-RR reduction",
    "Further reduction in RR per additional degree of latitude",
    "Between-study variance explained by latitude",
    "Remaining unexplained heterogeneity",
    "Residual between-study variance after adjusting for latitude"
  )
) |>
  gt() |>
  tab_header(
    title    = md("**Meta-Regression Results**"),
    subtitle = md("Moderator: Absolute Latitude (degrees)")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold", color = "#1F4E79"),
                     cell_fill(color = "#D6E8F7")),
    locations = cells_body(rows = Parameter == "R² (latitude)")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  tab_style(
    style     = cell_fill(color = "#EAF4FB"),
    locations = cells_body(rows = c(1, 3, 5, 7))
  ) |>
  tab_source_note(
    source_note = md(
      "REML estimation. R² reflects variance in true effects explained by latitude."
    )
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "left",   columns = c(Parameter, Interpretation)) |>
  cols_align(align = "center", columns = Estimate)

Parameter	Estimate	Interpretation
Meta-Regression Results
Moderator: Absolute Latitude (degrees)
Intercept (log RR at 0°)	0.2515	Log RR at equator (latitude = 0°)
Intercept — RR	1.286	Near-null protection at equatorial baseline
Slope (per degree latitude — log RR)	-0.0291	Each additional degree increases log-RR reduction
Slope — RR change per degree	2.9%	Further reduction in RR per additional degree of latitude
R² (latitude)	75.6%	Between-study variance explained by latitude
Residual I²	68.4%	Remaining unexplained heterogeneity
Residual τ²	0.0764	Residual between-study variance after adjusting for latitude
REML estimation. R² reflects variance in true effects explained by latitude.

7.2 Meta-Regression Plot

lat_seq  <- seq(5, 65, by = 1)
pred_mod <- predict(res_mod, newmods = lat_seq, transf = exp)

pred_df <- tibble(
  ablat = lat_seq,
  fit   = pred_mod$pred,
  ci_lo = pred_mod$ci.lb,
  ci_hi = pred_mod$ci.ub
)

dat2 <- dat |>
  mutate(
    wt   = weights(res_mod),
    size = 2 + 5 * (wt - min(wt)) / (max(wt) - min(wt))
  )

ggplot() +
  geom_ribbon(
    data = pred_df,
    aes(x = ablat, ymin = ci_lo, ymax = ci_hi),
    fill = clr_fill, alpha = 0.7
  ) +
  geom_line(
    data = pred_df, aes(x = ablat, y = fit),
    colour = clr_pooled, linewidth = 1
  ) +
  geom_hline(yintercept = 1, linetype = "dashed", colour = "grey50") +
  geom_point(
    data   = dat2,
    aes(x = ablat, y = RR, size = size),
    colour = clr_primary, alpha = 0.85
  ) +
  scale_y_log10(
    limits = c(0.05, 3),
    breaks = c(0.1, 0.25, 0.5, 1, 2),
    labels = c("0.10", "0.25", "0.50", "1.00", "2.00")
  ) +
  scale_size_identity() +
  scale_x_continuous(breaks = seq(10, 60, by = 10)) +
  annotate(
    "text",
    x = 50, y = 2.3,
    label = sprintf("Slope = %.4f log-RR per degree\nR\u00b2 = %.1f%%",
                    coef(res_mod)["ablat"], res_mod$R2),
    hjust = 0, size = 3.5, colour = clr_pooled
  ) +
  labs(
    title    = "Meta-Regression: Latitude \u00d7 BCG Efficacy",
    subtitle = "Weighted regression; point size proportional to study weight",
    x        = "Absolute Latitude (degrees)",
    y        = "Risk Ratio (log scale)",
    caption  = "Shaded band = 95% CI. Dashed line = no effect (RR = 1)."
  ) +
  theme_meta()

Figure 3. Meta-regression of BCG efficacy on absolute latitude. Point size reflects study weight. Shaded band = 95% CI around the regression line.

Interpretation: Each additional degree of absolute latitude is associated with a 2.9% further reduction in the risk ratio. Latitude explains approximately 75.6% of the between-study variance, with 24.4% of heterogeneity remaining unexplained by this single moderator.

8 Sensitivity Analysis: Leave-One-Out

The robustness of the pooled estimate is assessed by sequentially omitting each study and re-fitting the model on the remaining 12 trials.

loo <- leave1out(res, transf = exp)

loo_df <- tibble(
  study = dat$slab,
  est   = loo$estimate,
  ci_lo = loo$ci.lb,
  ci_hi = loo$ci.ub,
  i2    = loo$I2
) |>
  mutate(study = factor(study, levels = rev(study)))

ggplot(loo_df, aes(y = study, x = est, xmin = ci_lo, xmax = ci_hi)) +
  geom_vline(
    xintercept = exp(res$beta),
    linetype   = "dashed", colour = clr_pooled, linewidth = 0.7
  ) +
  geom_errorbarh(height = 0.25, colour = clr_mid, linewidth = 0.5) +
  geom_point(aes(colour = i2), size = 3, shape = 18) +
  scale_x_log10(breaks = c(0.3, 0.4, 0.5, 0.6, 0.7, 0.8)) +
  scale_colour_gradient(low = "#2ECC71", high = "#E74C3C",
                        name = "I\u00b2 (%)") +
  labs(
    title    = "Leave-One-Out Sensitivity Analysis",
    subtitle = "Red dashed line = original pooled estimate",
    x        = "Pooled RR (log scale)",
    y        = NULL,
    caption  = "Point colour indicates residual I\u00b2 after study removal."
  ) +
  theme_meta() +
  theme(axis.text.y = element_text(size = 9))

Figure 4. Leave-one-out sensitivity analysis. Each row shows the pooled RR when the named study is excluded. Point colour indicates the residual I² after removal.

loo_df |>
  mutate(study = as.character(study)) |>
  arrange(est) |>
  gt() |>
  tab_header(
    title    = md("**Leave-One-Out Results**"),
    subtitle = "Pooled RR and residual heterogeneity after each study omission"
  ) |>
  fmt_number(columns = c(est, ci_lo, ci_hi, i2), decimals = 3) |>
  cols_label(
    study = "Study Omitted",
    est   = "Pooled RR",
    ci_lo = "95% CI Low",
    ci_hi = "95% CI High",
    i2    = "Residual I² (%)"
  ) |>
  tab_style(
    style     = cell_fill(color = "#EAF4FB"),
    locations = cells_body(rows = seq(1, nrow(loo_df), 2))
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  tab_footnote(
    footnote  = md(paste0(
      "Red dashed line in Figure 4 = original pooled RR = **",
      round(exp(res$beta), 3), "** [",
      round(exp(res$ci.lb), 3), ", ",
      round(exp(res$ci.ub), 3), "]."
    )),
    locations = cells_column_labels(columns = est)
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "center", columns = everything()) |>
  cols_align(align = "left",   columns = study)

Study Omitted	Pooled RR¹	95% CI Low	95% CI High	Residual I² (%)
Leave-One-Out Results
Pooled RR and residual heterogeneity after each study omission
TPT Madras, 1980	0.452	0.317	0.643	87.031
Comstock et al, 1976	0.460	0.319	0.661	92.344
Frimodt-Moller et al, 1973	0.466	0.320	0.678	92.763
Comstock & Webster, 1969	0.468	0.327	0.668	92.678
Comstock et al, 1974	0.469	0.319	0.688	91.811
Coetzee & Berjak, 1968	0.477	0.324	0.701	93.213
Stein & Aronson, 1953	0.491	0.332	0.727	90.912
Aronson, 1948	0.493	0.340	0.716	93.226
Rosenthal et al, 1960	0.504	0.350	0.725	92.935
Vandiviere et al, 1973	0.519	0.365	0.740	92.278
Ferguson & Simes, 1949	0.520	0.365	0.741	92.254
Rosenthal et al, 1961	0.520	0.363	0.747	92.232
Hart & Sutherland, 1977	0.533	0.377	0.754	90.412
¹ Red dashed line in Figure 4 = original pooled RR = 0.489 [0.344, 0.696].

9 Summary

tibble(
  Analysis = c(
    "Primary RE (REML)",
    "With latitude covariate",
    "Leave-one-out range"
  ),
  `Pooled RR (95% CI)` = c(
    sprintf("%.2f [%.2f, %.2f]",
            exp(res$beta), exp(res$ci.lb), exp(res$ci.ub)),
    sprintf("Intercept RR: %.2f; Slope: %.4f/°",
            exp(coef(res_mod)[1]), coef(res_mod)[2]),
    sprintf("%.3f to %.3f", min(loo_df$est), max(loo_df$est))
  ),
  `I² (%)` = c(
    round(res$I2, 1),
    round(res_mod$I2, 1),
    NA_real_
  ),
  `τ²` = c(
    round(res$tau2, 4),
    round(res_mod$tau2, 4),
    NA_real_
  ),
  `p (Heterogeneity)` = c(
    ifelse(res$QEp < 0.001, "< 0.001", round(res$QEp, 4)),
    ifelse(res_mod$QEp < 0.001, "< 0.001", round(res_mod$QEp, 4)),
    "—"
  )
) |>
  gt() |>
  tab_header(
    title    = md("**Summary of All Models and Sensitivity Analyses**"),
    subtitle = md("BCG Vaccine Meta-Analysis")
  ) |>
  sub_missing(missing_text = "—") |>
  tab_style(
    style     = list(cell_fill(color = "#EAF4FB"),
                     cell_text(weight = "bold")),
    locations = cells_body(rows = Analysis == "Primary RE (REML)")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  tab_source_note(
    source_note = md(
      "RE = Random-effects · REML = Restricted Maximum Likelihood · I² = between-study heterogeneity"
    )
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "center", columns = everything()) |>
  cols_align(align = "left",   columns = Analysis)

Analysis	Pooled RR (95% CI)	I² (%)	τ²	p (Heterogeneity)
Summary of All Models and Sensitivity Analyses
BCG Vaccine Meta-Analysis
Primary RE (REML)	0.49 [0.34, 0.70]	92.2	0.3132	< 0.001
With latitude covariate	Intercept RR: 1.29; Slope: -0.0291/°	68.4	0.0764	0.0012
Leave-one-out range	0.452 to 0.533	—	—	—
RE = Random-effects · REML = Restricted Maximum Likelihood · I² = between-study heterogeneity

10 Conclusion

The meta-analysis of 13 randomised controlled trials provides strong evidence that BCG vaccination significantly reduces the risk of tuberculosis infection. The pooled risk ratio of 0.49 (95% CI: 0.34 to 0.7) indicates a meaningful protective effect of the vaccine compared to no vaccination.

Substantial heterogeneity (I² ≈ 92%) was detected across trials, which is expected given the diversity of study populations, settings, and BCG strains. Meta-regression revealed that absolute latitude explains approximately 76% of this between-study variance, with trials at higher latitudes demonstrating stronger protection. This pattern is consistent with the hypothesis that pre-existing sensitisation to environmental mycobacteria at lower latitudes attenuates vaccine efficacy.

Funnel plot inspection and Egger’s test did not provide strong evidence of publication bias (p = 0.422). Leave-one-out analyses confirmed that the pooled estimate is robust and not unduly influenced by any single trial, with the estimate ranging from 0.452 to 0.533 across all omissions.

In conclusion, BCG vaccination confers significant protection against TB, though the magnitude of benefit varies substantially by geographic context. These findings support continued use of BCG in high-latitude settings and underscore the importance of context-specific evaluation in global vaccination policy.

11 References

Colditz GA, Brewer TF, Berkey CS, et al. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis. JAMA, 271(9), 698–702. https://doi.org/10.1001/jama.1994.03510330076038

Viechtbauer W (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03

Higgins JPT, Thompson SG, Deeks JJ, Altman DG (2003). Measuring inconsistency in meta-analyses. BMJ, 327, 557–560.

Egger M, Davey Smith G, Schneider M, Minder C (1997). Bias in meta-analysis detected by a simple, graphical test. BMJ, 315, 629–634.

Analysis conducted in R using the metafor, forestplot, and tidyverse packages. Dataset: Colditz GA et al. (1994). JAMA, 271(9), 698–702.