if (!requireNamespace("pacman", quietly = TRUE)) install.packages("pacman")
pacman::p_load(
  metafor,
  tidyverse,
  gt,
  forestplot,
  patchwork
)

# ── Global ggplot theme ───────────────────────────────────────────────────────
theme_meta <- function() {
  theme_minimal(base_size = 12) +
    theme(
      plot.title       = element_text(face = "bold", size = 14,
                                      margin = margin(b = 6)),
      plot.subtitle    = element_text(colour = "grey40", size = 11,
                                      margin = margin(b = 12)),
      plot.caption     = element_text(colour = "grey55", size = 9, hjust = 0),
      axis.title       = element_text(size = 11),
      axis.text        = element_text(colour = "grey30"),
      panel.grid.minor = element_blank(),
      panel.grid.major = element_line(colour = "grey92"),
      strip.text       = element_text(face = "bold"),
      legend.position  = "bottom"
    )
}

clr_primary <- "#2C6FAC"
clr_pooled  <- "#C0392B"
clr_mid     <- "#7F8C8D"
clr_fill    <- "#D6E8F7"

1 Introduction

1.1 Background

Tuberculosis (TB) remains one of the leading infectious causes of death globally, disproportionately affecting populations in low- and middle-income countries. The BCG (Bacillus Calmette-Guérin) vaccine has been used for decades as the primary preventive intervention against TB, yet its efficacy has been shown to vary substantially across geographic and demographic settings. Understanding the sources of this variability is essential for informing vaccination policy and targeting interventions effectively.

A leading explanation for the geographic pattern centres on exposure to environmental mycobacteria: populations in tropical regions encounter these organisms more frequently, potentially inducing partial cross-immunity that diminishes the relative benefit of BCG vaccination.

1.2 Objectives

This report presents a full random-effects meta-analysis of the BCG vaccine using the classic 13-trial dataset assembled by Colditz et al. (1994), available in the metafor package (dat.bcg). Specifically, the objectives are to:

  1. Summarise individual study effect sizes for TB infection risk in vaccinated versus control arms
  2. Pool evidence using a random-effects model (REML) and quantify between-study heterogeneity
  3. Assess potential publication bias using a funnel plot and Egger’s test
  4. Examine absolute latitude as a moderator of BCG efficacy via meta-regression
  5. Evaluate robustness of the pooled estimate using leave-one-out sensitivity analysis

1.3 Dataset

The analysis uses the dat.bcg dataset from the metafor package, comprising 13 randomised controlled trials of BCG vaccination. The primary outcome is the risk ratio (RR) for tuberculosis infection, comparing vaccinated participants to unvaccinated controls. Key variables include trial-level case counts, absolute latitude of the trial site, year of publication, and method of treatment allocation.

1.4 Analytic Approach

Effect sizes are computed as log risk ratios using escalc(). A random-effects model is fitted using restricted maximum likelihood (REML) estimation, appropriate given the expected between-study heterogeneity arising from differences in populations, TB exposure, BCG strains, and climate. Heterogeneity is quantified using Cochran’s Q, I², and τ². A funnel plot and Egger’s test assess publication bias. A meta-regression model is fitted with absolute latitude as a continuous moderator. Finally, a leave-one-out analysis examines the influence of individual studies on the pooled estimate.


2 Data

2.1 Loading and Computing Effect Sizes

data("dat.bcg", package = "metafor")
dat <- dat.bcg

dat <- escalc(
  measure = "RR",
  ai = tpos, bi = tneg,
  ci = cpos, di = cneg,
  data = dat
)

dat <- dat |>
  mutate(
    slab    = paste(author, year, sep = ", "),
    RR      = exp(yi),
    RR_lo   = exp(yi - 1.96 * sqrt(vi)),
    RR_hi   = exp(yi + 1.96 * sqrt(vi)),
    n_treat = tpos + tneg,
    n_ctrl  = cpos + cneg
  )

2.2 Study Characteristics

dat |>
  select(author, year, ablat, alloc, n_treat, n_ctrl, RR, RR_lo, RR_hi) |>
  mutate(across(RR:RR_hi, \(x) round(x, 3))) |>
  rename(
    Author         = author,
    Year           = year,
    `Latitude (°)` = ablat,
    Allocation     = alloc,
    `n (Treated)`  = n_treat,
    `n (Control)`  = n_ctrl,
    `RR`           = RR,
    `95% CI Low`   = RR_lo,
    `95% CI High`  = RR_hi
  ) |>
  gt() |>
  tab_header(
    title    = md("**Study Characteristics and Individual Effect Sizes**"),
    subtitle = md("BCG Vaccine Randomised Controlled Trials")
  ) |>
  tab_source_note(
    source_note = md(
      "Source: Colditz et al. (1994). *JAMA*, 271(9), 698–702."
    )
  ) |>
  tab_style(
    style     = cell_fill(color = "#EAF4FB"),
    locations = cells_body(rows = seq(1, nrow(dat), 2))
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "center", columns = everything()) |>
  cols_align(align = "left",   columns = c(Author, Allocation)) |>
  fmt_number(columns = c(RR, `95% CI Low`, `95% CI High`), decimals = 3)
Study Characteristics and Individual Effect Sizes
BCG Vaccine Randomised Controlled Trials
Author Year Latitude (°) Allocation n (Treated) n (Control) RR 95% CI Low 95% CI High
Aronson 1948 44 random 123 139 0.411 0.134 1.257
Ferguson & Simes 1949 55 random 306 303 0.205 0.086 0.486
Rosenthal et al 1960 42 random 231 220 0.260 0.073 0.919
Hart & Sutherland 1977 52 random 13598 12867 0.237 0.179 0.312
Frimodt-Moller et al 1973 13 alternate 5069 5808 0.804 0.516 1.254
Stein & Aronson 1953 44 alternate 1541 1451 0.456 0.387 0.536
Vandiviere et al 1973 19 random 2545 629 0.198 0.078 0.499
TPT Madras 1980 13 random 88391 88391 1.012 0.895 1.145
Coetzee & Berjak 1968 27 random 7499 7277 0.625 0.393 0.996
Rosenthal et al 1961 42 systematic 1716 1665 0.254 0.149 0.431
Comstock et al 1974 18 systematic 50634 27338 0.712 0.573 0.886
Comstock & Webster 1969 33 systematic 2498 2341 1.562 0.374 6.529
Comstock et al 1976 33 systematic 16913 17854 0.983 0.582 1.659
Source: Colditz et al. (1994). JAMA, 271(9), 698–702.

3 Random-Effects Meta-Analysis

3.1 Model Specification and Output

Why random-effects? The 13 BCG trials span different populations, geographic settings, BCG strains, and years. It is implausible that they estimate a single common true effect. A random-effects model assumes true effects are drawn from a distribution with mean μ and variance τ², explicitly modelling between-study heterogeneity. REML estimation produces unbiased estimates of τ².

res <- rma(yi = yi, vi = vi, data = dat, method = "REML")
pi_lb <- exp(res$beta - 1.96 * sqrt(res$se^2 + res$tau2))
pi_ub <- exp(res$beta + 1.96 * sqrt(res$se^2 + res$tau2))

tibble(
  Group = c(
    rep("Model Specification & Rationale", 4),
    rep("Pooled Estimate", 7),
    rep("Heterogeneity", 7)
  ),
  Parameter = c(
    "Model type", "Estimator", "Number of studies (k)", "Rationale",
    "Pooled log RR (β)", "Standard error (SE)", "z value",
    "p-value (H₀: β = 0)", "95% CI — log RR", "95% CI — RR",
    "95% Prediction interval — RR",
    "Q statistic", "Degrees of freedom (df)", "p-value (Q test)",
    "I²", "H²", "τ²", "τ"
  ),
  Value = c(
    "Random-effects",
    "REML (Restricted Maximum Likelihood)",
    as.character(res$k),
    "Trials span different populations, BCG strains, climates & years — single common effect implausible; RE models τ²",
    round(res$beta[[1]], 4),
    round(res$se, 4),
    round(res$zval, 3),
    ifelse(res$pval < 0.001, "< 0.001", round(res$pval, 4)),
    paste0("[", round(res$ci.lb, 4), ",  ", round(res$ci.ub, 4), "]"),
    paste0("[", round(exp(res$ci.lb), 3), ",  ", round(exp(res$ci.ub), 3), "]"),
    paste0("[", round(pi_lb, 3), ",  ", round(pi_ub, 3), "]"),
    round(res$QE, 3),
    as.character(res$k - 1),
    ifelse(res$QEp < 0.001, "< 0.001", round(res$QEp, 4)),
    paste0(round(res$I2, 2), "%  — ",
           ifelse(res$I2 < 25, "low",
           ifelse(res$I2 < 50, "moderate",
           ifelse(res$I2 < 75, "substantial", "considerable"))),
           " heterogeneity"),
    round(res$H2, 3),
    round(res$tau2, 4),
    round(sqrt(res$tau2), 4)
  )
) |>
  gt(groupname_col = "Group") |>
  tab_header(
    title    = md("**Random-Effects Model: Specification & Output (REML)**"),
    subtitle = md("BCG Vaccine vs Control")
  ) |>
  tab_style(
    style     = list(cell_fill(color = "#EAF4FB"),
                     cell_text(weight = "bold")),
    locations = cells_row_groups()
  ) |>
  tab_style(
    style     = list(cell_text(color = "#C0392B", weight = "bold"),
                     cell_fill(color = "#FFF0EE")),
    locations = cells_body(rows = Parameter %in%
                             c("95% CI — RR", "95% Prediction interval — RR"))
  ) |>
  tab_style(
    style     = cell_text(weight = "bold"),
    locations = cells_body(rows = Parameter == "I²")
  ) |>
  tab_style(
    style     = cell_text(style = "italic", color = "grey50", size = px(12)),
    locations = cells_body(rows = Parameter == "Rationale")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  cols_label(Parameter = "Parameter", Value = "Estimate / Value") |>
  opt_align_table_header(align = "left") |>
  tab_source_note(
    source_note = md(
      "REML = Restricted Maximum Likelihood · I² = proportion of variance due to between-study heterogeneity · τ² = between-study variance · PI = 95% prediction interval"
    )
  ) |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "left", columns = everything())
Random-Effects Model: Specification & Output (REML)
BCG Vaccine vs Control
Parameter Estimate / Value
Model Specification & Rationale
Model type Random-effects
Estimator REML (Restricted Maximum Likelihood)
Number of studies (k) 13
Rationale Trials span different populations, BCG strains, climates & years — single common effect implausible; RE models τ²
Pooled Estimate
Pooled log RR (β) -0.7145
Standard error (SE) 0.1798
z value -3.974
p-value (H₀: β = 0) < 0.001
95% CI — log RR [-1.0669, -0.3622]
95% CI — RR [0.344, 0.696]
95% Prediction interval — RR [0.155, 1.549]
Heterogeneity
Q statistic 152.233
Degrees of freedom (df) 12
p-value (Q test) < 0.001
92.22% — considerable heterogeneity
12.856
τ² 0.3132
τ 0.5597
REML = Restricted Maximum Likelihood · I² = proportion of variance due to between-study heterogeneity · τ² = between-study variance · PI = 95% prediction interval

4 Forest Plot

wts <- weights(res)

tabletext <- cbind(
  c("Study",
    dat$slab,
    NA,
    "Pooled (RE)"),
  c("Lat (°)",
    as.character(dat$ablat),
    NA,
    NA),
  c("n (Vax)",
    formatC(dat$n_treat, format = "d", big.mark = ","),
    NA,
    NA),
  c("n (Ctrl)",
    formatC(dat$n_ctrl,  format = "d", big.mark = ","),
    NA,
    NA),
  c("Wt (%)",
    formatC(wts, digits = 1, format = "f"),
    NA,
    NA),
  c("RR [95% CI]",
    sprintf("%.2f [%.2f, %.2f]", dat$RR, dat$RR_lo, dat$RR_hi),
    NA,
    sprintf("%.2f [%.2f, %.2f]",
            exp(res$beta), exp(res$ci.lb), exp(res$ci.ub)))
)

mean_v  <- c(NA, dat$RR,    NA, exp(res$beta))
lower_v <- c(NA, dat$RR_lo, NA, exp(res$ci.lb))
upper_v <- c(NA, dat$RR_hi, NA, exp(res$ci.ub))

box_sizes <- c(
  NA,
  0.25 + 0.75 * (wts - min(wts)) / (max(wts) - min(wts)),
  NA,
  NA
)

forestplot(
  labeltext   = tabletext,
  mean        = mean_v,
  lower       = lower_v,
  upper       = upper_v,
  is.summary  = c(TRUE, rep(FALSE, nrow(dat)), FALSE, TRUE),
  boxsize     = box_sizes,
  graph.pos   = 5,
  xlog        = TRUE,
  xticks      = c(0.1, 0.25, 0.5, 1, 2, 4),
  zero        = 1,
  xlab        = expression(
    "Risk Ratio (log scale)" %~~% "  \u25c4 Favours vaccine  |  Favours control \u25ba"
  ),
  hrzl_lines = list(
    "1"  = gpar(lwd = 1.5, col = "black"),   # top border
    "2"  = gpar(lwd = 1.0, col = "black"),   # below header
    "15" = gpar(lwd = 0.8, col = "grey60"),  # above spacer row
    "16" = gpar(lwd = 1.5, col = "black")    # bottom border
  ),
  col = fpColors(
    box       = "#2C6FAC",
    line      = "#2C6FAC",
    summary   = "#C0392B",
    hrz_lines = "grey80",
    zero      = "grey40"
  ),
  txt_gp = fpTxtGp(
    label   = gpar(fontfamily = "sans", cex = 0.82),
    ticks   = gpar(fontfamily = "sans", cex = 0.80),
    xlab    = gpar(fontfamily = "sans", cex = 0.85),
    title   = gpar(fontfamily = "sans", cex = 1.0, fontface = "bold"),
    summary = gpar(fontfamily = "sans", cex = 0.88, fontface = "bold",
                   col = "#C0392B")
  ),
  shapes_gp = fpShapesGp(
    default = gpar(lineend = "square", linejoin = "mitre")
  ),
  addpred       = TRUE,
  clip          = c(0.05, 6),
  lwd.ci        = 1.2,
  lwd.zero      = 0.8,
  lwd.xaxis     = 0.8,
  ci.vertices        = TRUE,
  ci.vertices.height = 0.12,
  lineheight    = unit(7, "mm"),
  colgap        = unit(3, "mm"),
  graphwidth    = unit(7, "cm"),
  title = sprintf(
    "BCG Vaccine vs Control — Random-Effects Meta-Analysis\nPooled RR = %.2f  [%.2f, %.2f]   I\u00b2 = %.1f%%   \u03c4\u00b2 = %.4f",
    exp(res$beta), exp(res$ci.lb), exp(res$ci.ub), res$I2, res$tau2
  )
)
**Figure 1.** Forest plot of BCG vaccine trials (forestplot package). Square size reflects study weight. Diamond = pooled estimate with 95% CI; wider bar = 95% prediction interval.

Figure 1. Forest plot of BCG vaccine trials (forestplot package). Square size reflects study weight. Diamond = pooled estimate with 95% CI; wider bar = 95% prediction interval.


5 Heterogeneity

Heterogeneity across trials was assessed using Cochran’s Q test, I², and the between-study variance τ².

tibble(
  Measure        = c("Q statistic", "df", "p-value", "I²", "H²", "τ²", "τ"),
  Value          = c(
    round(res$QE, 2),
    res$k - 1,
    ifelse(res$QEp < 0.001, "< 0.001", round(res$QEp, 4)),
    paste0(round(res$I2, 1), "%"),
    round(res$H2, 3),
    round(res$tau2, 4),
    round(sqrt(res$tau2), 4)
  ),
  Interpretation = c(
    "Test statistic for heterogeneity",
    "Number of studies minus 1",
    "Evidence of significant heterogeneity",
    paste0(round(res$I2, 1),
           "% of variance due to between-study differences — considerable"),
    "Ratio of total to sampling variance",
    "Between-study variance on log-RR scale",
    "Between-study SD on log-RR scale"
  )
) |>
  gt() |>
  tab_header(
    title    = md("**Heterogeneity Statistics**"),
    subtitle = md("Random-Effects Model (REML)")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#FFF3CD")),
    locations = cells_body(rows = Measure == "I²")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  tab_style(
    style     = cell_fill(color = "#EAF4FB"),
    locations = cells_body(rows = seq(1, 7, 2))
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "left", columns = everything()) |>
  cols_align(align = "center", columns = Value)
Heterogeneity Statistics
Random-Effects Model (REML)
Measure Value Interpretation
Q statistic 152.23 Test statistic for heterogeneity
df 12 Number of studies minus 1
p-value < 0.001 Evidence of significant heterogeneity
92.2% 92.2% of variance due to between-study differences — considerable
12.856 Ratio of total to sampling variance
τ² 0.3132 Between-study variance on log-RR scale
τ 0.5597 Between-study SD on log-RR scale

Interpretation: I² = 92.2% indicates considerable heterogeneity. The large majority of variation across studies reflects genuine differences in true effects rather than sampling error, justifying the use of a random-effects model and motivating the moderator analysis below.


6 Publication Bias

6.1 Funnel Plot

funnel_df <- dat |> mutate(sei = sqrt(vi))

se_seq <- seq(0, max(funnel_df$sei) * 1.1, length.out = 100)
ci_lo  <- res$beta - 1.96 * se_seq
ci_hi  <- res$beta + 1.96 * se_seq

funnel_poly <- tibble(
  x = c(exp(ci_lo), rev(exp(ci_hi))),
  y = c(se_seq,     rev(se_seq))
)

ggplot(funnel_df, aes(x = RR, y = sqrt(vi))) +
  geom_polygon(
    data = funnel_poly, aes(x = x, y = y),
    fill = "#EAF4FB", colour = NA, alpha = 0.7
  ) +
  geom_path(
    data = tibble(x = exp(ci_lo), y = se_seq),
    aes(x = x, y = y), linetype = "dashed", colour = "grey60"
  ) +
  geom_path(
    data = tibble(x = exp(ci_hi), y = se_seq),
    aes(x = x, y = y), linetype = "dashed", colour = "grey60"
  ) +
  geom_vline(xintercept = exp(res$beta), colour = clr_pooled,
             linewidth = 0.8, linetype = "dotted") +
  geom_vline(xintercept = 1, colour = "grey40",
             linewidth = 0.5, linetype = "dashed") +
  geom_point(colour = clr_primary, size = 3, alpha = 0.85) +
  scale_x_log10(breaks = c(0.1, 0.25, 0.5, 1, 2, 4)) +
  scale_y_reverse(
    limits = c(max(funnel_df$sei) * 1.12, 0),
    expand = c(0, 0)
  ) +
  labs(
    title    = "Funnel Plot — BCG Trials",
    subtitle = "Assessing small-study effects and publication bias",
    x        = "Risk Ratio (log scale)",
    y        = "Standard Error",
    caption  = "Red dotted line = pooled estimate. Shaded region = 95% pseudo-CI."
  ) +
  theme_meta()
**Figure 2.** Funnel plot for publication bias assessment. Dashed lines = 95% pseudo-confidence region. Asymmetry may suggest selective reporting of small studies with positive results.

Figure 2. Funnel plot for publication bias assessment. Dashed lines = 95% pseudo-confidence region. Asymmetry may suggest selective reporting of small studies with positive results.

6.2 Egger’s Test

The funnel plot is complemented by Egger’s test, which formally tests for funnel plot asymmetry using a regression of the standardised effect on its precision.

  • A p-value < 0.05 suggests significant asymmetry — possible publication bias or small-study effects.
  • A p-value ≥ 0.05 suggests no significant asymmetry at the 5% level.
egg <- regtest(res)

tibble(
  Statistic      = c("z statistic", "p-value", "Interpretation"),
  Value          = c(
    round(egg$zval, 3),
    round(egg$pval, 4),
    ifelse(
      egg$pval < 0.05,
      "Significant asymmetry detected — possible publication bias",
      "No significant asymmetry detected at the 5% level"
    )
  )
) |>
  gt() |>
  tab_header(
    title    = md("**Egger's Test for Funnel Plot Asymmetry**"),
    subtitle = "Regression test for publication bias"
  ) |>
  tab_style(
    style     = cell_text(
      weight = "bold",
      color  = ifelse(egg$pval < 0.05, "#C0392B", "#27AE60")
    ),
    locations = cells_body(rows = Statistic == "Interpretation")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  tab_footnote(
    footnote  = "A non-significant p-value (p > 0.05) suggests no evidence of publication bias.",
    locations = cells_column_labels(columns = Value)
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(60)
  ) |>
  cols_align(align = "left", columns = everything())
Egger’s Test for Funnel Plot Asymmetry
Regression test for publication bias
Statistic Value1
z statistic -0.803
p-value 0.4218
Interpretation No significant asymmetry detected at the 5% level
1 A non-significant p-value (p > 0.05) suggests no evidence of publication bias.

7 Meta-Regression: Latitude as Moderator

Absolute latitude is hypothesised to be a proxy for background exposure to environmental mycobacteria. Trials at higher latitudes (cooler climates) consistently show stronger BCG protection, likely because populations at lower latitudes have pre-existing immunity from environmental mycobacteria that diminishes the relative benefit of vaccination.

7.1 Model

res_mod <- rma(
  yi     = yi,
  vi     = vi,
  mods   = ~ ablat,
  data   = dat,
  method = "REML"
)
tibble(
  Parameter = c(
    "Intercept (log RR at 0°)",
    "Intercept — RR",
    "Slope (per degree latitude — log RR)",
    "Slope — RR change per degree",
    "R² (latitude)",
    "Residual I²",
    "Residual τ²"
  ),
  Estimate = c(
    round(coef(res_mod)[1], 4),
    round(exp(coef(res_mod)[1]), 3),
    round(coef(res_mod)[2], 5),
    paste0(round((1 - exp(coef(res_mod)["ablat"])) * 100, 1), "%"),
    paste0(round(res_mod$R2, 1), "%"),
    paste0(round(res_mod$I2, 1), "%"),
    round(res_mod$tau2, 4)
  ),
  Interpretation = c(
    "Log RR at equator (latitude = 0°)",
    "Near-null protection at equatorial baseline",
    "Each additional degree increases log-RR reduction",
    "Further reduction in RR per additional degree of latitude",
    "Between-study variance explained by latitude",
    "Remaining unexplained heterogeneity",
    "Residual between-study variance after adjusting for latitude"
  )
) |>
  gt() |>
  tab_header(
    title    = md("**Meta-Regression Results**"),
    subtitle = md("Moderator: Absolute Latitude (degrees)")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold", color = "#1F4E79"),
                     cell_fill(color = "#D6E8F7")),
    locations = cells_body(rows = Parameter == "R² (latitude)")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  tab_style(
    style     = cell_fill(color = "#EAF4FB"),
    locations = cells_body(rows = c(1, 3, 5, 7))
  ) |>
  tab_source_note(
    source_note = md(
      "REML estimation. R² reflects variance in true effects explained by latitude."
    )
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "left",   columns = c(Parameter, Interpretation)) |>
  cols_align(align = "center", columns = Estimate)
Meta-Regression Results
Moderator: Absolute Latitude (degrees)
Parameter Estimate Interpretation
Intercept (log RR at 0°) 0.2515 Log RR at equator (latitude = 0°)
Intercept — RR 1.286 Near-null protection at equatorial baseline
Slope (per degree latitude — log RR) -0.0291 Each additional degree increases log-RR reduction
Slope — RR change per degree 2.9% Further reduction in RR per additional degree of latitude
R² (latitude) 75.6% Between-study variance explained by latitude
Residual I² 68.4% Remaining unexplained heterogeneity
Residual τ² 0.0764 Residual between-study variance after adjusting for latitude
REML estimation. R² reflects variance in true effects explained by latitude.

7.2 Meta-Regression Plot

lat_seq  <- seq(5, 65, by = 1)
pred_mod <- predict(res_mod, newmods = lat_seq, transf = exp)

pred_df <- tibble(
  ablat = lat_seq,
  fit   = pred_mod$pred,
  ci_lo = pred_mod$ci.lb,
  ci_hi = pred_mod$ci.ub
)

dat2 <- dat |>
  mutate(
    wt   = weights(res_mod),
    size = 2 + 5 * (wt - min(wt)) / (max(wt) - min(wt))
  )

ggplot() +
  geom_ribbon(
    data = pred_df,
    aes(x = ablat, ymin = ci_lo, ymax = ci_hi),
    fill = clr_fill, alpha = 0.7
  ) +
  geom_line(
    data = pred_df, aes(x = ablat, y = fit),
    colour = clr_pooled, linewidth = 1
  ) +
  geom_hline(yintercept = 1, linetype = "dashed", colour = "grey50") +
  geom_point(
    data   = dat2,
    aes(x = ablat, y = RR, size = size),
    colour = clr_primary, alpha = 0.85
  ) +
  scale_y_log10(
    limits = c(0.05, 3),
    breaks = c(0.1, 0.25, 0.5, 1, 2),
    labels = c("0.10", "0.25", "0.50", "1.00", "2.00")
  ) +
  scale_size_identity() +
  scale_x_continuous(breaks = seq(10, 60, by = 10)) +
  annotate(
    "text",
    x = 50, y = 2.3,
    label = sprintf("Slope = %.4f log-RR per degree\nR\u00b2 = %.1f%%",
                    coef(res_mod)["ablat"], res_mod$R2),
    hjust = 0, size = 3.5, colour = clr_pooled
  ) +
  labs(
    title    = "Meta-Regression: Latitude \u00d7 BCG Efficacy",
    subtitle = "Weighted regression; point size proportional to study weight",
    x        = "Absolute Latitude (degrees)",
    y        = "Risk Ratio (log scale)",
    caption  = "Shaded band = 95% CI. Dashed line = no effect (RR = 1)."
  ) +
  theme_meta()
**Figure 3.** Meta-regression of BCG efficacy on absolute latitude. Point size reflects study weight. Shaded band = 95% CI around the regression line.

Figure 3. Meta-regression of BCG efficacy on absolute latitude. Point size reflects study weight. Shaded band = 95% CI around the regression line.

Interpretation: Each additional degree of absolute latitude is associated with a 2.9% further reduction in the risk ratio. Latitude explains approximately 75.6% of the between-study variance, with 24.4% of heterogeneity remaining unexplained by this single moderator.


8 Sensitivity Analysis: Leave-One-Out

The robustness of the pooled estimate is assessed by sequentially omitting each study and re-fitting the model on the remaining 12 trials.

loo <- leave1out(res, transf = exp)

loo_df <- tibble(
  study = dat$slab,
  est   = loo$estimate,
  ci_lo = loo$ci.lb,
  ci_hi = loo$ci.ub,
  i2    = loo$I2
) |>
  mutate(study = factor(study, levels = rev(study)))

ggplot(loo_df, aes(y = study, x = est, xmin = ci_lo, xmax = ci_hi)) +
  geom_vline(
    xintercept = exp(res$beta),
    linetype   = "dashed", colour = clr_pooled, linewidth = 0.7
  ) +
  geom_errorbarh(height = 0.25, colour = clr_mid, linewidth = 0.5) +
  geom_point(aes(colour = i2), size = 3, shape = 18) +
  scale_x_log10(breaks = c(0.3, 0.4, 0.5, 0.6, 0.7, 0.8)) +
  scale_colour_gradient(low = "#2ECC71", high = "#E74C3C",
                        name = "I\u00b2 (%)") +
  labs(
    title    = "Leave-One-Out Sensitivity Analysis",
    subtitle = "Red dashed line = original pooled estimate",
    x        = "Pooled RR (log scale)",
    y        = NULL,
    caption  = "Point colour indicates residual I\u00b2 after study removal."
  ) +
  theme_meta() +
  theme(axis.text.y = element_text(size = 9))
**Figure 4.** Leave-one-out sensitivity analysis. Each row shows the pooled RR when the named study is excluded. Point colour indicates the residual I² after removal.

Figure 4. Leave-one-out sensitivity analysis. Each row shows the pooled RR when the named study is excluded. Point colour indicates the residual I² after removal.

loo_df |>
  mutate(study = as.character(study)) |>
  arrange(est) |>
  gt() |>
  tab_header(
    title    = md("**Leave-One-Out Results**"),
    subtitle = "Pooled RR and residual heterogeneity after each study omission"
  ) |>
  fmt_number(columns = c(est, ci_lo, ci_hi, i2), decimals = 3) |>
  cols_label(
    study = "Study Omitted",
    est   = "Pooled RR",
    ci_lo = "95% CI Low",
    ci_hi = "95% CI High",
    i2    = "Residual I² (%)"
  ) |>
  tab_style(
    style     = cell_fill(color = "#EAF4FB"),
    locations = cells_body(rows = seq(1, nrow(loo_df), 2))
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  tab_footnote(
    footnote  = md(paste0(
      "Red dashed line in Figure 4 = original pooled RR = **",
      round(exp(res$beta), 3), "** [",
      round(exp(res$ci.lb), 3), ", ",
      round(exp(res$ci.ub), 3), "]."
    )),
    locations = cells_column_labels(columns = est)
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "center", columns = everything()) |>
  cols_align(align = "left",   columns = study)
Leave-One-Out Results
Pooled RR and residual heterogeneity after each study omission
Study Omitted Pooled RR1 95% CI Low 95% CI High Residual I² (%)
TPT Madras, 1980 0.452 0.317 0.643 87.031
Comstock et al, 1976 0.460 0.319 0.661 92.344
Frimodt-Moller et al, 1973 0.466 0.320 0.678 92.763
Comstock & Webster, 1969 0.468 0.327 0.668 92.678
Comstock et al, 1974 0.469 0.319 0.688 91.811
Coetzee & Berjak, 1968 0.477 0.324 0.701 93.213
Stein & Aronson, 1953 0.491 0.332 0.727 90.912
Aronson, 1948 0.493 0.340 0.716 93.226
Rosenthal et al, 1960 0.504 0.350 0.725 92.935
Vandiviere et al, 1973 0.519 0.365 0.740 92.278
Ferguson & Simes, 1949 0.520 0.365 0.741 92.254
Rosenthal et al, 1961 0.520 0.363 0.747 92.232
Hart & Sutherland, 1977 0.533 0.377 0.754 90.412
1 Red dashed line in Figure 4 = original pooled RR = 0.489 [0.344, 0.696].

9 Summary

tibble(
  Analysis = c(
    "Primary RE (REML)",
    "With latitude covariate",
    "Leave-one-out range"
  ),
  `Pooled RR (95% CI)` = c(
    sprintf("%.2f [%.2f, %.2f]",
            exp(res$beta), exp(res$ci.lb), exp(res$ci.ub)),
    sprintf("Intercept RR: %.2f; Slope: %.4f/°",
            exp(coef(res_mod)[1]), coef(res_mod)[2]),
    sprintf("%.3f to %.3f", min(loo_df$est), max(loo_df$est))
  ),
  `I² (%)` = c(
    round(res$I2, 1),
    round(res_mod$I2, 1),
    NA_real_
  ),
  `τ²` = c(
    round(res$tau2, 4),
    round(res_mod$tau2, 4),
    NA_real_
  ),
  `p (Heterogeneity)` = c(
    ifelse(res$QEp < 0.001, "< 0.001", round(res$QEp, 4)),
    ifelse(res_mod$QEp < 0.001, "< 0.001", round(res_mod$QEp, 4)),
    "—"
  )
) |>
  gt() |>
  tab_header(
    title    = md("**Summary of All Models and Sensitivity Analyses**"),
    subtitle = md("BCG Vaccine Meta-Analysis")
  ) |>
  sub_missing(missing_text = "—") |>
  tab_style(
    style     = list(cell_fill(color = "#EAF4FB"),
                     cell_text(weight = "bold")),
    locations = cells_body(rows = Analysis == "Primary RE (REML)")
  ) |>
  tab_style(
    style     = list(cell_text(weight = "bold"),
                     cell_fill(color = "#F2F6FB")),
    locations = cells_column_labels()
  ) |>
  tab_source_note(
    source_note = md(
      "RE = Random-effects · REML = Restricted Maximum Likelihood · I² = between-study heterogeneity"
    )
  ) |>
  opt_align_table_header(align = "left") |>
  tab_options(
    table_body.hlines.color = "transparent",
    table.width = pct(100)
  ) |>
  cols_align(align = "center", columns = everything()) |>
  cols_align(align = "left",   columns = Analysis)
Summary of All Models and Sensitivity Analyses
BCG Vaccine Meta-Analysis
Analysis Pooled RR (95% CI) I² (%) τ² p (Heterogeneity)
Primary RE (REML) 0.49 [0.34, 0.70] 92.2 0.3132 < 0.001
With latitude covariate Intercept RR: 1.29; Slope: -0.0291/° 68.4 0.0764 0.0012
Leave-one-out range 0.452 to 0.533
RE = Random-effects · REML = Restricted Maximum Likelihood · I² = between-study heterogeneity

10 Conclusion

The meta-analysis of 13 randomised controlled trials provides strong evidence that BCG vaccination significantly reduces the risk of tuberculosis infection. The pooled risk ratio of 0.49 (95% CI: 0.34 to 0.7) indicates a meaningful protective effect of the vaccine compared to no vaccination.

Substantial heterogeneity (I² ≈ 92%) was detected across trials, which is expected given the diversity of study populations, settings, and BCG strains. Meta-regression revealed that absolute latitude explains approximately 76% of this between-study variance, with trials at higher latitudes demonstrating stronger protection. This pattern is consistent with the hypothesis that pre-existing sensitisation to environmental mycobacteria at lower latitudes attenuates vaccine efficacy.

Funnel plot inspection and Egger’s test did not provide strong evidence of publication bias (p = 0.422). Leave-one-out analyses confirmed that the pooled estimate is robust and not unduly influenced by any single trial, with the estimate ranging from 0.452 to 0.533 across all omissions.

In conclusion, BCG vaccination confers significant protection against TB, though the magnitude of benefit varies substantially by geographic context. These findings support continued use of BCG in high-latitude settings and underscore the importance of context-specific evaluation in global vaccination policy.


11 References

Colditz GA, Brewer TF, Berkey CS, et al. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis. JAMA, 271(9), 698–702. https://doi.org/10.1001/jama.1994.03510330076038

Viechtbauer W (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03

Higgins JPT, Thompson SG, Deeks JJ, Altman DG (2003). Measuring inconsistency in meta-analyses. BMJ, 327, 557–560.

Egger M, Davey Smith G, Schneider M, Minder C (1997). Bias in meta-analysis detected by a simple, graphical test. BMJ, 315, 629–634.


Analysis conducted in R using the metafor, forestplot, and tidyverse packages. Dataset: Colditz GA et al. (1994). JAMA, 271(9), 698–702.