Meta-regression concepts

Introduction

Meta-regression is a statistical method for combining results from multiple studies. If the studies aim to answer the same research question and produce a quantitative result, we can use meta-regression to obtain an overall estimate considering the totality of the available evidence.

[XX reference previous training on probability distributions and standard errors]

In this training, we discuss several concepts that will be important for running and interpreting the results of meta-regression models. These include:

how meta-regression differs from other types of regression;
quantifying excess variation between studies;
understanding funnel plots;
quantifying uncertainty in the final estimate.

Some readers may be more familiar with the term “meta-analysis.” Meta-regression is nothing other than meta-analysis with covariates, or variables that describe how study characteristics affect the outcome measure.

Within the range of possible models for conducting meta-regression analysis, we focus on linear mixed-effects models as a way to introduce MR-BRT in the next training.

A typical use case for meta-regression

We will occasionally refer back to the following hypothetical example to make the discussion more concrete. Consider a scenario in which researchers are interested in quantifying the effect of a binary risk factor \(r\) on the incidence of a disease \(d\). The notation \(r_1\) indicates exposure to the risk factor and \(r_0\) indicates lack of exposure. The dependent variable, our quantity of interest, is defined as a ratio comparing the disease incidence in the exposed group to the disease incidence in the unexposed group. In other words, the quantity of interest is an incidence rate ratio: \(IRR = \frac{I(d)|r_1}{I(d)|r_0}\). In the scenario, 30 research groups have conducted longitudinal cohort studies each using their local city as a source population.

The particular details of risk factor \(r\) and disease \(d\) are not important for the discussion. The IRR could represent something like the incidence of diarrheal disease among households using well water versus piped water. Or maybe the IRR represents the incidence of asthma attacks among smokers versus non-smokers. The important part is that multiple studies have been conducted assessing the same research question, so we can use a meta-regression model to obtain an overall best-guess estimate for the true IRR.

The “meta” in meta-regression

How does meta-regression (MR) differ from other types of regression, like ordinary least squares (OLS) regression? The defining characteristic of MR is the presence of uncertainty in the dependent variable. This feature reflects the fact that the input data for MR models are often quantitative results from scientific studies that are reported with uncertainty. Because scientific results often come from a regression analysis, the “meta” in meta-regression refers to the idea that we are running a regression model on regression results.

How is uncertainty in the dependent variable used by a meta-regression model? Studies with relatively small uncertainty tend to have larger sample sizes. Because studies with larger sample sizes provide more information about the quantity of interest, they are given greater weight in a meta-regression model. In this sense, as a simple shorthand, one might think of meta-regression as a weighted average of the input data. There are two extenuating factors, however:

First, the dependent variable needs to be transformed into the appropriate probability space before modeling begins. The IRR in our hypothetical scenario needs to be log(IRR) when used as a dependent variable, for example.
Second, while some of the results in “fixed-effects” meta-regression can indeed be calculated as a simple weighted average (after transforming the dependent variable), the standard in epidemiology and other fields is to use mixed-effects meta-regression. The word “mixed” refers to the use of both fixed and random effects in the same model. Random effects quantify the degree of between-study heterogeneity, and estimating random effects as part of the model means that results cannot be calculated simply as a weighted average. This topic is covered in more detail in the section “Quantifying between-study variation.”

Mathematical description

Having covered some topics conceptually, we are now prepared to define all terms in the equations that define a linear mixed-effects meta-regression:

\(y_{ij} = \beta_0 + \beta_1 x_{i1} + u_j + \epsilon_i\)

The term \(y_{ij}\) refers to an observation \(i\) in a study \(j\) that has been log- or logit-transformed using the delta method.
The terms \(\beta_0\) and \(\beta_1\) are a typical intercept and slope as in ordinary least squares regression.
The term \(x_{i1}\) is a covariate, also sometimes called a moderator. For example, from the hypothetical scenario described in the first section, if we believe that log(IRR) should vary as a function of age, then \(x_{i1}\) is the age value corresponding to observation \(i\). A model that does not include any covariates is called an “intercept-only” model.
The term \(\epsilon_i\) is the observation-specific sampling error, or the variation we expect due to low sample size. This is also sometimes called measurement error.
The term \(u_j\) is the random intercept that represents study-specific non-sampling error.

Quantifying between-study variation

What do we mean by “random intercept” and “study-specific non-sampling error”? An assumption of mixed-effects meta-regression models is that, after accounting for variation in \(y_{ij}\) that is due to covariates (e.g. \(x_{i1}\)) and sampling error (i.e. known \(\sigma_i\)), there might be some left-over variation in the data. This extraneous noise is called “non-sampling error”. Non-sampling error often results from natural inconsistencies between the various studies that comprise the input data. A few common examples of those inconsistencies are: the use of source populations that are not completely comparable, different degrees of selection bias between the studies, or subtle differences in the measurement of the risk factor or definition of the disease. A linear mixed-effects model assumes that the between-study variation that comes from these factors roughly follows a Normal distribution as in the following equation:

\(u_j \sim N(0, \gamma)\)

The model estimates the variance (i.e. \(\gamma\) or “gamma”) of this Normal distribution, which in turn implies a study-specific quantity that represents the study’s particular non-sampling error value \(u_j\). This is what is meant by random intercept. The “random” in random intercept refers to the idea that the model treats non-sampling error as a random variable in the sense that the intercepts are a manifestation of an underlying probability distribution.

Finally, the model assumes that sampling error, or the variation we expect due to low sample size, similarly comes from a Normal distribution.

\(\epsilon_i \sim N(0, \sigma^2_i)\)

As discussed in previous sections, the \(\sigma_i\) for a given observation is assumed to be known and given as part of the input data.

After a meta-regression model has been fit and the effects of covariates have been accounted for, it is often useful to quantify the degree to which sampling and non-sampling error contribute to the remaining variation. This information is described by the \(I^2\) statistic, defined as “the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance)” (https://training.cochrane.org/handbook/current/chapter-10).

A graphical example can be helpful here. First, we create some simulated data representing 10 studies with 4 observations per study and include the effect of a covariate x1.

library(dplyr)
set.seed(0)
k_studies <- 10; n_per_study <- 4
sigma_1 <- 1; tau_1 <- 8 # tau = sqrt(gamma)

df_sim_study <- data.frame(study_id = as.factor(1:k_studies)) %>%
  mutate(study_effect1 = rnorm(n = k_studies, mean = 0, sd = tau_1) ) 

df_sim1 <- do.call("rbind", lapply(1:nrow(df_sim_study), function(i) {
  df_sim_study[rep(i, n_per_study), ] })) %>%
  mutate(
    x1 = runif(n = nrow(.), min = 0, max = 10),
    y1 = 0.9*x1 + study_effect1 + rnorm(nrow(.), mean = 0, sd = sigma_1),
    y1_se = sigma_1 ) %>%
  arrange(x1)

Next we visualize the data; observations in the same study are connected with a line.

with(df_sim1, plot(x1, y1))
grid()
for (id in unique(df_sim1$study_id)) {
  with(filter(df_sim1, study_id == id), lines(x1, y1, lty = 1))
}

Just for illustration, we created data in which the non-sampling error (i.e. between-study heterogeneity) is much larger than the sampling error (i.e. within-study variation after accounting for the effect of covariate x1). When we pass these data to a mixed-effects meta-regression model, it will estimate a single slope and intercept as usual. Each study will also receive its own random intercept, which moves the linear predictor \(\beta_0 + \beta_1 x_{i1}\) (not visualized) vertically such that it roughly matches the level of the particular study. Please see the chapter on MR-BRT for a visualization that shows model predictions using study-specific random intercepts.

Funnel plots and publication bias

Another common way to visualize meta-regression data is with a funnel plot. It requires first fitting a meta-regression model (intercept-only or with covariates) so that residuals can be calculated.

The distinguishing characteristics of a funnel plot are:

The x-axis represents the residuals of a fitted model. The residuals are in transformed space, i.e. log or logit depending on which transformation was applied to the dependent variable before modeling.
The y-axis refers to the magnitude of the observation-specific standard error (SE). The scale is inverted, so the lowest SE values (the observations with the least uncertainty) are toward the top of the plot.
There is a vertical line at x=0.
The lines forming the funnel are 1.96 standard errors away from the vertical line at x=0.

To generate an example, we return to the hypothetical scenario described at the beginning of the chapter (“30 research groups have conducted longitudinal cohort studies each using their local city as a source population”). The studies differ only in their sample sizes. We set the true underlying value \(\mu\) to be \(log(IRR) = 0.2\) or approximately \(IRR = 1.221\). We use the metafor R package to estimate the overall mean only because we have not yet introduced MR-BRT!

library(dplyr)
library(metafor)

set.seed(4)
n_studies <- 30
true_logirr_mean <- 0.2
sd_logirr <- 0.1
# sample_sizes1 <- round(runif(n = n_studies, min = 40, max = 4000), digits = 0)
sample_sizes1 <- sample(c(40, 80, 160, 320, 640, 1280), size = n_studies, replace = TRUE)

df_sim2 <- data.frame(study_id = 1:n_studies) %>%
  mutate(
    sample_size = sample_sizes1,
    se_log_irr = sd_logirr / sqrt(sample_size),
    log_irr = rnorm(n = nrow(.), mean = true_logirr_mean, sd = se_log_irr),
  )

mod1 <- metafor::rma.uni(yi = df_sim2$log_irr, sei = df_sim2$se_log_irr)
beta0 <- summary(mod1)$beta["intrcpt", 1]
df_sim2$residual1 <- df_sim2$log_irr - beta0

metafor::funnel(x = df_sim2$residual1, sei = df_sim2$se_log_irr, xlab = "Residual")

In general, we expect observations with the lowest SE values (i.e. observations with the least uncertainty) to have the smallest residuals and therefore expect them to be closest to the x=0 line. This happens for two reasons. First, scientifically speaking, if the studies are all attempting to discover the same true underlying phenomenon, studies with higher sample sizes should be better at precisely estimating that truth. Second, the vertical line represents the intercept estimated from a meta-regression model in which low-SE observations have the most influence. The line therefore naturally tracks more closely to the low-SE observations.

What can we learn from a funnel plot? Recall that the bounds of the funnel are fixed at 1.96*SE away from the x=0 vertical line, where SE comes from the level indicated on the y-axis. The quantity 1.96 refers to the z-score that defines the 95% confidence interval, so based on sampling error alone we expect approximately 5 percent of observations to fall outside the funnel.

If greater than 5 percent of observations fall outside the funnel, it indicates that there is excess variation in the data not captured by the model. This could result from omitting an important covariate that would otherwise explain the variation, or it could be an indication of a high degree of non-sampling error.
If the observations all fall well within the funnel, it indicates that there is greater consistency among the studies than would be expected based on their measurement error or sample size alone. This might happen if, for example, one large and credible study was published early on and others were accepted for publication only if they agreed with the original study.

More commonly, publication bias comes from the tendency for studies to be accepted for publication if they have large sample sizes and/or estimate a large effect size in the expected direction. In this case, studies with large uncertainty and/or an effect size in the opposite direction are less likely to appear in the published literature. This phenomenon can be observed readily in a funnel plot as in the simulated example below. We take the data set from the previous example but remove observations that have both a negative residual and a low sample size (n<500).

df_sim3 <- df_sim2 %>%
  mutate(
    not_published = residual1 < 0 & sample_size < 500,
    pch_var = ifelse(not_published, 1, 16)
  )
  
mod2 <- metafor::rma.uni(
  yi = log_irr, 
  sei = se_log_irr, 
  data = df_sim3 %>% filter(!not_published)
)
beta0_pubbias <- summary(mod2)$beta["intrcpt", 1]
df_sim3$residual2 <- df_sim3$log_irr - beta0_pubbias

with(df_sim3, funnel(x = residual2, sei = se_log_irr, xlab = "Residual", pch = pch_var))

Studies that occurred but were not published are now shown as empty circles, and published studies are shown as solid circles. The solid circles have the characteristic pattern that suggests the presence of publication bias, a diagonal downward slant in the expected direction of the effect size.

To conduct a statistical test for the presence of publication bias, we can assess whether the standard errors have a statistically significant relationship with residuals. In the absence of publication bias, the sample size of the study (and therefore its standard error) should have no bearing on whether the quantity of interest is high or low. There are many options in the literature for how to specify this test; a common one is known as Egger’s regression test.

Using the data set from the previous funnel plot, we run Egger’s regression test.

df_sim4 <- df_sim3 %>% filter(!not_published)
# source: https://rdrr.io/cran/metafor/man/regtest.html
regtest(x = residual2, sei = se_log_irr, predictor = "sei", data = df_sim4)


Regression Test for Funnel Plot Asymmetry

Model:     mixed-effects meta-regression model
Predictor: standard error

Test for Funnel Plot Asymmetry: z =  2.2114, p = 0.0270
Limit Estimate (as sei -> 0):   b = -0.0050 (CI: -0.0097, -0.0002)

The test found that the standard error of an observation has a statistically significant relationship with the residual (p < 0.05) so we reject the null hypothesis of no publication bias.

Quantifying uncertainty in the final estimate

After fitting a meta-regression model, the last modeling decision is what type of uncertainty to report for the effect size. There are generally three options:

Uncertainty from the fixed effects only;
Uncertainty from the fixed effects and between-study variation (using the point estimate of \(\gamma\));
Uncertainty from the fixed effects and between-study variation, also incorporating parameter uncertainty in the estimation of \(\gamma\) (i.e. using draws of \(\gamma\)).

Option 1 corresponds to a confidence interval. In an intercept-only model describing log(IRR), for example, the 95% confidence interval can be calculated as \(\beta_0 \pm 1.96 *SE(\beta_0)\) where \(SE\) means the posterior standard error of a parameter. By considering only uncertainty in the fixed effects (e.g. the estimation of \(\beta_0\)) and ignoring study-level variation, the choice to use a confidence interval implies that study-level factors are irrelevant when deciding whether to reject the null hypothesis of \(log(IRR) = 0\). While the defensibility of this view depends on the purpose of the analysis, most meta-regression analyses in epidemiology are conducted with future action in mind. From that point of view, between-study heterogeneity represents a true lack of knowledge about the location of the effect size. When evidence-based policies and public health actions depend on the results of a meta-analysis being correct in practice, incorporating all sources of unexplained variation into the uncertainty estimate is important.

Option 2 corresponds to a prediction interval, or the range of effect sizes we might expect in hypothetical new studies. In an intercept-only model describing log(IRR), the 95% prediction interval can be calculated as \(\beta_0 \pm 1.96 * \sqrt{SE(\beta_0)^2 + \gamma}\). Incorporating \(\gamma\), the parameter descrbing the variance of between-study heterogeneity, into the uncertainty interval addresses the limitations highlighted for Option 1, namely that variation between studies represents real lack of knowledge in where the true effect lies, so it should be incorporated into the reported uncertainty interval.

Option 3 is similar to Option 2 in the sense that it corresponds to a prediction interval, but it additionally accounts for uncertainty in the estimation of \(\gamma\) itself. Uncertainty in \(\gamma\) is rarely documented or used in published meta-analyses (see Ioannidis et al., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2048840/), but incorporating it into uncertainty intervals and statistics like \(I^2\) can provide important information about the reliability of results. We discuss the mechanics of how to do this with MR-BRT in the next chapter.