This chapter provides code examples for common meta-regression modeling tasks using MR-BRT. MR-BRT stands for Meta-Regression with Bayesian priors, Regularization and Trimming. It is the mixed-effects modeling tool implemented by the mrtool Python package, which we access here through the reticulate R package. See the Health Metrics Toolbox chapter Meta-regression concepts for a primer on the subject of mixed-effects meta-regression modeling.
The main functions of interest in this chapter are:
Splines, including ensemble splines, will be covered in the next chapter.
Modeling tasks covered here include:
# Load the mrbrt package through reticulate
Sys.setenv("RETICULATE_PYTHON" = "/ihme/code/mscm/miniconda3/envs/mrtool_0.0.2/bin/python")
library(reticulate)
reticulate::use_python("/ihme/code/mscm/miniconda3/envs/mrtool_0.0.2/bin/python")
mr <- reticulate::import("mrtool")
# Load R library to create synthetic data
library(dplyr)
First, we generate some simulated data. The object df_sim1 is a data set that includes 10 studies with 5 observations per study. The effect of covariate x1 on the dependent variable y1 is linear with a slope of 0.9. Between-study variation (non-sampling error) has a standard deviation of 7 and within-study variation (sampling error) has a standard deviation of 1. Observation-specific standard errors in the input data are set to be 1. We thus have:
\(y_1 = 0.9 x_1 + N \left( 0 , 7^2 \right) + N \left( 0 , 1^2 \right)\)
With real data, the within-study variation would be known and given by the user. The objective of the modeler is to determine the coefficient associated to the covariate x1 and the between-study variation.
set.seed(2)
k_studies <- 10 # Number of studies
n_per_study <- 5 # Number of observations per study
tau_1 <- 7 # Between-study standard deviation
sigma_1 <- 1 # Within-study standard deviation
# Compute the study effect
df_sim_study <- data.frame(study_id = as.factor(1:k_studies)) %>%
mutate(study_effect1 = rnorm(n = k_studies, mean = 0, sd = tau_1))
# Compute the observations with one covariate x1 and observation-specific standard errors = 1
df_sim1 <- do.call("rbind", lapply(1:nrow(df_sim_study), function(i) {
df_sim_study[rep(i, n_per_study), ] })) %>%
mutate(
x1 = runif(n = nrow(.), min = 0, max = 10), # covariate
y1 = 0.9*x1 + study_effect1 + rnorm(nrow(.), mean = 0, sd = sigma_1), # variable to be predicted
y1_se = sigma_1, # known within-study variation
is_outlier = FALSE # we will add outliers when we do trimming later
) %>%
arrange(x1)
Here is a plot showing the data in df_sim1.
sim_plot <- function(dat, point_size = 0.5, line_width = 0.5) {
with(dat, plot(x1, y1, pch = 16, cex = point_size))
grid()
# Loop on the studies to see the linear effect of the covariate x1
# Each line in the plot corresponds to one study
for (id in unique(dat$study_id)) {
df_tmp <- filter(arrange(dat, x1), study_id == id)
with(df_tmp, lines(x1, y1, lty = 1, lwd = line_width))
}
}
sim_plot(df_sim1)
In this part, we show how to fit a basic model. Following the way we constructed the synthetic data, the model is defined by:
\(y_{ij} = \beta_0 + \beta_1 x_{i1} + u_j + \epsilon_i\)
with unknown coefficients \(\beta_0\) and \(\beta_1\). The between-study effects follow a normal distribution with unknown variance \(\gamma\):
\(u_j \sim N(0, \gamma)\)
The observation specific effects follow a normal distribution with known standard deviation \(\sigma_i\):
\(\epsilon_i \sim N(0, \sigma^2_i)\)
We would like to see if we can retrieve the initial parameters that we have chosen for the model: \(\beta_0 = 0 , \beta_1 = 0.9\) and \(\gamma = 7^2\).
The data are formatted with the MRData() function, specifying which columns in the data set represent the mean of the dependent variable (col_obs), standard error of the dependent variable (col_obs_se), variables to be potentially used as covariates (col_covs), and the variable identifying the random effect units (col_study_id).
Documentation for function inputs can also be viewed with the reticulate::py_help() function, like py_help(mr$MRData).
# Create the data
dat1 <- mr$MRData()
# Assign data set and columns
dat1$load_df(
data = df_sim1, col_obs = "y1", col_obs_se = "y1_se",
col_covs = list("x1"), col_study_id = "study_id")
Next we use the MRBRT() function to specify the covariates to be used in the model, giving them as a list of LinearCovModel function calls. For any term, we can specify that random effects should be estimated by setting use_re = TRUE. The default is use_re = FALSE. Mixed-effects meta-regressions typically have a random effect on the intercept term but not the other terms. LinearCovModel() can take many other arguments, some of which we explore later in the section dealing with Bayesian priors. You can see a full list with py_help(mr$LinearCovModel).
Then running fit_model() begins the process of optimizing the parameter values.
# Set the model parameters
mod1 <- mr$MRBRT(
data = dat1,
cov_models = list(
mr$LinearCovModel("intercept", use_re = TRUE), # Specify that random effects should be estimated
mr$LinearCovModel("x1")
)
)
# Fit the model
mod1$fit_model()
We have now successfully fit the following model:
\(y_{ij} = \beta_0 + \beta_1 x_{i1} + u_j + \epsilon_i\)
\(u_j \sim N(0, \gamma)\)
\(\epsilon_i \sim N(0, \sigma^2_i)\)
You can see the coefficient estimates with:
mod1$summary()
## [[1]]
## intercept x1
## 0 0.8307955 0.969775
##
## [[2]]
## intercept
## 0 31.26816
The first element includes the \(\beta\) coefficients, and the second element includes the \(\gamma\) coefficient.
A data frame with study-specific random effect values can be obtained by:
df_re <- data.frame(level = names(mod1$re_soln), re = unlist(mod1$re_soln))
head(df_re, n = 5)
## level re
## 1 1 -8.19543941
## 10 10 -2.96534262
## 2 2 0.03955203
## 3 3 9.93241499
## 4 4 -8.86715464
To save a model object to disk as a pkl file, use the py_save_object() function. To load a saved model object, use py_load_object().
py_save_object(object = mod1, filename = file.path("~/msca_code_examples/R_examples/MRBRT/mod1.pkl"), pickle = "dill")
mod1 <- py_load_object(filename = file.path("~/msca_code_examples/R_examples/MRBRT/mod1.pkl"), pickle = "dill")
This approach can be used for saving and loading any object created with reticulate.
To obtain predictions from a fitted model object, we first create a prediction frame and pass it to MRData(). The prediction frame should contain all columns that were specified as covariates in the model (e.g. only x1 in this example). Each row should contain a set of covariate values for which we want a prediction.
df_pred1 <- data.frame(x1 = seq(0, 10, by = 0.1))
dat_pred1 <- mr$MRData()
dat_pred1$load_df(
data = df_pred1,
col_covs=list('x1')
)
The linear predictor is the part of a regression equation that represents a linear combination of variables, like: \(\beta_0 + \beta_1 x_{i1}\). From a linear algebra point of view, the “variable” in the case of _0 is a column made entirely of 1 values; it is often omitted as it is here in order to simplify the notation. In this example, we use only the linear predictor to make predictions.
We pass the formatted prediction frame from the previous step to the predict() function to obtain point predictions.
df_pred1$pred1 <- mod1$predict(data = dat_pred1)
head(df_pred1, n = 5)
## x1 pred1
## 1 0.0 0.8307955
## 2 0.1 0.9277730
## 3 0.2 1.0247505
## 4 0.3 1.1217280
## 5 0.4 1.2187055
This example produced “out-of-sample” predictions, or predictions for sets of covariate values that do not necessarily align with what exists in the input data. An “extrapolation” is an out-of-sample prediction beyond the scope of the covariate in the original input data (e.g. predicting forward into the future when year is a covariate). An “interpolation” is an out-of-sample prediction that fills in gaps between observed values of a continuous covariate. If in-sample predictions, or predictions corresponding to each observation we used as input data, are desired instead, we can use the input data object (e.g. dat1 in this example) as the prediction frame.
Sometimes the estimated random effects are also used when making a point prediction. Random effects are study-specific values as in the random intercepts u_j in this formula: \(\beta_0 + \beta_1 x_{i1} + u_j\). In the fitting process, recall that the distribution of u_j values is assumed to be Normal with mean equal to zero and variance equal to \(\gamma : u_j \sim N(0, \gamma)\).
To incorporate the estimated random effects into point predictions, three modifications are needed:
Please note that if sort_by_data_id = TRUE is not specified, the predictions might not line up with the rows of the prediction data frame.
dat_pred2 <- mr$MRData()
dat_pred2$load_df(data = df_sim1, col_covs = list("x1"), col_study_id = "study_id")
df_sim1$pred_re <- mod1$predict(
data = dat_pred2,
predict_for_study = TRUE,
sort_by_data_id = TRUE
)
Here we plot the data…
with(df_sim1, plot(x1, y1))
for (id in unique(df_sim1$study_id)) {
with(filter(df_sim1, study_id == id), lines(x1, y1, lty = 2))
}
…and study-specific predictions:
with(df_sim1, plot(x1, y1))
for (id in unique(df_sim1$study_id)) {
df_tmp <- filter(arrange(df_sim1, x1), study_id == id)
with(df_tmp, lines(x1, pred_re, lty = 1))
}
Note that when is estimated to be exactly zero (i.e. when there is no excess variation beyond what we expect from the covariates and sampling error), predictions using the random effects will be the same as predictions using only the linear predictor.
To calculate uncertainty intervals around point predictions, we first need to decide which type of uncertainty is needed. Briefly, the three options are: 1) using uncertainty from the fixed effects only; 2) additionally incorporating between-study heterogeneity as a type of uncertainty; and 3) additionally incorporating uncertainty in the estimation of , the parameter describing the variance of between-study effects. Please see the previous chapter for a general discussion of the options.
To represent uncertainty from the fixed effects only (e.g. uncertainty in the estimation of _0 and _1 in this example), we take samples (also called “draws”) from the variance-covariance matrix using the sample_soln function. As the name suggests, the variance-covariance matrix describes not only the variance of the posterior distribution for each individual parameter but also the covariance between parameters. Covariance refers to correlations between the effects of variables passed to the model. In a model with many covariates, for example, maybe the model found that when the effect of covariate x1 is relatively high, the effect of covariate x2 tends to be relatively low. This type of information is taken into account when taking draws from the variance-covariance matrix with the sample_soln() function. The output of sample_soln() is a two-element list, where the first element contains draws of the coefficient(s) and the second element contains draws of the coefficient(s).
In the example below, the defining detail for obtaining uncertainty from only the fixed effects is setting random_study = FALSE. The input to gamma_samples parameter is ignored when random_study = FALSE. Note that the sample_soln() function assumes that the posterior distributions of the parameters are Gaussian in mrtool==0.0.3.
n_samples1 <- as.integer(1000)
samples1 <- mod1$sample_soln(
sample_size = n_samples1
)
draws1 <- mod1$create_draws(
data = dat_pred1,
beta_samples = samples1[[1]],
gamma_samples = matrix(rep(0, n_samples1), ncol = 1),
random_study = FALSE )
It can be useful to think of draws as a set of parallel universes. In each universe, we assume that the set of sampled coefficient values (i.e. the values for _0 and _1 in a given draw) is the truth, and we use the values to obtain a universe-specific point prediction. With the resulting set of universe-specific point predictions (i.e. the result of create_draws()), we obtain the bounds of the 95 percent uncertainty interval by calculating the 2.5th and 97.5th quantiles.
Here we define a function add_ui() for visualizing uncertainty intervals.
add_ui <- function(dat, x_var, lo_var, hi_var, color = "darkblue", opacity = 0.2) {
polygon(
x = c(dat[, x_var], rev(dat[, x_var])),
y = c(dat[, lo_var], rev(dat[, hi_var])),
col = adjustcolor(col = color, alpha.f = opacity), border = FALSE
)
}
Predictions using the draws from \(\beta_0\) and \(\beta_1\).
df_pred1$pred1_pt <- mod1$predict(data = dat_pred1)
df_pred1$pred1_lo <- apply(draws1, 1, function(x) quantile(x, 0.025))
df_pred1$pred1_up <- apply(draws1, 1, function(x) quantile(x, 0.975))
Here is a plot of predictions using only uncertainty from the fixed effects:
with(df_sim1, plot(x1, y1))
with(df_pred1, lines(x1, pred1_pt))
add_ui(df_pred1, "x1", "pred1_lo", "pred1_up")
To include between-study heterogeneity into the uncertainty for a particular prediction, two inputs to the create_draws() function are needed:
Because we only want to use the point estimates of for this uncertainty type, we create a new matrix in which the draw-specific rows for are identical:
n_samples2 <- 1000L
gamma_vals2 <- matrix(
data = mod1$gamma_soln,
nrow = n_samples2,
ncol = length(mod1$gamma_soln),
byrow = TRUE # 'byrow = TRUE' is important to include
)
And we pass in the new matrix gamma_vals2 to the create_draws() function like this:
samples2 <- mod1$sample_soln(sample_size = n_samples2)
draws2 <- mod1$create_draws(
data = dat_pred1,
beta_samples = samples2[[1]],
gamma_samples = gamma_vals2,
random_study = TRUE )
Here is a plot of the result:
df_pred1$pred2_pt <- mod1$predict(dat_pred1)
df_pred1$pred2_lo <- apply(draws2, 1, function(x) quantile(x, 0.025))
df_pred1$pred2_hi <- apply(draws2, 1, function(x) quantile(x, 0.975))
with(df_sim1, plot(x1, y1))
with(df_pred1, lines(x1, pred2_pt))
add_ui(df_pred1, "x1", "pred2_lo", "pred2_hi")
Finally, if draws of \(\gamma\) are desired, pass in the default output of sample_soln() to create_draws() and set random_study to TRUE.
MR-BRT is a Bayesian modeling framework. This means that priors can be specified for any estimated coefficient in the model. Priors provide an additional source of information (apart from the data) for what a final estimated coefficient value (i.e. the posterior) should be. A prior is specified as a probability distribution in which all of the parameters that define the distribution are known. While in principle any distribution may be used as a prior, MR-BRT implements only Uniform, Gaussian and Laplace priors.
This section gives examples for setting priors on and , or the parameters defining the fixed and random effects, in models without a spline. For priors that inform the non-linear shape of a spline, please see the next chapter.
In the first example, we tell the model that the slope corresponding to covariate x1 must be between -3.1 and -2.9. This is implemented with a Uniform prior by passing an array (array(c(-3.1, -2.9))) to the prior_beta_uniform argument of LinearCovModel, specifically the LinearCovModel call corresponding to covariate x1.
mod2 <- mr$MRBRT(
data = dat1,
cov_models = list(
mr$LinearCovModel("intercept", use_re = TRUE),
mr$LinearCovModel("x1", prior_beta_uniform = array(c(-3.1, -2.9)))))
mod2$fit_model()
df_pred1$pred2_pt <- mod2$predict(data = dat_pred1)
with(df_sim1, plot(x1, y1))
with(df_pred1, lines(x1, pred2_pt))
The negative range given by the Uniform prior stands in contrast to the true slope from which the data were generated (0.9), so the prediction line does not follow the trend of the data at all.
Recall that is the parameter representing the variance of between-study heterogeneity, specifically the variance of the estimated random effects. In this model, we have a random intercept specified by setting use_re = TRUE in the LinearCovModel function call for the “intercept”. A value of = 0 means that there is no between-study heterogeneity, so biasing toward zero effectively reduces the amount of between-study heterogeneity estimated by the model. We implement this bias by setting a Gaussian prior on gamma with mean equal to zero, passing an array to the prior_gamma_gaussian argument of the LinearCovModel function for the intercept. The first element of the array defines the mean and the second element defines the standard deviation.
mod3 <- mr$MRBRT(
data = dat1,
cov_models = list(
mr$LinearCovModel("intercept",
use_re = TRUE,
prior_gamma_gaussian = array(c(0, 0.02))),
mr$LinearCovModel("x1") ) )
mod3$fit_model()
dat_sim1_tmp <- mr$MRData()
dat_sim1_tmp$load_df(data = df_sim1, col_covs = list("x1"), col_study_id = "study_id")
df_sim1$pred_re_prior <- mod3$predict(
data = dat_sim1_tmp,
predict_for_study = TRUE,
sort_by_data_id = TRUE
)
Plotting the result, we see that the study-specific predictions have been pulled closer to the overall mean trend as compared to the plot earlier in the chapter showing study-specific predictions.
with(df_sim1, plot(x1, y1))
for (id in unique(df_sim1$study_id)) {
df_tmp <- filter(arrange(df_sim1, x1), study_id == id)
with(df_tmp, lines(x1, pred_re_prior, lty = 1))
}
In total, there are four ways to set (non-Laplace) priors on and . They include the following arguments to LinearCovModel(): prior_beta_uniform, prior_beta_gaussian, prior_gamma_uniform and prior_gamma_gaussian.
Trimming is a method for identifying and removing the influence of outliers in the process of fitting a model. In contrast to other methods that first fit a model with all data and then remove observations with large residuals, trimming is integrated into the likelihood. The likelihood defines what metric the model minimizes to find the optimal values of the coefficients, so the optimal coefficient values returned by a model with trimming have already accounted for and removed the influence of outliers.
In a meta-regression model, the size of the residual is not the only factor that matters for identifying outliers; we expect larger residuals for observations with larger standard errors. For that reason, MR-BRT trims based on the size of standardized residuals, or the residual divided by its standard error. That means for example that if two observations are equally far away from their in-sample prediction (i.e. the residual size is the same), the observation with the smaller standard error is more likely to be trimmed.
To use trimming in MR-BRT, the user tells the model what proportion of observations should be trimmed and the model decides which ones to trim. Users can extract information about which observations were in fact trimmed by querying a fitted model object.
In the following example, we create a simulated dataset with four outliers and see if MR-BRT can identify them. We also add in some variation to the standard errors, so they are not all equal. First we format the input data and create the prediction frame as usual.
Simulated dataset with four outliers.
set.seed(0)
df_sim2 <- df_sim1 %>%
rbind(., .[(nrow(.)-7):(nrow(.)-4), ] %>% mutate(y1=y1-35, is_outlier = TRUE)) %>%
arrange(x1) %>%
mutate(
variance_multiplier = runif(n = nrow(.), min = 0.6, max = 1.5),
y1_se2 = sqrt(y1_se^2 * variance_multiplier)
)
dat2 <- mr$MRData()
dat2$load_df(
data = df_sim2, col_obs = "y1", col_obs_se = "y1_se2",
col_covs = list("x1"), col_study_id = "study_id" )
df_pred2 <- data.frame(x1 = seq(0, 10, by = 2))
dat_pred2 <- mr$MRData()
dat_pred2$load_df(
data = df_pred2,
col_covs=list('x1')
)
Next we specify a model that uses trimming by using the inlier_pct argument of the MRBRT() function. Using a value of 0.9 means that the estimated coefficients will be informed by 90 percent of the data.
mod2 <- mr$MRBRT(
data = dat2,
cov_models = list(
mr$LinearCovModel("intercept", use_re = TRUE),
mr$LinearCovModel("x1") ),
inlier_pct = 0.9)
mod2$fit_model()
df_pred2$pred4_pt <- mod2$predict(data = dat_pred2)
A data frame identifying which observations have been trimmed can be obtained from the fitted model object mod2.
df_mod2 <- cbind(mod2$data$to_df(), data.frame(w = mod2$w_soln))
Visualizing the result, we see that the model discovered the four outliers in addition to two other observations. Points are sized proportional to each observation’s inverse variance because that is how they are weighted in the meta-regression model.
with(df_mod2, plot(
x = x1, y = obs,
col = ifelse(w == 1, "black", "red"),
pch = ifelse(w == 1, 1, 16),
cex = 1/obs_se^2
))
with(df_pred2, lines(x1, pred4_pt))
The mrtool package comes equipped with the CovFinder() function for identifying informative covariates. The algorithm is a form of forward selection in the sense that it starts with the simplest available model and selects each additional covariate iteratively. Covariate selection is important when the data set is not rich enough to support the estimation of coefficients corresponding to all prospective covariates. This might happen if the sample size of the data set is too low and/or in the presence of multi-collinearity.
Briefly, the selection algorithm involves first using Lasso regularization to rank order the covariates by informativeness. In general, Lasso works by penalizing the sum of absolute values of the coefficients with a strength proportional to a hyperparameter . Higher values of encourage the coefficients to be exactly zero (i.e. not contributing to the model, or not selected). In CovFinder(), the rank ordering process involves running many models and adjusting the parameter to become progressively more permissive; covariates enter the model one-by-one from most to least informative in this way. After the rank order is established, another round of forward selection is conducted on the basis of p-values. Models are run iteratively starting with the simplest model (intercept-only, or including only the pre-selected covariates); the process of adding covariates one-by-one stops when the additional covariate is not statistically significant.
The CovFinder() function has two required arguments:
Optional covariates include:
Trimming and the specification of random effects are also available. For a full list of CovFinder() options, please see py_help(mr$Covfinder).
In this simulated example, we create a dataset in which only two covariates truly have non-zero effect sizes. We use CovFinder() to identify these covariates in addition to one pre-selected covariate.
set.seed(123)
beta <- c(0, 0, 0, 1.2, 0, 1.5, 0)
gamma <- rep(0, 7)
num_obs <- 100
obs_se = 0.1
studies <- c("A", "B", "C")
covs <- cbind(
rep(1, num_obs),
do.call("cbind", lapply(1:(length(beta)-1), function(i) rnorm(num_obs)))
)
obs <- covs %*% beta + rnorm(n = num_obs, mean = 0, sd = obs_se)
df_sim3 <- as.data.frame(do.call("cbind", list(obs, rep(obs_se, num_obs), covs))) %>%
select(-V3)
cov_names <- paste0("cov", 1:(length(beta)-1))
names(df_sim3) <- c("obs", "obs_se", cov_names)
df_sim3$study_id <- sample(studies, num_obs, replace = TRUE)
head(df_sim3, n = 5)
## obs obs_se cov1 cov2 cov3 cov4 cov5
## 1 2.6356396 0.1 -0.56047565 -0.7104066 2.1988103 -0.7152422 -0.07355602
## 2 -0.1808163 0.1 -0.23017749 0.2568837 1.3124130 -0.7526890 -1.16865142
## 3 -1.2736295 0.1 1.55870831 -0.2466919 -0.2651451 -0.9385387 -0.63474826
## 4 0.4569638 0.1 0.07050839 -0.3475426 0.5431941 -1.0525133 -0.02884155
## 5 0.5878745 0.1 0.12928774 -0.9516186 -0.4143399 -0.4371595 0.67069597
## cov6 study_id
## 1 -0.6018928 B
## 2 -0.9936986 A
## 3 1.0267851 B
## 4 0.7510613 C
## 5 -1.5091665 A
Fit the model to identify covariates.
mrdata <- mr$MRData()
mrdata$load_df(
data = df_sim3,
col_obs = 'obs',
col_obs_se = 'obs_se',
col_study_id = 'study_id',
col_covs = as.list(cov_names)
)
candidate_covs <- cov_names[!cov_names == "cov1"]
covfinder <- mr$CovFinder(
data = mrdata,
covs = as.list(candidate_covs),
pre_selected_covs = list("cov1"),
normalized_covs = FALSE,
num_samples = 1000L,
power_range = list(-4, 4),
power_step_size = 1,
laplace_threshold = 1e-5
)
covfinder$select_covs(verbose = TRUE)
## Laplace std: 0.0001
## potential additional covariates []
## Laplace std: 0.001
## potential additional covariates ['cov3', 'cov5']
## Gaussian model fe_mean: [-0.00638271 1.19819778 1.49663731]
## Gaussiam model fe_std: [0.01122484 0.01032789 0.01026959]
## Gaussian model re_var: [0. 0. 0.]
## significance: [False True True]
## selected covariates: ['cov1', 'cov3', 'cov5']
covfinder$selected_covs
## [1] "cov1" "cov3" "cov5"