Splines in MR-BRT

In meta-regression models (e.g. the mrtool package) and generalized linear models (e.g. the regmod package), using a spline term is a way to incorporate a non-linear relationship between a continuous covariate and the dependent variable. Here we focus on splines as implemented in mrtool, the Python package that implements the MR-BRT mixed-effects modeling framework. We access mrtool using the reticulate R package as usual.

A spline term might be preferred when a linear effect, which can be estimated with a single parameter describing the slope, fails to capture important information about non-linear patterns in a covariate’s effect size. Splines are not the only method for estimating non-linear effects. Alternatives include Gaussian process regression, LOESS regression, or using a polynomial term.

# Load the mrbrt package through reticulate
Sys.setenv("RETICULATE_PYTHON" = "/ihme/code/mscm/miniconda3/envs/mrtool_0.0.2/bin/python")
library(reticulate)
reticulate::use_python("/ihme/code/mscm/miniconda3/envs/mrtool_0.0.2/bin/python")
mr <- reticulate::import("mrtool")
# Load R library to create synthetic data
library(dplyr)

1. B-splines and flexibility

The mrtool package implements a particular type of spline called a B-spline, or basis spline. This approach fits well into a linear regression framework like MR-BRT because B-spline coefficients can be expressed as an extension of the linear predictor: \(\beta_0 + \beta_1 x_1 + \cdot \cdot \cdot + \beta_n x_n\). Combined with the Bayesian prior options available in mrtool, the B-spline method allows for a high degree of control over the shape and flexibility of an estimated curve.

A key decision when using splines is how much flexibility to allow in the shape of the curve. As a general rule, greater flexibility involves estimating a greater number of model parameters, which in turn requires more data. If there is too little data for the number parameters, it can lead to overfitting; the non-linear curve might track the random noise in the data rather than the general trend. For that reason, we recommend starting with a minimal amount of flexibility and changing the specifications only when there is a specific reason. This approach helps to avoid overfitting and encourages modelers to track, document, and justify their modeling choices.

To implement a minimal degree of flexibility, the following specifications can be used as default inputs to the LinearCovModel() function call for the spline:

  • Three internal knots, for example specified as spline_knots = array(seq(0, 1, length.out = 5))
  • Spacing knots according to data density, specified as spline_knots_type = ‘frequency’
  • Quadratic splines, specified as spline_degree = 2L
  • Constraining the left and right tails to be linear, specified like spline_l_linear = TRUE and spline_r_linear = TRUE, respectively.

In the rest of the tutorial, we explain what these terms mean and demonstrate additional spline functionality available in the mrtool package. Additional functionality covered in this chapter includes:

  • Monotonicity, for example forcing a portion (or all) of the curve to be always non-decreasing;
  • Convexity and concavity;
  • Using Uniform and Gaussian priors; and
  • Ensemble splines.

2. Creating simulated data

First we make a function for creating simulated datasets with a non-linear effect. This will be used for multiple examples in the chapter.

create_simulated_data_spline <- function(k_studies, n_per_study, dgp, tau, sigma_bounds, seed = 1) {
  
  if (0) {
    seed = 1
    k_studies = 30
    n_per_study = 1
    dgp = "2 * sin(0.43*x1-2.9)"
    tau = 0.6
    sigma_bounds = c(0.6, 0.8)
  }
  
  require(dplyr)
  set.seed(seed)
  
  df_sim_study <- data.frame(study_id = as.factor(1:k_studies)) %>%
    mutate(study_effect = rnorm(n = k_studies, mean = 0, sd = tau) )
  
  df_sim_out <- do.call("rbind", lapply(1:nrow(df_sim_study), function(i) {
    df_sim_study[rep(i, n_per_study), ] })) %>%
    mutate(
      x1 = runif(n = nrow(.), min = 0, max = 100),
      y_true = eval(parse(text = dgp)),
      y_se = runif(n = nrow(.), min = sigma_bounds[1], max = sigma_bounds[2]),
      y_mean = y_true - min(y_true) + study_effect + rnorm(nrow(.), mean = 0, sd = y_se),
      is_outlier = FALSE) %>%
    arrange(x1)
  
  return(df_sim_out)
}

In the first simulated example, the relationship between covariate x1 and dependent variable y is J-shaped. This shape is common for biologically-based risk factors in epidemiology for which the optimal value is somewhere in the middle of the population distribution. For example, it is possible for systolic blood pressure to be either too low or too high with respect to health risk, making the risk curve J-shaped or U-shaped.

df_sim_spline1 <- create_simulated_data_spline(
  seed = 20,
  k_studies = 15,
  n_per_study = 1,
  dgp = "2 * sin(0.043*x1-2.9)",
  tau = 0.5,
  sigma_bounds = c(0.1, 0.3)
)

We plot the data with size proportional to the inverse of the standard error (SE). Observations with lower SE have less uncertainty and are therefore more heavily weighted in the model, so we want them to be bigger on the plot.

make_plot <- function(dat, size_multiplier = 1) {
  if (0) {
    dat <- df_sim_spline1
    size_multiplier = 0.2
  }
  with(dat, plot(
    x = x1, 
    y = y_mean,
    cex = 1/y_se * size_multiplier
  ))
  grid()
}

make_plot(dat = df_sim_spline1, size_multiplier = 0.5)

3. Loading the data

Here we format the data for passing it to the MRBRT() function. See the “MR-BRT examples” chapter for details.

dat4 <- mr$MRData()
dat4$load_df(
  data = df_sim_spline1, col_obs = "y_mean", col_obs_se = "y_se",
  col_covs = list("x1"), col_study_id = "study_id"
)

4. Understanding knot placement

The shape of the estimated curve is influenced by the quantity and placement of knots. Choosing a greater number of knots gives the curve more flexibility in general, and clustering knots closely together provides relatively greater flexibility in that portion of the domain. As a starting point, we recommend placing three internal knots such that the segments between the knots have a roughly equal amount of data.

Here is how we specify this with MR-BRT; the relevant code is spline_knots_type = ‘frequency’.

mod_4a <- mr$MRBRT(
  data = dat4,
  cov_models = list(
    mr$LinearCovModel("intercept", use_re = TRUE),
    mr$LinearCovModel(
      alt_cov = "x1",
      use_spline = TRUE,
      spline_degree = 2L, # 2L is quadratic
      spline_knots = array(seq(0, 1, length.out = 5)), # length.out=5 means 3 internal knots
      spline_knots_type = 'frequency', # 'frequency' means spaced according to data density
      spline_r_linear = TRUE,
      spline_l_linear = TRUE
    )
  )
)
mod_4a$fit_model()

The code below is just defining a function for plotting spline predictions, which we will use throughout the tutorial.

plot_spline_results <- function(mod_tmp, show_knots = TRUE) {
  df_pred_tmp <- data.frame(x1 = seq(0, 100, by = 0.1))
  dat_pred_tmp <- mr$MRData()
  dat_pred_tmp$load_df(
    data = df_pred_tmp, 
    col_covs=list('x1')
  )
  df_pred_tmp$pred_tmp <- mod_tmp$predict(data = dat_pred_tmp)
  with(df_pred_tmp, lines(x1, pred_tmp))
  if (show_knots) {
    for (k in mod_tmp$cov_models[[2]]$spline_knots) abline(v = k, col = "gray")
  }
}

Plotting the result, we notice that the knots (vertical lines) are not spaced evenly across x1. This happens because, by random chance, some portions of the x1 domain have greater data density than others. We prefer to place the knots according to data density so that no individual segment is subject to data sparsity; note that all segments contain approximately 4 data points.

make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4a)

4.1. Domain knots versus frequency knots

If spacing the knots equally over the domain is desired, change spline_knots_type = ‘frequency’ to spline_knots_type = ‘domain’ as below:

mod_4b <- mr$MRBRT(
  data = dat4,
  cov_models = list(
    mr$LinearCovModel("intercept", use_re = TRUE),
    mr$LinearCovModel(
      alt_cov = "x1",
      use_spline = TRUE,
      spline_degree = 2L, # 2L is quadratic
      spline_knots = array(seq(0, 1, length.out = 5)), # length.out=5 means 3 internal knots
      spline_knots_type = 'domain',
      spline_r_linear = TRUE,
      spline_l_linear = TRUE
    )
  )
)
mod_4b$fit_model()

The difference between choosing “frequency” and choosing “domain” is small in this example, but in the applied setting it can make a big difference. The key point is that if you are using “domain”, you will want to make sure that all segments have sufficient data to inform the shape of the curve. In the extreme, with 1 data point or no data in a segment, this can lead the model not to converge. With less extreme forms of data sparsity, fitting a flexible curve based on insufficient data can still lead to overfitting and instability in the predictions. For that reason, we typically only recommend using “domain” for special cases.

make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4b)

In the rest of the tutorial, spline_knots_type will be set to “frequency”.

4.2. Ensemble knots, briefly

There is another method for selecting knots which we describe in the section below regarding ensemble splines. Briefly, the approach is to randomly generate knot sets according to user-defined rules, fit a submodel with each individual knot set, score each knot set according to a performance metric, and make predictions as a performence-weighted average of submodel-specific predictions. Users might prefer this approach in general or simply as a data-driven way to identify knot locations.

4.3. External knots

So far, we have been discussing “internal” knots but not “external” knots. External knots are by definition placed at the minimum and maximum of the observed covariate values. This detail is ignorable in most cases but can be important for models without intercepts. That is, for no-intercept models in general, we expect the prediction to pass through the origin when all covariates are set to zero (i.e. \(E[y|x=0]=0\)). With mrtool, however, please note that the prediction would only be zero by definition when all non-spline covariates are set to zero and spline covariates are set to their minimum observed values.

5. Increasing the flexibility of the curve

The main methods for increasing the flexibility of a spline curve in mrtool are: 1) increase the number of knots; 2) increase the degree of the spline, converting it from quadratic to cubic; and 3) remove linearity in the tails, if applicable.

5.1. Increasing the number of knots

The following example increases the number of internal knots from 3 to 10, which is intentionally excessive. A typical number of internal knots is 3 to 5, but more may be desired for especially complex curve shapes. The relevant code is spline_knots = array(seq(0, 1, length.out = 12)). Note that specifying length.out = 12 gives 10 internal knots because two of them are the external knots.

mod_4c1 <- mr$MRBRT(
  data = dat4,
  cov_models = list(
    mr$LinearCovModel("intercept", use_re = TRUE),
    mr$LinearCovModel(
      alt_cov = "x1",
      use_spline = TRUE,
      spline_degree = 2L, # 2L is quadratic
      spline_knots = array(seq(0, 1, length.out = 8)), # length.out=8 means 6 internal knots
      spline_knots_type = 'frequency',
      spline_r_linear = TRUE,
      spline_l_linear = TRUE
    )
  )
)
mod_4c1$fit_model()
make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4c1)

5.2. Increasing the degree (2=quadratic, 3=cubic)

This section has the usual number of knots but increases the degree from 2L to 3L. In mathematical language, this changes it from a quadratic spline to a cubic spline. The use of higher degrees is technically possible but not recommended.

mod_4c2 <- mr$MRBRT(
  data = dat4,
  cov_models = list(
    mr$LinearCovModel("intercept", use_re = TRUE),
    mr$LinearCovModel(
      alt_cov = "x1",
      use_spline = TRUE,
      spline_degree = 3L, # 3L is cubic
      spline_knots = array(seq(0, 1, length.out = 5)), 
      spline_knots_type = 'frequency',
      spline_r_linear = TRUE,
      spline_l_linear = TRUE
    )
  )
)
mod_4c2$fit_model()
make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4c2)

5.3. Removing linearity in the tail segments

The following example removes linearity in the tail segments. This is typically not recommended except where there is strong evidence that the non-linear relationship has important variation at the lowest and/or highest end of the spline covariate. One example is that age-specific mortality risk is indeed usually high for infants compared to young children, but then mortality risk starts increasing consistently with increasing age.

mod_4c3 <- mr$MRBRT(
  data = dat4,
  cov_models = list(
    mr$LinearCovModel("intercept", use_re = TRUE),
    mr$LinearCovModel(
      alt_cov = "x1",
      use_spline = TRUE,
      spline_degree = 3L, # 3L is cubic
      spline_knots = array(seq(0, 1, length.out = 5)), 
      spline_knots_type = 'frequency',
      spline_r_linear = FALSE,
      spline_l_linear = FALSE
    )
  )
)
mod_4c3$fit_model()

We observe that without linearity in the tails (and using a cubic spline just for illustration), the model is overfitting the data at the high end of x1. This happens for the same reason that flexible models in general need a lot of data; data sparsity is leading to overfitting. It tracks too closely to the few observations that exist in that part of the domain. We expect data sparsity at the extremes by definition, and for that reason removing flexibility by using linear tails is the default.

make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4c3)

6. Monotonicity

Monotonicity constraints are for requiring that the effect of a spline covariate be increasing or decreasing with respect to the covariate. Technically, a spline with a decreasing monotonicity constraint is actually non-increasing, but we use the term “decreasing” for convenience.

The key argument for monotonicity is prior_spline_monotonicity. The allowed values are “increasing” and “decreasing”.

Here is an example with “increasing”.

mod_4d <- mr$MRBRT(
  data = dat4,
  cov_models = list(
    mr$LinearCovModel("intercept", use_re = TRUE),
    mr$LinearCovModel(
      alt_cov = "x1",
      use_spline = TRUE,
      spline_degree = 2L, # 2L is quadratic
      spline_knots = array(seq(0, 1, length.out = 5)), # length.out=5 means 3 internal knots
      spline_knots_type = 'frequency',
      spline_r_linear = TRUE,
      spline_l_linear = TRUE,
      prior_spline_monotonicity = "increasing"
    )
  )
)
mod_4d$fit_model()

A modeler might choose an increasing monotonicity constraint like this if, for example, a non-increasing effect is biologically implausible or otherwise not defensible from a scientific standpoint.

make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4d)

7. Convexity and concavity

A curve is convex if the second derivative (rate of change in the slope) is always >=0; this looks like the letter “U”. A curve is concave is the second derivative is always <=0; this looks like a dome. These specifications are not commonly used in health research, but can make the model more efficient if the particular shape can be assumed. In general, as we provide more information to the model a priori in the form of priors and constraints, the information content of the data can be used to more effectively learn about other aspects of the curve.

Here is an example of specifying a convexity constraint. The key argument is prior_spline_convexity. The allowed values are “convex” and “concave”.

mod_4e <- mr$MRBRT(
  data = dat4,
  cov_models = list(
    mr$LinearCovModel("intercept", use_re = TRUE),
    mr$LinearCovModel(
      alt_cov = "x1",
      use_spline = TRUE,
      spline_degree = 2L, # 2L is quadratic
      spline_knots = array(seq(0, 1, length.out = 5)), # length.out=5 means 3 internal knots
      spline_knots_type = 'frequency',
      spline_r_linear = TRUE,
      spline_l_linear = TRUE,
      prior_spline_convexity = "convex"
    )
  )
)
mod_4e$fit_model()

The J shape that underlies in the simulated data is already somewhat convex, so the addition of the convexity constraint does not make much of a difference.

make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4e)

Just for fun, let’s see what a concavity constraint would look like (code not shown; it includes prior_spline_convexity = “concave”).

mod_4e2 <- mr$MRBRT(
  data = dat4,
  cov_models = list(
    mr$LinearCovModel("intercept", use_re = TRUE),
    mr$LinearCovModel(
      alt_cov = "x1",
      use_spline = TRUE,
      spline_degree = 2L, # 2L is quadratic
      spline_knots = array(seq(0, 1, length.out = 5)), # length.out=5 means 3 internal knots
      spline_knots_type = 'frequency',
      spline_r_linear = TRUE,
      spline_l_linear = TRUE,
      prior_spline_convexity = "concave"
    )
  )
)
mod_4e2$fit_model()

make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4e2)

Because the data are clearly not dome-shaped, the best the model can do with a concavity constraint is a straight line.

8. Uniform and Gaussian priors

We can also alter the shape of the curve by applying Bayesian priors directly to the coefficients of the spline. Setting priors directly on the spline coefficients requires a fairly in-depth understanding of how B-splines are constructed and what the coefficients represent. Because this is outside the scope of the tutorial, we refer readers to the IHME blog post on splines which provides a more mathematical treatment of the topic. We give two examples here. For a full list of the prior options available, see py_help(mr$LinearCovModel); this assumes that you have run mr <- import(“mrtool”) as above.

As a side note, this section lays the groundwork for cascading splines, the topic of Chapter 5. Briefly, the method involves using spline coefficients estimated from a full-data model as priors on the spline coefficients for models fit to subsets of the data. In this way, information about the shape of the curve is passed hierarchically to different portions of the feature space.

8.1. Specifying a prior on the maximum order derivative

In both examples, we set priors on the maximum order derivative of a spline segment. We know how many spline segments there are based on how many internal knots there are. For example, using three internal knots splits the domain into four chunks, so the object we give to the model needs four priors (or one prior that for convenience will be used for all segments). Note that Uniform priors and Gaussian priors both have two parameters that define the prior. Uniform priors have a lower bound and an upper bound; Gaussian priors have a mean and standard deviation. Correspondingly, the structure of the object we pass to the model is a 2-by-n matrix, where n is the number of segments. The first row is either the lower bound (Uniform prior) or a mean (Gaussian prior), and the second row is either an upper bound (Uniform prior) or a standard deviation (Gaussian prior).

Without getting into too much math, there are two concepts that together give a good intuition for what the maximum order derivative represents. First, when the maximum order derivative for a cubic spline segment is set to exactly zero, it becomes a quadratic spline segment. Recall that quadratic splines have less flexibility than cubic splines. So essentially the strength of a prior with mean 0 acts as a tuning knob for reducing the flexibility of a spline. We call this a dampening prior, and it is the topic of the second example in this section. Second, for linear segments, as in piecewise linear splines (i.e. spline_degree = 1L) or linear tails, the maximum order derivative is nothing other than the slope of the line. So setting a prior on the maximum order derivative for linear segments is giving information to the model about what the slope should be. This is a topic of the first example.

8.2. Uniform prior on a linear tail

For the first example, we set a strict uniform prior on the slope of a linear tail segment. Specifically, we force the right linear tail to have a slope of −1, which is contrary to the prevailing trend in the data. We do this with the prior_spline_maxder_uniform argument to the mr$LinearCovModel() function. We use 4 internal knots for this example, yielding 5 spline segments. Correspondingly, we pass a 2-by-5 matrix to the model with information about priors. Note that setting Uniform priors with a lower bound of -Inf and an upper bound of Inf effectively provide no prior information to the model.

mod_4g <- mr$MRBRT(
  data = dat4,
  cov_models = list(
    mr$LinearCovModel("intercept", use_re = TRUE),
    mr$LinearCovModel(
      alt_cov = "x1",
      use_spline = TRUE,
      spline_degree = 2L, # 2L is quadratic
      spline_knots = array(seq(0, 1, length.out = 6)), # length.out=5 means 3 internal knots
      spline_knots_type = 'frequency', # 'frequency' means spaced according to data density
      spline_r_linear = TRUE,
      spline_l_linear = TRUE,
      prior_spline_maxder_uniform = rbind(c(-Inf,-Inf,-Inf,-Inf,-1), c(Inf,Inf,Inf,Inf,-1))
    )
  )
)
mod_4g$fit_model()

In the plot below, we see the effects of setting the slope of the right linear tail to exactly −1. Not only is this segment going against the trend in the data; it also has indirect effects on the fit in the adjacent segment. This happens due to the requirement in spline models generally that, exactly at a given knot location, the slope for adjacent spline segments must be equal to each other.

make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4g)

For the second example, we reduce the flexibility of a quadratic spline by using dampening priors. By using Gaussian priors with mean zero on the highest order derivative, we effectively create a situation where the spline is somewhere between quadratic (i.e. spline_degree = 2L) and piecewise linear (i.e. spline_degree = 1L). We remove the linear tail constraint in this example because otherwise, as we saw in the previous example, the mean-zero prior would bias the slope toward flatness (i.e. toward first derivative equals zero).

Recall the instance from above where we used too many knots, leading to too much wiggliness.

make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4c1)

Ten internal knots means that there are 11 spline segments, which is a lot. Dampening priors reduce the wiggles. Note that the choice of 0.1 for the prior’s standard deviation is arbitrary; lower values make the spline more like piecewise linear, and higher values make the spline more like quadratic.

mod_4c1_dampened <- mr$MRBRT(
  data = dat4,
  cov_models = list(
    mr$LinearCovModel("intercept", use_re = TRUE),
    mr$LinearCovModel(
      alt_cov = "x1",
      use_spline = TRUE,
      spline_degree = 2L, # 2L is quadratic
      spline_knots = array(seq(0, 1, length.out = 12)), # length.out=12 means 10 internal knots
      spline_knots_type = 'frequency',
      spline_r_linear = FALSE,
      spline_l_linear = FALSE,
      prior_spline_maxder_gaussian = array(c(0, 0.1))
    )
  )
)
mod_4c1_dampened$fit_model()

That worked well. We get the dual benefit of reasonable extrapolation (like piecewise linear splines) and differentiability at the knots (like quadratic splines).

make_plot(dat = df_sim_spline1, size_multiplier = 0.5)
plot_spline_results(mod_tmp = mod_4c1_dampened)