Model selection (bridge sampling)

Setup

This section contains the setup and the various utility functions used throughout.

Libraries used:

library(data.table)
library(ggplot2)
library(rstan)
library(bridgesampling)

Compiled code (the models are shown later):

rstan::rstan_options(auto_write = TRUE)
Sys.setenv(LOCAL_CPPFLAGS = '-march=native')
options(mc.cores = 1)
tg_stan <- new.env()
tg_stan$model1 <- rstan::stan_model(
  file = "..//stan//logit_bridgesamp1.stan", 
  auto_write = TRUE)
tg_stan$model2 <- rstan::stan_model(
  file = "..//stan//logit_bridgesamp2.stan",
  auto_write = TRUE)
tg_stan$model3 <- rstan::stan_model(
  file = "..//stan//logit_bridgesamp3.stan",
  auto_write = TRUE)

Scenario

Consider the following data arising from a clinical trial which compared the probability of moderate/severe adverse events when a covid 19 and flu vaccination are coadministered versus coadministration of covid 19 and placebo.

dd1 <- data.table(expand.grid(trt = 1:2, brand = 1:2, dose = 1:2))
setkey(dd1, trt, brand, dose)
  
dd1[, pae := c(0.60, 0.50,   # FLU AZ1 AZ2 0.1
             0.25, 0.40,   # FLU PF1 PF2 0.15
             0.40, 0.35,   # PBO AZ1 AZ2 0.05
             0.20, 0.30)]  # PBO PF1 PF2 0

dd1[, n := 120 ]
dd1[, y := as.integer(n * pae)]
dd1

##    trt brand dose  pae   n  y
## 1:   1     1    1 0.60 120 72
## 2:   1     1    2 0.50 120 60
## 3:   1     2    1 0.25 120 30
## 4:   1     2    2 0.40 120 48
## 5:   2     1    1 0.40 120 48
## 6:   2     1    2 0.35 120 42
## 7:   2     2    1 0.20 120 24
## 8:   2     2    2 0.30 120 36

As presented, the data have two brands of covid 19 vaccine (AZ and PF). Participants are enrolled into the rct at either (and only) their first or second covid 19 vaccination.

The the probability of adverse events might vary substantially across brand and cohort, but we do not know that is the case. For example, the AZ brand might show a decrease in the adverse event rate between dose 1 and 2, whereas the PF brand might show an increase in the adverse events at the second dose. Similarly, coadministration of the flu vaccination with the AZ brand might lead to a 10% increase in the AE rates, but it only leads to a 5% increase in the PF brand.

I have set up the data above so that there are differences across all of trt (CVD+FLU vs CVD+PBO), brand and dose. This gives us a change in the probability of adverse events (\(\delta\)) for each of the group as follows

dtmp <- cbind(dd1[trt == 1, ], dd1[trt == 2, ])
names(dtmp) <- c(paste0(names(dd1), 1), paste0(names(dd1), 2))
dtmp[, delta := pae1 - pae2]
dtmp

##    trt1 brand1 dose1 pae1  n1 y1 trt2 brand2 dose2 pae2  n2 y2 delta
## 1:    1      1     1 0.60 120 72    2      1     1 0.40 120 48  0.20
## 2:    1      1     2 0.50 120 60    2      1     2 0.35 120 42  0.15
## 3:    1      2     1 0.25 120 30    2      2     1 0.20 120 24  0.05
## 4:    1      2     2 0.40 120 48    2      2     2 0.30 120 36  0.10

So, we see a consistent rise in the probability of AE when the vaccine is coadministered with a flu vaccine, but this rise varies from 0.05 to 0.2 depending on the covid brand use and the dose at which it is received.

For the purposes of this post, I will also look at a more homogeneous dataset.

dd2 <- data.table(expand.grid(trt = 1:2, brand = 1:2, dose = 1:2))
setkey(dd2, trt, brand, dose)
  
dd2[, pae := c(0.60, 0.65,   # FLU AZ1 AZ2 
              0.40, 0.38,   # FLU PF1 PF2 
              0.40, 0.45,   # PBO AZ1 AZ2
              0.30, 0.28)]  # PBO PF1 PF2

dd2[, n := 120 ]
dd2[, y := as.integer(n * pae)]
dd2

##    trt brand dose  pae   n  y
## 1:   1     1    1 0.60 120 72
## 2:   1     1    2 0.65 120 78
## 3:   1     2    1 0.40 120 48
## 4:   1     2    2 0.38 120 45
## 5:   2     1    1 0.40 120 48
## 6:   2     1    2 0.45 120 54
## 7:   2     2    1 0.30 120 36
## 8:   2     2    2 0.28 120 33

dtmp <- cbind(dd2[trt == 1, ], dd2[trt == 2, ])
names(dtmp) <- c(paste0(names(dd2), 1), paste0(names(dd2), 2))
dtmp[, delta := pae1 - pae2]
dtmp

##    trt1 brand1 dose1 pae1  n1 y1 trt2 brand2 dose2 pae2  n2 y2 delta
## 1:    1      1     1 0.60 120 72    2      1     1 0.40 120 48   0.2
## 2:    1      1     2 0.65 120 78    2      1     2 0.45 120 54   0.2
## 3:    1      2     1 0.40 120 48    2      2     1 0.30 120 36   0.1
## 4:    1      2     2 0.38 120 45    2      2     2 0.28 120 33   0.1

In the above data, the probability of having an AE is again higher when the covid vaccination is admininstered with flu, but there is no variation across dose. For example, coadministration of brand 1 with flu leads to a 0.2 increase in the probability of an AE regardless of whether you were enrolled at your first or second covid vaccination.

Modeling

We might specify several possible models for this data, but here I use logistic regression. The first iteration models the probability of an adverse event by treatment group (CVD+FLU vs CVD+PBO).

\[ \begin{aligned} p(y_i | n_i, \pi_{trt[i]}) &\sim \mathsf{Binomial}(n_i, \pi_{trt[i]}) \\ \mathsf{logit}(\pi_{trt[i]}) &= \theta_{trt[i]} \\ \theta_{trt[i]} &\sim \mathsf{Normal}(0, 5) \end{aligned} \]

A stan version of this model is shown below.

data{
  int N;   
  int y[N]; 
  int n[N];
  int trt[N];
}
transformed data {
}
parameters{
  real mu[2];
}
transformed parameters{
}
model{
  target += normal_lpdf(mu | 0, 5);
  for(i in 1:N){
    target += binomial_logit_lpmf(y[i] | n[i], mu[trt[i]]);
  }
}
generated quantities{
  vector[2] pae;
  for(i in 1:2){
    pae[i] = inv_logit(mu[i]);
  }
}

However, only modeling the probability of adverse events by treatment group gives a coarse representation of the data. Other options might be to model (1) the variation in treatment and brand and (2) the variation in treatment, brand and dose. The specification for these models is just a variation on the theme of model 1 shown above.

A stan implementation of model 2 (modeling each treatment and brand averaging over dose) is shown below:

data{
  int N;   
  int y[N]; 
  int n[N];
  int trt[N];
  int brand[N];
}
transformed data {
}
parameters{
  real b[2, 2];
}
transformed parameters{
}
model{
  real eta[N];
  for(i in 1:N){  // trt
    target += normal_lpdf(b[trt[i], brand[i]] | 0, 5);
    target += binomial_logit_lpmf(y[i] | n[i], b[trt[i], brand[i]]);
  }
}
generated quantities{
  real pae[4];
  int k = 1;
  for(i in 1:2){
    for(j in 1:2){
      pae[k] = inv_logit(b[i, j]);
      k = k + 1;
    }
  }
}

A stan implementation of model 3 (modeling each treatment, brand and dose group) is shown below:

data{
  int N;   
  int y[N]; 
  int n[N];
}
transformed data {
}
parameters{
  real mu[N];
}
transformed parameters{
}
model{
  target += normal_lpdf(mu | 0, 5);
  for(i in 1:N){
    target += binomial_logit_lpmf(y[i] | n[i], mu[i]);
  }
}
generated quantities{
  vector[N] pae;
  for(i in 1:N){
    pae[i] = inv_logit(mu[i]);
  }
}

Fit each model:

ld <- list(
  N = 8, y = dd1$y, n = dd1$n, trt = dd1$trt, brand = dd1$brand
)
l1a <- rstan::sampling(tg_stan$model1, data = ld,
                      chains = 1, thin = 1,
                      iter = 40000, warmup = 2000, refresh = 0)
l2a <- rstan::sampling(tg_stan$model2, data = ld,
                      chains = 1, thin = 1,
                      iter = 40000, warmup = 2000, refresh = 0)
l3a <- rstan::sampling(tg_stan$model3, data = ld,
                      chains = 1, thin = 1,
                      iter = 40000, warmup = 2000, refresh = 0)

As a sanity check, we can summarise the posterior samples and compare them against the group means.

For model 1a, 2a and 3a:

list(
  colMeans(data.table(as.matrix(l1a, pars = "pae"))),
  dd1[, .(mu = sum(y)/sum(n)), by = trt],
  
  colMeans(data.table(as.matrix(l2a, pars = "pae"))),
  dd1[, .(mu = sum(y)/sum(n)), by = .(trt, brand)],
  
  colMeans(data.table(as.matrix(l3a, pars = "pae"))),
  dd1[, .(mu = sum(y)/sum(n)), by = .(trt, brand, dose)]
  )

## [[1]]
##    pae[1]    pae[2] 
## 0.4373594 0.3127658 
## 
## [[2]]
##    trt     mu
## 1:   1 0.4375
## 2:   2 0.3125
## 
## [[3]]
##    pae[1]    pae[2]    pae[3]    pae[4] 
## 0.5497981 0.3251805 0.3752058 0.2500954 
## 
## [[4]]
##    trt brand    mu
## 1:   1     1 0.550
## 2:   1     2 0.325
## 3:   2     1 0.375
## 4:   2     2 0.250
## 
## [[5]]
##    pae[1]    pae[2]    pae[3]    pae[4]    pae[5]    pae[6]    pae[7]    pae[8] 
## 0.5997638 0.5001544 0.2501667 0.4002705 0.3999781 0.3501605 0.2005129 0.3000433 
## 
## [[6]]
##    trt brand dose   mu
## 1:   1     1    1 0.60
## 2:   1     1    2 0.50
## 3:   1     2    1 0.25
## 4:   1     2    2 0.40
## 5:   2     1    1 0.40
## 6:   2     1    2 0.35
## 7:   2     2    1 0.20
## 8:   2     2    2 0.30

Fit the second dataset using the same models:

ld <- list(
  N = 8, y = dd2$y, n = dd2$n, trt = dd2$trt, brand = dd2$brand
)
l1b <- rstan::sampling(tg_stan$model1, data = ld,
                       chains = 1, thin = 1,
                       iter = 40000, warmup = 2000, refresh = 0)
l2b <- rstan::sampling(tg_stan$model2, data = ld,
                      chains = 1, thin = 1,
                      iter = 40000, warmup = 2000, refresh = 0)
l3b <- rstan::sampling(tg_stan$model3, data = ld,
                      chains = 1, thin = 1,
                      iter = 40000, warmup = 2000, refresh = 0)

For model 1b, 2b and 3b:

list(
  colMeans(data.table(as.matrix(l1b, pars = "pae"))),
  dd2[, .(mu = sum(y)/sum(n)), by = trt],
  
  colMeans(data.table(as.matrix(l2b, pars = "pae"))),
  dd2[, .(mu = sum(y)/sum(n)), by = .(trt, brand)],
  
  colMeans(data.table(as.matrix(l3b, pars = "pae"))),
  dd2[, .(mu = sum(y)/sum(n)), by = .(trt, brand, dose)]
  )

## [[1]]
##    pae[1]    pae[2] 
## 0.5061495 0.3563947 
## 
## [[2]]
##    trt      mu
## 1:   1 0.50625
## 2:   2 0.35625
## 
## [[3]]
##    pae[1]    pae[2]    pae[3]    pae[4] 
## 0.6247432 0.3874775 0.4250868 0.2875771 
## 
## [[4]]
##    trt brand     mu
## 1:   1     1 0.6250
## 2:   1     2 0.3875
## 3:   2     1 0.4250
## 4:   2     2 0.2875
## 
## [[5]]
##    pae[1]    pae[2]    pae[3]    pae[4]    pae[5]    pae[6]    pae[7]    pae[8] 
## 0.5997095 0.6494264 0.3999430 0.3752748 0.4004264 0.4499800 0.3002395 0.2754406 
## 
## [[6]]
##    trt brand dose    mu
## 1:   1     1    1 0.600
## 2:   1     1    2 0.650
## 3:   1     2    1 0.400
## 4:   1     2    2 0.375
## 5:   2     1    1 0.400
## 6:   2     1    2 0.450
## 7:   2     2    1 0.300
## 8:   2     2    2 0.275

BayesFactors

NOTE: For some reason I could not get this part to work unless I included the generated quantities block for computing pae in all the models; I do not know why this is.

Anyway, we want to consider the plausibility of each model, which can be achieved by reference to the posterior model probability:

\[ \begin{aligned} p(\mathcal{M}_j | y) = \frac{p(y | \mathcal{M}_j)p(\mathcal{M}_j)}{\sum_J p(y | \mathcal{M}_j)p(\mathcal{M}_j)} \end{aligned} \]

for each model \(\mathcal{M}_j\). The denominator is the sum of all the marginal likelihoods from each of the 3 models we used multiplied by the prior probability of the models, which we can just assume is \(1/3\) in this simple case. We then use the ratios of the posterior model probabilities to evaluate the relative plausibility of each model. For example:

\[ \begin{aligned} \frac{p(\mathcal{M}_3 | y)}{p(\mathcal{M}_1 | y)} = \frac{p(\mathcal{M}_3)}{p(\mathcal{M}_1)} \frac{p(y | \mathcal{M}_3)}{p(y | \mathcal{M}_1)} \end{aligned} \]

where the last ratio term is the Bayes factor.

Bridge-sampling provides a simple way to compute the marginal likelihood for each model.

h1a <- bridge_sampler(l1a, silent = TRUE)
# Approximate percentage error of the estimates (which 
# is low in this instance): 
# error_measures(h1a)$percentage
h2a <- bridge_sampler(l2a, silent = TRUE)
h3a <- bridge_sampler(l3a, silent = TRUE)

h1b <- bridge_sampler(l1b, silent = TRUE)
h2b <- bridge_sampler(l2b, silent = TRUE)
h3b <- bridge_sampler(l3b, silent = TRUE)

The bayes factors arising from modeling the first dataset under model assumptions 1, 2 and 3 are:

list(
  bridgesampling::bf(h2a, h1a), 
  bridgesampling::bf(h3a, h1a), 
  bridgesampling::bf(h3a, h2a)
)

## [[1]]
## Estimated Bayes factor in favor of h2a over h1a: 1.22844
## 
## [[2]]
## Estimated Bayes factor in favor of h3a over h1a: 151.42409
## 
## [[3]]
## Estimated Bayes factor in favor of h3a over h2a: 123.26537

These suggest that the data are about 150 times more likely under model 3 than model 1. It also suggests that the data are about 120 times more likely under model 3 than model 2. So, the gods are pointing towards model 3, which seems reasonable given the contrived dataset had varying treatment, brand and dose differences in the rate of AE.

For the second dataset:

list(
  bridgesampling::bf(h2b, h1b), 
  bridgesampling::bf(h3b, h1b), 
  bridgesampling::bf(h3b, h2b)
)

## [[1]]
## Estimated Bayes factor in favor of h2b over h1b: 7.27824
## 
## [[2]]
## Estimated Bayes factor in favor of h3b over h1b: 3.41497
## 
## [[3]]
## Estimated Bayes factor in favor of h3b over h2b: 0.46920

and now the data are more likely under model 2 than under model 3 or 1, which again seems reasonable given the data.

The actual posterior probabilities of each model can also be obtained:

list(
  bridgesampling::post_prob(h2a, h1a),
  bridgesampling::post_prob(h3a, h1a),
  bridgesampling::post_prob(h3a, h2a),
  bridgesampling::post_prob(h2b, h1b),
  bridgesampling::post_prob(h3b, h1b),
  bridgesampling::post_prob(h3b, h2b)
)

## [[1]]
##       h2a       h1a 
## 0.5512556 0.4487444 
## 
## [[2]]
##         h3a         h1a 
## 0.993439357 0.006560643 
## 
## [[3]]
##         h3a         h2a 
## 0.991952706 0.008047294 
## 
## [[4]]
##       h2b       h1b 
## 0.8792013 0.1207987 
## 
## [[5]]
##      h3b      h1b 
## 0.773498 0.226502 
## 
## [[6]]
##      h3b      h2b 
## 0.319359 0.680641

Summary

A toy example was set up to look at the application of bayesfactors to bayesian model selection and it seemed to lead to sensible inference. In the hetergeneous data, the bayes factor identified model 3 as the most plausible, but when a more homogenous dataset what used, the bayes factor sensibly identified model 2 as the most plausible.