The CONDA-YAPA Study was a community cluster-randomised trial done in Blantyre, Malawi. We randomly allocated 16,660 adults living in 14 communities to either:
The primary trial outcome compared between groups the cumulative incidence of antiretroviral therapy (ART) initiation among all adult cluter residents in the first 6-months of HIV self-testing availability.
Final analysis showed that three-times as many adults initiated ART after self-testing where home treatment was available compared to where only clinic treatment was available.
Results are reported in MacPherson et al, JAMA 2014 link
Here, for exploratory purposes, we conduct a Bayesian re-analysis of the primary trial outcome.
First import the cluster-level summary dataset.
clustid is community cluster numbergroup is allocation group, with 0: clinic ART initiation, and 1: home initiationartstarts is the nunmber of adult cluster residents who initiated ART during the first 6-months of HIV self-testing availabilitypop is the cluster denominator (number of adult residents)library(tidyverse)
library(knitr)
library(hrbrthemes)
library(brms)
library(lme4)
library(broom)
library(scales)
df <- read_csv("20171103_CYP_ARTinitiations.csv")
Estimate the proportion of adults in each cluster who initiated ART.
df <- df %>%
mutate(propinitiated = artstarts/pop,
se = sqrt(propinitiated*(1-propinitiated)/pop),
lci = propinitiated - qnorm(0.975)*se,
uci = propinitiated + qnorm(0.975)*se)
kable(df)
| clustid | group | artstarts | pop | propinitiated | se | lci | uci |
|---|---|---|---|---|---|---|---|
| 2 | 0 | 11 | 1075 | 0.0102326 | 0.0030694 | 0.0042166 | 0.0162485 |
| 3 | 0 | 10 | 1209 | 0.0082713 | 0.0026048 | 0.0031660 | 0.0133766 |
| 8 | 0 | 12 | 1274 | 0.0094192 | 0.0027062 | 0.0041150 | 0.0147233 |
| 9 | 1 | 21 | 1346 | 0.0156018 | 0.0033779 | 0.0089812 | 0.0222224 |
| 10 | 1 | 40 | 1416 | 0.0282486 | 0.0044030 | 0.0196190 | 0.0368782 |
| 14 | 0 | 7 | 1181 | 0.0059272 | 0.0022336 | 0.0015494 | 0.0103050 |
| 15 | 1 | 24 | 1259 | 0.0190627 | 0.0038539 | 0.0115092 | 0.0266163 |
| 18 | 0 | 3 | 1276 | 0.0023511 | 0.0013558 | -0.0003062 | 0.0050084 |
| 19 | 0 | 9 | 1270 | 0.0070866 | 0.0023538 | 0.0024732 | 0.0117000 |
| 21 | 1 | 25 | 1098 | 0.0227687 | 0.0045016 | 0.0139457 | 0.0315916 |
| 22 | 1 | 24 | 949 | 0.0252898 | 0.0050966 | 0.0153007 | 0.0352789 |
| 25 | 1 | 18 | 923 | 0.0195016 | 0.0045515 | 0.0105808 | 0.0284225 |
| 26 | 0 | 11 | 1181 | 0.0093141 | 0.0027952 | 0.0038356 | 0.0147927 |
| 28 | 1 | 29 | 1203 | 0.0241064 | 0.0044222 | 0.0154391 | 0.0327737 |
Plot distribution of percentage of ART initiations by cluster.
ggplot(df, aes(x=as.factor(clustid), y=propinitiated,
fill = factor(group, labels=c("Facility initiation", "Home initiation")))) +
geom_bar(stat="identity") +
geom_errorbar(aes(ymin=lci, ymax=uci), width = 0.1) +
xlab("Cluster ID") +
ylab("Percentage of adult population who initiated ART (95% CI)") +
labs(title="Cluster-level summary of primary trial outcome") +
theme_ipsum() +
labs(fill="Allocation group",
caption="Primary trial outcome: Comparison between groups of the percentage of adult residents who initiated ART")
First we replicate the primary outcome analysis of the CONDA-YAPA trial using a generalised linear mixed model with a logit link function, and incorporating random effects for cluster of residence, and the fixed effects of trial group.
(Note in this re-analysis, we use generalised linear regression model with fixed effect term for trial group, and a random effects term for cluster. This differs slightly from the analysis reported in MacPherson et al, JAMA 2014, where we estimated the ratio of cluster-averaged means of ART initiations using methods described in Hayes and Moulton 2009).
test<- cbpp
g1 <- glmer(cbind(artstarts, pop) ~ group + (1 | clustid), data = df, family = binomial)
summary(g1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: cbind(artstarts, pop) ~ group + (1 | clustid)
## Data: df
##
## AIC BIC logLik deviance df.resid
## 84.3 86.2 -39.1 78.3 11
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.0907 -0.5743 0.2399 0.7129 1.4855
##
## Random effects:
## Groups Name Variance Std.Dev.
## clustid (Intercept) 0.001074 0.03277
## Number of obs: 14, groups: clustid, 14
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.9011 0.1273 -38.50 < 2e-16 ***
## group 1.0879 0.1482 7.34 2.1e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## group -0.855
Now, we exponentiate the co-efficients to give the relative risk and 95% confidence interval of ART initiation, comparing the home group to the clinic group.
cc <- confint(g1,parm="beta_")
## Warning in if (parm == "theta_") {: the condition has length > 1 and only
## the first element will be used
## Warning in if (parm == "beta_") {: the condition has length > 1 and only
## the first element will be used
ctab <- cbind(relative_risk=fixef(g1),cc)
rtab <- exp(ctab)
kable(rtab,digits=3)
| relative_risk | 2.5 % | 97.5 % | |
|---|---|---|---|
| (Intercept) | 0.007 | 0.006 | 0.009 |
| group | 2.968 | 2.206 | 4.030 |
Results are essentially identitical to the unadjusted risk ratio reported in MacPherson et al, JAMA 2014.
Now we will conduct a Bayesian analysis of the primary trial outcome.
We will combine likelihood estimates from the generalised linear mixed model (with a Beta family with logit link function) with a prior distribution of estimated effect size, and sample from the posterior distribution to obtain estimates of effect size and credible intervals.
The brms package is used to consutruct the regression model. brms link implements multilevel Bayesian models using Stan.
We did not define prior distributions before the trial was conducted, so here for this exploratory analysis, have selected a prior based on our estimated sample size calculations.
We specify the prior on a log-odds scale and to be normally distributed, with a mean at 0.18 (i.e. approximately equivalent to a relative risk of 1.5, i.e. we expected a 50% relative increase in the proportion of residents initiating ART where the home initiation intervention was available), and with a standard deviation of 0.25.
First, we define the prior distribution.
df$group <- as.factor(df$group)
Prior <- set_prior("normal (0.18,0.25)", class = "b")
Now we fit the Bayesian model using the brms package.
#fit the model with prior
fit1 <- brm(formula = propinitiated ~ group + (1 | clustid),
data = df,
family = Beta(link="logit"),
prior = Prior,
sample_prior = TRUE,
warmup = 1000, iter = 2000, chains = 4,
control = list(adapt_delta = 0.95))
##
## SAMPLING FOR MODEL 'beta(logit) brms-model' NOW (CHAIN 1).
##
## Gradient evaluation took 5.6e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.56 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 2000 [ 0%] (Warmup)
## Iteration: 200 / 2000 [ 10%] (Warmup)
## Iteration: 400 / 2000 [ 20%] (Warmup)
## Iteration: 600 / 2000 [ 30%] (Warmup)
## Iteration: 800 / 2000 [ 40%] (Warmup)
## Iteration: 1000 / 2000 [ 50%] (Warmup)
## Iteration: 1001 / 2000 [ 50%] (Sampling)
## Iteration: 1200 / 2000 [ 60%] (Sampling)
## Iteration: 1400 / 2000 [ 70%] (Sampling)
## Iteration: 1600 / 2000 [ 80%] (Sampling)
## Iteration: 1800 / 2000 [ 90%] (Sampling)
## Iteration: 2000 / 2000 [100%] (Sampling)
##
## Elapsed Time: 4.16275 seconds (Warm-up)
## 3.20716 seconds (Sampling)
## 7.36991 seconds (Total)
##
##
## SAMPLING FOR MODEL 'beta(logit) brms-model' NOW (CHAIN 2).
##
## Gradient evaluation took 2.8e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 2000 [ 0%] (Warmup)
## Iteration: 200 / 2000 [ 10%] (Warmup)
## Iteration: 400 / 2000 [ 20%] (Warmup)
## Iteration: 600 / 2000 [ 30%] (Warmup)
## Iteration: 800 / 2000 [ 40%] (Warmup)
## Iteration: 1000 / 2000 [ 50%] (Warmup)
## Iteration: 1001 / 2000 [ 50%] (Sampling)
## Iteration: 1200 / 2000 [ 60%] (Sampling)
## Iteration: 1400 / 2000 [ 70%] (Sampling)
## Iteration: 1600 / 2000 [ 80%] (Sampling)
## Iteration: 1800 / 2000 [ 90%] (Sampling)
## Iteration: 2000 / 2000 [100%] (Sampling)
##
## Elapsed Time: 5.15016 seconds (Warm-up)
## 7.78631 seconds (Sampling)
## 12.9365 seconds (Total)
##
##
## SAMPLING FOR MODEL 'beta(logit) brms-model' NOW (CHAIN 3).
##
## Gradient evaluation took 2.5e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 2000 [ 0%] (Warmup)
## Iteration: 200 / 2000 [ 10%] (Warmup)
## Iteration: 400 / 2000 [ 20%] (Warmup)
## Iteration: 600 / 2000 [ 30%] (Warmup)
## Iteration: 800 / 2000 [ 40%] (Warmup)
## Iteration: 1000 / 2000 [ 50%] (Warmup)
## Iteration: 1001 / 2000 [ 50%] (Sampling)
## Iteration: 1200 / 2000 [ 60%] (Sampling)
## Iteration: 1400 / 2000 [ 70%] (Sampling)
## Iteration: 1600 / 2000 [ 80%] (Sampling)
## Iteration: 1800 / 2000 [ 90%] (Sampling)
## Iteration: 2000 / 2000 [100%] (Sampling)
##
## Elapsed Time: 5.34778 seconds (Warm-up)
## 3.33996 seconds (Sampling)
## 8.68774 seconds (Total)
##
##
## SAMPLING FOR MODEL 'beta(logit) brms-model' NOW (CHAIN 4).
##
## Gradient evaluation took 3e-05 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.3 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 2000 [ 0%] (Warmup)
## Iteration: 200 / 2000 [ 10%] (Warmup)
## Iteration: 400 / 2000 [ 20%] (Warmup)
## Iteration: 600 / 2000 [ 30%] (Warmup)
## Iteration: 800 / 2000 [ 40%] (Warmup)
## Iteration: 1000 / 2000 [ 50%] (Warmup)
## Iteration: 1001 / 2000 [ 50%] (Sampling)
## Iteration: 1200 / 2000 [ 60%] (Sampling)
## Iteration: 1400 / 2000 [ 70%] (Sampling)
## Iteration: 1600 / 2000 [ 80%] (Sampling)
## Iteration: 1800 / 2000 [ 90%] (Sampling)
## Iteration: 2000 / 2000 [100%] (Sampling)
##
## Elapsed Time: 3.92644 seconds (Warm-up)
## 4.42356 seconds (Sampling)
## 8.35 seconds (Total)
summary(fit1)
## Family: beta(logit)
## Formula: propinitiated ~ group + (1 | clustid)
## Data: df (Number of observations: 14)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
## ICs: LOO = NA; WAIC = NA; R2 = NA
##
## Group-Level Effects:
## ~clustid (Number of levels: 14)
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept) 0.18 0.14 0.01 0.52 1894 1.00
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept -4.47 0.20 -4.86 -4.03 2002 1.00
## group1 0.53 0.21 0.09 0.93 2477 1.00
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## phi 267.79 123.76 84.42 561.08 2612 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Review the compiled Stan code used to fit the model.
stancode(fit1)
## // generated with brms 1.10.2
## functions {
## }
## data {
## int<lower=1> N; // total number of observations
## vector[N] Y; // response variable
## int<lower=1> K; // number of population-level effects
## matrix[N, K] X; // population-level design matrix
## // data for group-level effects of ID 1
## int<lower=1> J_1[N];
## int<lower=1> N_1;
## int<lower=1> M_1;
## vector[N] Z_1_1;
## int prior_only; // should the likelihood be ignored?
## }
## transformed data {
## int Kc = K - 1;
## matrix[N, K - 1] Xc; // centered version of X
## vector[K - 1] means_X; // column means of X before centering
## for (i in 2:K) {
## means_X[i - 1] = mean(X[, i]);
## Xc[, i - 1] = X[, i] - means_X[i - 1];
## }
## }
## parameters {
## vector[Kc] b; // population-level effects
## real temp_Intercept; // temporary intercept
## real<lower=0> phi; // precision parameter
## vector<lower=0>[M_1] sd_1; // group-level standard deviations
## vector[N_1] z_1[M_1]; // unscaled group-level effects
## // parameters to store prior samples
## real<lower=0> prior_phi;
## real<lower=0> prior_sd_1;
## }
## transformed parameters {
## // group-level effects
## vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
## }
## model {
## vector[N] mu = Xc * b + temp_Intercept;
## for (n in 1:N) {
## mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n];
## mu[n] = inv_logit(mu[n]);
## }
## // priors including all constants
## target += normal_lpdf(b | 0.18,0.25);
## target += gamma_lpdf(phi | 0.01, 0.01);
## target += student_t_lpdf(sd_1 | 3, 0, 10)
## - 1 * student_t_lccdf(0 | 3, 0, 10);
## target += normal_lpdf(z_1[1] | 0, 1);
## // likelihood including all constants
## if (!prior_only) {
## target += beta_lpdf(Y | mu * phi, (1 - mu) * phi);
## }
## // additionally draw samples from priors
## target += gamma_lpdf(prior_phi | 0.01,0.01);
## target += student_t_lpdf(prior_sd_1 | 3,0,10);
## }
## generated quantities {
## // actual population-level intercept
## real b_Intercept = temp_Intercept - dot_product(means_X, b);
## // additionally draw samples from priors
## real prior_b = normal_rng(0.18,0.25);
## }
Plot the model diagnostics
plot(fit1, ask = FALSE)
Extract exponeniated coefficients (i.e. Relative risk and 95% credible intervals) from model
tidy(exp(fixef(fit1))) %>%
kable()
| .rownames | Estimate | Est.Error | X2.5.ile | X97.5.ile |
|---|---|---|---|---|
| Intercept | 0.0114234 | 1.226633 | 0.0077482 | 0.0177216 |
| group1 | 1.7062389 | 1.233784 | 1.0988421 | 2.5421734 |
plot the model marginal effects
plot(marginal_effects(fit1))
Plot the prior and posterior distributions sampled from the model
exp_samples <- exp(posterior_samples(fit1, "b"))
exp_plot <- gather(exp_samples, Type, value) %>%
filter(Type !="b_Intercept")
exp_plot$Type <- factor(exp_plot$Type, levels=c("prior_b", "b_group1"), labels=c("Prior probability", "Posterior probability"))
numticks <- 6
exp_plot %>%
ggplot(aes(value, col=Type, fill=Type)) +
geom_density(alpha = 0.5) +
geom_vline(aes(xintercept =1), linetype="dashed") +
geom_vline(aes(xintercept =1.706), linetype="dashed", colour = "dodgerblue") +
annotate("text", x = 1.05, y = 5, label = paste("Favours home\ninitiation",sprintf('\u2192')), size=3.5, hjust=0) +
annotate("text", x = 0.93, y = 5, label = paste("Favours clinic\n", sprintf('\u2190'),"initiation"), size=3.5, hjust=1) +
labs(x = "Relative risk of ART initiation (Home vs. Facility)", y = "Probability density",
title = "Posterior probability of ART initiation in CONDA-YAPA Trial") +
ylim(0,5.5) +
theme_ipsum() +
scale_x_log10(breaks = trans_breaks(identity, identity, n = numticks)) +
theme(plot.title = element_text(size = 14)) +
theme(legend.title = element_blank())
What is the posterior probability that the covariate for Group 2 (vs. Group 1) is > 0
ps <- posterior_samples(fit1, "^b_")
colMeans(ps > 0)
## b_Intercept b_group1
## 0.00000 0.98975
After taking into account prior beliefs for the effectiveness of the intervention, this analysis is consistent 99% probability that the availability of home ART initiation after HIV self-testing resulted in an increase in the proportion of adult residents who initiated ART.