Fits a collection of treatment and response models using the Bayesian Additive Regresssion Trees (BART) algorithm, producing estimates of treatment effects.
bartc(response, treatment, confounders,
data, subset, weights,
method.rsp = c("bart", "p.weight", "tmle"),
method.trt = c("none", "glm", "bart", "bart.xval"),
estimand = c("ate", "att", "atc"),
group.by = NULL,
commonSup.rule = c("none", "sd", "chisq"),
commonSup.cut = c(NA_real_, 1, 0.05),
args.rsp = list(), args.trt = list(),
p.scoreAsCovariate = TRUE, use.rbart = FALSE,
keepCall = TRUE, verbose = TRUE,
...)
response: A vector of the continuous outcome variable, or a reference to such in the data argument.
treatment: A vector of the binary treatment variable, or a reference to data.
confounders: A matrix or data frame of covariates to be used in estimating the treatment and response model. Can also the right-hand-side of a formula (e.g. x1 + x2 + …). The data argument will be searched when supplied.
data: An optional data frame or named list containing the response, treatment, and confounders.
subset: An optional vector using to subset the data. Can refer to data if provided.
weights: An optional vector of population weights used in model fitting and estimating the treatment effect. Can refer to data if provided.
method.rsp: A character string specifying which method to use when fitting the response surface and estimating the treatment effect. Options are: "bart" - fit the response surface with BART and take the average of the individual treatment effect estimates, "p.weight" - fit the resposne surface with BART but compute the treatment effect estimate by using a propensity score weighted sum of individual effects, and "tmle" - as above, but further adjust the individual estimates using the Targetted Minimum Loss based Estimation (TMLE) adjustment.
method.trt: A character string specifying which method to use when fitting the treatment assingment mechanism, or a vector/matrix of propensity scores. Character string options are: "none" - do no propensity score estimation, "glm" - fit a generalized linear model with a binomial response and all confounders added linearly, "bart" - fit BART directly to the treatment variable, and "bart.xval" - crossvalidate the treatment fit with different values of the end-node prior variance. Cannot be “none” if the response model requires propensity scores. When supplied as a matrix, it should be of dimensions equal to the number of observations times the number of samples used in any response model.
estimand: A character string specifying which causal effect to target. Options are "ate" - average treatment effect, "att" - average treatment effect on the treated, and "atc" - average treatment effect on the controls.
group.by: An optional factor that, when present, causes the treatment effect estimate to be calculated within each group.
commonSup.rule: Rule for exclusion of observations lacking in common support. Options are "none" - no suppression, "sd" - exlude units whose predicted counterfactual standard deviation is extreme compared to the maximum standard deviation under those units’ observed treatment condition, where extreme refers to the distribution of all standard deviations of observed treatment conditions, "chisq" - exclude observations according to ratio of the variance of posterior predicted counterfactual to the posterior variance of the observed condition, having a Chi Squared distribution with one degree of freedom under the null hypothesis of have equal distributions.
commonSup.cut: Cuttoffs for commonSup.rule. Ignored for "none", when commonSup.rule is "sd", refers to how many standard deviations of the distribution of posterior variance for counterfactuals an observation can be above the maximum of posterior variances for that treatment condition. When commonSup.rule is "chisq", is the p value used for rejection of the hypothesis of equal variances.
p.scoreAsCovariate: A logical such that when TRUE, the propensity score is added to the response model as a covariate.
use.rbart: Logical specifying the use of rbart for when group.by is supplied. When TRUE, the grouping variable is added as a random intercept to the response model.
keepCall: A logical such that when FALSE, the call to bartc is not kept. This can reduce the amount of information printed by summary when passing in data as literals.
verbose: A logical that when TRUE prints information as the model is fit.
args.rsp,args.trt,…
Further arguments to the treatment and response model fitting algorithms. Arguments passed to the main function as … will be used in both models. args.rsp and args.trt can be used to set parameters in a single fit, and will override other values. See glm and bart2 for reference.
bartc represents a collection of methods that primarily use the Bayesian Additive Regression Trees(BART) algorithm to estimate causal treatment effects with binary treatment variables and continuous outcomes. This requires models to be fit to the response surface (distribution of the response as a function of treatment and confounders, p(Y(1), Y(0) | X) and optionally for treatment assignment mechanism (probability of receiving treatment, i.e. propensity score, Pr(Z = 1 | X)). The response surface model is used to impute counterfactuals, which may then be adjusted together with the propensity score to produce estimates of effects.
Similar to lm, models can be specified symbolically. When the data term is present, it will be added to the search path for the response, treatment, and confounders variables. The confounders must be specified devoid of any “left hand side”, as they appear in both of the models.
Response Surface
The response surface methods included are:
"bart" - use BART to fit the response surface and produce individual estimates Y(1)^hat_i and Y(0)^hat_i. Treatment effect estimates are obtained by averaging the difference of these across the population of interest.
"p.weight" - individual effects are esimated as in “bart”, but treatment effect estimates are obtained by using a propensity score weighted average. For the average treatment effect on the treated, these weights are \(p(z_i | x_i) / (∑ z / n)\). For ATC, replace z with 1 - z. For ATE, “p.weight” is equal to “bart”.
"tmle" - individual effects are esimated as in “bart” and a weighted average is taken as in “p.weight”, however the response surface estimates and propensity scores are corrected by using the Targetted Minimum Loss based Estimation method.
Treatment Assignment
The treatment assignment models are:
"none" - no modeling is doing. Only applies when using response method “bart” and p.scoreAsCovariate is FALSE.
"glm" - fit a binomial generalized linear model with logistic link and confounders included as linear terms.
"bart" - fit a binary BART directly to the treatment using all the confounders.
"bart.xval" - use the xbart function to perform cross validation on the node prior sensitivity (for that method, k) before fitting a final model.
Finally, a vector or matrix of propensity scores can be supplied. Propensity score matrices should have a number of rows equal to the number of observations in the data and a number of columns equal to the number of posterior samples.
Common Support Rules
Common support, or that the probability of receiving all treatment conditions is non-zero within every area of the covariate space (P(Z = 1 | X = x) > 0 for all x in the inferential sample), can be enforced by excluding observations with high posterior uncertainty. bartc supports two common support rules through commonSup.rule argument:
"sd" - observations are cut from the inferential sample if: s_i^f(1-z) > m_z + a * sd(s_j^f(z)), where s_i^f(1-z) is the posterior standard deviation of the predicted counterfactual for observation i, s_j^f(z) is the posterior standard deviation of the prediction for the observed treatment condition of objservation j, sd(s_j^f(z)) is the empirical standard deviation of those quantities, and m_z = max_j s_j^f(z) for all j in the same treatment group, i.e. Z_j = z. a is a constant to be passed in using commonSup.cut and its default is 1.
"chisq" - observations are cut from the inferential sample if: s_i^f(1-z) / s_if(z))2 > q_α, where s_i are as above and q_α, is the upper α percentile of a χ^2 distribution with one degree of freedom, corresponding to a null hypothesis of equal variance. The default for α is 0.05, and it is specified using the commonSup.cut parameter.
Special Arguments
Some default arguments are unconvential or are passed in a unique fashion.
If n.chains is missing, unlike in bart2 a default of 10 is used.
For method.trt == “bart.xval”, any of the parameters accepted by both xbart and bart2 can be passed as a list in args.trt and the first value will be used in the crossvalidation step and the second for the fit itself.
For method.rsp == “tmle”, a special arg.trt of posteriorOfTMLE determines if the TMLE correction should be applied to each posterior sample (TRUE), or just the posterior mean (FALSE).
Missing Data
Missingness is allowed only in the response. If some response values are NA, the BART models will be trained just for where data are available and those values will be used to make predictions for the missing observations. Missing observations are not used when calculating statistics for assessing common support, although they may still be excluded on those grounds. Further, missing observations may not be compatible with response method “tmle”.
bartc returns an object of class bartcFit. Information about the object can be derived by using methods summary, plot_sigma, plot_est, plot_indiv, and plot_support. Numerical quantities are recovered with the fitted and extract generics.
Objects of class bartcFit are lists containing items:
method.rsp
character string specifying the method used to fit the response surface
method.trt
character string specifying the method used to fit the treatment assignment mechanism
estimand
character string specifying the targetted causal effect
fit.rsp
object containing the fitted response model
data.rsp
dbartsData object used when fitting the response model
fit.trt
object containing the fitted treatment model
group.by
optional factor vector containing the groups in which treatment effects are estimated
samples.est
matrix or array of posterior samples of the treatment effect estimate
samples.indiv.diff
matrix or array of posterior samples of the individual treatment effects
p.score
the vector of propensity scores used as a covariate in the response model, when applicable
samples.p.score
matrix or array of posterior samples of the propensity score, when applicable
name.trt
character string giving the name of the treatment variable in the data of fit.rsp
trt
vector of treatment assignments
call
how bartc was called
n.chains
number of independent posterior sampler chains in response model
commonSup.rule
common support rule used for suppressing observations
commonSup.cut
common support parameter used to set cutoff when suppression observations
sd.obs
vector of standard deviations of individual posterior predictors for observed treatment conditions
sd.cf
vector of standard deviations of individual posterior predictors for counterfactuals
commonSup.sub
logical vector expressing which observations are used when estimating treatment effects
Author(s) Vincent Dorie: vdorie@gmail.com.
Chipman, H., George, E. and McCulloch R. (2006) Bayesian Ensemble Learning. Proceedings of the 19th International Conference on Neural Information Processing Systems, 265–272. Cambridge, MA: MIT Press. http://papers.nips.cc/paper/3084-bayesian-ensemble-learning.pdf.
Hill, J. L. (2011) Bayesian Nonparametric Modeling for Causal Inference. Journal of Computational and Graphical Statistics 20(1), 217–240. Taylor & Francis. https://doi.org/10.1198/jcgs.2010.08162.
Hill, J. L. and Su Y. S. (2013) Assessing Lack of Common Support in Causal Inference Using Bayesian Nonparametrics: Implications for Evaluating the Effect of Breastfeeding on Children’s Cognitive Outcomes The Annals of Applied Statistics 7(3), 1386–1420. https://www.jstor.org/stable/23566478.
See Also bart2
## fit a simple linear model
n <- 1000L
beta.z <- c(.75, -0.5, 0.25)
beta.y <- c(.5, 1.0, -1.5)
sigma <- 2
set.seed(725)
x <- matrix(rnorm(3 * n), n, 3)
tau <- rgamma(1L, 0.25 * 16 * rgamma(1L, 1 * 32, 32), 16)
p.score <- pnorm(x %*% beta.z)
z <- rbinom(n, 1, p.score)
mu.0 <- x %*% beta.y
mu.1 <- x %*% beta.y + tau
y <- mu.0 * (1 - z) + mu.1 * z + rnorm(n, 0, sigma)
y %>% length()
## [1] 1000
fit <- bartc(y, z, x, n.samples = 1000L)
## fitting treatment model via method 'bart'
## fitting response model via method 'bart'
summary(fit)
## Call: bartc(response = y, treatment = z, confounders = x, n.samples = 1000L)
##
## Causal inference model fit by:
## model.rsp: bart
## model.trt: bart
##
## Treatment effect:
## estimate sd ci.lower ci.upper
## ate 0.09884 0.1688 -0.232 0.4296
## Estimates fit from 1000 total observations
## 95% credible interval calculated by: normal approximation
## Result based on 10000 posterior samples across 10 chains
fitted(fit,value = "y",sample = "all") %>% hist()
fitted(fit,value = "y1",sample = "all") %>% hist()
fitted(fit,value = "y0",sample = "all") %>% hist()
tibble(X = fitted(fit,value = "indiv.diff",sample = "all")) %>%
ggplot(aes(X)) +
geom_histogram(binwidth = 0.05,col = "blue")
plot_indiv(fit)
## example to show refitting under the common support rule
fit2 <- refit(fit, commonSup.rule = "sd")
fit2
## Call:
## bartc(response = y, treatment = z, confounders = x, n.samples = 1000L)
##
## Treatment effect (ate): 0.09973
fit3 <- bartc(y, z, x, subset = fit2$commonSup.sub, n.samples = 1000L)
## fitting treatment model via method 'bart'
## fitting response model via method 'bart'
fit3
## Call:
## bartc(response = y, treatment = z, confounders = x, subset = fit2$commonSup.sub,
## n.samples = 1000L)
##
## Treatment effect (ate): 0.08368
Visual exploratory data analysis and model fitting diagnostics for causal inference models fit using the bartc function.
## S3 method for class 'bartcFit'
fitted(object,
value = c("est", "y", "y0", "y1", "indiv.diff", "p.score", "p.weights"),
sample = c("inferential", "all"),
...)
extract(object, ...)
## S3 method for class 'bartcFit'
extract(object,
value = c("est", "y", "y0", "y1", "indiv.diff", "p.score", "p.weights"),
sample = c("inferential", "all"),
combineChains = TRUE,
...)
refit(object, newresp, ...)
## S3 method for class 'bartcFit'
refit(object,
newresp = NULL,
commonSup.rule = c("none", "sd", "chisq"),
commonSup.cut = c(NA_real_, 1, 0.05),
...)
predict(object, newdata, ...)
## S3 method for class 'bartcFit'
predict(object,
newdata,
value = c("y1", "y0", "indiv.diff", "p.score"),
combineChains = TRUE,
...)
object: Object of class bartcFit.
value: Which quantity to return. See details for a description of possible values.
sample: Return information for either the “inferential” (e.g. treated observations when the estimand is att) or “all” observations.
combineChains: If the models were fit with more than one chain, results retain the chain structure unless combineChains is TRUE.
newresp: Not presently used, but provided for compatibility with other definitions of the refit generic.
newdata: Data corresponding to the confounders in a bartc fit.
commonSup.rule, commonSup.cut
As in bartc
Additional parameters passed up the generic method chain.
fitted returns the values that would serve as predictions for an object returned by the bartc function, while extract instead returns the full matrix or array of posterior samples. The possible options are:
“est” - the estimate itself, e.g. ATE
“y” - predictions under the observed treatment condition, i.e. \(\hat{y}_i(1) * z_i + \hat{y}_i(0) * (1 - z_i)\).
“y0” - predictions for all observations under the control
“y1” - predictions for all observations under the treatment
“indiv.diff” - for all observations the individual treatment effect estimates, i.e.$ _i(1) _i(0)$.
“p.score” - probability that each observation is assigned to the treatment group
“p.weights” - weights assigned to each individual difference if the response method is “p.weight”
refit exists to allow the same regressions to be used to calculate estimates under different common support rules. To refit those models on a subset, see the examples in bartc.
predict allows the fitted model to be used to make predictions on an out-of-sample set. Requires model to be fit with keepTrees equal to TRUE.
For fitted, extract, and predict, a matrix, array, or vector depending on the dimensions of the result and the number of chains. For the following, when n.chains is one the dimension is dropped.
“est” - when fitted, a scalar; when extract, n.samples x n.chains
“y”, “y0”, “y1”, “indiv.diff”, “p.weights” - when fitted, a vector of length equal to the number of observations (n.obs); when extract or predict, a matrix or array of dimensions n.obs x n.samples x n.chains.
“p.score” - depending on the fitting method, samples may or not be present. When samples are absent, a vector is returned for both functions. When present, the same as “y”.
For refit, an object of class bartcFit.
Author(s) Vincent Dorie: vdorie@gmail.com.
See Also bartc
Visual exploratory data analysis and model fitting diagnostics for causal inference models fit using the bartc function.
plot_sigma(x, main = "Traceplot sigma",
xlab = "iteration", ylab = "sigma",
lty = 1:x$n.chains,
...)
plot_est(x, main = paste("Traceplot", x$estimand),
xlab = "iteration", ylab = x$estimand,
lty = 1:x$n.chains, col = NULL,
...)
plot_indiv(x, main = "Histogram Individual Effects",
xlab = "treatment effect",
breaks = 20,
...)
plot_support(x, main = "Common Support Scatterplot",
xvar = "tree.1", yvar = "tree.2",
xlab = NULL, ylab = NULL,
pch.trt = 21, bg.trt = "black",
pch.ctl = pch.trt, bg.ctl = NA,
pch.sup = pch.trt, bg.sup = NA, col.sup = "red", cex.sup = 1.5,
legend.x = "topleft", legend.y = NULL,
...)
x
Object of class bartcFit.
main
Character title of plot.
xlab
Character label of x axis. For plot_support, if NULL a default will be used.
ylab
Character label of y axis. For plot_support, if NULL a default will be used.
lty
For line plots (plot.sigma, plot.est), models use the values of lty to visually distinguish each chain.
lty
For plot.est, use the values of col to visually distinguish each group, if there are any.
breaks
Argument to codehist.
xvar
Variable for use on x axis. Can be one of “tree.XX”, “pca.XX”, “css”, “p.score”, “y”, “y0”, “y1”, “indiv.diff”, “p.score”, “p.weights”, the number of name of a column used to fit the response model, or a given vector. See below for details.
yvar
Variable for use on the y axis, of the same form as xvar.
pch.trt
pch point value used when plotting treatment observations.
bg.trt
bg background value used when plotting treatment observations.
pch.ctl
pch point value used when plotting control observations.
bg.ctl
bg background value used when plotting treatment observations.
pch.sup
pch point value used when plotting suppressed observations.
bg.sup
bg background value used when plotting suppressed observations.
col.sup
col color value used when plotting suppressed observations.
cex.sup
cex size value used when plotting suppressed observations.
legend.x
x value passed to legend. If NULL, legend plotting is skipped.
legend.y
Optional y value passed to legend
Optional graphical parameters.
Produces various plots using objects fit by bartc. plot_sigma and plot_est are traditional parameter trace plots that can be used to diagnose the convergence of the posterior sampler. If the bartc model is fit with n.chains greater than one, by default each chain will be plotted with its own line type.
plot_indiv produces a simple histogram of the distribution of the estimates of the individual effects, taken as the average of their posterior samples.
plot_support is used to visualize the common support diagnostic in the form of a scatterplot. Points that the diagnostic excludes are outlined in red. The contents of the x and y axes are controlled by the xvar and yvar arguments respectively and can be of the form:
“tree.XX” - Variable number “XX” as selected by variable importance in predicting individual treatment effects using a tree fit by rpart. “XX” must be a number not exceeding the number of continuous variables used to fit the response model.
“pca.XX” - “XX”th principal component axis.
“css” - The common support statistic.
“y” - Observed response variable.
“y0” - Predicted values for the response under the control as obtained by fitted.
“y1” - Predicted values for the response under the treatment fitted.
“indiv.diff” - Individual treatment effect estimates, or \(\hat{y}(1) - \hat{y}(0)\).
“p.score” - Propensity score used to fit the response model (if applicable).
“p.weights” - Weights used when taking average of individual differences for response method “p.weight”.
predictor columns - Given by name or number.
supplied vector - Any vector of length equal to the number of observations.
plot_sigma(fit)
plot_support(fit2, xvar = "tree.1", yvar = "css", legend.x = NULL)
plot_est(fit)
plot_indiv(fit)