The following tutorial illustrates how to use the CEAforests package to estimate heterogeneous effects and costs in cost-effectiveness analyses alongside clinical trials or observational studies. The package uses the causal_forest function from the grf package to fit causal forests according to the procedures presented in Bonander & Svensson (2021, Health Economics). Its main purpose is to provide a set of helpful functions that can be used to implement and analyze the results of causal forests in cost-effectiveness analyses.
Generate example data
We begin by generating some data for the tutorial. The dataset contains three correlated \(X\) variables, a treatment variable \(W_i\), quality-adjusted life years (QALYs, \(Y_i\)) and healthcare costs in Euros (\(C_i\)).
require(MASS)
#> Loading required package: MASS
require(GGally)
#> Loading required package: GGally
#> Loading required package: ggplot2
#> Registered S3 method overwritten by 'GGally':
#> method from
#> +.gg ggplot2
# Set sample size
n = 500
# Set seed for replication
RNGversion("3.6.0")
set.seed(11313413)
# Generate three X variables with some correlation structure
mu = rep(0,3)
x1cors = c(1, -0.5, -0.1)
x2cors = c(-0.5, 1, 0)
x3cors = c(-0.1, 0, 1)
Sigma = matrix(c(x1cors, x2cors, x3cors), nrow=3, ncol=3)
Xvars = mvrnorm(n=n, mu=mu, Sigma=Sigma)
X1 = Xvars[,1]
X2 = Xvars[,2]
X3 = Xvars[,3]
#Generate binary treatment variable W
Z = rbinom(n, 1, 0.5)
W = rbinom(n, 1, 1/(1+exp(-X1*-1.5+Z*1.4)))
#Generate costs
mui = 0.5*W + W*X2*0.6 + W*X3*0.3 + rnorm(n,0,1)
mui_s = pnorm(mui)
C = (qgamma(mui_s, shape=1, rate=2)*100000)
#Generate QALYs
lni = 0.4 + 0.2*W + W*X1*0.3 + W*X3*0.5
sd = 0.4
Y = rlnorm(n, lni, sd)
# Convert X to look more like real data while keeping dependencies
xnames <- c("Severe_disease", "HRQoL", "Risk_score")
pvars = pnorm(Xvars)
povars <- round(qnorm(pvars, 50, 8), 1)
betavars <- qbeta(pvars, 1, 2)
binomvars <- qbinom(pvars, 1, .50)
X1 = binomvars[,1]
X2 = betavars[,2]
X3 = povars[,3]
Xvars = cbind(X1,X2,X3)
colnames(Xvars) = xnames
options(scipen=999) #Remove scientific notation
ggpairs(data.frame(Y,C,W,Xvars),
title="Distributions and correlation structure in the generated dataset",
progress=FALSE, diag=list(continuous="barDiag")) + theme_bw(base_size=8)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.Basic functionality
Training a CEA forest
Next, we train a CEA forest using the cea_forest command, which has four required inputs: three vectors containing the health outcome \(Y_i\) (e.g., QALYs), healthcare costs \(C_i\), and a treatment indicator \(W_i\), respectively, and a matrix where each column represents the \(X\) variables. cea_forest estimates heterogeneity in outcomes, costs and in the net monetary benefit as a function of the \(X\) variables. By default, it also adjusts for confounding by \(X\) using propensity scores. The latter feature can be circumvented by supplying known propensities, or propensities estimated using other methods, with the W.hat option. To calculate the net monetary benefit, we also need to specify a willingness to pay per QALY threshold (\(\lambda\)). In the example, we set \(\lambda\) to 50000. The function can also pass any other call to the causal_forest command in the grf package.
Internally, the function fits three causal forests (one for outcomes, one for costs and one for net monetary benefits) using the grf package, and then stores them in a single object that other functions in the CEAforests package will understand.
All other functions that we will use to explore heterogeneity and estimate effects below rely on the estimates contained in objects produced by the cea_forest function (cea_example in the present example).
By default, the command will train 5000 causal trees on different random subsets of the data with the default settings from the grf package, with one exception; tune.parameters is set to "all" by default instead "none". Thus, the cea_forest command will make use of the automated tuning features in grf by default. In real applications, it may be advisable to increase the number of trees (via the num.trees option) to, e.g., 50000-100000, to minimize the influence of Monte Carlo errors on the results.
cea_example
#> $outcome.forest
#> GRF forest object of type causal_forest
#> Number of trees: 5000
#> Number of training samples: 500
#> Variable importance:
#> 1 2 3
#> 0.096 0.187 0.717
#>
#> $cost.forest
#> GRF forest object of type causal_forest
#> Number of trees: 5000
#> Number of training samples: 500
#> Variable importance:
#> 1 2 3
#> 0.097 0.560 0.343
#>
#> $nmb.forest
#> GRF forest object of type causal_forest
#> Number of trees: 5000
#> Number of training samples: 500
#> Variable importance:
#> 1 2 3
#> 0.238 0.418 0.343
#>
#> $WTP
#> [1] 50000
#>
#> attr(,"class")
#> [1] "cea_forest" "CEAforests"Predicting using CEA forests
To get estimates of the conditional average effects on outcomes ( \(\Delta Y_i\)), costs (\(\Delta C_i\)) and net monetary benefits (\(NMB_i\)) for each individual in the data given their \(X\) values, we use the predict command:
By default, the estimates reflect out-of-bag estimates for each individual, which means that they are taken only from the subset of causal trees in the forest in which the individual \(i\) was not used for training. The output contains point estimates and their variances from each of the three forests.
head(pred.cea,5)[1:2] #View first five individuals, incremental outcomes
#> predicted.delta_y variance.delta_y
#> 1 0.01114385 0.01079556
#> 2 -0.61976092 0.01336175
#> 3 -0.59704233 0.01236641
#> 4 0.45049866 0.07081276
#> 5 1.13970868 0.04797749
head(pred.cea,5)[3:4] #View first five individuals, incremental costs
#> predicted.delta_cost variance.delta_cost
#> 1 -12280.40 123457469
#> 2 56343.84 96031148
#> 3 69272.02 96037661
#> 4 67909.39 199377516
#> 5 66955.79 160255464
head(pred.cea,5)[5:6] #View first five individuals, net monetary benefits
#> predicted.nmb variance.nmb
#> 1 22045.963 132621941
#> 2 -77287.881 283745989
#> 3 -83657.677 243783768
#> 4 -47258.468 185316464
#> 5 -3606.199 225632749It also contains some additional information that may be useful for model building. The columns named excess error reflect jackknife estimates of the Monte Carlo errors, which can be used to assess the instability of forests grown with the same amount of trees on the same data. The general recommendation provided by the authors of the grf package is to increase the number of trees until the excess error is determined to be negligible.
head(pred.cea,5)[c(8,10,12)] #Excess errors for the first five individuals
#> delta_y.excess.error delta_cost.excess.error nmb.excess.error
#> 1 0.00009115689 613216.1 807946.3
#> 2 0.00002124729 1188245.3 833419.5
#> 3 0.00001955035 1280005.2 731348.2
#> 4 0.00008747626 1258291.3 877045.6
#> 5 0.00003461465 300013.5 287303.0The predict command can also be used to predictions at new values of \(X\), which can, for instance, be useful for estimating the change in effect size if we modify the value of one variable while keeping others constant.
predict(cea_example, newdata=t(matrix(c(Severe_disease=1, HRQoL=0.3, Risk_score=50))))
#> predicted.delta_y variance.delta_y predicted.delta_cost variance.delta_cost
#> 1 0.264024 0.02176748 22990.92 48144359
#> predicted.nmb variance.nmb
#> 1 9761.927 77986673
predict(cea_example, newdata=t(matrix(c(Severe_disease=0, HRQoL=0.3, Risk_score=50))))
#> predicted.delta_y variance.delta_y predicted.delta_cost variance.delta_cost
#> 1 0.03920837 0.03203222 27349.06 31868729
#> predicted.nmb variance.nmb
#> 1 -17378.99 35024393Plotting heterogeneity in outcomes, costs and net monetary benefits
The plot function can be used to generate histograms of the out-of-bag estimates for outcomes, costs and net monetary benefits. The which.y can be used to specify which effects to plot ("outcome", "cost", "nmb", or "all"). The function calls the ggplot2 package (in the latter instance, it also uses the cowplot package to merge the three plots).
Histograms
Different graphs can be combined directly within the function.
Graphical parameters can easily be changed using calls to ggplot2.
Scatter plots
Specifying an \(X\) variable with the option which.x produces a scatter plot of the estimates along the focal \(X\) variable.
plot(cea_example, which.y="nmb", which.x="HRQoL")
#> Warning: Removed 1 rows containing missing values (geom_point).Conditional effect plots
Setting conditional=TRUE, the function instead uses predict to vary the focal \(X\) variable while keeping the others constant at their mean. The plot includes point estimates and 95% confidence bands, and allows us to better assess how a change in \(X\) affects the cost-effectiveness of the treatment.
Individual-level cost-effectivness plane
Beyond the standard plots, the CEAforests package also includes a set of specialized functions for cost-effectiveness analyses. The ce_plane function can be used to produce individual-level cost-effectiveness planes using the out-of-bag estimates from the predict function.
ce_plane(cea_example) + ggplot2::ylim(c(-50000,120000)) + ggplot2::xlim(c(-1,2.5))
#> New treatment is more effective and more costly (NE Quadrant) (%): 51.6
#> New treatment is more effective and less costly (SE Quadrant) (%): 6
#> New treatment is less effective and more costly (NW Quadrant) (%): 26.8
#> New treatment is less effective and less costly (SW Quadrant) (%): 15.6While the relevance of statistical inference is debatable in the context of cost-effectiveness analyses, it may be useful to probe how much random noise influences results of the heterogeneity analysis. Setting certainty_groups=TRUE, the function will use the point estimates and variance of the out-of-bag estimates from the net monetary benefit forest to divide the data into certainty groups with respect to the WTP threshold defined in the cea_forest object. The option alpha can be used to modify the confidence level (default = 0.05, i.e., 95% CIs).
ce_plane(cea_example, certainty_groups = TRUE) + ggplot2::ylim(c(-50000,120000)) + ggplot2::xlim(c(-1,2.5))
#> New treatment is more effective and more costly (NE Quadrant) (%): 51.6
#> New treatment is more effective and less costly (SE Quadrant) (%): 6
#> New treatment is less effective and more costly (NW Quadrant) (%): 26.8
#> New treatment is less effective and less costly (SW Quadrant) (%): 15.6The plotted point estimates are taken from the outcome and cost forests. These estimates may differ from those produced by the net monetary benefit forest due to Monte Carlo errors. The consequence of this can be seen in the plot above, where some individuals who are above the WTP threshold are classified as being significantly below the threshold, and vice versa. This issue can be avoided by increasing the number of trees in the cea_forest object (using the num.trees option).
Exploring heterogeneity with tables and regression
Differences in characteristics between individuals above and below the WTP threshold can be investigated using the describe_forest function.
describe_forest(cea_example)
#> Stratified by Cost_Effective
#> No Yes p test
#> n 272 228
#> Severe_disease (mean (SD)) 0.36 (0.48) 0.73 (0.44) <0.001
#> HRQoL (mean (SD)) 0.45 (0.20) 0.18 (0.15) <0.001
#> Risk_score (mean (SD)) 47.24 (6.50) 53.89 (8.39) <0.001
describe_forest(cea_example, certainty_groups=TRUE)
#> Stratified by Certainty_Group
#> Not cost-effective Undetermined Cost-effective
#> n 155 236 109
#> Severe_disease (mean (SD)) 0.14 (0.35) 0.58 (0.49) 0.96 (0.19)
#> HRQoL (mean (SD)) 0.54 (0.18) 0.28 (0.18) 0.11 (0.08)
#> Risk_score (mean (SD)) 47.14 (6.10) 50.83 (8.27) 53.51 (8.81)
#> Stratified by Certainty_Group
#> p test
#> n
#> Severe_disease (mean (SD)) <0.001
#> HRQoL (mean (SD)) <0.001
#> Risk_score (mean (SD)) <0.001The function uses the tableone package, and can pass calls directly to CreateTableOne to modify the table.
dtable = describe_forest(cea_example, factorVars=c("Severe_disease"))
print(dtable, nonnormal="HRQoL")
#> Stratified by Cost_Effective
#> No Yes p test
#> n 272 228
#> Severe_disease = 1 (%) 97 (35.7) 167 (73.2) <0.001
#> HRQoL (median [IQR]) 0.42 [0.30, 0.58] 0.14 [0.06, 0.23] <0.001 nonnorm
#> Risk_score (mean (SD)) 47.24 (6.50) 53.89 (8.39) <0.001The explain_forest function fits linear regression models of the form \(E[\Delta Y|X] = a+\beta X\) to estimate the marginal effects of covariates on the effect size, which can be useful for reducing the complexity of the results and to assess how specific variable appear to influence the effect size after adjustment for other variables in the data. By default, it will run intercepts only models to estimate the average \(\Delta Y\), \(\Delta C\) and \(NMB\) in the sample.
explain_forest(cea_example)
#> $`Outcome forest`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) 0.5311196 0.116308 0.3026058 0.7596334 0
#>
#> $`Cost forest`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) 26137.33 6171.639 14011.73 38262.93 0
#>
#> $`Net monetary benefit forest, WTP = 50000`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) 418.6498 8737.73 -16748.62 17585.92 0.9426Covariates can be input in the same manner as in the cea_forest function.
explain_forest(cea_example, X=Xvars)
#> $`Outcome forest`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) -6.7491049 0.85834604 -8.4355474 -5.0626624 0.0000
#> Severe_disease 0.7734686 0.22186920 0.3375493 1.2093880 0.0005
#> HRQoL -0.4806281 0.46821527 -1.4005579 0.4393017 0.2990
#> Risk_score 0.1397965 0.01683369 0.1067224 0.1728706 0.0000
#>
#> $`Cost forest`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) -187944.621 37675.4008 -261967.676 -113921.565 0.0000
#> Severe_disease -14799.048 12770.3576 -39889.714 10291.618 0.2421
#> HRQoL 162165.476 33071.5902 97187.795 227143.156 0.0000
#> Risk_score 3366.989 669.8776 2050.841 4683.137 0.0000
#>
#> $`Net monetary benefit forest, WTP = 50000`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) -149510.625 60075.674 -267544.803 -31476.447 0.0129
#> Severe_disease 53472.480 17436.749 19213.482 87731.477 0.0022
#> HRQoL -186196.881 41791.330 -268306.743 -104087.019 0.0000
#> Risk_score 3622.837 1133.239 1396.296 5849.377 0.0014The \(X\) matrix can also be changed as desired and does not have to include variables that were used to train the forest. For instance, we may want to explore how having a non-severe version of the disease affects \(\Delta Y\), \(\Delta C\) and \(NMB\) compared to a severe variant.
notsevere_dummy = 1-Xvars[,c("Severe_disease")]
explain_forest(cea_example, X=data.frame(notsevere_dummy))
#> $`Outcome forest`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) 0.8669144 0.1713934 0.5301712 1.2036577 0.000
#> notsevere_dummy -0.7114298 0.2289857 -1.1613269 -0.2615326 0.002
#>
#> $`Cost forest`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) 4271.015 8462.58 -12355.75 20897.78 0.6017
#> notsevere_dummy 46326.939 12198.89 22359.30 70294.58 0.0002
#>
#> $`Net monetary benefit forest, WTP = 50000`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) 39074.71 12697.55 14127.33 64022.09 0.0022
#> notsevere_dummy -81898.43 16982.27 -115264.16 -48532.69 0.0000By default, the function uses the willingness to pay threshold specified in the cea_forest object. The threshold can easily be changed without having to re-train the forest using the WTP option.
explain_forest(cea_example, X=data.frame(notsevere_dummy), WTP=20000)
#> $`Outcome forest`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) 0.8669144 0.1713934 0.5301712 1.2036577 0.000
#> notsevere_dummy -0.7114298 0.2289857 -1.1613269 -0.2615326 0.002
#>
#> $`Cost forest`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) 4271.015 8462.58 -12355.75 20897.78 0.6017
#> notsevere_dummy 46326.939 12198.89 22359.30 70294.58 0.0002
#>
#> $`Net monetary benefit forest, WTP = 20000`
#> Estimates StdError LowerCI UpperCI P.val
#> (Intercept) 13067.27 9478.127 -5554.772 31689.32 0.1652
#> notsevere_dummy -60555.53 13160.020 -86411.538 -34699.53 0.0000Using CEAforests to estimate average effects
The main purpose of the CEAforests package is to simplify heterogeneity estimation. However, the package would not be complete without classic functions to produce sample average cost-effectiveness results from the forest object. The avg_effects function produces a small table with incremental effects, costs, net monetary benefits and incremental cost-effectiveness ratio (ICER) for the entire sample or, optionally, for a specified subgroup.
avg_effects(cea_example)
#> estimate std.err lower
#> Average effect on outcome 0.5311 0.1162 0.3028
#> Average effect on costs 26137.3304 6165.4639 14023.8621
#> Average net monetary benefit (WTP: 50000) 418.6498 8728.9876 -16731.4486
#> Incremental cost-effectiveness ratio 49211.7598 NA 20131.6572
#> upper accept.prob.normal
#> Average effect on outcome 0.7594 NA
#> Average effect on costs 38250.7986 NA
#> Average net monetary benefit (WTP: 50000) 17568.7482 0.5191
#> Incremental cost-effectiveness ratio 89188.5131 NAThe willingness to pay threshold can easily be changed without having to re-train the forest.
avg_effects(cea_example, WTP=20000)
#> estimate std.err lower
#> Average effect on outcome 0.5311 0.1162 0.3028
#> Average effect on costs 26137.3304 6165.4639 14023.8621
#> Average net monetary benefit (WTP: 20000) -15514.9383 6722.0201 -28721.8888
#> Incremental cost-effectiveness ratio 49211.7598 NA 20131.6572
#> upper accept.prob.normal
#> Average effect on outcome 0.7594 NA
#> Average effect on costs 38250.7986 NA
#> Average net monetary benefit (WTP: 20000) -2307.9878 0.0105
#> Incremental cost-effectiveness ratio 89188.5131 NABy default, the function uses Fieller’s method to calculate CIs for the ICER. The confidence intervals in the table can also be obtained via the bootstrap as described in Bonander & Svensson (2021).
avg_effects(cea_example, boot.ci=TRUE)
#> estimate std.err lower
#> Average effect on outcome 0.5311 0.1173 0.3276
#> Average effect on costs 26137.3304 6096.0914 14832.6291
#> Average net monetary benefit (WTP: 50000) 418.6498 8752.2117 -16605.6967
#> Incremental cost-effectiveness ratio 49211.7598 NA 23652.3910
#> upper accept.prob.normal
#> Average effect on outcome 0.7819 NA
#> Average effect on costs 40097.2335 NA
#> Average net monetary benefit (WTP: 50000) 17720.0794 0.5191
#> Incremental cost-effectiveness ratio 101865.0771 NAThe subset option can be used to obtain the same table for a specified subgroup defined by logical conditions.
avg_effects(cea_example, subset=Xvars[,"Risk_score"]>50)
#> estimate std.err lower
#> Average effect on outcome 1.4163 0.1899 1.0424
#> Average effect on costs 41932.3743 8990.4487 24227.3886
#> Average net monetary benefit (WTP: 50000) 28880.7097 14309.4028 701.0516
#> Incremental cost-effectiveness ratio 29607.7871 NA 14621.2219
#> upper accept.prob.normal
#> Average effect on outcome 1.7901 NA
#> Average effect on costs 59637.3600 NA
#> Average net monetary benefit (WTP: 50000) 57060.3678 0.9782
#> Incremental cost-effectiveness ratio 47455.2634 NACost-effectiveness planes
Standard cost-effectiveness planes with confidence ellipses can also be obtained via the bootstrap by setting type="average" in the ce_plane function.
ce_plane(cea_example, type="average") + ggplot2::ylim(c(-1000,70000)) + ggplot2::xlim(c(0, 2.2))
#> New treatment is more effective and more costly (NE Quadrant) (%): 100
#> New treatment is more effective and less costly (SE Quadrant) (%): 0
#> New treatment is less effective and more costly (NW Quadrant) (%): 0
#> New treatment is less effective and less costly (SW Quadrant) (%): 0The subset option can be used here as well to obtain a corresponding plot for a specified subgroup.
ce_plane(cea_example, type="average", subset=Xvars[,"Risk_score"]>50) + ggplot2::ylim(c(-1000,70000)) + ggplot2::xlim(c(0, 2.2))
#> New treatment is more effective and more costly (NE Quadrant) (%): 100
#> New treatment is more effective and less costly (SE Quadrant) (%): 0
#> New treatment is less effective and more costly (NW Quadrant) (%): 0
#> New treatment is less effective and less costly (SW Quadrant) (%): 0
#> Warning: Removed 2 rows containing non-finite values (stat_ellipse).
#> Warning: Removed 2 rows containing missing values (geom_point).Cost-effectivness acceptability curves
Cost-effectiveness acceptability curves (CEAC), based on probabilities estimated via a cumulative normal distribution function, can be obtained using the accept_curve function.
The function can also be used to compare the subgroups based on logical conditions by setting compare=TRUE.
accept_curve(cea_example, subset=Xvars[,"Risk_score"]>50, from=1000, to=100000, length.out=50,
compare=TRUE,
labels=c("Risk > 50", "Full sample", "Risk < 50"))Automated policy learning using CEAforests
The CEAforests package can also call the policytree package to perform an exhaustive search for the optimal reimbursement policy given a set of input variables and a pre-specified complexity (tree depth). Here, the optimality criteria is defined as the policy that maximizes the average net monetary benefit.
First, we train a policy tree with a single split (depth=1).
ptree1 <- cea_policy_tree(cea_example, X=Xvars, depth=1)
#> Warning in cea_policy_tree(cea_example, X = Xvars, depth = 1): WTP not
#> specified, assuming WTP = 50000, as supplied to the CEAforests object.
plot(ptree1)The infer_policy function can be used to estimate the value of implementing the suggested policy versus both a new-treatment-for-all and a old-treatment-for-all scenario.
infer_policy(cea_example,ptree1)
#> Warning in infer_policy(cea_example, ptree1): WTP not specified, assuming WTP =
#> 50000, as supplied to the CEAforests object.
#> Estimate Std.Err Lower.CI
#> Average NMB, new-for-all vs control-for-all 418.6498 8728.988 -16731.45
#> Average NMB, suggested policy vs control-for-all 24563.4565 5818.881 13130.93
#> Difference in NMB, suggested vs. new-for-all 24144.8067 6321.295 11725.17
#> Prop. who gets new treatment, suggested policy 0.4380 NA NA
#> Upper.CI
#> Average NMB, new-for-all vs control-for-all 17568.75
#> Average NMB, suggested policy vs control-for-all 35995.98
#> Difference in NMB, suggested vs. new-for-all 36564.44
#> Prop. who gets new treatment, suggested policy NAThe same function can also be used with custom policies.
infer_policy(cea_example, treat.policy=Xvars[,"Risk_score"]>50 & Xvars[,"Severe_disease"]==0)
#> Warning in infer_policy(cea_example, treat.policy = Xvars[, "Risk_score"] > :
#> WTP not specified, assuming WTP = 50000, as supplied to the CEAforests object.
#> Estimate Std.Err Lower.CI
#> Average NMB, new-for-all vs control-for-all 418.6498 8728.988 -16731.45
#> Average NMB, suggested policy vs control-for-all -5426.3576 4288.996 -13853.07
#> Difference in NMB, suggested vs. new-for-all -5845.0075 7594.249 -20765.65
#> Prop. who gets new treatment, suggested policy 0.2440 NA NA
#> Upper.CI
#> Average NMB, new-for-all vs control-for-all 17568.748
#> Average NMB, suggested policy vs control-for-all 3000.359
#> Difference in NMB, suggested vs. new-for-all 9075.636
#> Prop. who gets new treatment, suggested policy NAThe function can also suggest more complex policies by increasing the depth of the tree.
plot(cea_policy_tree(cea_example, X=Xvars, depth=2))
#> Warning in cea_policy_tree(cea_example, X = Xvars, depth = 2): WTP not
#> specified, assuming WTP = 50000, as supplied to the CEAforests object.The function uses the willingness to pay threshold from the cea_forest object by default. The threshold can easily be changed, without re-training the forest, using the WTP option. In our example, increasing the threshold leads to a slightly more inclusive reimbursement policy.
Using instruments with CEAforests
The CEAforests package can also make use of instruments via the instrumental_forest function in grf. To use an instrument \(Z_i\) (e.g., random treatment allocation in cases where \(W_i\) reflects actual exposure to treatment), we supply the instrument to the cea_forest function via the Z option.
iv_example = cea_forest(Y=Y, C=C, X=Xvars, W=W, Z=Z, WTP=50000)
iv_example
#> $outcome.forest
#> GRF forest object of type instrumental_forest
#> Number of trees: 5000
#> Number of training samples: 500
#> Variable importance:
#> 1 2 3
#> 0.052 0.318 0.630
#>
#> $cost.forest
#> GRF forest object of type instrumental_forest
#> Number of trees: 5000
#> Number of training samples: 500
#> Variable importance:
#> 1 2 3
#> 0.224 0.402 0.374
#>
#> $nmb.forest
#> GRF forest object of type instrumental_forest
#> Number of trees: 5000
#> Number of training samples: 500
#> Variable importance:
#> 1 2 3
#> 0.166 0.439 0.394
#>
#> $WTP
#> [1] 50000
#>
#> attr(,"class")
#> [1] "cea_forest_instrumental" "CEAforests"The instrumental version of the forest can then be analyzed as a standard cea_forest object.
avg_effects(iv_example)
#> Fieller's method CIs appear to be unbounded and may seem strange, but this behavior is to be expected if the incremental outcome does not differ significantly from zero. You may want to try setting boot.ci=TRUE to obtain bootstrapped (BCa) confidence intervals, but some authors recommend against this (see documentaion for details).
#> estimate std.err lower
#> Average effect on outcome 0.2795 0.2736 -0.2582
#> Average effect on costs 32158.9262 18264.2783 -3725.4383
#> Average net monetary benefit (WTP: 50000) -18186.2824 23388.4141 -64138.1871
#> Incremental cost-effectiveness ratio 115078.1724 NA 81461.3336
#> upper accept.prob.normal
#> Average effect on outcome 0.8171 NA
#> Average effect on costs 68043.2907 NA
#> Average net monetary benefit (WTP: 50000) 27765.6222 0.2184
#> Incremental cost-effectiveness ratio -356785.0773 NA
describe_forest(iv_example, factorVars="Severe_disease")
#> Stratified by Cost_Effective
#> No Yes p test
#> n 325 175
#> Severe_disease = 1 (%) 117 (36.0) 147 (84.0) <0.001
#> HRQoL (mean (SD)) 0.42 (0.21) 0.14 (0.10) <0.001
#> Risk_score (mean (SD)) 49.18 (6.97) 52.31 (9.61) <0.001
ce_plane(iv_example, type="average")
#> New treatment is more effective and more costly (NE Quadrant) (%): 80.4
#> New treatment is more effective and less costly (SE Quadrant) (%): 3.1
#> New treatment is less effective and more costly (NW Quadrant) (%): 16
#> New treatment is less effective and less costly (SW Quadrant) (%): 0.5