Introduction to the CEAforests package

library(CEAforests)

The following tutorial illustrates how to use the CEAforests package to estimate heterogeneous effects and costs in cost-effectiveness analyses alongside clinical trials or observational studies. The package uses the causal_forest function from the grf package to fit causal forests according to the procedures presented in Bonander & Svensson (2021, Health Economics). Its main purpose is to provide a set of helpful functions that can be used to implement and analyze the results of causal forests in cost-effectiveness analyses.

Generate example data

We begin by generating some data for the tutorial. The dataset contains three correlated \(X\) variables, a treatment variable \(W_i\), quality-adjusted life years (QALYs, \(Y_i\)) and healthcare costs in Euros (\(C_i\)).

require(MASS)
#> Loading required package: MASS
require(GGally)
#> Loading required package: GGally
#> Loading required package: ggplot2
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2
# Set sample size
n = 500
# Set seed for replication
RNGversion("3.6.0")
set.seed(11313413)
# Generate three X variables with some correlation structure
mu = rep(0,3)
x1cors = c(1, -0.5, -0.1)
x2cors = c(-0.5, 1, 0)
x3cors = c(-0.1, 0, 1)
Sigma = matrix(c(x1cors, x2cors, x3cors), nrow=3, ncol=3)
Xvars = mvrnorm(n=n, mu=mu, Sigma=Sigma)
X1 = Xvars[,1]
X2 = Xvars[,2]
X3 = Xvars[,3]
#Generate binary treatment variable W
Z = rbinom(n, 1, 0.5)
W = rbinom(n, 1, 1/(1+exp(-X1*-1.5+Z*1.4)))
#Generate costs
mui = 0.5*W +  W*X2*0.6 + W*X3*0.3 + rnorm(n,0,1)
mui_s = pnorm(mui)
C = (qgamma(mui_s, shape=1, rate=2)*100000)
#Generate QALYs
lni = 0.4 + 0.2*W + W*X1*0.3 + W*X3*0.5
sd = 0.4
Y = rlnorm(n, lni, sd)
# Convert X to look more like real data while keeping dependencies
xnames <- c("Severe_disease", "HRQoL", "Risk_score")
pvars = pnorm(Xvars)
povars <- round(qnorm(pvars, 50, 8), 1)
betavars <- qbeta(pvars, 1, 2)
binomvars <-  qbinom(pvars, 1, .50)
X1 = binomvars[,1]
X2 = betavars[,2]
X3 = povars[,3]
Xvars = cbind(X1,X2,X3)
colnames(Xvars) = xnames
options(scipen=999) #Remove scientific notation
ggpairs(data.frame(Y,C,W,Xvars),
        title="Distributions and correlation structure in the generated dataset", 
        progress=FALSE, diag=list(continuous="barDiag")) + theme_bw(base_size=8)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Basic functionality

Training a CEA forest

Next, we train a CEA forest using the cea_forest command, which has four required inputs: three vectors containing the health outcome \(Y_i\) (e.g., QALYs), healthcare costs \(C_i\), and a treatment indicator \(W_i\), respectively, and a matrix where each column represents the \(X\) variables. cea_forest estimates heterogeneity in outcomes, costs and in the net monetary benefit as a function of the \(X\) variables. By default, it also adjusts for confounding by \(X\) using propensity scores. The latter feature can be circumvented by supplying known propensities, or propensities estimated using other methods, with the W.hat option. To calculate the net monetary benefit, we also need to specify a willingness to pay per QALY threshold (\(\lambda\)). In the example, we set \(\lambda\) to 50000. The function can also pass any other call to the causal_forest command in the grf package.

cea_example = cea_forest(Y=Y, C=C, X=Xvars, W=W, WTP=50000)

Internally, the function fits three causal forests (one for outcomes, one for costs and one for net monetary benefits) using the grf package, and then stores them in a single object that other functions in the CEAforests package will understand.

All other functions that we will use to explore heterogeneity and estimate effects below rely on the estimates contained in objects produced by the cea_forest function (cea_example in the present example).

By default, the command will train 5000 causal trees on different random subsets of the data with the default settings from the grf package, with one exception; tune.parameters is set to "all" by default instead "none". Thus, the cea_forest command will make use of the automated tuning features in grf by default. In real applications, it may be advisable to increase the number of trees (via the num.trees option) to, e.g., 50000-100000, to minimize the influence of Monte Carlo errors on the results.

cea_example
#> $outcome.forest
#> GRF forest object of type causal_forest 
#> Number of trees: 5000 
#> Number of training samples: 500 
#> Variable importance: 
#>     1     2     3 
#> 0.096 0.187 0.717 
#> 
#> $cost.forest
#> GRF forest object of type causal_forest 
#> Number of trees: 5000 
#> Number of training samples: 500 
#> Variable importance: 
#>     1     2     3 
#> 0.097 0.560 0.343 
#> 
#> $nmb.forest
#> GRF forest object of type causal_forest 
#> Number of trees: 5000 
#> Number of training samples: 500 
#> Variable importance: 
#>     1     2     3 
#> 0.238 0.418 0.343 
#> 
#> $WTP
#> [1] 50000
#> 
#> attr(,"class")
#> [1] "cea_forest" "CEAforests"

Predicting using CEA forests

To get estimates of the conditional average effects on outcomes ( \(\Delta Y_i\)), costs (\(\Delta C_i\)) and net monetary benefits (\(NMB_i\)) for each individual in the data given their \(X\) values, we use the predict command:

pred.cea = predict(cea_example)

By default, the estimates reflect out-of-bag estimates for each individual, which means that they are taken only from the subset of causal trees in the forest in which the individual \(i\) was not used for training. The output contains point estimates and their variances from each of the three forests.

head(pred.cea,5)[1:2] #View first five individuals, incremental outcomes
#>   predicted.delta_y variance.delta_y
#> 1        0.01114385       0.01079556
#> 2       -0.61976092       0.01336175
#> 3       -0.59704233       0.01236641
#> 4        0.45049866       0.07081276
#> 5        1.13970868       0.04797749
head(pred.cea,5)[3:4] #View first five individuals, incremental costs
#>   predicted.delta_cost variance.delta_cost
#> 1            -12280.40           123457469
#> 2             56343.84            96031148
#> 3             69272.02            96037661
#> 4             67909.39           199377516
#> 5             66955.79           160255464
head(pred.cea,5)[5:6] #View first five individuals, net monetary benefits
#>   predicted.nmb variance.nmb
#> 1     22045.963    132621941
#> 2    -77287.881    283745989
#> 3    -83657.677    243783768
#> 4    -47258.468    185316464
#> 5     -3606.199    225632749

It also contains some additional information that may be useful for model building. The columns named excess error reflect jackknife estimates of the Monte Carlo errors, which can be used to assess the instability of forests grown with the same amount of trees on the same data. The general recommendation provided by the authors of the grf package is to increase the number of trees until the excess error is determined to be negligible.

head(pred.cea,5)[c(8,10,12)] #Excess errors for the first five individuals
#>   delta_y.excess.error delta_cost.excess.error nmb.excess.error
#> 1        0.00009115689                613216.1         807946.3
#> 2        0.00002124729               1188245.3         833419.5
#> 3        0.00001955035               1280005.2         731348.2
#> 4        0.00008747626               1258291.3         877045.6
#> 5        0.00003461465                300013.5         287303.0

The predict command can also be used to predictions at new values of \(X\), which can, for instance, be useful for estimating the change in effect size if we modify the value of one variable while keeping others constant.

predict(cea_example, newdata=t(matrix(c(Severe_disease=1, HRQoL=0.3, Risk_score=50))))
#>   predicted.delta_y variance.delta_y predicted.delta_cost variance.delta_cost
#> 1          0.264024       0.02176748             22990.92            48144359
#>   predicted.nmb variance.nmb
#> 1      9761.927     77986673
predict(cea_example, newdata=t(matrix(c(Severe_disease=0, HRQoL=0.3, Risk_score=50))))
#>   predicted.delta_y variance.delta_y predicted.delta_cost variance.delta_cost
#> 1        0.03920837       0.03203222             27349.06            31868729
#>   predicted.nmb variance.nmb
#> 1     -17378.99     35024393

Plotting heterogeneity in outcomes, costs and net monetary benefits

The plot function can be used to generate histograms of the out-of-bag estimates for outcomes, costs and net monetary benefits. The which.y can be used to specify which effects to plot ("outcome", "cost", "nmb", or "all"). The function calls the ggplot2 package (in the latter instance, it also uses the cowplot package to merge the three plots).

Histograms

plot(cea_example, which.y="nmb")

Different graphs can be combined directly within the function.

plot(cea_example, which.y=c("outcome","cost"))

Graphical parameters can easily be changed using calls to ggplot2.

plot(cea_example, which.y="nmb") + theme_classic(base_size=10) + xlab("NMB, WTP = 50000")

Scatter plots

Specifying an \(X\) variable with the option which.x produces a scatter plot of the estimates along the focal \(X\) variable.

plot(cea_example, which.y="nmb", which.x="HRQoL")
#> Warning: Removed 1 rows containing missing values (geom_point).

Conditional effect plots

Setting conditional=TRUE, the function instead uses predict to vary the focal \(X\) variable while keeping the others constant at their mean. The plot includes point estimates and 95% confidence bands, and allows us to better assess how a change in \(X\) affects the cost-effectiveness of the treatment.

plot(cea_example, which.y="nmb", which.x="HRQoL", conditional=TRUE)

Individual-level cost-effectivness plane

Beyond the standard plots, the CEAforests package also includes a set of specialized functions for cost-effectiveness analyses. The ce_plane function can be used to produce individual-level cost-effectiveness planes using the out-of-bag estimates from the predict function.

ce_plane(cea_example) + ggplot2::ylim(c(-50000,120000)) + ggplot2::xlim(c(-1,2.5))
#>  New treatment is more effective and more costly (NE Quadrant) (%):  51.6 
#>  New treatment is more effective and less costly (SE Quadrant) (%):  6 
#>  New treatment is less effective and more costly (NW Quadrant) (%):  26.8 
#>  New treatment is less effective and less costly (SW Quadrant) (%):  15.6

While the relevance of statistical inference is debatable in the context of cost-effectiveness analyses, it may be useful to probe how much random noise influences results of the heterogeneity analysis. Setting certainty_groups=TRUE, the function will use the point estimates and variance of the out-of-bag estimates from the net monetary benefit forest to divide the data into certainty groups with respect to the WTP threshold defined in the cea_forest object. The option alpha can be used to modify the confidence level (default = 0.05, i.e., 95% CIs).

ce_plane(cea_example, certainty_groups = TRUE) + ggplot2::ylim(c(-50000,120000)) + ggplot2::xlim(c(-1,2.5))
#>  New treatment is more effective and more costly (NE Quadrant) (%):  51.6 
#>  New treatment is more effective and less costly (SE Quadrant) (%):  6 
#>  New treatment is less effective and more costly (NW Quadrant) (%):  26.8 
#>  New treatment is less effective and less costly (SW Quadrant) (%):  15.6

The plotted point estimates are taken from the outcome and cost forests. These estimates may differ from those produced by the net monetary benefit forest due to Monte Carlo errors. The consequence of this can be seen in the plot above, where some individuals who are above the WTP threshold are classified as being significantly below the threshold, and vice versa. This issue can be avoided by increasing the number of trees in the cea_forest object (using the num.trees option).

Exploring heterogeneity with tables and regression

Differences in characteristics between individuals above and below the WTP threshold can be investigated using the describe_forest function.

describe_forest(cea_example)
#>                             Stratified by Cost_Effective
#>                              No           Yes          p      test
#>   n                            272          228                   
#>   Severe_disease (mean (SD))  0.36 (0.48)  0.73 (0.44) <0.001     
#>   HRQoL (mean (SD))           0.45 (0.20)  0.18 (0.15) <0.001     
#>   Risk_score (mean (SD))     47.24 (6.50) 53.89 (8.39) <0.001
describe_forest(cea_example, certainty_groups=TRUE)
#>                             Stratified by Certainty_Group
#>                              Not cost-effective Undetermined Cost-effective
#>   n                            155                236          109         
#>   Severe_disease (mean (SD))  0.14 (0.35)        0.58 (0.49)  0.96 (0.19)  
#>   HRQoL (mean (SD))           0.54 (0.18)        0.28 (0.18)  0.11 (0.08)  
#>   Risk_score (mean (SD))     47.14 (6.10)       50.83 (8.27) 53.51 (8.81)  
#>                             Stratified by Certainty_Group
#>                              p      test
#>   n                                     
#>   Severe_disease (mean (SD)) <0.001     
#>   HRQoL (mean (SD))          <0.001     
#>   Risk_score (mean (SD))     <0.001

The function uses the tableone package, and can pass calls directly to CreateTableOne to modify the table.

dtable = describe_forest(cea_example, factorVars=c("Severe_disease"))
print(dtable, nonnormal="HRQoL")
#>                         Stratified by Cost_Effective
#>                          No                 Yes                p      test   
#>   n                        272                228                            
#>   Severe_disease = 1 (%)    97 (35.7)         167 (73.2)       <0.001        
#>   HRQoL (median [IQR])    0.42 [0.30, 0.58]  0.14 [0.06, 0.23] <0.001 nonnorm
#>   Risk_score (mean (SD)) 47.24 (6.50)       53.89 (8.39)       <0.001

The explain_forest function fits linear regression models of the form \(E[\Delta Y|X] = a+\beta X\) to estimate the marginal effects of covariates on the effect size, which can be useful for reducing the complexity of the results and to assess how specific variable appear to influence the effect size after adjustment for other variables in the data. By default, it will run intercepts only models to estimate the average \(\Delta Y\), \(\Delta C\) and \(NMB\) in the sample.

explain_forest(cea_example)
#> $`Outcome forest`
#>             Estimates StdError   LowerCI   UpperCI P.val
#> (Intercept) 0.5311196 0.116308 0.3026058 0.7596334     0
#> 
#> $`Cost forest`
#>             Estimates StdError  LowerCI  UpperCI P.val
#> (Intercept)  26137.33 6171.639 14011.73 38262.93     0
#> 
#> $`Net monetary benefit forest, WTP = 50000`
#>             Estimates StdError   LowerCI  UpperCI  P.val
#> (Intercept)  418.6498  8737.73 -16748.62 17585.92 0.9426

Covariates can be input in the same manner as in the cea_forest function.

explain_forest(cea_example, X=Xvars)
#> $`Outcome forest`
#>                 Estimates   StdError    LowerCI    UpperCI  P.val
#> (Intercept)    -6.7491049 0.85834604 -8.4355474 -5.0626624 0.0000
#> Severe_disease  0.7734686 0.22186920  0.3375493  1.2093880 0.0005
#> HRQoL          -0.4806281 0.46821527 -1.4005579  0.4393017 0.2990
#> Risk_score      0.1397965 0.01683369  0.1067224  0.1728706 0.0000
#> 
#> $`Cost forest`
#>                  Estimates   StdError     LowerCI     UpperCI  P.val
#> (Intercept)    -187944.621 37675.4008 -261967.676 -113921.565 0.0000
#> Severe_disease  -14799.048 12770.3576  -39889.714   10291.618 0.2421
#> HRQoL           162165.476 33071.5902   97187.795  227143.156 0.0000
#> Risk_score        3366.989   669.8776    2050.841    4683.137 0.0000
#> 
#> $`Net monetary benefit forest, WTP = 50000`
#>                  Estimates  StdError     LowerCI     UpperCI  P.val
#> (Intercept)    -149510.625 60075.674 -267544.803  -31476.447 0.0129
#> Severe_disease   53472.480 17436.749   19213.482   87731.477 0.0022
#> HRQoL          -186196.881 41791.330 -268306.743 -104087.019 0.0000
#> Risk_score        3622.837  1133.239    1396.296    5849.377 0.0014

The \(X\) matrix can also be changed as desired and does not have to include variables that were used to train the forest. For instance, we may want to explore how having a non-severe version of the disease affects \(\Delta Y\), \(\Delta C\) and \(NMB\) compared to a severe variant.

notsevere_dummy = 1-Xvars[,c("Severe_disease")]
explain_forest(cea_example, X=data.frame(notsevere_dummy))
#> $`Outcome forest`
#>                  Estimates  StdError    LowerCI    UpperCI P.val
#> (Intercept)      0.8669144 0.1713934  0.5301712  1.2036577 0.000
#> notsevere_dummy -0.7114298 0.2289857 -1.1613269 -0.2615326 0.002
#> 
#> $`Cost forest`
#>                 Estimates StdError   LowerCI  UpperCI  P.val
#> (Intercept)      4271.015  8462.58 -12355.75 20897.78 0.6017
#> notsevere_dummy 46326.939 12198.89  22359.30 70294.58 0.0002
#> 
#> $`Net monetary benefit forest, WTP = 50000`
#>                 Estimates StdError    LowerCI   UpperCI  P.val
#> (Intercept)      39074.71 12697.55   14127.33  64022.09 0.0022
#> notsevere_dummy -81898.43 16982.27 -115264.16 -48532.69 0.0000

By default, the function uses the willingness to pay threshold specified in the cea_forest object. The threshold can easily be changed without having to re-train the forest using the WTP option.

explain_forest(cea_example, X=data.frame(notsevere_dummy), WTP=20000)
#> $`Outcome forest`
#>                  Estimates  StdError    LowerCI    UpperCI P.val
#> (Intercept)      0.8669144 0.1713934  0.5301712  1.2036577 0.000
#> notsevere_dummy -0.7114298 0.2289857 -1.1613269 -0.2615326 0.002
#> 
#> $`Cost forest`
#>                 Estimates StdError   LowerCI  UpperCI  P.val
#> (Intercept)      4271.015  8462.58 -12355.75 20897.78 0.6017
#> notsevere_dummy 46326.939 12198.89  22359.30 70294.58 0.0002
#> 
#> $`Net monetary benefit forest, WTP = 20000`
#>                 Estimates  StdError    LowerCI   UpperCI  P.val
#> (Intercept)      13067.27  9478.127  -5554.772  31689.32 0.1652
#> notsevere_dummy -60555.53 13160.020 -86411.538 -34699.53 0.0000

Using CEAforests to estimate average effects

The main purpose of the CEAforests package is to simplify heterogeneity estimation. However, the package would not be complete without classic functions to produce sample average cost-effectiveness results from the forest object. The avg_effects function produces a small table with incremental effects, costs, net monetary benefits and incremental cost-effectiveness ratio (ICER) for the entire sample or, optionally, for a specified subgroup.

avg_effects(cea_example)
#>                                             estimate   std.err       lower
#> Average effect on outcome                     0.5311    0.1162      0.3028
#> Average effect on costs                   26137.3304 6165.4639  14023.8621
#> Average net monetary benefit (WTP: 50000)   418.6498 8728.9876 -16731.4486
#> Incremental cost-effectiveness ratio      49211.7598        NA  20131.6572
#>                                                upper accept.prob.normal
#> Average effect on outcome                     0.7594                 NA
#> Average effect on costs                   38250.7986                 NA
#> Average net monetary benefit (WTP: 50000) 17568.7482             0.5191
#> Incremental cost-effectiveness ratio      89188.5131                 NA

The willingness to pay threshold can easily be changed without having to re-train the forest.

avg_effects(cea_example, WTP=20000)
#>                                              estimate   std.err       lower
#> Average effect on outcome                      0.5311    0.1162      0.3028
#> Average effect on costs                    26137.3304 6165.4639  14023.8621
#> Average net monetary benefit (WTP: 20000) -15514.9383 6722.0201 -28721.8888
#> Incremental cost-effectiveness ratio       49211.7598        NA  20131.6572
#>                                                upper accept.prob.normal
#> Average effect on outcome                     0.7594                 NA
#> Average effect on costs                   38250.7986                 NA
#> Average net monetary benefit (WTP: 20000) -2307.9878             0.0105
#> Incremental cost-effectiveness ratio      89188.5131                 NA

By default, the function uses Fieller’s method to calculate CIs for the ICER. The confidence intervals in the table can also be obtained via the bootstrap as described in Bonander & Svensson (2021).

avg_effects(cea_example, boot.ci=TRUE)
#>                                             estimate   std.err       lower
#> Average effect on outcome                     0.5311    0.1173      0.3276
#> Average effect on costs                   26137.3304 6096.0914  14832.6291
#> Average net monetary benefit (WTP: 50000)   418.6498 8752.2117 -16605.6967
#> Incremental cost-effectiveness ratio      49211.7598        NA  23652.3910
#>                                                 upper accept.prob.normal
#> Average effect on outcome                      0.7819                 NA
#> Average effect on costs                    40097.2335                 NA
#> Average net monetary benefit (WTP: 50000)  17720.0794             0.5191
#> Incremental cost-effectiveness ratio      101865.0771                 NA

The subset option can be used to obtain the same table for a specified subgroup defined by logical conditions.

avg_effects(cea_example, subset=Xvars[,"Risk_score"]>50)
#>                                             estimate    std.err      lower
#> Average effect on outcome                     1.4163     0.1899     1.0424
#> Average effect on costs                   41932.3743  8990.4487 24227.3886
#> Average net monetary benefit (WTP: 50000) 28880.7097 14309.4028   701.0516
#> Incremental cost-effectiveness ratio      29607.7871         NA 14621.2219
#>                                                upper accept.prob.normal
#> Average effect on outcome                     1.7901                 NA
#> Average effect on costs                   59637.3600                 NA
#> Average net monetary benefit (WTP: 50000) 57060.3678             0.9782
#> Incremental cost-effectiveness ratio      47455.2634                 NA

Cost-effectiveness planes

Standard cost-effectiveness planes with confidence ellipses can also be obtained via the bootstrap by setting type="average" in the ce_plane function.

ce_plane(cea_example, type="average") + ggplot2::ylim(c(-1000,70000)) + ggplot2::xlim(c(0, 2.2))
#>  New treatment is more effective and more costly (NE Quadrant) (%):  100 
#>  New treatment is more effective and less costly (SE Quadrant) (%):  0 
#>  New treatment is less effective and more costly (NW Quadrant) (%):  0 
#>  New treatment is less effective and less costly (SW Quadrant) (%):  0

The subset option can be used here as well to obtain a corresponding plot for a specified subgroup.

ce_plane(cea_example, type="average", subset=Xvars[,"Risk_score"]>50) + ggplot2::ylim(c(-1000,70000)) + ggplot2::xlim(c(0, 2.2))
#>  New treatment is more effective and more costly (NE Quadrant) (%):  100 
#>  New treatment is more effective and less costly (SE Quadrant) (%):  0 
#>  New treatment is less effective and more costly (NW Quadrant) (%):  0 
#>  New treatment is less effective and less costly (SW Quadrant) (%):  0
#> Warning: Removed 2 rows containing non-finite values (stat_ellipse).
#> Warning: Removed 2 rows containing missing values (geom_point).

Cost-effectivness acceptability curves

Cost-effectiveness acceptability curves (CEAC), based on probabilities estimated via a cumulative normal distribution function, can be obtained using the accept_curve function.

accept_curve(cea_example,from=1000,to=100000,length.out=50)

The function can also be used to compare the subgroups based on logical conditions by setting compare=TRUE.

accept_curve(cea_example, subset=Xvars[,"Risk_score"]>50, from=1000, to=100000, length.out=50, 
             compare=TRUE, 
             labels=c("Risk > 50", "Full sample", "Risk < 50"))

Automated policy learning using CEAforests

The CEAforests package can also call the policytree package to perform an exhaustive search for the optimal reimbursement policy given a set of input variables and a pre-specified complexity (tree depth). Here, the optimality criteria is defined as the policy that maximizes the average net monetary benefit.

First, we train a policy tree with a single split (depth=1).

ptree1 <- cea_policy_tree(cea_example, X=Xvars, depth=1)
#> Warning in cea_policy_tree(cea_example, X = Xvars, depth = 1): WTP not
#> specified, assuming WTP = 50000, as supplied to the CEAforests object.
plot(ptree1)

The infer_policy function can be used to estimate the value of implementing the suggested policy versus both a new-treatment-for-all and a old-treatment-for-all scenario.

infer_policy(cea_example,ptree1)
#> Warning in infer_policy(cea_example, ptree1): WTP not specified, assuming WTP =
#> 50000, as supplied to the CEAforests object.
#>                                                    Estimate  Std.Err  Lower.CI
#> Average NMB, new-for-all vs control-for-all        418.6498 8728.988 -16731.45
#> Average NMB, suggested policy vs control-for-all 24563.4565 5818.881  13130.93
#> Difference in NMB, suggested vs. new-for-all     24144.8067 6321.295  11725.17
#> Prop. who gets new treatment, suggested policy       0.4380       NA        NA
#>                                                  Upper.CI
#> Average NMB, new-for-all vs control-for-all      17568.75
#> Average NMB, suggested policy vs control-for-all 35995.98
#> Difference in NMB, suggested vs. new-for-all     36564.44
#> Prop. who gets new treatment, suggested policy         NA

The same function can also be used with custom policies.

infer_policy(cea_example, treat.policy=Xvars[,"Risk_score"]>50 & Xvars[,"Severe_disease"]==0)
#> Warning in infer_policy(cea_example, treat.policy = Xvars[, "Risk_score"] > :
#> WTP not specified, assuming WTP = 50000, as supplied to the CEAforests object.
#>                                                    Estimate  Std.Err  Lower.CI
#> Average NMB, new-for-all vs control-for-all        418.6498 8728.988 -16731.45
#> Average NMB, suggested policy vs control-for-all -5426.3576 4288.996 -13853.07
#> Difference in NMB, suggested vs. new-for-all     -5845.0075 7594.249 -20765.65
#> Prop. who gets new treatment, suggested policy       0.2440       NA        NA
#>                                                   Upper.CI
#> Average NMB, new-for-all vs control-for-all      17568.748
#> Average NMB, suggested policy vs control-for-all  3000.359
#> Difference in NMB, suggested vs. new-for-all      9075.636
#> Prop. who gets new treatment, suggested policy          NA

The function can also suggest more complex policies by increasing the depth of the tree.

plot(cea_policy_tree(cea_example, X=Xvars, depth=2))
#> Warning in cea_policy_tree(cea_example, X = Xvars, depth = 2): WTP not
#> specified, assuming WTP = 50000, as supplied to the CEAforests object.

The function uses the willingness to pay threshold from the cea_forest object by default. The threshold can easily be changed, without re-training the forest, using the WTP option. In our example, increasing the threshold leads to a slightly more inclusive reimbursement policy.

plot(cea_policy_tree(cea_example, X=Xvars, depth=2, WTP=70000))

Using instruments with CEAforests

The CEAforests package can also make use of instruments via the instrumental_forest function in grf. To use an instrument \(Z_i\) (e.g., random treatment allocation in cases where \(W_i\) reflects actual exposure to treatment), we supply the instrument to the cea_forest function via the Z option.

iv_example = cea_forest(Y=Y, C=C, X=Xvars, W=W, Z=Z, WTP=50000)
iv_example
#> $outcome.forest
#> GRF forest object of type instrumental_forest 
#> Number of trees: 5000 
#> Number of training samples: 500 
#> Variable importance: 
#>     1     2     3 
#> 0.052 0.318 0.630 
#> 
#> $cost.forest
#> GRF forest object of type instrumental_forest 
#> Number of trees: 5000 
#> Number of training samples: 500 
#> Variable importance: 
#>     1     2     3 
#> 0.224 0.402 0.374 
#> 
#> $nmb.forest
#> GRF forest object of type instrumental_forest 
#> Number of trees: 5000 
#> Number of training samples: 500 
#> Variable importance: 
#>     1     2     3 
#> 0.166 0.439 0.394 
#> 
#> $WTP
#> [1] 50000
#> 
#> attr(,"class")
#> [1] "cea_forest_instrumental" "CEAforests"

The instrumental version of the forest can then be analyzed as a standard cea_forest object.

avg_effects(iv_example)
#> Fieller's method CIs appear to be unbounded and may seem strange, but this behavior is to be expected if the incremental outcome does not differ significantly from zero. You may want to try setting boot.ci=TRUE to obtain bootstrapped (BCa) confidence intervals, but some authors recommend against this (see documentaion for details).
#>                                              estimate    std.err       lower
#> Average effect on outcome                      0.2795     0.2736     -0.2582
#> Average effect on costs                    32158.9262 18264.2783  -3725.4383
#> Average net monetary benefit (WTP: 50000) -18186.2824 23388.4141 -64138.1871
#> Incremental cost-effectiveness ratio      115078.1724         NA  81461.3336
#>                                                  upper accept.prob.normal
#> Average effect on outcome                       0.8171                 NA
#> Average effect on costs                     68043.2907                 NA
#> Average net monetary benefit (WTP: 50000)   27765.6222             0.2184
#> Incremental cost-effectiveness ratio      -356785.0773                 NA

describe_forest(iv_example, factorVars="Severe_disease")
#>                         Stratified by Cost_Effective
#>                          No            Yes           p      test
#>   n                        325           175                    
#>   Severe_disease = 1 (%)   117 (36.0)    147 (84.0)  <0.001     
#>   HRQoL (mean (SD))       0.42 (0.21)   0.14 (0.10)  <0.001     
#>   Risk_score (mean (SD)) 49.18 (6.97)  52.31 (9.61)  <0.001

ce_plane(iv_example, type="average")
#>  New treatment is more effective and more costly (NE Quadrant) (%):  80.4 
#>  New treatment is more effective and less costly (SE Quadrant) (%):  3.1 
#>  New treatment is less effective and more costly (NW Quadrant) (%):  16 
#>  New treatment is less effective and less costly (SW Quadrant) (%):  0.5