class: center, middle, inverse, title-slide .title[ # Causal inference with the R package
stagedtrees
] .author[ ### Gherardo Varando (dpt, of Statistics and Operational Research, UV)
joint work with Manuele Leonelli (IE University) ] --- class: chapter-slide ## Event Trees .pull-left[ <img src="data:image/png;base64,#IVusuariosRvalencia_files/figure-html/unnamed-chunk-2-1.png" width="100%" /> ] .pull-right[ * `\(X = (X_1, \ldots, X_n)\)` categorical * `\(P(X) = \prod_{i=1}^n P(X_i| X_{1:(i-1)})\)` ] --- ## Staged Event Trees .pull-left[ <img src="data:image/png;base64,#IVusuariosRvalencia_files/figure-html/unnamed-chunk-3-1.png" width="100%" /> ] .pull-right[ * `\(X = (X_1, \ldots, X_n)\)` categorical * `\(P(X) = \prod_{i=1}^n P(X_i| X_{1:(i-1)})\)` * such event tree is called `\(X\)`-compatible * if two nodes **at the same depth** are of the same color then the associated conditional probabilities are equal `\(P(\texttt{Survived}|\texttt{Crew, Adult})\)` `\(= P(\texttt{Survived}|\texttt{3d, Adult})\)` * chain event graphs are an equivalent, (sometimes) more compact, representation of the same models ] .normal[ <small> Collazo R. A., Görgen C. and Smith J. Q. Chain event graphs. _CRC Press_, 2018. Barclay L. M., Hutton J. L. and Smith J. Q. Refining a Bayesian network using a chain event graph. _International Journal of Approximate Reasoning_, vol. 54, pp. 1300-1309, 2013. </small> ] --- ## The `stagedtrees` package `stagedtrees` package is available on [CRAN](https://cran.r-project.org/package=stagedtrees) and [github](https://github.com/stagedtrees/stagedtrees) under MIT license. <small> F Carli, M Leonelli, E Riccomagno, G Varando, The R Package stagedtrees for Structural Learning of Stratified Staged Trees, 2020, https://arxiv.org/abs/2004.06459. </small> [](https://github.com/stagedtrees/stagedtrees/actions) [](https://app.codecov.io/github/stagedtrees/stagedtrees) [](https://cran.r-project.org/package=stagedtrees) For this talk we use the development version from the github repository ``` r remotes::install_github("stagedtrees/stagedtrees") ``` --- ## In action .pull-left[ ``` r model <- Titanic |> full(order = c("Sex", "Age", "Class", "Survived")) |> stages_bhc() model ``` ``` ## Staged event tree (fitted) ## Sex[2]->Age[2]->Class[4]->Survived[2] ## 'log Lik.' -5157.759 (df=19) ``` ``` r prob(model, c(Survived = "Yes"), conditional_on = c(Class = "2nd")) ``` ``` ## [1] 0.4468562 ``` ] .pull-right[ ``` r plot(model, cex_nodes = 3, cex_label_edges = 2) ``` <img src="data:image/png;base64,#IVusuariosRvalencia_files/figure-html/unnamed-chunk-7-1.png" width="100%" /> ] --- ## What is Causal Inference? Causal inference seeks to estimate causal effects from data by combining statistical models, assumptions, and observed information. * The primary target of interest is the causal treatment effect, also called the average treatment effect (ATE) `\(\text{ATE} = E[Y(1) - Y(0)]\)` * The Fundamental Problem of Causal Inference: For any given unit, we can only observe one potential outcome, either the outcome under treatment or the outcome under control, but never both (the counterfactual). * Randomized controlled trials (RCTs) are considered the gold standard for es- timating ATEs because randomization balances both observed and unobserved confounders. * Assuming consistency, positivity, and conditional exchangeability the average treatment effect can be estimated from observational data alone. --- ### Causal Inference in the `stagedtrees` package We implemented the `potential_outcomes()` function, ``` r potential_outcomes(model, "Survived", "Class") ``` ``` ## Survived ## Class No Yes ## 1st 0.5252169 0.4747831 ## 2nd 0.6781598 0.3218402 ## 3rd 0.7858233 0.2141767 ## Crew 0.6153188 0.3351582 ``` which is different from conditional probabilities, ``` r matrix(c(prob(model, c(Survived = "No"), as.data.frame(model$tree["Class"])), prob(model, c(Survived = "Yes"), as.data.frame(model$tree["Class"]))), ncol = 2, byrow = FALSE, dimnames = model$tree[c("Class", "Survived")]) ``` ``` ## Survived ## Class No Yes ## 1st 0.3800478 0.6199522 ## 2nd 0.5531438 0.4468562 ## 3rd 0.7584015 0.2415985 ## Crew 0.7606468 0.2393532 ``` --- ### Average Treatment Effect From the potential outcomes, we caneasily compute average treatment effects: ``` r po <- potential_outcomes(model, "Survived", "Class") po["1st",] - po["3rd",] ``` ``` ## No Yes ## -0.2606065 0.2606065 ``` Thus the model estimate that 1st class passenger had 26% more probability of surviving than in 3rd class (assuming Gender and Age are sufficient). --- ### Standardization <img src="data:image/png;base64,#IVusuariosRvalencia_files/figure-html/unnamed-chunk-11-1.png" width="50%" /><img src="data:image/png;base64,#IVusuariosRvalencia_files/figure-html/unnamed-chunk-11-2.png" width="50%" /> --- ### Propensity-score stratification <img src="data:image/png;base64,#IVusuariosRvalencia_files/figure-html/unnamed-chunk-12-1.png" width="50%" /><img src="data:image/png;base64,#IVusuariosRvalencia_files/figure-html/unnamed-chunk-12-2.png" width="50%" /> --- ### Positivity violations Staged event tree are useful graphical tools to check violation of positivity assumptions. <img src="data:image/png;base64,#IVusuariosRvalencia_files/figure-html/unnamed-chunk-13-1.png" width="75%" style="display: block; margin: auto;" /> --- Example with depression data from https://www.kaggle.com/datasets/shahzadahmad0402/depression-and-anxiety-data .pull-left[ ``` r order <- c("gender", "bmi_cat", "anxiety_diagnosis", "anxiety_treatment", "anxiousness") model <- full(data, order = order) |> stages_bhc() ``` ] .pull-right[ <img src="data:image/png;base64,#IVusuariosRvalencia_files/figure-html/unnamed-chunk-16-1.png" width="95%" style="display: block; margin: auto;" /> ] ``` r model_f <- subtree(model, c(gender = "female")) potential_outcomes(model_f, "anxiousness", "anxiety_treatment") ``` ``` ## anxiousness ## anxiety_treatment anxiety notanxiety ## treated 0.6888049 0.3111951 ## untreated 0.8034879 0.1965121 ``` --- ### Uncertainty quantification The right heart catheterization (RHC) dataset from the `ATbounds` package. .pull-left[ ``` r order <- c("sex", "race", "income", "cat", "RHC", "survival") boot <- pbreplicate(50, { data <- RHC[sample(nrow(RHC), replace = TRUE),] model <- full(data, order = order, lambda = 1e-5, join_unobserved = FALSE) |> stages_hclust(k = 2) diff(potential_outcomes(model, "survival", "RHC"))[,2] }, cl = 5) ``` ] .pull-right[ ``` r hist(boot) ``` <!-- --> ] --- ### Thank you for the attention * The `stagedtrees` package is available on CRAN and github (https://github.com/stagedtrees/stagedtrees). #### References <small> * Varando, G., Leonelli, M., Cerdà -Bautista, J., Sitokonstantinou, V., & Camps-Valls, G. (2025). Staged Event Trees for Transparent Treatment Effect Estimation. arXiv preprint arXiv:2509.26265. * Carli, F., Leonelli, M., Riccomagno, E., & Varando, G. (2022). The R package stagedtrees for structural learning of stratified staged trees. Journal of Statistical Software, 102, 1-30. * Leonelli, Manuele, and Gherardo Varando. Context-specific causal discovery for categorical data using staged trees." International conference on artificial intelligence and statistics. PMLR, 2023. </small>