DAG-based causal inference vs AIC model selection

Author

Published

January 19, 2026

Purpose

Demonstrates why AIC fails causally

Statistical model selection procedures such as Akaike’s Information Criterion (AIC) are widely used to identify well-fitting models and are often implicitly treated as tools for causal inference. However, model selection optimizes predictive performance rather than causal validity, and may therefore favor models that yield biased estimates of causal effects. In contrast, causal directed acyclic graphs (DAGs) make assumptions about the data-generating process explicit and provide principled rules for covariate adjustment that are independent of model fit criteria. Here, we use simulation to demonstrate that DAG-based adjustment can outperform model selection for causal effect estimation, even when the model preferred by AIC has substantially better predictive fit.

Specifically, we simulate data under two common causal structures: 1. a scenario involving a collider 2 a scenario involving a confounder

For each scenario, we generate data 100 times from a known data-generating process with a fixed true causal effect. For each simulated dataset, we estimate the effect of an exposure on an outcome using both a DAG-identified adjustment set and models selected using AIC. We then compare the distribution and mean of estimated effects across replications to the known true effect. This repeated-simulation approach allows to assess bias and consistency, and to illustrate how model selection can systematically favor causally invalid models despite superior predictive performance.

0.1 Packages

Code

# Load packages
library(ggdag)


Attaching package: 'ggdag'

The following object is masked from 'package:stats':

    filter

Code

library(dagitty)
library(ggplot2)
library(viridis)

Loading required package: viridisLite

0.2 Functions

Code

adjustment_set_formulas <- function(dag, exposure, required_variable, outcome, effect = "total",
    type = "minimal", formula_parts = c(outcome), latent = NULL, remove = NULL, plot = TRUE,
    ...) {
    if (plot)
        gg <- ggdag_adjustment_set(.tdy_dag = tidy_dagitty(dag), exposure = exposure,
            outcome = outcome, ...) + theme_dag()

    temp_set <- adjustmentSets(x = dag, exposure = exposure, outcome = outcome, effect = effect,
        type = type)

    form_set <- lapply(temp_set, function(x) {
        if (!is.null(remove))
            x <- x[!x %in% remove]
        if (length(x) > 0){
        form <- paste(formula_parts[1], " ~ ", exposure, " + ", paste(x, collapse = " + "),
            if (length(formula_parts) == 2)
                formula_parts[2] else NULL)
        } else {
            form <- paste(formula_parts[1], " ~ ", exposure, if (length(formula_parts) == 2)
                formula_parts[2] else NULL)
        }

        return(form)
    })

    form_set <- form_set[!duplicated(form_set)]

    if (!is.null(latent))
        for (i in latent) form_set <- form_set[!grepl(paste0(" ", i, " "), form_set)]

    # form_set <- sapply(form_set, as.formula)

    names(form_set) <- seq_along(form_set)

    # add formula as attribute
    attributes(form_set)$exposure.formula <- paste(formula_parts[1], " ~ ", exposure,
        if (length(formula_parts) == 2)
            formula_parts[2] else NULL)

    if (plot)
        return(gg) else return(form_set)
}

1 Scenario 1: colliders

A collider is a variable that is causally influenced by two other variables, such as when both an exposure and an outcome affect a third variable. In DAG terminology, a collider appears in the structure X→C←Y. A key property of colliders is that, although they block association between their causes by default, conditioning on a collider (or one of its descendants) opens a non-causal path between those causes. As a result, controlling for a collider induces a spurious association that did not exist prior to conditioning. This phenomenon, known as collider bias, means that including a collider as a covariate can bias causal effect estimates, even if the collider strongly predicts the outcome. Because model selection criteria reward improved predictive fit, they often favor models that condition on colliders, making collider bias a systematic failure mode of model selection for causal inference.

1.1 Simulated world

Code

# DAG structure:
# U -> X -> Y
# U -> Y
# X -> C <- Y   (C is a collider)

dag <- dagify(
    Y ~ X,
    C ~ X + Y,
    exposure = "X",
    outcome  = "Y"
)


tidy_dag <- tidy_dagitty(dag)
tidy_dag$data$type <- ""
tidy_dag$data$label <- tidy_dag$data$name
tidy_dag$data$label[tidy_dag$data$name == "X"] <- "X\n(predictor)" 
tidy_dag$data$label[tidy_dag$data$name == "Y"] <- "Y\n(response)" 
tidy_dag$data$label[tidy_dag$data$name == "C"] <- "Z\n(collider)" 

ggplot(tidy_dag, aes(x = x, y = y, xend = xend, yend = yend)) + scale_color_viridis_d(begin = 0.2,
    end = 0.8, alpha = 0.5) + geom_dag_edges_fan(edge_color = viridis(10, alpha = 0.4)[2],
    arrow = grid::arrow(length = grid::unit(10, "pt"), type = "closed")) + geom_dag_point(size = 24) + geom_dag_text(color = "white", aes(label = label))  + theme_dag(base_size = 14) + theme(legend.position = "bottom")

1.2 Simulation

Code

# sample size
N <- 100

# True causal effect of X -> Y 
eff_size <- 10

out <- pbapply::pblapply(1:100, function(i) {

    # X is the exposure
    X <- rnorm(N)
    
    # Y is the outcome
    Y <- eff_size * X + rnorm(N)
    
    # C is a COLLIDER caused by both X and Y
    C <- X + Y + rnorm(N)
    
    dat <- data.frame(X, Y, C)
    
    m_dag <- lm(Y ~ X, data = dat)
    
    ## model selection approach
    
    # Candidate models a researcher might try
    m1 <- lm(Y ~ X , data = dat)
    m2 <- lm(Y ~ X + C, data = dat)
    
    # define adjustment set
    aictab <- AIC(m1, m2)
    
    # get object from name
    aic_mod <- get(rownames(aictab)[which.min(aictab$AIC)])
    
    # DAG approach
    # 
    d <- as.dagitty(dag)
    
    # Find valid adjustment sets
    
    # All minimal sufficient adjustment sets
    a <- adjustment_set_formulas(
        d,
        exposure = "X",
        outcome = "Y",
        type = "all",
        plot = FALSE
    )
    
    dag_mod <- lm(a[[1]], data = dat)
    
    
    # return estimates
    return(data.frame(
        method = c("DAG causal\ninference", "Model selection"),
        estimate = c(coef(dag_mod)["X"], coef(aic_mod)["X"])
    ))
    
})

results <- do.call(rbind, out)

1.3 Results

Blue line represents true effect size:

Code

agg_results <- aggregate(estimate ~ method, data = results, mean)

agg_results$sd <- aggregate(estimate ~ method, data = results, sd)$estimate

ggplot(agg_results, aes(x = method, y = estimate)) +
    geom_pointrange(aes(ymin = estimate - sd, ymax = estimate + sd)) +
    geom_hline(yintercept = eff_size,
               linetype = "dashed",
               color = "blue") +
    ylab("Estimated effect of X on Y") +
    xlab("Method")

2 Scenario 2: confounders

A confounder is a variable that causally influences both the exposure and the outcome, creating a spurious association between them if left unadjusted. In DAGs, a confounder appears in the structure X←Z→Y. Unlike colliders, confounders transmit non-causal association by default, and conditioning on them blocks this backdoor path. Proper adjustment for confounders is therefore necessary to recover unbiased estimates of causal effects. However, model selection procedures do not distinguish confounders from other predictive variables, such as mediators or post-treatment variables. As a result, even in the presence of a true confounder, model selection may favor over-controlled models that include variables downstream of the outcome, leading to biased estimates of the total causal effect. DAG-based reasoning is thus required not only to identify confounders that should be adjusted for, but also to identify variables that should not be conditioned on, regardless of their predictive utility.

2.1 Simulated world

Code

# DAG structure:
#
# Z -> X -> Y -> M
# Z -> Y
#
# Z is a confounder (should adjust)
# M is post-treatment (should NOT adjust for total effect)

dag <- dagify(
  X ~ Z,
  Y ~ X + Z,
  M ~ Y,
  exposure = "X",
  outcome  = "Y"
)

tidy_dag <- tidy_dagitty(dag)
tidy_dag$data$type <- ""
tidy_dag$data$label <- tidy_dag$data$name
tidy_dag$data$label[tidy_dag$data$name == "M"] <- "M\n(mediator)" 
tidy_dag$data$label[tidy_dag$data$name == "X"] <- "X\n(predictor)" 
tidy_dag$data$label[tidy_dag$data$name == "Y"] <- "Y\n(response)" 
tidy_dag$data$label[tidy_dag$data$name == "Z"] <- "Z\n(counfounder)" 

ggplot(tidy_dag, aes(x = x, y = y, xend = xend, yend = yend)) + scale_color_viridis_d(begin = 0.2,
    end = 0.8, alpha = 0.5) + geom_dag_edges_fan(edge_color = viridis(10, alpha = 0.4)[2],
    arrow = grid::arrow(length = grid::unit(10, "pt"), type = "closed")) + geom_dag_point(size = 24) + geom_dag_text(color = "white", aes(label = label))  + theme_dag(base_size = 14) + theme(legend.position = "bottom")

2.2 Simulation

Code

# sample size
N <- 100

# True causal effect of X -> Y 
eff_size <- 1

out <- pbapply::pblapply(1:100, function(i) {

    
    # Z is a TRUE CONFOUNDER (common cause of X and Y)
    Z <- rnorm(N)
    
    # X is the exposure
    X <- 0.8 * Z + rnorm(N)
    
    # Y is the outcome
    # True causal effect of X -> Y is EXACTLY 1.0
    Y <- 1.0 * X + 0.8 * Z + rnorm(N)

    # M is a POST-TREATMENT VARIABLE (mediator / outcome proxy)
    # Caused by Y, and therefore downstream of X
    M <- 0.9 * Y + rnorm(N)

dat <- data.frame(X, Y, Z, M)
    
    m_dag <- lm(Y ~ X, data = dat)
    
    ## model selection approach
    
    # Candidate models a researcher might try
    # m1: No adjustment (confounding bias)
m1 <- lm(Y ~ X, data = dat)

# m2: Correct DAG-based model (adjust for confounder Z)
m2 <- lm(Y ~ X + Z, data = dat)

# m3: Overcontrolled model (adjusts for post-treatment variable M)
m3 <- lm(Y ~ X + Z + M, data = dat)
    
# define adjustment set
    aictab <- AIC(m1, m2, m3)
    
    # get object from name
    aic_mod <- get(rownames(aictab)[which.min(aictab$AIC)])
    
    # DAG approach
    # 
    d <- as.dagitty(dag)
    
    # Find valid adjustment sets
    
    # All minimal sufficient adjustment sets
    a <- adjustment_set_formulas(
        d,
        exposure = "X",
        outcome = "Y",
        type = "all",
        plot = FALSE
    )
    
    dag_mod <- lm(a[[1]], data = dat)
    
    
    # return estimates
    return(data.frame(
        method = c("DAG causal\ninference", "Model selection"),
        estimate = c(coef(dag_mod)["X"], coef(aic_mod)["X"])
    ))
    
})

results2 <- do.call(rbind, out)

2.3 Results

Blue line represents true effect size:

Code

agg_results2 <- aggregate(estimate ~ method, data = results2, mean)

agg_results2$sd <- aggregate(estimate ~ method, data = results2, sd)$estimate

ggplot(agg_results2, aes(x = method, y = estimate)) +
    geom_pointrange(aes(ymin = estimate - sd, ymax = estimate + sd)) +
    geom_hline(yintercept = eff_size,
               linetype = "dashed",
               color = "blue") +
    ylab("Estimated effect of X on Y") +
    xlab("Method")

Session information

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       Ubuntu 22.04.2 LTS
 system   x86_64, linux-gnu
 ui       X11
 language es_ES:en
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Costa_Rica
 date     2026-01-19
 pandoc   3.8 @ /home/marce/anaconda3/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package      * version date (UTC) lib source
 boot           1.3-28  2021-05-03 [4] CRAN (R 4.1.1)
 cachem         1.1.0   2024-05-16 [1] CRAN (R 4.3.2)
 cli            3.6.5   2025-04-23 [1] CRAN (R 4.3.2)
 curl           7.0.0   2025-08-19 [1] CRAN (R 4.3.2)
 dagitty      * 0.3-4   2023-12-07 [1] CRAN (R 4.3.2)
 devtools       2.4.5   2022-10-11 [1] CRAN (R 4.3.2)
 digest         0.6.39  2025-11-19 [1] CRAN (R 4.3.2)
 dplyr          1.1.4   2023-11-17 [1] CRAN (R 4.3.2)
 ellipsis       0.3.2   2021-04-29 [3] CRAN (R 4.1.1)
 evaluate       1.0.5   2025-08-27 [1] CRAN (R 4.3.2)
 farver         2.1.2   2024-05-13 [1] CRAN (R 4.3.2)
 fastmap        1.2.0   2024-05-15 [1] CRAN (R 4.3.2)
 fs             1.6.6   2025-04-12 [1] CRAN (R 4.3.2)
 generics       0.1.4   2025-05-09 [1] CRAN (R 4.3.2)
 ggdag        * 0.2.12  2024-03-08 [1] CRAN (R 4.3.2)
 ggforce        0.4.2   2024-02-19 [1] CRAN (R 4.3.2)
 ggplot2      * 4.0.1   2025-11-14 [1] CRAN (R 4.3.2)
 ggraph         2.2.1   2024-03-07 [1] CRAN (R 4.3.2)
 ggrepel        0.9.5   2024-01-10 [1] CRAN (R 4.3.2)
 glue           1.8.0   2024-09-30 [1] CRAN (R 4.3.2)
 graphlayouts   1.1.1   2024-03-09 [1] CRAN (R 4.3.2)
 gridExtra      2.3     2017-09-09 [3] CRAN (R 4.0.1)
 gtable         0.3.6   2024-10-25 [1] CRAN (R 4.3.2)
 htmltools      0.5.9   2025-12-04 [1] CRAN (R 4.3.2)
 htmlwidgets    1.6.4   2023-12-06 [1] CRAN (R 4.3.2)
 httpuv         1.6.13  2023-12-06 [1] CRAN (R 4.3.2)
 igraph         2.1.4   2025-01-23 [1] CRAN (R 4.3.2)
 jsonlite       2.0.0   2025-03-27 [1] CRAN (R 4.3.2)
 knitr          1.50    2025-03-16 [1] CRAN (R 4.3.2)
 labeling       0.4.3   2023-08-29 [1] CRAN (R 4.3.2)
 later          1.3.2   2023-12-06 [1] CRAN (R 4.3.2)
 lifecycle      1.0.4   2023-11-07 [1] CRAN (R 4.3.2)
 magrittr       2.0.4   2025-09-12 [1] CRAN (R 4.3.2)
 MASS           7.3-55  2022-01-13 [4] CRAN (R 4.1.2)
 memoise        2.0.1   2021-11-26 [3] CRAN (R 4.1.2)
 mime           0.13    2025-03-17 [1] CRAN (R 4.3.2)
 miniUI         0.1.1.1 2018-05-18 [3] CRAN (R 4.0.1)
 pbapply        1.7-4   2025-07-20 [1] CRAN (R 4.3.2)
 pillar         1.11.1  2025-09-17 [1] CRAN (R 4.3.2)
 pkgbuild       1.4.8   2025-05-26 [1] CRAN (R 4.3.2)
 pkgconfig      2.0.3   2019-09-22 [3] CRAN (R 4.0.1)
 pkgload        1.4.0   2024-06-28 [1] CRAN (R 4.3.2)
 polyclip       1.10-7  2024-07-23 [1] CRAN (R 4.3.2)
 profvis        0.3.8   2023-05-02 [1] CRAN (R 4.3.2)
 promises       1.2.1   2023-08-10 [1] CRAN (R 4.3.2)
 purrr          1.2.0   2025-11-04 [1] CRAN (R 4.3.2)
 R6             2.6.1   2025-02-15 [1] CRAN (R 4.3.2)
 RColorBrewer   1.1-3   2022-04-03 [1] CRAN (R 4.3.2)
 Rcpp           1.1.0   2025-07-02 [1] CRAN (R 4.3.2)
 remotes        2.5.0   2024-03-17 [1] CRAN (R 4.3.2)
 rlang          1.1.6   2025-04-11 [1] CRAN (R 4.3.2)
 rmarkdown      2.30    2025-09-28 [1] CRAN (R 4.3.2)
 rstudioapi     0.17.1  2024-10-22 [1] CRAN (R 4.3.2)
 S7             0.2.1   2025-11-14 [1] CRAN (R 4.3.2)
 scales         1.4.0   2025-04-24 [1] CRAN (R 4.3.2)
 sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
 shiny          1.8.0   2023-11-17 [1] CRAN (R 4.3.2)
 stringi        1.8.7   2025-03-27 [1] CRAN (R 4.3.2)
 stringr        1.6.0   2025-11-04 [1] CRAN (R 4.3.2)
 tibble         3.3.0   2025-06-08 [1] CRAN (R 4.3.2)
 tidygraph      1.3.1   2024-01-30 [1] CRAN (R 4.3.2)
 tidyr          1.3.1   2024-01-24 [1] CRAN (R 4.3.2)
 tidyselect     1.2.1   2024-03-11 [1] CRAN (R 4.3.2)
 tweenr         2.0.2   2022-09-06 [1] CRAN (R 4.3.2)
 urlchecker     1.0.1   2021-11-30 [1] CRAN (R 4.3.2)
 usethis        3.1.0   2024-11-26 [1] CRAN (R 4.3.2)
 V8             4.4.2   2024-02-15 [1] CRAN (R 4.3.2)
 vctrs          0.6.5   2023-12-01 [1] CRAN (R 4.3.2)
 viridis      * 0.6.5   2024-01-29 [1] CRAN (R 4.3.2)
 viridisLite  * 0.4.2   2023-05-02 [1] CRAN (R 4.3.2)
 withr          3.0.2   2024-10-28 [1] CRAN (R 4.3.2)
 xfun           0.55    2025-12-16 [1] CRAN (R 4.3.2)
 xtable         1.8-4   2019-04-21 [3] CRAN (R 4.0.1)
 yaml           2.3.12  2025-12-10 [1] CRAN (R 4.3.2)

 [1] /home/marce/R/x86_64-pc-linux-gnu-library/4.3
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────