Supplementary Material
In psychology, one of the core questions is how psychopathology comes about, with the network theory positing that mental disorder is produced by a system of direct and mutual causal interactions between symptoms that reinforce each other via feedback loops (Borsboom, 2017). In practice, empirical researchers often aim to gain insights into these causal relations by fitting statistical network models to observational (cross-sectional) data (Robinaugh et al., 2020). Although, statistical network models do not reflect causal relations, it is described such that researchers can use network models to generate causal hypotheses (Borsboom & Cramer, 2013). Regarding the utility of a network model as a causal discovery tool, there has been some research done, for example comparing the statistical network models to the directed acyclic graph models (DAG) (Dablander & Hinne, 2019). However, as Borsboom (2017) suggests, in reality, the true causal dynamics of psychopathology are likely to contain cycles, raising questions about the suitability of using DAGs in this context. This motivates our research into directed cyclic graph models (DCG).
We aim to investigate the utility of statistical network models as tools for causal discovery in cyclic settings compared to the directed cyclic graph (DCG) models estimated by a causal discovery method. We try to answer the question: how do the statistical network models behave in comparison to the DCGs when the true causal system contains cycles?
To compare the performance of statistical network models (GGM) in comparison to the directed cyclic graph (DCG) models as causal discovery tools, we perform a simulation study. In this section, we discuss the causal discovery method we decide to use, the data generating process, the specifics of simulation study design, and the evaluation metrics.
In this simulation study, we decide to use cyclic causal discovery algorithm (CCD) (Richardson, 1996b). There are other cyclic causal discovery methods exist (Strobl, 2019), but CCD is a relatively simple one to use and can estimate the cyclic structure with the asymptotic correctness (Richardson, 1996a). CCD outputs a partial ancestral graph (PAG), which represents the Markov equivalence class (i.e., statistically indistinguishable) of DCGs. See Figure 2.1 for an example PAG and the corresponding Markov equivalence class of DCGs.
Figure 2.1: Example partial ancestral graph (PAG) estimated by CCD algorithm
We simulate data from different causal models, all of which are characterized by linear causal relationships and independent Gaussian error terms, which are commonly assumed in psychological research. To generate data, we first choose a weight matrix \(B\), and a covariance matrix \(\boldsymbol{\varepsilon}\) for the error terms \(\epsilon\). Next, we sample the error terms from a joint normal distribution and generate data by solving the following systems of equations: \[ \boldsymbol{X} = (\boldsymbol{I} − \boldsymbol{B})^{-1} \boldsymbol{\varepsilon}. \] Note that the equations may not have unique solutions for some causal models. The necessary condition to have a unique solution is that \((I − B)\) is invertible and correspondingly the cycles converge to an equilibrium state. In cyclic models, this condition is only satisfied when the absolute values of eigenvalues of \(B\) are all smaller than one, \(|\lambda| < 1\) (Eberhardt et al., 2010). Hence, we ensure that the specified \(B\) matrix satisfies the aforementioned condition in our simulation study.
We generate data from different types of cyclic models by varying the number of variables (p = 4, 5, 6) as well as density (sparse/dense), which results in 3×2 design (see Figure 2.2). The sample size is fixed to \(10^6\) across all simulated models. We specify such a large sample size in order to minimize the variability due to sampling error and to enhance the estimation stability.
Figure 2.2: Simulation design
,where \(E\) represents the total number of existing edges in the model and \(p\) represents the number of nodes.
\[ Degree (i) = \sum_{j=1}^{n}a_{ij} \] ,where \(a_{ij}\) represents the element at row \(i\) and column \(j\) of the adjacency matrix \(A\).
Below, we show the results from comparing the GGM with the PAG (i.e., partial ancestral graph: representation of the equivalence class of DCGs) in all 6 different simulated cases in terms of the overall density and degree centrality, as we previously described in 2.3.
## Specify B matrix
# set the number of nodes (p)
p = 4
B4 = matrix(c(0, 0, 0, 0,
1, 0, 0.5, 0,
0, 0.5, 0, 0.9,
0, 0, 0, 0), p, p, byrow = T)
colnames(B4) <- c("X1", "X2", "X3", "X4")
## Generate data
# first, equilibrium check (necessary condition for cyclic models to converge)
equilibrium_check(B4)
# generated data with N = 10^6, seed = 1
data4p <- gen_dat(B4, N =1e6, seed = 1)
## Specify layout
layout4 = matrix(c(-1,1,
-1,0,
1,0,
1,1),4,2,byrow = T)
layout(t(1:3))
par(oma=c(0, 0, 6, 0))
## True cyclic graph
true4p <- qgraph(t(B4), layout=layout4, labels = colnames(B4),
theme="colorblind", vsize = 20, asize = 10)
title("True cyclic graph", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.11)
## Estimate GGM
ggm4p <- qgraph(cor(data4p), layout=layout4, theme="colorblind", vsize = 20, asize = 10)
title(main = "GGM", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.5)
## Run CCD algorithm
ccd_4p <- ccdKP(df=data4p, dataType = "continuous", alpha = 0.05)
mat4p <- CreateAdjMat(ccd_4p, 4)
## Estimate PAG
pag4p <- plotPAG(ccd_4p, mat4p)
title(main = "PAG", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.86)Figure 3.1: four nodes sparse case
## Compute equivalence class of all DCGs given the PAG
# (this takes relatively a long time, so we save the object)
# equiv4p <- semiequiv_dcg(ccd_4p, mat4p)
# save(equiv4p, file="data/equiv4p.RData")
load("../data/equiv4p.RData")NOTE. In the PAG representation, there exists two types of underlining that can be used in a triple of nodes: solid underlining (A - B - C) and dotted underlining (A - B - C). The colored nodes (in blue) in PAGs refer to the presence of the solid underlinings and the dashed nodes refer to the presence of the dotted underlinings on the corresponding nodes. These underlinings are used to further orient the edges in a PAG. For more information on this, see (Richardson, 1996b).
## Specify B matrix
# set the number of nodes (p)
p = 4
B4_high = matrix(c(0, 0, 0, 0,
0.9, 0, 0.4, 0,
0, 0.5, 0, .5,
-0.8, 0, 0, 0), p, p, byrow = T)
colnames(B4_high) <- c("X1", "X2", "X3", "X4")
## Generate data
# first, equilibrium check (necessary condition for cyclic models to converge)
equilibrium_check(B4_high)
# generated data with N = 10^6, seed = 1
data4p_high <- gen_dat(B4_high, N =1e6, seed = 1)
## Specify layout
layout4 = matrix(c(-1,1,
-1,0,
1,0,
1,1),4,2,byrow = T)
layout(t(1:3))
par(oma=c(0, 0, 6, 0))
## True cyclic graph
true4p_high <- qgraph(t(B4_high), layout=layout4, labels = colnames(B4_high),
theme="colorblind", vsize = 20, asize = 10)
title("True cyclic graph", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.11)
## Estimate GGM
ggm4p_high <- qgraph(t(cor(data4p_high)), layout=layout4, theme="colorblind", vsize = 20, asize = 10)
title(main = "GGM", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.5)
## Run CCD algorithm
ccd_4p_high <- ccdKP(df=data4p_high, dataType = "continuous", alpha = 0.05)
mat4p_high <- CreateAdjMat(ccd_4p_high, 4)
## Estimate PAG
pag4p <- plotPAG(ccd_4p_high, mat4p_high)
title(main = "PAG", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.86)Figure 3.2: four nodes dense case
## Compute equivalence class of all DCGs given the PAG
# (this takes relatively a long time, so we save the object)
# equiv4p_high <- semiequiv_dcg(ccd_4p_high, mat4p_high)
# save(equiv4p_high, file="data/equiv4p_high.RData")
load("../data/equiv4p_high.RData")## Specify B matrix
# set the number of nodes (p)
p = 5
B5 = matrix(c(0, 1, 0, 0, 0,
0, 0, 0, 0.7, 0,
0, 0.4, 0, 0, 0,
0, 0, .5, 0, 0,
0, 0, 0, -1.5, 0), p, p, byrow = T)
colnames(B5) <- c("X1", "X2", "X3", "X4", "X5")
## Generate data
# first, equilibrium check (necessary condition for cyclic models to converge)
equilibrium_check(B5)
# generated data with N = 10^6, seed = 123
data5p <- gen_dat(B5, N =1e6, seed = 123)
## Specify layout
layout5 = matrix(c(0,1,
0,0,
1,-1,
2,0,
2,1),5,2,byrow = T)
layout(t(1:3))
par(oma=c(0, 0, 6, 0))
## True cyclic graph
true5p <- qgraph(t(B5), layout=layout5, labels = colnames(B5),
theme="colorblind", vsize = 20, asize = 10)
title("True cyclic graph", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.11)
## Estimate GGM
ggm5p <- qgraph(cor(data5p), layout = layout5, theme="colorblind", vsize = 20, asize = 10)
title(main = "GGM", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.5)
## Run CCD algorithm
ccd_5p <- ccdKP(df=data5p, dataType = "continuous", alpha = 0.05)
mat5p <- CreateAdjMat(ccd_5p, 5)
## Estimate PAG
pag5p <- plotPAG(ccd_5p, mat5p)
title(main = "PAG", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.86)Figure 3.3: five nodes sparse case
## Compute equivalence class of all DCGs given the PAG
# (this takes relatively a long time, so we save the object)
# equiv5p <- semiequiv_dcg(ccd_5p, mat5p)
# save(equiv5p, file="data/equiv5p.RData")
load("../data/equiv5p.RData")## Specify B matrix
# set the number of nodes (p)
p = 5
B5_high = matrix(c(0, 0.9, 0, 0, 0.6,
0, 0, 0, 0.7, 0,
0, 0.9, 0, 0, 0,
0, 0, 0.5, 0, 0,
0, 0, 0, 1, 0), p, p, byrow = T)
colnames(B5_high) <- c("X1", "X2", "X3", "X4", "X5")
## Generate data
# first, equilibrium check (necessary condition for cyclic models to converge)
equilibrium_check(B5_high)
# generated data with N = 10^6, seed = 1
data5p_high <- gen_dat(B5_high, N =1e6, seed = 1)
## Specify layout
layout5 = matrix(c(0,1,
0,0,
1,-1,
2,0,
2,1),5,2,byrow = T)
layout(t(1:3))
par(oma=c(0, 0, 6, 0))
## True cyclic graph
true5p_high <- qgraph(t(B5_high), layout=layout5, labels = colnames(B5_high),
theme="colorblind", vsize = 20, asize = 10)
title("True cyclic graph", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.11)
## Estimate GGM
ggm5p_high <- qgraph(cor(data5p_high), layout = layout5, theme="colorblind", vsize = 20, asize = 10)
title(main = "GGM", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.5)
## Run CCD algorithm
ccd_5p_high <- ccdKP(df=data5p_high, dataType = "continuous", alpha = 0.05)
mat5p_high <- CreateAdjMat(ccd_5p_high, 5)
## Estimate PAG
pag5p_high <- plotPAG(ccd_5p_high, mat5p_high)
title(main = "PAG", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.86)Figure 3.4: five nodes dense case
## Compute equivalence class of all DCGs given the PAG
# (this takes relatively a long time, so we save the object)
# equiv5p_high <- semiequiv_dcg(ccd_5p_high, mat5p_high)
# save(equiv5p_high, file="data/equiv5p_high.RData")
load("../data/equiv5p_high.RData")## Specify B matrix
# set the number of nodes (p)
p = 6
B6 = matrix(c(0, 0, 0, 0, 0, 0,
0.3, 0, 0.4, 0, 0, 0,
0, 0, 0, 0.9, 0, 0,
0, 0, 0, 0, 0.4, 0,
0, 0, 1, 0, 0, 0,
1, 0, 0, 0, 0.5, 0), p, p, byrow = T)
colnames(B6) <- c("X1", "X2", "X3", "X4", "X5", "X6")
## Generate data
# first, equilibrium check (necessary condition for cyclic models to converge)
equilibrium_check(B6)
# generated data with N = 10^6, seed = 123
data6p <- gen_dat(B6, N =1e6, seed = 123)
## Specify layout
layout6 = matrix(c(1, 2,
0,1,
0,0,
1,-1,
2,0,
2,1),6,2,byrow = T)
layout(t(1:3))
par(oma=c(0, 0, 6, 0))
## True cyclic graph
true6p <- qgraph(t(B6), layout=layout6, labels = colnames(B6), theme="colorblind", vsize = 20, asize = 10)
title("True cyclic graph", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.11)
## Estimate GGM
ggm6p <- qgraph(cor(data6p), layout = layout6, theme="colorblind", vsize = 20, asize = 10)
title(main = "GGM", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.5)
## Run CCD algorithm
ccd_6p <- ccdKP(df=data6p, dataType = "continuous", alpha = 0.05)
mat6p <- CreateAdjMat(ccd_6p, 6)
## Estimate PAG
pag6p <- plotPAG(ccd_6p, mat6p)
title(main = "PAG", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.86)Figure 3.5: six nodes sparse case
## Compute equivalence class of all DCGs given the PAG
# (this takes relatively a long time, so we save the object)
# equiv6p <- semiequiv_dcg(ccd_6p, mat6p)
# save(equiv6p, file="data/equiv6p.RData")
load("../data/equiv6p.RData")## Specify B matrix
# set the number of nodes (p)
p = 6
B6_high = matrix(c(0, 0, 0, 0, 0, 0,
0.7, 0, 0.4, 0, 0, 0.9,
0, 0, 0, 0.9, 0, 0,
0, 0, 0, 0, 0.4, 0,
0, 0, 1, 0, 0, 0,
1, 0, 0, 0, 0.5, 0), p, p, byrow = T)
# colnames for B matrix is necessary for running CCD
colnames(B6_high) <- c("X1", "X2", "X3", "X4", "X5", "X6")
## Generate data
# first, equilibrium check (necessary condition for cyclic models to converge)
equilibrium_check(B6_high)
# generated data with N = 10^6, seed = 123
data6p_high<- gen_dat(B6_high, N =1e6, seed = 123)
layout(t(1:3))
par(oma=c(0, 0, 6, 0))
## True cyclic graph
# we use the layout specified earlier in the 6p-sparse model.
true6p_high <- qgraph(t(B6_high), layout=layout6, labels = colnames(B6), theme="colorblind", vsize = 20, asize = 10)
title("True cyclic graph", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.11)
## Estimate GGM
ggm6p_high <- qgraph(cor(data6p_high), layout = layout6, theme="colorblind", vsize = 20, asize = 10)
title(main = "GGM", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.5)
## Run CCD algorithm
ccd_6p_high <- ccdKP(df=data6p_high, dataType = "continuous", alpha = 0.05)
mat6p_high <- CreateAdjMat(ccd_6p_high, 6)
## Estimate PAG
pag6p_high <- plotPAG(ccd_6p_high, mat6p_high)
title(main = "PAG", font.main = 1, cex.main = 1.2, line = 4, outer=TRUE, adj = 0.86)Figure 3.6: six nodes dense case
## Compute equivalence class of all DCGs given the PAG
# (this takes relatively a long time, so we save the object)
# equiv6p_high <- semiequiv_dcg(ccd_6p_high, mat6p_high)
# save(equiv6p_high, file="data/equiv6p_high.RData")
load("../data/equiv6p_high.RData")Below, we can see the overall density for each of the simulated models. Figure 3.7 (a) shows the density of the sparse models, and Figure 3.7 (b) shows the density of the dense models. Across conditions, the DCGs approximate the true density more closely than the GGMs (as the red line follows the yellow line more closely). GGMs (green line) almost always overestimate the density. In addition, it can be seen that DCGs more clearly outperform GGMs when the true causal model is sparse. This could be due to the fact that the causal discovery algorithm (CCD) also struggles to estimate, when the true model is dense.
## Compute densities
# density for true models
trueden <- list(B4, B4_high, B5, B5_high, B6, B6_high) %>%
map( ~ truemoddensity(.)) %>% unlist() %>% as.data.frame() %>% rename("TRUE"=".")
# density for GGM
ggmden <- list(ggm4p, ggm4p_high, ggm5p, ggm5p_high, ggm6p, ggm6p_high) %>%
map( ~ GGMdensity(.)) %>% unlist() %>% as.data.frame() %>% rename("GGM"=".")
# density for true DCG
dcgden <- list(equiv4p, equiv4p_high, equiv5p, equiv5p_high, equiv6p, equiv6p_high) %>%
map( ~ DCGdensity(.)) %>% unlist() %>% as.data.frame() %>% rename("DCG"=".")
# bind them together
modeldensities <- bind_cols(trueden, ggmden, dcgden) %>%
magrittr::set_rownames(c("4p-sparse", "4p-dense", "5p-sparse", "5p-dense", "6p-sparse", "6p-dense")) %>%
mutate(model = rownames(.)) %>% tidyr::pivot_longer(!model, names_to = "type", values_to="density")
## Specify my custom theme
MyTheme <- theme(plot.title = element_blank(),
plot.subtitle = element_text( face = "italic"),
axis.text=element_text(face = "bold"),
legend.text = element_text(face = "bold"))
## Create density plots
# Low-density (sparse) conditions
low_densities <- modeldensities %>%
filter(model %in% c("4p-sparse", "5p-sparse", "6p-sparse")) %>%
ggplot(aes(x=model, y=density, group = type, colour = type)) +
scale_colour_manual(values = c("#FF0000", "#00A08A", "#F2AD00"), name= "") +
geom_line(aes(group = type)) +
geom_point() +
labs(x="", y="", title = "", subtitle = "(a) Sparse Condition") +
theme_classic() + MyTheme
# High-density (sparse) conditions
high_densities <- modeldensities %>%
filter(model %in% c("4p-dense", "5p-dense", "6p-dense")) %>%
ggplot(aes(x=model, y=density, group = type, colour = type)) +
scale_colour_manual(values = c("#FF0000", "#00A08A", "#F2AD00"), name="") +
geom_line(aes(group = type)) +
geom_point() +
labs(x="", y="", title = "", subtitle = "(b) Dense Condition") +
theme_classic() + MyTheme
# combine plots
ggarrange(low_densities, high_densities, ncol = 1, nrow = 2, common.legend = TRUE, legend = "bottom") %>%
annotate_figure(top = text_grob("Comparing the overall density of different models", face = "bold", family = "Palatino"))Figure 3.7: Overall density of all six different cases
Figure 3.8 below shows the degree centrality for each of the simulated models. Across conditions, the DCGs again approximate the true degree for all nodes more closely than the GGMs, except for the 4node-dense case (see the top right graph in Figure 3.8). As previously mentioned, the GGMs always overestimate the density, which naturally leads to the higher degree for almost all the nodes in the models. The exceptional 4node-dense case seems to be a difficult model for CCD to estimate, as it is a rather small model with a high density including a cycle. As you can see in the corresponding PAG in 3.2, CCD was not able to infer any directions, which subsequently resulted in a lot of possible DCGs. Probably that is why the degrees are shown to deviate a lot, as there were numerous DCGs that were considered to compute the average degrees in this case.
## Compute degrees
trueobj <- list(true4p, true4p_high, true5p, true5p_high, true6p, true6p_high)
ggmobj <- list(ggm4p, ggm4p_high, ggm5p, ggm5p_high, ggm6p, ggm6p_high)
dcgobj <- list(equiv4p, equiv4p_high, equiv5p, equiv5p_high, equiv6p, equiv6p_high)
modelnames <- c("4node-sparse","4node-dense","5node-sparse","5node-dense","6node-sparse","6node-dense")
deglist <- list()
for(i in seq_along(trueobj)){
deglist[[i]] <- bind_cols(GGMdegree(trueobj[[i]]),GGMdegree(ggmobj[[i]]), DCGdegree(dcgobj[[i]])) %>%
select(contains(c("node...1", "degree"))) %>%
rename(node = node...1, truedegree = degree...2, ggmdegree = degree...4 , dcg_avgdegree = average_degree) %>%
tidyr::pivot_longer(!node, names_to = "model", values_to = "degree") %>%
mutate(name = modelnames[i]) %>%
suppressMessages() # suppress messages for renaming columns
}
## Create degree centrality plots
degplots <- deglist %>%
map(~
ggplot(data = ., aes(x = degree, y = node, group = model, colour = model)) +
geom_point() + geom_path(aes(group = model)) +
labs(x="", y="", subtitle=.$name[1]) +
scale_colour_manual(values = c("#FF0000", "#00A08A", "#F2AD00"), name = "",
labels = c("DCG", "GGM", "TRUE")) + theme_bw() + MyTheme
)
# combine plots
ggarrange(plotlist = degplots,
ncol = 2, nrow = 3, common.legend = TRUE, legend = "bottom") %>%
annotate_figure(top = text_grob("Comparing the degree of different models",
face = "bold", family = "Palatino"))Figure 3.8: Degree centrality plots for all six different cases
The conclusion based on these results is that statistical network models perform poorly as causal discovery tools in cyclic settings and hence, it shall be preferred to use the purpose-built cyclic causal discovery methods such as CCD, when one is interested in the underlying causal mechanism of the mental disorder dynamics.