This tutorial is demonstrates how to use networks (specifically Gaussian graphical models or GGMs) to
In psychology, network models are almost never used to generate hypotheses. This is puzzling because one of the original reasons researchers began using them was exactly this. Networks are also thought of highly exploratory tools, despite their potential for confirmatory tests. However, there is plenty of room for networks in this respect. These ideas are expanded upon in our recent paper,“On Formalizing Theoretical Expectations: Bayesian Testing of Central Structures in Pyschological Networks”, where we merge exploratory and confirmatory hypotheses into a cohesive framework based on Bayesian hypothesis testing. You find a preprint here.
In what follows, I will describe how you can use GGMs in R to perform confirmatory hypothesis tests bases on an initial, exploratoy hypotheses. For clarity, some code chunks have been omitted, but the full code to reproduce this document is available on the OSF or GitHub.
To begin we need several packages:
BGGM: to conduct exploratory and confirmatory analyses with GGMsMASS: to generate data from covariance matricesnetworktools: to calculate bridge centrality statisticsggnetworks: used for plottingI also load a function make_adj_plot stored in 03-plot-adj.R. This function is used to produce plots of the conditional (in)dependence matrices (Panels A & B in Figures 3 and 4 of the paper).
# Uncomment and run if missing packages
# for latest version of BGGM
# remotes::install_github('donaldrwilliams/BGGM')
# packages <- c("BGGM", "MASS", "networktools")
# if (!packages %in% installed.packages()) install.packages(packages)
source("01-plot-adj.R")
library(MASS)
library(networktools)
library(BGGM)
The first dataset contains measurements on 16 PTSD symptoms in 3 communities, “Re-experiencing”, “Avoidance”, and “Arousal”. Only the covariance matrices are available so we have to generate the data using MASS. The data,ptsd_cor4, is loaded when we load BGGM.
set.seed(1812)
# data for exploratory analyses
explore_ptsd <- mvrnorm(n = 965, mu = rep(0, 16), Sigma = ptsd_cor4, empirical = TRUE)
colnames(explore_ptsd) <- node_names
head(explore_ptsd)
## B1 B2 B3 B4 C1 C2
## [1,] 1.8185234 1.28075149 0.9164065 -1.3056992 -1.19542346 -0.6232488
## [2,] -0.2767064 -0.52201830 -0.3306419 -0.5563122 -1.47678987 -0.5883824
## [3,] 1.2774957 1.90522822 1.8019264 -1.4164023 2.14139220 1.7681356
## [4,] 0.8043783 0.84727056 0.2442545 -0.6187949 0.24483924 0.1196831
## [5,] -1.0790033 -1.51475008 -0.8794434 -1.5693234 -0.02626149 1.2488179
## [6,] -0.6226078 0.02695523 0.2391587 0.1796465 1.91942409 1.9599440
## C3 C4 C5 C6 C7 D1
## [1,] -1.0599306 0.5786074 0.7531740 0.7698083 0.756293787 0.8887558
## [2,] 0.1022892 -0.6554164 -0.8095467 0.3519246 -0.438868679 0.3497864
## [3,] 0.3968195 -1.1117893 -1.5369231 0.5061597 -0.026363691 0.5172878
## [4,] -1.3502039 1.2872828 0.4037868 -0.2491910 0.083697359 0.1484805
## [5,] -1.1881348 -0.5438269 1.3043925 1.2176694 -1.137796837 -0.4760302
## [6,] 1.7871930 0.5880438 0.4392643 0.3064342 0.003660419 0.4467992
## D2 D3 D4 D5
## [1,] 0.07958799 2.3973209 -0.8417986 2.64646948
## [2,] 1.26852337 -0.2419411 -0.8409757 0.02310026
## [3,] -0.02076598 1.1509737 0.2544366 -0.18049763
## [4,] -0.53254994 -0.4588925 0.3632865 -0.59815200
## [5,] 1.66327280 -0.3150559 -0.5044286 -0.68593216
## [6,] 1.17169908 0.3436593 -1.5970968 0.97243812
Our approach begins by estimating an exploratory network. With BGGM, this requires calling the explore function to obtain and sample the posterior distribution. These results are saved in explore_network. The select function takes the results from an explore call, and is used to determine the edge set for \(\mathbf{A}^{CD}\) and \(\mathbf{A}^{CI}\) — the conditional (in)dependence structures.
Note that we formally incorporate a one-sided hypothesis test. \(\mathcal{H}_0: \rho_{ij} = 0\) versus \(\mathcal{H}_1: \rho_{ij} > 0\) by setting alternative = "greater" to determine the edge set. This formally includes the expectation of a “positive manifold”, or that the edges should all be positive.
# sample posterior distribution
explore_network <- explore(explore_ptsd, chains = 4, cores = 4)
# determine edge set
selected_network <- select(explore_network, alternative = "greater", bf_cut = 3)
We can visualize the conditional (in)dependence structures by simply calling plot. By default, this returns three plots, but we focus on the first two.
plots <-
plot(selected_network,
node_labels = node_names, # vector storing node names
communities = comms, # vector of community name for each node
node_size = 12)
plots$plt_alt +
ggtitle("Conditional Dependence Structure")
plots$plt_null +
ggtitle("Conditional Independence Structure")
We then proceed to calculate bridge strength using the networktools package. This is similar to node strength, in that, for a given node, it is the sum of the absolute values of its edges. However, bridge strength only takes into account edges that connect a node to different communities, or clusters. Thus, it is a measure of inter-community connectivity, and we use it here to identify central structures in a network.
# extract partial correlations from selected network
partial_cors <- selected_network$pcor_mat_zero
# rename columns with node names
colnames(partial_cors) <- node_names
# calculate bridge strength. comms is a vector specifying
# the community for each node
bridge_strengths <- bridge(partial_cors, communities = comms)$`Bridge Strength`
# we use the top 10% in bridge strength as bridge nodes
bridge_strength_cutoff <- quantile(bridge_strengths, 0.9)
bridge_strengths[bridge_strengths > bridge_strength_cutoff]
## B4 D1
## 0.532 0.657
Calculating bridge strength indicates that nodes B4 and D1 are the top bridge nodes.
A key idea in our paper was that highlighting and “zooming” in on central structures allows researchers to easily formulate hypotheses.
We can highlight the the bridge nodes by using make_adj_plot.
plot_cd <-
make_adj_plot(selected_network$Adj_20, # conditional dependence structure
node_labels = node_names, # vector storing node names
communities = comms,
bridges = c("D1", "B4"), # name of nodes to highlights
fade = c("D2", "D5", "B3", # name of nodes to fade
"C3", "C4", "C5"),
node_size = 12,
type = "cd") + # type = "cd" for conditional dependence. "ci" for conditional
ggthemes::scale_fill_few() +
theme_facet() +
theme(legend.position = "top",
legend.text = element_text(size = 14),
legend.title = element_blank())
plot_cd
We can use the same function to plot the conditional independence structures in panel B of both Figures 3 and 4.
plot_ci <-
make_adj_plot(selected_network$Adj_02,
node_labels = node_names,
communities = comms,
bridges = paste0("D", 1),
fade = NULL,
type = "ci", # type = "ci" for conditional independence
node_size = 12) +
ggthemes::scale_fill_few() +
guides(fill = guide_legend(override.aes = list(size=10))) +
theme_facet() +
theme(legend.position = "top",
legend.box.background = element_blank(),
legend.background = element_blank(),
legend.title = element_blank())
plot_ci
We have not found an easy way to “zoom in” on the neighborhood of bridge edges for the bridge nodes (bottom panels in Figures 3 and 4), but as an example, this is code we used to the “bridge neighborhood” for node D1.
coord_d1 <-
data.frame(
x = c(1, 0, 1, 2, 3, 4, 3),
y = c(1, 2, 3, 2, 1, 2, 3),
comm = c("3", "3", "3", "1", "2", "2", "2"),
node = c("B1", "B4", "B2", "D1", "C6", "C1", "C2")
)
plot_d1 <-
ggplot(coord_d1, aes(x, y)) +
# bridge cors
annotate(geom = "segment",
x = rep(2, 4),
xend = c(1, 1, 3, 3),
y = rep(2, 4),
yend = c(1, 3, 1, 3),
linetype = "dashed",
size = 1) +
# independent
annotate(geom = "segment",
x = c(2, 2),
xend = c(0, 4),
y = c(2, 2),
yend = c(2, 2),
linetype = "dotted",
col = "gray70") +
# B nodes
annotate(geom = "segment",
x = c(1, 1, 0),
xend = c(0, 1, 1),
y = c(1, 1, 2),
yend = c(2, 3, 3),
linetype = c(3, 1, 1),
col = "gray70") +
# C nodes
annotate(geom = "segment",
x = c(3, 3, 4),
xend = c(4, 3, 3),
y = c(1, 1, 2),
yend = c(2, 3, 3),
linetype = c(1, 3, 3),
col = "gray70") +
# bridge cor labels
annotate(geom = "label",
label = c(0.11, 0.31, 0.08, 0.15),
x = c(1.5, 1.5, 2.5, 2.5),
y = c(1.5, 2.5, 1.5, 2.5)) +
geom_point(size = 21) +
geom_point(aes(fill = comm),
size = 20,
shape = 21,
show.legend = FALSE) +
geom_text(aes(label = node)) +
lims(x = c(-0.1, 4.1), y = c(0.9, 3.1)) +
ggthemes::scale_fill_few() +
theme_facet()
plot_d1
For completeness, here I have included the neighborhood for node B4.
With the central structures identified and plotted, we can move on to formulating and testing hypotheses.
Note that we generate another dataset here so that we test our hypotheses on different data than our exploratory analysis.
set.seed(1)
# data for confirmatory analyses
confirm_ptsd <- mvrnorm(n = 926, mu = rep(0, 16), Sigma = ptsd_cor3, empirical = TRUE)
colnames(confirm_ptsd) <- node_names
We first focus on node B4 and test the following hypotheses \[\begin{align} \mathcal{H}_1&: (\rho_{B4-C1}, \rho_{B4-C7}, \rho_{B4-D3}, \rho_{B4-D4}) > 0 \\ \nonumber \mathcal{H}_2&: \rho_{B4-C1} > (\rho_{B4-C7}, \rho_{B4-D3}, \rho_{B4-D4}) > 0 \\ \nonumber \mathcal{H}_3 &: ``\text{not}\; \mathcal{H}_1 \; \text{or}\; \mathcal{H}_2 \text{.''} \end{align}\]
Above, \(\mathcal{H}_1\) is testing for replication of all edges but is otherwise agnostic towards the interplay among bridge relations. \(\mathcal{H}_2\) then provides a refined view into the bridge neighborhood by testing an additional constraint that the strongest edge replicated. That is, all of the bridge relations the strongest edge re-emerged in an independent dataset. Furthermore, \(\mathcal{H}_1\) and \(\mathcal{H}_2\) both reflect a positive manifold. We also included \(\mathcal{H}_3\) which accounts for structures that are not \(\mathcal{H}_1\) or \(\mathcal{H}_2\).
To test these hypotheses, we can formulate them in a single string and use the confirm function. Note that hypotheses are separated by a semicolon, and that partial correlations are denoted as node1 -- node2. The output can be obtained by simply printing out the resuls of confirm.
hyp_var_rep <- c("(B4--C1, B4--C7, B4--D3, B4--D4) > 0;
B4--C1 > (B4--C7, B4--D3, B4--D4) > 0")
confirm_var_rep <- confirm(confirm_ptsd,
hyp_var_rep,
iter = 50000)
confirm_var_rep
## BGGM: Bayesian Gaussian Graphical Models
## Type: continuous
## ---
## Posterior Samples: 50000
## Observations (n): 926
## Variables (p): 16
## Delta: 15
## ---
## Call:
## confirm(Y = confirm_ptsd, hypothesis = hyp_var_rep, iter = 50000)
## ---
## Hypotheses:
##
## H1: (B4--C1,B4--C7,B4--D3,B4--D4)>0
## H2:
## B4--C1>(B4--C7,B4--D3,B4--D4)>0
## H3: complement
## ---
## Posterior prob:
##
## p(H1|data) = 0.197
## p(H2|data) = 0.797
## p(H3|data) = 0.006
## ---
## Bayes factor matrix:
## H1 H2 H3
## H1 1.000 0.247 33.401
## H2 4.054 1.000 135.404
## H3 0.030 0.007 1.000
## ---
## note: equal hypothesis prior probabilities
The output includes both the posterior probabilities and all the Bayes factors. The Bayes factors are in reference to the rows relative to the columns. For example the element in the 2nd row and 1st column would be interpreted as BF\(_{21} = 4.05\)
In this case, \(\mathcal{H}_2\) is the preferred hypothesis, that is, all of the bridge edges and the strongest edge replicated. This gets at an important notion. It is possible to test varying degrees of replication.
We also examined a comorbity network containing 16 symptoms of anxiety and depression.
# libraries
set.seed(27)
cov_anxdep <- read.csv("../05-data/00-cov-anxdep.csv")[, -1]
sim_anxdep <- MASS::mvrnorm(n = 1029, mu = rep(0, 16), Sigma = cov_anxdep, empirical = TRUE)
col_names <-
sub(pattern = "[A-Z].",
replacement = "",
x = colnames(cov_anxdep))
col_names <- tolower(col_names)
colnames(sim_anxdep) <- col_names
node_names <-
c(
paste0("D", 1:9),
paste0("A", 1:7)
)
comms <- c(
rep("depression", 9),
rep("anxiety", 7)
)
pal <- ggthemes::colorblind_pal()(8)[c(4, 6)]
split <- sample(1:1029, size = floor(1029 * .5))
explore_anxdep <- sim_anxdep[split, ]
explore_network <- explore(explore_anxdep, chains = 4, cores = 4)
selected_network <- BGGM::select(explore_network, alternative = "greater", BF_cut = 3)
pcors_exp <- round(selected_network$pcor_mat_zero, 4)
dimnames(pcors_exp) <- list(node_names, node_names)
bridge_strengths <- sort(bridge(pcors_exp, communities = comms)$`Bridge Strength`, decreasing = TRUE)
bridge_strength_cutoff <- quantile(bridge_strengths, 0.9)
bridge_strengths[bridge_strengths > bridge_strength_cutoff]
## D8 D6
## 0.404 0.270
| D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | A1 | A2 | A3 | A4 | A5 | A6 | A7 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| D1 | 0 | 0.412 | 0 | 0.184 | 0.121 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| D2 | 0.412 | 0 | 0 | 0.188 | 0 | 0.325 | 0 | 0 | 0.146 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| D3 | 0 | 0 | 0 | 0.207 | 0 | 0 | 0.102 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| D4 | 0.184 | 0.188 | 0.207 | 0 | 0.241 | 0 | 0.165 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| D5 | 0.121 | 0 | 0 | 0.241 | 0 | 0 | 0.108 | 0.107 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| D6 | 0 | 0.325 | 0 | 0 | 0 | 0 | 0 | 0 | 0.189 | 0 | 0 | 0.17 | 0 | 0 | 0.1 | 0 |
| D7 | 0 | 0 | 0.102 | 0.165 | 0.108 | 0 | 0 | 0.243 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| D8 | 0 | 0 | 0 | 0 | 0.107 | 0 | 0.243 | 0 | 0 | 0 | 0 | 0 | 0 | 0.204 | 0 | 0.2 |
| D9 | 0 | 0.146 | 0 | 0 | 0 | 0.189 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| A1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.208 | 0.183 | 0.212 | 0 | 0 | 0 |
| A2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.208 | 0 | 0.578 | 0.132 | 0 | 0 | 0.108 |
| A3 | 0 | 0 | 0 | 0 | 0 | 0.17 | 0 | 0 | 0 | 0.183 | 0.578 | 0 | 0.181 | 0 | 0 | 0 |
| A4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.212 | 0.132 | 0.181 | 0 | 0.262 | 0 | 0.108 |
| A5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.204 | 0 | 0 | 0 | 0 | 0.262 | 0 | 0.164 | 0 |
| A6 | 0 | 0 | 0 | 0 | 0 | 0.1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.164 | 0 | 0.206 |
| A7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.2 | 0 | 0 | 0.108 | 0 | 0.108 | 0 | 0.206 | 0 |
confirm_anxdep <- sim_anxdep[-split, ]
colnames(confirm_anxdep) <- node_names
\[\begin{align} \label{eq:intra-inter} \mathcal{H}_1 &: \rho_{D8-A5} = \rho_{D8-A7} > (\rho_{D6-A3}, \rho_{D6-A6}) > 0 \\ \nonumber \mathcal{H}_2 &: \rho_{D8-A5} > \rho_{D8-A7} > \rho_{D6-A3} > \rho_{D6-A6} > 0 \\ \nonumber \mathcal{H}_3 &: ``\text{not}\; \mathcal{H}_1 \; \text{or}\; \mathcal{H}_2 \text{.''} \end{align}\]
intra_inter_hyp <- c("D8--A5 = D8--A7 > (D6--A3, D6--A6) > 0;
D8--A5 > D8--A7 > D6--A3 > D6--A6 > 0")
confirm_intra_inter <- confirm(Y = confirm_anxdep,
hypothesis = intra_inter_hyp,
iter = 50000)
confirm_intra_inter
## BGGM: Bayesian Gaussian Graphical Models
## Type: continuous
## ---
## Posterior Samples: 50000
## Observations (n): 515
## Variables (p): 16
## Delta: 15
## ---
## Call:
## confirm(Y = confirm_anxdep, hypothesis = intra_inter_hyp, iter = 50000)
## ---
## Hypotheses:
##
## H1: D8--A5=D8--A7>(D6--A3,D6--A6)>0
## H2:
## D8--A5>D8--A7>D6--A3>D6--A6>0
## H3: complement
## ---
## Posterior prob:
##
## p(H1|data) = 0.037
## p(H2|data) = 0.954
## p(H3|data) = 0.009
## ---
## Bayes factor matrix:
## H1 H2 H3
## H1 1.000 0.039 4.251
## H2 25.905 1.000 110.115
## H3 0.235 0.009 1.000
## ---
## note: equal hypothesis prior probabilities
\[\begin{align} \mathcal{H}_1 &: (\rho_{D8-A3}, \rho_{D8-A6}) = 0 \\ \nonumber \mathcal{H}_2 &: (\rho_{D8-A3}, \rho_{D8-A6}) > 0 \\ \nonumber \mathcal{H}_3 &: ``\text{not}\; \mathcal{H}_1 \; \text{or} \; \mathcal{H}_2 \text{.''} \end{align}\]