Introduction

This tutorial is demonstrates how to use networks (specifically Gaussian graphical models or GGMs) to

  1. Generate hypotheses
  2. Perform confirmatory tests on the generated hypotheses

In psychology, network models are almost never used to generate hypotheses. This is puzzling because one of the original reasons researchers began using them was exactly this. Networks are also thought of highly exploratory tools, despite their potential for confirmatory tests. However, there is plenty of room for networks in this respect. These ideas are expanded upon in our recent paper,“On Formalizing Theoretical Expectations: Bayesian Testing of Central Structures in Pyschological Networks”, where we merge exploratory and confirmatory hypotheses into a cohesive framework based on Bayesian hypothesis testing. You find a preprint here.

In what follows, I will describe how you can use GGMs in R to perform confirmatory hypothesis tests bases on an initial, exploratoy hypotheses. For clarity, some code chunks have been omitted, but the full code to reproduce this document is available on the OSF or GitHub.

Examples

PTSD Network

To begin we need several packages:

  • BGGM: to conduct exploratory and confirmatory analyses with GGMs
  • MASS: to generate data from covariance matrices
  • networktools: to calculate bridge centrality statistics
  • ggnetworks: used for plotting

I also load a function make_adj_plot stored in 03-plot-adj.R. This function is used to produce plots of the conditional (in)dependence matrices (Panels A & B in Figures 3 and 4 of the paper).

# Uncomment and run  if missing packages

# for latest version of BGGM
# remotes::install_github('donaldrwilliams/BGGM') 

# packages <- c("BGGM", "MASS", "networktools")
# if (!packages %in% installed.packages()) install.packages(packages)
source("01-plot-adj.R")
library(MASS)
library(networktools)
library(BGGM)

The first dataset contains measurements on 16 PTSD symptoms in 3 communities, “Re-experiencing”, “Avoidance”, and “Arousal”. Only the covariance matrices are available so we have to generate the data using MASS. The data,ptsd_cor4, is loaded when we load BGGM.

set.seed(1812)
# data for exploratory analyses
explore_ptsd <- mvrnorm(n = 965, mu = rep(0, 16), Sigma = ptsd_cor4, empirical = TRUE)
colnames(explore_ptsd) <- node_names
head(explore_ptsd)
##              B1          B2         B3         B4          C1         C2
## [1,]  1.8185234  1.28075149  0.9164065 -1.3056992 -1.19542346 -0.6232488
## [2,] -0.2767064 -0.52201830 -0.3306419 -0.5563122 -1.47678987 -0.5883824
## [3,]  1.2774957  1.90522822  1.8019264 -1.4164023  2.14139220  1.7681356
## [4,]  0.8043783  0.84727056  0.2442545 -0.6187949  0.24483924  0.1196831
## [5,] -1.0790033 -1.51475008 -0.8794434 -1.5693234 -0.02626149  1.2488179
## [6,] -0.6226078  0.02695523  0.2391587  0.1796465  1.91942409  1.9599440
##              C3         C4         C5         C6           C7         D1
## [1,] -1.0599306  0.5786074  0.7531740  0.7698083  0.756293787  0.8887558
## [2,]  0.1022892 -0.6554164 -0.8095467  0.3519246 -0.438868679  0.3497864
## [3,]  0.3968195 -1.1117893 -1.5369231  0.5061597 -0.026363691  0.5172878
## [4,] -1.3502039  1.2872828  0.4037868 -0.2491910  0.083697359  0.1484805
## [5,] -1.1881348 -0.5438269  1.3043925  1.2176694 -1.137796837 -0.4760302
## [6,]  1.7871930  0.5880438  0.4392643  0.3064342  0.003660419  0.4467992
##               D2         D3         D4          D5
## [1,]  0.07958799  2.3973209 -0.8417986  2.64646948
## [2,]  1.26852337 -0.2419411 -0.8409757  0.02310026
## [3,] -0.02076598  1.1509737  0.2544366 -0.18049763
## [4,] -0.53254994 -0.4588925  0.3632865 -0.59815200
## [5,]  1.66327280 -0.3150559 -0.5044286 -0.68593216
## [6,]  1.17169908  0.3436593 -1.5970968  0.97243812

Exploratory analysis

Estimate graph

Our approach begins by estimating an exploratory network. With BGGM, this requires calling the explore function to obtain and sample the posterior distribution. These results are saved in explore_network. The select function takes the results from an explore call, and is used to determine the edge set for \(\mathbf{A}^{CD}\) and \(\mathbf{A}^{CI}\) — the conditional (in)dependence structures.

Note that we formally incorporate a one-sided hypothesis test. \(\mathcal{H}_0: \rho_{ij} = 0\) versus \(\mathcal{H}_1: \rho_{ij} > 0\) by setting alternative = "greater" to determine the edge set. This formally includes the expectation of a “positive manifold”, or that the edges should all be positive.

# sample posterior distribution
explore_network <- explore(explore_ptsd, chains = 4, cores = 4)

# determine edge set
selected_network <- select(explore_network, alternative = "greater", bf_cut = 3)

We can visualize the conditional (in)dependence structures by simply calling plot. By default, this returns three plots, but we focus on the first two.

plots <- 
  plot(selected_network,
     node_labels = node_names, # vector storing node names
     communities = comms, # vector of community name for each node
     node_size = 12)

plots$plt_alt +
  ggtitle("Conditional Dependence Structure")

plots$plt_null +
  ggtitle("Conditional Independence Structure")

Bridge Centrality

We then proceed to calculate bridge strength using the networktools package. This is similar to node strength, in that, for a given node, it is the sum of the absolute values of its edges. However, bridge strength only takes into account edges that connect a node to different communities, or clusters. Thus, it is a measure of inter-community connectivity, and we use it here to identify central structures in a network.

# extract partial correlations from selected network
partial_cors <- selected_network$pcor_mat_zero

# rename columns with node names
colnames(partial_cors) <- node_names


# calculate bridge strength. comms is a vector specifying 
# the community for each node
bridge_strengths <- bridge(partial_cors, communities = comms)$`Bridge Strength`

# we use the top 10% in bridge strength as bridge nodes
bridge_strength_cutoff <- quantile(bridge_strengths, 0.9)

bridge_strengths[bridge_strengths > bridge_strength_cutoff]
##    B4    D1 
## 0.532 0.657

Calculating bridge strength indicates that nodes B4 and D1 are the top bridge nodes.

Plot bridges

A key idea in our paper was that highlighting and “zooming” in on central structures allows researchers to easily formulate hypotheses.

We can highlight the the bridge nodes by using make_adj_plot.

plot_cd <- 
  make_adj_plot(selected_network$Adj_20, # conditional dependence structure
              node_labels = node_names, # vector storing node names
              communities = comms,
              bridges = c("D1", "B4"), # name of nodes to highlights
              fade = c("D2", "D5", "B3", # name of nodes to fade
                       "C3", "C4", "C5"),
              node_size = 12,
              type = "cd")  + # type = "cd" for conditional dependence. "ci" for conditional
  ggthemes::scale_fill_few() +
  theme_facet() +
  theme(legend.position = "top",
        legend.text = element_text(size = 14),
        legend.title = element_blank())

plot_cd

We can use the same function to plot the conditional independence structures in panel B of both Figures 3 and 4.

plot_ci <-
  make_adj_plot(selected_network$Adj_02,
              node_labels = node_names,
              communities = comms,
              bridges = paste0("D", 1),
              fade = NULL,
              type = "ci", # type = "ci" for conditional independence
              node_size = 12) +
  ggthemes::scale_fill_few() +
  guides(fill = guide_legend(override.aes = list(size=10))) +
  theme_facet() +
  theme(legend.position = "top",
        legend.box.background = element_blank(),
        legend.background = element_blank(),
        legend.title = element_blank())
plot_ci

We have not found an easy way to “zoom in” on the neighborhood of bridge edges for the bridge nodes (bottom panels in Figures 3 and 4), but as an example, this is code we used to the “bridge neighborhood” for node D1.

coord_d1 <- 
  data.frame(
    x = c(1, 0, 1, 2, 3, 4, 3),
    y = c(1, 2, 3, 2, 1, 2, 3),
    comm = c("3",  "3",  "3",  "1",  "2",  "2",  "2"),
    node = c("B1", "B4", "B2", "D1", "C6", "C1", "C2")
    )
    

plot_d1 <-
  ggplot(coord_d1, aes(x, y)) +
  # bridge cors
  annotate(geom = "segment",
           x = rep(2, 4),
           xend = c(1, 1, 3, 3),
           y = rep(2, 4),
           yend = c(1, 3, 1, 3),
           linetype = "dashed",
           size = 1) +
  # independent
  annotate(geom = "segment",
           x = c(2, 2),
           xend = c(0, 4),
           y = c(2, 2),
           yend = c(2, 2),
           linetype = "dotted",
           col = "gray70") +
  # B nodes
  annotate(geom = "segment",
           x = c(1, 1, 0),
           xend = c(0, 1, 1),
           y = c(1, 1, 2),
           yend = c(2, 3, 3),
           linetype = c(3, 1, 1),
           col = "gray70") +
  # C nodes
  annotate(geom = "segment",
           x = c(3, 3, 4),
           xend = c(4, 3, 3),
           y = c(1, 1, 2),
           yend = c(2, 3, 3),
           linetype = c(1, 3, 3),
           col = "gray70") +
  # bridge cor labels
  annotate(geom = "label",
           label = c(0.11, 0.31, 0.08, 0.15),
           x = c(1.5, 1.5, 2.5, 2.5),
           y = c(1.5, 2.5, 1.5, 2.5)) +
  geom_point(size = 21) +
  geom_point(aes(fill = comm), 
             size = 20, 
             shape = 21,
             show.legend = FALSE) +
  geom_text(aes(label = node)) +
  lims(x = c(-0.1, 4.1), y = c(0.9, 3.1))  +
  ggthemes::scale_fill_few() +
  theme_facet()

plot_d1

For completeness, here I have included the neighborhood for node B4.

Confirmatory analysis

With the central structures identified and plotted, we can move on to formulating and testing hypotheses.

Note that we generate another dataset here so that we test our hypotheses on different data than our exploratory analysis.

set.seed(1)
# data for confirmatory analyses
confirm_ptsd <- mvrnorm(n = 926, mu = rep(0, 16), Sigma = ptsd_cor3, empirical = TRUE)
colnames(confirm_ptsd) <- node_names

Varying degrees of replication

We first focus on node B4 and test the following hypotheses \[\begin{align} \mathcal{H}_1&: (\rho_{B4-C1}, \rho_{B4-C7}, \rho_{B4-D3}, \rho_{B4-D4}) > 0 \\ \nonumber \mathcal{H}_2&: \rho_{B4-C1} > (\rho_{B4-C7}, \rho_{B4-D3}, \rho_{B4-D4}) > 0 \\ \nonumber \mathcal{H}_3 &: ``\text{not}\; \mathcal{H}_1 \; \text{or}\; \mathcal{H}_2 \text{.''} \end{align}\]

Above, \(\mathcal{H}_1\) is testing for replication of all edges but is otherwise agnostic towards the interplay among bridge relations. \(\mathcal{H}_2\) then provides a refined view into the bridge neighborhood by testing an additional constraint that the strongest edge replicated. That is, all of the bridge relations the strongest edge re-emerged in an independent dataset. Furthermore, \(\mathcal{H}_1\) and \(\mathcal{H}_2\) both reflect a positive manifold. We also included \(\mathcal{H}_3\) which accounts for structures that are not \(\mathcal{H}_1\) or \(\mathcal{H}_2\).

To test these hypotheses, we can formulate them in a single string and use the confirm function. Note that hypotheses are separated by a semicolon, and that partial correlations are denoted as node1 -- node2. The output can be obtained by simply printing out the resuls of confirm.

hyp_var_rep <- c("(B4--C1, B4--C7, B4--D3, B4--D4) > 0;
                   B4--C1 > (B4--C7, B4--D3, B4--D4) > 0")

confirm_var_rep <- confirm(confirm_ptsd,
                           hyp_var_rep,
                           iter = 50000)
confirm_var_rep
## BGGM: Bayesian Gaussian Graphical Models 
## Type: continuous 
## --- 
## Posterior Samples: 50000 
## Observations (n): 926 
## Variables (p): 16 
## Delta: 15 
## --- 
## Call:
## confirm(Y = confirm_ptsd, hypothesis = hyp_var_rep, iter = 50000)
## --- 
## Hypotheses: 
## 
## H1: (B4--C1,B4--C7,B4--D3,B4--D4)>0
## H2: 
## B4--C1>(B4--C7,B4--D3,B4--D4)>0
## H3: complement
## --- 
## Posterior prob: 
## 
## p(H1|data) = 0.197
## p(H2|data) = 0.797
## p(H3|data) = 0.006
## --- 
## Bayes factor matrix: 
##       H1    H2      H3
## H1 1.000 0.247  33.401
## H2 4.054 1.000 135.404
## H3 0.030 0.007   1.000
## --- 
## note: equal hypothesis prior probabilities

The output includes both the posterior probabilities and all the Bayes factors. The Bayes factors are in reference to the rows relative to the columns. For example the element in the 2nd row and 1st column would be interpreted as BF\(_{21} = 4.05\)

In this case, \(\mathcal{H}_2\) is the preferred hypothesis, that is, all of the bridge edges and the strongest edge replicated. This gets at an important notion. It is possible to test varying degrees of replication.

Comorbidity Network

We also examined a comorbity network containing 16 symptoms of anxiety and depression.

Exploratory analysis

# libraries
set.seed(27)

cov_anxdep <- read.csv("../05-data/00-cov-anxdep.csv")[, -1]

sim_anxdep <- MASS::mvrnorm(n = 1029, mu = rep(0, 16), Sigma = cov_anxdep, empirical = TRUE)

col_names <- 
  sub(pattern = "[A-Z].",
      replacement = "",
      x = colnames(cov_anxdep)) 

col_names <- tolower(col_names)

colnames(sim_anxdep) <- col_names
node_names <- 
  c(
    paste0("D", 1:9),
    paste0("A", 1:7)
  )

comms <- c(
  rep("depression", 9),
  rep("anxiety", 7)
)


pal <-  ggthemes::colorblind_pal()(8)[c(4, 6)]

split <- sample(1:1029, size = floor(1029 * .5))

explore_anxdep <- sim_anxdep[split, ]
explore_network <- explore(explore_anxdep, chains = 4, cores = 4)
selected_network <- BGGM::select(explore_network, alternative = "greater", BF_cut = 3)
pcors_exp <- round(selected_network$pcor_mat_zero, 4)
dimnames(pcors_exp) <- list(node_names, node_names)

Bridge Strength

bridge_strengths <- sort(bridge(pcors_exp, communities = comms)$`Bridge Strength`, decreasing = TRUE) 
bridge_strength_cutoff <- quantile(bridge_strengths, 0.9)
bridge_strengths[bridge_strengths > bridge_strength_cutoff]
##    D8    D6 
## 0.404 0.270
D1 D2 D3 D4 D5 D6 D7 D8 D9 A1 A2 A3 A4 A5 A6 A7
D1 0 0.412 0 0.184 0.121 0 0 0 0 0 0 0 0 0 0 0
D2 0.412 0 0 0.188 0 0.325 0 0 0.146 0 0 0 0 0 0 0
D3 0 0 0 0.207 0 0 0.102 0 0 0 0 0 0 0 0 0
D4 0.184 0.188 0.207 0 0.241 0 0.165 0 0 0 0 0 0 0 0 0
D5 0.121 0 0 0.241 0 0 0.108 0.107 0 0 0 0 0 0 0 0
D6 0 0.325 0 0 0 0 0 0 0.189 0 0 0.17 0 0 0.1 0
D7 0 0 0.102 0.165 0.108 0 0 0.243 0 0 0 0 0 0 0 0
D8 0 0 0 0 0.107 0 0.243 0 0 0 0 0 0 0.204 0 0.2
D9 0 0.146 0 0 0 0.189 0 0 0 0 0 0 0 0 0 0
A1 0 0 0 0 0 0 0 0 0 0 0.208 0.183 0.212 0 0 0
A2 0 0 0 0 0 0 0 0 0 0.208 0 0.578 0.132 0 0 0.108
A3 0 0 0 0 0 0.17 0 0 0 0.183 0.578 0 0.181 0 0 0
A4 0 0 0 0 0 0 0 0 0 0.212 0.132 0.181 0 0.262 0 0.108
A5 0 0 0 0 0 0 0 0.204 0 0 0 0 0.262 0 0.164 0
A6 0 0 0 0 0 0.1 0 0 0 0 0 0 0 0.164 0 0.206
A7 0 0 0 0 0 0 0 0.2 0 0 0.108 0 0.108 0 0.206 0

Confirmatory analysis

confirm_anxdep <- sim_anxdep[-split, ]
colnames(confirm_anxdep) <-  node_names

Intra- and Inter-Bridge Sets

\[\begin{align} \label{eq:intra-inter} \mathcal{H}_1 &: \rho_{D8-A5} = \rho_{D8-A7} > (\rho_{D6-A3}, \rho_{D6-A6}) > 0 \\ \nonumber \mathcal{H}_2 &: \rho_{D8-A5} > \rho_{D8-A7} > \rho_{D6-A3} > \rho_{D6-A6} > 0 \\ \nonumber \mathcal{H}_3 &: ``\text{not}\; \mathcal{H}_1 \; \text{or}\; \mathcal{H}_2 \text{.''} \end{align}\]

intra_inter_hyp <- c("D8--A5 = D8--A7 > (D6--A3, D6--A6) > 0;
                      D8--A5 > D8--A7 > D6--A3 > D6--A6 > 0")

confirm_intra_inter <- confirm(Y = confirm_anxdep,
                      hypothesis = intra_inter_hyp,
                      iter = 50000)
confirm_intra_inter 
## BGGM: Bayesian Gaussian Graphical Models 
## Type: continuous 
## --- 
## Posterior Samples: 50000 
## Observations (n): 515 
## Variables (p): 16 
## Delta: 15 
## --- 
## Call:
## confirm(Y = confirm_anxdep, hypothesis = intra_inter_hyp, iter = 50000)
## --- 
## Hypotheses: 
## 
## H1: D8--A5=D8--A7>(D6--A3,D6--A6)>0
## H2: 
## D8--A5>D8--A7>D6--A3>D6--A6>0
## H3: complement
## --- 
## Posterior prob: 
## 
## p(H1|data) = 0.037
## p(H2|data) = 0.954
## p(H3|data) = 0.009
## --- 
## Bayes factor matrix: 
##        H1    H2      H3
## H1  1.000 0.039   4.251
## H2 25.905 1.000 110.115
## H3  0.235 0.009   1.000
## --- 
## note: equal hypothesis prior probabilities

Bridge Set Separation

\[\begin{align} \mathcal{H}_1 &: (\rho_{D8-A3}, \rho_{D8-A6}) = 0 \\ \nonumber \mathcal{H}_2 &: (\rho_{D8-A3}, \rho_{D8-A6}) > 0 \\ \nonumber \mathcal{H}_3 &: ``\text{not}\; \mathcal{H}_1 \; \text{or} \; \mathcal{H}_2 \text{.''} \end{align}\]