Data Preparation

First, we need to load the necessary packages and load the data.

library(qgraph)
library(psych)
library(GPArotation)
library(lavaan)
library(psychTools)

data(bfi)

bfi <- bfi[,1:25]

bfi <- na.omit(bfi)

Confirmatory Factor Analysis

Next we run a confirmatory factor analysis on the Big Five Index data using lavaan.

model <- 'N =~ N1 + N2 + N3 + N4 + N5
          E =~ E1 + E2 + E3 + E4 + E5
          O =~ O1 + O2 + O3 + O4 + O5
          A =~ A1 + A2 + A3 + A4 + A5
          C =~ C1 + C2 + C3 + C4 + C5'

cfa_fit <- cfa(model, data = bfi)

# Print the results
summary(cfa_fit)
## lavaan 0.6.15 ended normally after 55 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        60
## 
##   Number of observations                          2436
## 
## Model Test User Model:
##                                                       
##   Test statistic                              4165.467
##   Degrees of freedom                               265
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   N =~                                                
##     N1                1.000                           
##     N2                0.947    0.024   39.899    0.000
##     N3                0.884    0.025   35.919    0.000
##     N4                0.692    0.025   27.753    0.000
##     N5                0.628    0.026   24.027    0.000
##   E =~                                                
##     E1                1.000                           
##     E2                1.226    0.051   23.899    0.000
##     E3               -0.921    0.041  -22.431    0.000
##     E4               -1.121    0.047  -23.977    0.000
##     E5               -0.808    0.039  -20.648    0.000
##   O =~                                                
##     O1                1.000                           
##     O2               -1.020    0.068  -14.962    0.000
##     O3                1.373    0.072   18.942    0.000
##     O4                0.437    0.048    9.160    0.000
##     O5               -0.960    0.060  -16.056    0.000
##   A =~                                                
##     A1                1.000                           
##     A2               -1.579    0.108  -14.650    0.000
##     A3               -2.030    0.134  -15.093    0.000
##     A4               -1.564    0.115  -13.616    0.000
##     A5               -1.804    0.121  -14.852    0.000
##   C =~                                                
##     C1                1.000                           
##     C2                1.148    0.057   20.152    0.000
##     C3                1.036    0.054   19.172    0.000
##     C4               -1.421    0.065  -21.924    0.000
##     C5               -1.489    0.072  -20.694    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   N ~~                                                
##     E                 0.292    0.032    9.131    0.000
##     O                -0.093    0.022   -4.138    0.000
##     A                 0.141    0.018    7.712    0.000
##     C                -0.250    0.025  -10.117    0.000
##   E ~~                                                
##     O                -0.265    0.021  -12.347    0.000
##     A                 0.304    0.025   12.293    0.000
##     C                -0.224    0.020  -11.121    0.000
##   O ~~                                                
##     A                -0.093    0.011   -8.446    0.000
##     C                 0.130    0.014    9.190    0.000
##   A ~~                                                
##     C                -0.110    0.012   -9.254    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .N1                0.793    0.037   21.575    0.000
##    .N2                0.836    0.036   23.458    0.000
##    .N3                1.222    0.043   28.271    0.000
##    .N4                1.654    0.052   31.977    0.000
##    .N5                1.969    0.060   32.889    0.000
##    .E1                1.814    0.058   31.047    0.000
##    .E2                1.332    0.049   26.928    0.000
##    .E3                1.108    0.038   29.522    0.000
##    .E4                1.088    0.041   26.732    0.000
##    .E5                1.251    0.040   31.258    0.000
##    .O1                0.865    0.032   27.216    0.000
##    .O2                1.990    0.063   31.618    0.000
##    .O3                0.691    0.039   17.717    0.000
##    .O4                1.346    0.040   34.036    0.000
##    .O5                1.380    0.045   30.662    0.000
##    .A1                1.745    0.052   33.725    0.000
##    .A2                0.807    0.028   28.396    0.000
##    .A3                0.754    0.032   23.339    0.000
##    .A4                1.632    0.051   31.796    0.000
##    .A5                0.852    0.032   26.800    0.000
##    .C1                1.063    0.035   30.073    0.000
##    .C2                1.130    0.039   28.890    0.000
##    .C3                1.170    0.039   30.194    0.000
##    .C4                0.960    0.040   24.016    0.000
##    .C5                1.640    0.059   27.907    0.000
##     N                 1.689    0.073   23.034    0.000
##     E                 0.846    0.062   13.693    0.000
##     O                 0.404    0.033   12.156    0.000
##     A                 0.234    0.030    7.839    0.000
##     C                 0.463    0.036   12.810    0.000

Now that we have confirmed that a five-factor structure works for this data, we can start using this data to explore network psychometrics! In network psychometrics, it’s usually typical to analyze the individual items that make up a factor rather than the summary variables. But it is still possible to analyze summary variables as components in a network (e.g., depression).

To analyze the data, we will be using the qgraph package in R. The main function is, surprise, qgraph(), and the input it takes is an edge-list or a weights matrix. What are those you ask?

Edge lists

An edge list looks like this:

From To Weight
Var1 Var2 0.6
Var1 Var3 0.4
Var2 Var3 0.5
Var2 Var4 0.7
Var3 Var4 0.3

An edge list calculates the similarity (or dissimilarity) between data points and represents the relationship as pairs of nodes with associated weights. Edge lists are also called similarity or adjacency matrices.

The easy way to do this is by using the igraph package:

similarity_matrix <- cor(bfi)

graph <- igraph::graph_from_adjacency_matrix(similarity_matrix, mode = "undirected", weighted = TRUE, diag = FALSE)

edge_list <- igraph::get.data.frame(graph)

knitr::kable(head(edge_list), format = "simple", align = 'c', full_width = F)
from to weight
A1 A2 -0.3509055
A1 A3 -0.2736361
A1 A4 -0.1567536
A1 A5 -0.1926976
A1 C1 0.0146977
A1 C2 0.0129184

Looking at these results, it should be clear that, given we’re using correlations as our edge weight metric, an edge list is nothing more than a reformatted correlation matrix when creating edges based on correlations. See:

A1 A2 A3
A1 1.0000000 -0.3509055 -0.2736361
A2 -0.3509055 1.0000000 0.5030411
A3 -0.2736361 0.5030411 1.0000000

Weights Tables

A correlation table mirrors the general format for a weights table (also called an association table), which is an n x n matrix where the rows are the origin node and the columns are the destination node.

The values in a weights table can be made using any statistic that indicates the strength of the connection between nodes as long as 0 is used to indicate no connection, and the absolute negative values are similar in strength to the positive values.

If the data in the matrix is symmetrical, it will generate a network plot that is undirected. If the data in the matrix is asymmetrical, the network plot will be directed. Here are some plots illustrating the difference between the two:

0.0 0.2 -0.3 0.1 -0.4
0.2 0.0 0.5 -0.2 0.0
-0.3 0.5 0.0 0.4 -0.1
0.1 -0.2 0.4 0.0 0.2
-0.4 0.0 -0.1 0.2 0.0

Directed and Undirected Graphs

This is an undirected graph; think of the edges of the nodes in these graphs as representing “mutual relationships”. If this was a social network, those nodes with a red edge would both agree they didn’t like each other and agree on the strength of that dislike, whereas those with a blue edge both agree they like each other and how much they like each other. The opacity and size of the line indicate the strength of the relationship, because the properties of these edges vary between nodes, we say that this graph is weighted.

Directed graphs, in contrast, show nodes that are considered to be a source and which are to be considered a destination. Hence the “to” and “from” wording in the edge list (even though this wording applies to edge lists used in both directed and undirected graphs). Below is an example asymmetric weight table and its corresponding directed graph:

0.0 0.5 0.4 0.0 -0.2 -0.1
0.0 0.0 0.3 -0.3 0.0 0.0
0.4 0.3 0.0 0.0 0.0 0.0
0.0 -0.3 0.0 0.0 0.4 0.3
0.0 0.0 0.0 0.4 0.0 0.2
-0.1 0.0 0.0 0.3 0.2 0.0

Notice that in this example most nodes are directed at each other. However, node 1 has more outgoing edges (2,3,5,6) than incoming edges (3,6), and depending on the network we are analyzing, this could indicate that this node is relatively isolated from the rest.

What is important to note that while edge lists and weight matrices are the inputs into qgraph and are used to create the network graph, they only represent the network structure at a basic level and more detailed analyses can still be done.

Network Visualization

Back to our data. Let’s graph it as a network:

bfi_cor <- cor(bfi)

network <- qgraph(bfi_cor, layout = "spring", theme = "colorblind")

Already we can see the beginnings of a very interesting network structure. However, because we are using only correlations, we aren’t gaining much more insight than we would have with only factor analysis. Also, remember how I said a main feature and strength of network psychometrics is its ability to estimate model parameters; looking at correlations hardly does this. Let’s instead estimate our network based on our data. To do this, we will estimate the network using a pairwise Markov random field (PMRF) model.

However, before we get into network estimation, we need to understand a few core concepts.

Pairwise Markov Random Field (PMRF) Models

First, remember that a PMRF model “is a class of undirected network models in which variables are represented by nodes connected by edges that indicate the (strength of) conditional association between two variables after controlling for all other variables in the network” (Network Psychometrics in R, p. 162).

Which brings us to a very cool feature of network estimation!

Conditional Independence and Dependence

Because network psychometrics focuses so much on accurately representing and estimating the relationships between variables, it’s important to factor in that some variables might impact others, or not at all.

When the strength of a relationship between two variables does not depend on the value of any other variable, the variables are said to be conditionally independent, and the inverse occurs when the variables are conditionally dependent.

In general, most networks are considered to have some amount of dependency, otherwise you wouldn’t have much of a network! It’s also the analysis of how the variables interact which is also one of the main reasons for conducting network analysis. Additionally, this is why PMRF models are so popular, since they work well with data that is assumed to have some level of dependency.

Additionally, PMRF models have model sub-classes that are used for specific types of data, like an Ising model for binary data or a Gaussian for continuous data. But for our purposes, we will stick to a basic partial correlation.

To conduct our analysis, let’s use the bootnet package, which is really just a wrapper for other estimation functions, to run a PMRF to estimate the network structure of our measurement theory assessment items.

library(bootnet)

network <- estimateNetwork(bfi, default = "pcor")

plot(network, layout = "spring", groups = groupings, theme = "colorblind")

Now we can start to notice interesting visual things about our network! According to the CFA, variables should cluster together by factor. In this graph, we can see that this is generally the case with the variables of similar factors being linked together more than with variables of other factors. However, there appears to be an interesting clustering around neuroticism items and the third agreeableness item.

Pruning and Thresholding

This generated network looks a bit messy because it shows all the edges. In network analytics this is termed a saturated graph.

In the following code we prune the network to only have significant edges at p = .05. Using the pruning method, all non-significant values are set to 0 in the network and thus impacts analysis of the network data.

Thresholding in contrast would simply remove the non-significant values from the plot, but keep them included in the network and they are used in analysis.

network <- estimateNetwork(bfi, default = "pcor", threshold = "sig", alpha = 0.05)

# Further showing that we "pruned" rather than threshold
knitr::kable(head(network$results[1:3,1:3]),align = 'c', full_width = F)
A1 A2 A3
A1 0.0000000 -0.2406618 -0.1324402
A2 -0.2406618 0.0000000 0.2674442
A3 -0.1324402 0.2674442 0.0000000
plot(network, groups = groupings, layout = "spring", theme = "colorblind", minimum = 0.1)

Here I also set the minimum to .1, so only edges with an absolute value greater than .1 are plotted. The minimum value is sometimes a subjective choice, in this case I think it balances interpretability while maintaining enough information.

Network Metrics (Centrality)

However, reviewers won’t be happy if you just do a visual examination of your network. They want to see the numbers! So let’s get some. Network scientists have developed a few key metrics that we can use here to examine our network. We can obtain some of these metrics using qgraph’s centrality plots:

centralityPlot(network, include = "all", scale = "raw0")
## Warning: Removed 1 rows containing missing values (`geom_point()`).

Note: the scale used in the plot is the raw partial correlation values with the scales anchored at 0.

Strength - Is the sum of the absolute edge weights of all the edges connected to the node. In this graph, the partial correlation values are what are being summed. Strength can be considered a proxy measure of influence or importance of a node.

Closeness - Is the inverse of the average distance of a node to all other nodes in the network. The distance of a node is the shortest path that connects them together. In this case, all the nodes are considered far away from each other and none of them is a central node to the network.

Betweeness - Is a way to measure how often a node is between two other nodes. A node with high betweeness is considered a bridge between other parts of the network. In psychometrics, a variable with high betweeness might indicate that the variable plays a role in connecting clusters, or possibly represents a unique aspect of the construct being measured.

Expected Influence - This metric is similar to strength in that it is the sum of the edge weights extending from a node, but does not take the absolute value of these nodes when summing. Note, while negative influences can exist in a network, in the context of expected influence only the positive impact of the node is considered, therefore the lowest possible value for expected influence is 0. Higher influence indicates how much that node can impacts the other nodes in a network. This can be useful for network psychometrics because we can identify the symptoms that are having the greatest impact on the other nodes in the network.

Caveat: Because our data is being treated as continuous, the qgraph function clusteringPlots() doesn’t work well as it is designed to work with categorical and binary data. Our data is being treated as continuous in this case. Clustering metrics and plots are similar to those of centrality metrics and plots, but with slightly different interpretations, unfortunately we don’t have time today to review those as well.

Bootstrapping

It is common to use bootstrapping methods when estimating network structures from data. Doing so gives better estimates of the “true” edge weights by considering variability between different samples. The primary way this is accomplished is by identifying the confidence intervals around the edge weights.

We can do this with bootnet:

# Warning! This step is fairly computationally expensive! Be sure to lower the number of cores to that which is appropriate for your device.
b1 <- bootnet(network, nboots = 1000, default = "pcor", ncores = 14,threshold = "sig", alpha = 0.05)

You’ll notice that the bootstrapped network is very similar to the one we estimated previously, however this is not usually the case.

Normally, because parameter estimation (e.g., of edge weights) is sensitive to sampling variation, the estimated parameters come with a level of uncertainty which is larger in smaller samples, resulting in bootstrapped networks sometimes looking very different from their estimated networks.

The important thing to remember is that estimated networks may not represent true network structures, and that networks will vary based on sampling.

But we’re here to look at the CI’s for the bootstrapped edge weights.

plot(b1, "edge", labels = FALSE, order = "sample")

To interpret this plot, we want to look for edges with narrow confidence intervals that do not cross the zero line. These represent stable and significant relationships in the network.

In general, most of the edges are 0 (see the long vertical red line and grey areas that cross over 0). And, in my opinion, the number of edges that are distinctly away from 0 matches the ones that we see in the network graph.

It should be noted that bootstrapping generally shouldn’t be used to estimate other network parameters (e.g., betweenness) because many of the centrality metrics use absolute values in their calculations this means 0 is the lowest bound of an estimated parameter range, and 0’s cause a lot of problems when bootstrapping. And we should expect a lot of 0 edges in our network models, especially if there are distinct and distant clusters of variables.

Conclusion

That’s pretty much a brief introduction to network psychometrics!

By now you hopefully have some ideas on how psychological constructs can be represented, estimated, and analyzed using networks. Network psychometrics is a powerful tool to assess the relationships between variables, especially when working in conjunction with other tools like FA and CFA.

References

Isvoranu, Adela-Maria, et al., editors. Network Psychometrics with R: A Guide for Behavioral and Social Scientists. New York, NY, 2022.