## set seed
## for results replication purposes
set.seed(12345)
## set CRAN mirror for downloading packages (if required)
r <- getOption("repos")
r["CRAN"] <-"https://cloud.r-project.org/"
options(repos=r)
## install packages from CRAN
## to install the packages delete the # in front of the following lines
# install.packages("tidyverse") # for data manipulation
# install.packages("DT") # for nice tables that are searchable
# install.packages("bootnet") # for network estimation and tests
# install.packages("qgraph") # for network visualization
# install.packages("NetworkComparisonTest") # for network comparison
# install.packages("psych") # for standard psychological research needs
## activate packages
library(tidyverse)
library(DT)
library(bootnet)
library(qgraph)
library(NetworkComparisonTest)
library(psych)
### round up function --- used later ## do not modify
round_df <- function(df, digits=2) {
# round all numeric variables
# x: data frame
# digits: number of digits to round
numeric_columns <- sapply(df, mode) == 'numeric'
df[numeric_columns] <- round(df[numeric_columns], digits)
df
} #closes func
## load provided data
## make sure you have provided the correct path on your local machine
## note that the data is stored in an object simply called df
load("data/MPI-54-data.Rdata")MPI-54 Psychological network systems
Supporting code for the seminar
Note before you start
The practical session of the seminar aims to teach less data analysis itself but more how to interpret results and how to apply certain criteria to derive answers for your research question.
My suggestion is to first think of the theory and only then do the practical part. Coding is meaningless if the theory is not understood well enough.
An ultra-brief introduction to r and RStudio can be read here.
Prepare the working environment
Install the required packages and load the provided data.
Data description
This is a subset of the data reported in a paper currently under review (Herdoiza-Arroyo, Stanciu, de la Rosa-Gomez, Martinez-Arriaga, & Dominguez-Rodriguez. (2024). A psychological network approach to depression symptoms and sleep difficulties among adults going through grief during COVID-19: A cross-sectional study).
Data were collected via the online platform Duelocovid used to deliver self-applied interventions. The platform was operational in multiple countries in Latin America. For the study, and therefore for this seminar, only data from Mexican participants were used.
Depression symptoms were measured with the Center for Epidemiologic Studies Depression Scale Revised (CESD-R) instrument (Eaton et al., 2004). Access the CESD-R instrument here.
The answer options for the CESD-R instrument are:
0 = _not at all or less than one a day_1 = _1-2 days_2 = _3-4 days_3 = _5-7 days_4 = _nearly every day for two weeks_
Sleep quality was measured with the Pittsburgh Sleep Quality Index (PSQI) instrument (Faulkner & Sidey-Gibbons, 2019). Access the PSQI instrument here.
The answer options for the PSQI instrument are:
0 = _not during the past month_1 = _less than once a week_2 = _once or twice a week_3 = _three or more times a week_
For this seminar, a subset of n = 500 were randomly chosen from the original data. The participant ID, gender and age group were simulated at random. Only the data on the CESD-R and PSQI are real.
For this seminar:
age_groupcontains three values:1 = young,2 = middle agedand3 = oldgendercontains two values:1 = menand2 = women
Thus, results of group comparisons or any analyses that go beyond examinations of CESD-R and PSQI disorders are most likely going to be nonsensical!
Your research question
Now would be a good time to think of a research question that you’d like to examine.
Of course, one that draws on psychological network systems approach to psychological phenomena.
Inspect data
Here is an overview of the measurement items available in the data.
Inspect the first 5 rows in the data.
## Use function head() and the arguments df and number of rows
head(df,5)# A tibble: 5 × 34
ID gender age_group CESDR1 CESDR2 CESDR3 CESDR4 CESDR5 CESDR6 CESDR7 CESDR8
<int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 2 1 3 2 2 2 3 3 2
2 2 2 2 1 4 2 1 1 1 0 0
3 3 1 1 4 4 4 4 4 4 3 4
4 4 2 3 0 1 1 2 2 3 0 1
5 5 2 1 1 2 2 2 0 2 1 2
# ℹ 23 more variables: CESDR9 <dbl>, CESDR10 <dbl>, CESDR11 <dbl>,
# CESDR12 <dbl>, CESDR13 <dbl>, CESDR14 <dbl>, CESDR15 <dbl>, CESDR16 <dbl>,
# CESDR17 <dbl>, CESDR18 <dbl>, CESDR19 <dbl>, CESDR20 <dbl>, PSQI5a <dbl>,
# PSQI5b <dbl>, PSQI5c <dbl>, PSQI5d <dbl>, PSQI5e <dbl>, PSQI5f <dbl>,
# PSQI5g <dbl>, PSQI5h <dbl>, PSQI5i <dbl>, PSQI7 <dbl>, PSQI8 <dbl>
You can note the first five rows on the variables ID through CESDR8 and then a message that there are 23 more variables not listed here. To view the entire dataset you can use the function view() or View() applied to the dataset df. Type this in your console if you’d like.
Check the sample size number.
## Display the number of rows the dataset has
nrow(df)[1] 500
Get a quick summary of the CESDR items1.
summary(df %>% select(CESDR1:CESDR20)) CESDR1 CESDR2 CESDR3 CESDR4 CESDR5
Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.00 Min. :0.000
1st Qu.:0.000 1st Qu.:1.000 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:1.000
Median :1.000 Median :2.000 Median :2.00 Median :2.00 Median :2.000
Mean :1.406 Mean :2.374 Mean :2.35 Mean :2.31 Mean :2.404
3rd Qu.:2.000 3rd Qu.:4.000 3rd Qu.:4.00 3rd Qu.:4.00 3rd Qu.:4.000
Max. :4.000 Max. :4.000 Max. :4.00 Max. :4.00 Max. :4.000
CESDR6 CESDR7 CESDR8 CESDR9
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:2.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
Median :3.000 Median :1.000 Median :1.000 Median :1.000
Mean :2.684 Mean :1.486 Mean :1.578 Mean :1.416
3rd Qu.:4.000 3rd Qu.:2.250 3rd Qu.:3.000 3rd Qu.:3.000
Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
CESDR10 CESDR11 CESDR12 CESDR13 CESDR14
Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000
1st Qu.:1.000 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.000
Median :2.000 Median :0.500 Median :1.00 Median :1.000 Median :0.000
Mean :2.026 Mean :1.222 Mean :1.38 Mean :1.508 Mean :0.922
3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:2.00 3rd Qu.:2.250 3rd Qu.:1.000
Max. :4.000 Max. :4.000 Max. :4.00 Max. :4.000 Max. :4.000
CESDR15 CESDR16 CESDR17 CESDR18 CESDR19
Min. :0.000 Min. :0.0 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:0.000 1st Qu.:1.0 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:1.000
Median :0.000 Median :2.0 Median :2.000 Median :0.000 Median :2.000
Mean :0.392 Mean :2.2 Mean :1.818 Mean :0.934 Mean :2.042
3rd Qu.:0.000 3rd Qu.:3.0 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:3.000
Max. :4.000 Max. :4.0 Max. :4.000 Max. :4.000 Max. :4.000
CESDR20
Min. :0.000
1st Qu.:1.000
Median :2.000
Mean :2.128
3rd Qu.:3.000
Max. :4.000
Get a quick summary of the PSQI items.
summary(df %>% select(PSQI5a:PSQI8)) PSQI5a PSQI5b PSQI5c PSQI5d
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:0.000
Median :3.000 Median :2.000 Median :2.000 Median :0.000
Mean :2.198 Mean :2.174 Mean :1.616 Mean :0.844
3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:2.000
Max. :3.000 Max. :3.000 Max. :3.000 Max. :3.000
PSQI5e PSQI5f PSQI5g PSQI5h
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:0.000 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:0.000
Median :0.000 Median :2.000 Median :1.000 Median :1.000
Mean :0.936 Mean :1.798 Mean :1.102 Mean :1.366
3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :3.000 Max. :3.000 Max. :3.000 Max. :3.000
PSQI5i PSQI7 PSQI8
Min. :0.000 Min. :0.00 Min. :0.000
1st Qu.:0.000 1st Qu.:0.00 1st Qu.:1.000
Median :1.000 Median :0.00 Median :2.000
Mean :1.232 Mean :0.62 Mean :1.684
3rd Qu.:2.000 3rd Qu.:1.00 3rd Qu.:2.000
Max. :3.000 Max. :3.00 Max. :3.000
One other way to get a fuller understanding of the data is to desscribe it using the package psych psych::describe()
## Note that means,sd, etc for ID, gender and age_group are nonsensical because these are categorial
psych::describe(df) vars n mean sd median trimmed mad min max range skew
ID 1 500 250.50 144.48 250.5 250.50 185.32 1 500 499 0.00
gender 2 500 1.50 0.50 1.0 1.50 0.00 1 2 1 0.02
age_group 3 500 2.02 0.81 2.0 2.02 1.48 1 3 2 -0.04
CESDR1 4 500 1.41 1.29 1.0 1.26 1.48 0 4 4 0.68
CESDR2 5 500 2.37 1.32 2.0 2.44 1.48 0 4 4 -0.11
CESDR3 6 500 2.35 1.27 2.0 2.39 1.48 0 4 4 -0.08
CESDR4 7 500 2.31 1.36 2.0 2.38 1.48 0 4 4 -0.11
CESDR5 8 500 2.40 1.33 2.0 2.49 1.48 0 4 4 -0.22
CESDR6 9 500 2.68 1.21 3.0 2.76 1.48 0 4 4 -0.35
CESDR7 10 500 1.49 1.35 1.0 1.36 1.48 0 4 4 0.54
CESDR8 11 500 1.58 1.36 1.0 1.47 1.48 0 4 4 0.45
CESDR9 12 500 1.42 1.43 1.0 1.27 1.48 0 4 4 0.61
CESDR10 13 500 2.03 1.38 2.0 2.03 1.48 0 4 4 0.16
CESDR11 14 500 1.22 1.48 0.5 1.03 0.74 0 4 4 0.81
CESDR12 15 500 1.38 1.38 1.0 1.23 1.48 0 4 4 0.63
CESDR13 16 500 1.51 1.39 1.0 1.39 1.48 0 4 4 0.53
CESDR14 17 500 0.92 1.29 0.0 0.67 0.00 0 4 4 1.27
CESDR15 18 500 0.39 0.92 0.0 0.13 0.00 0 4 4 2.64
CESDR16 19 500 2.20 1.35 2.0 2.25 1.48 0 4 4 -0.06
CESDR17 20 500 1.82 1.44 2.0 1.77 1.48 0 4 4 0.28
CESDR18 21 500 0.93 1.37 0.0 0.67 0.00 0 4 4 1.23
CESDR19 22 500 2.04 1.44 2.0 2.05 1.48 0 4 4 0.04
CESDR20 23 500 2.13 1.31 2.0 2.16 1.48 0 4 4 0.08
PSQI5a 24 500 2.20 0.97 3.0 2.36 0.00 0 3 3 -0.97
PSQI5b 25 500 2.17 0.95 2.0 2.32 1.48 0 3 3 -0.93
PSQI5c 26 500 1.62 1.12 2.0 1.65 1.48 0 3 3 -0.15
PSQI5d 27 500 0.84 1.00 0.0 0.70 0.00 0 3 3 0.83
PSQI5e 28 500 0.94 1.13 0.0 0.80 0.00 0 3 3 0.77
PSQI5f 29 500 1.80 1.02 2.0 1.87 1.48 0 3 3 -0.39
PSQI5g 30 500 1.10 1.05 1.0 1.00 1.48 0 3 3 0.45
PSQI5h 31 500 1.37 1.06 1.0 1.33 1.48 0 3 3 0.12
PSQI5i 32 500 1.23 1.15 1.0 1.17 1.48 0 3 3 0.32
PSQI7 33 500 0.62 1.08 0.0 0.40 0.00 0 3 3 1.42
PSQI8 34 500 1.68 1.01 2.0 1.73 1.48 0 3 3 -0.27
kurtosis se
ID -1.21 6.46
gender -2.00 0.02
age_group -1.48 0.04
CESDR1 -0.58 0.06
CESDR2 -1.25 0.06
CESDR3 -1.18 0.06
CESDR4 -1.30 0.06
CESDR5 -1.18 0.06
CESDR6 -1.17 0.05
CESDR7 -0.93 0.06
CESDR8 -1.00 0.06
CESDR9 -1.01 0.06
CESDR10 -1.30 0.06
CESDR11 -0.84 0.07
CESDR12 -0.87 0.06
CESDR13 -0.99 0.06
CESDR14 0.35 0.06
CESDR15 6.37 0.04
CESDR16 -1.23 0.06
CESDR17 -1.28 0.06
CESDR18 0.06 0.06
CESDR19 -1.35 0.06
CESDR20 -1.17 0.06
PSQI5a -0.19 0.04
PSQI5b -0.13 0.04
PSQI5c -1.36 0.05
PSQI5d -0.59 0.04
PSQI5e -0.93 0.05
PSQI5f -0.98 0.05
PSQI5g -1.07 0.05
PSQI5h -1.23 0.05
PSQI5i -1.36 0.05
PSQI7 0.38 0.05
PSQI8 -1.00 0.04
- Are there missing data?
- What are the min and max values on each instrument?
- What are the means on these items?
- What would low and what would high mean values signal?
Scale reliability
Remember that a scale needs to be valid and reliable. We can assume validity because we use an established scale. We can check for the reliability in the data.
Scale reliability can be quantified by the Cronbach alpha coefficient.
The rule of thumb is that Cronbach alpha above 0.80 signals a reliable scale.
A Cronbach alpha below 0.80 signals that the use of the scale is questionable.
A Cronbach alpha below 0.60 is problematic and one should avoid using the scale.
Check the CESDR and PSQI scale reliability in the data.
#-cesdr
cesdr_alpha<- df %>% dplyr::select(starts_with("CESDR")) %>%
ltm::cronbach.alpha(.,na.rm=TRUE, CI=TRUE, B=100)
cesdr_alpha
Cronbach's alpha for the '.' data-set
Items: 20
Sample units: 500
alpha: 0.933
Bootstrap 95% CI based on 100 samples
2.5% 97.5%
0.925 0.941
#-psqi
psqi_alpha<-df %>% dplyr::select(PSQI5a:PSQI5i,PSQI7:PSQI8) %>%
ltm::cronbach.alpha(.,na.rm=TRUE, CI=TRUE, B=100)
psqi_alpha
Cronbach's alpha for the '.' data-set
Items: 11
Sample units: 500
alpha: 0.741
Bootstrap 95% CI based on 100 samples
2.5% 97.5%
0.694 0.771
- What are the Cronbach alphas for the two instruments?
- Anything to keep on the back of your mind when interpreting results?
Partial correlation network
Network estimation
Note that the network is estimated only from non-categorical scales and from items that should in theory belong to a psychological system. That is why, we first create a separate data object that contains only the CESDR and PSQI items.
If you choose to focus on only one instrument, you can specify this now.
# select CESDR and PSQI items and assign the new dataset into object short_df
short_df <- df %>%
dplyr::select(CESDR1:PSQI8) Use the newly created dataset short_df and estimate partial correlation network with GLASSO regularization.
net_ebic <- bootnet::estimateNetwork(short_df, # what data to use
tuning = 0.5, # EBIC tuning parameter that specifies how liberal or conservative the estimations should be in dealing with meaningful and non-meaningful edges ## see network parsimony argument
default="EBICglasso") # the network estimation techniqueVisualize the overall network.
plot(net_ebic, # what network estimation model to use
layout="spring", # what layout to use for the network visualization
minimum=0.01, # specify which edge weight to display
groups=list(CESDR=1:20, PSQI=21:31)) # specifies which nodes belong to what instruments ## theoryNote the clutted plot with multiple edges (lines). This plot displays all edges with a minimum r = 0.01. You can modify this in the plot function above by adjusting the argument minimum= ....
Inspect the node centrality indicators plot.
centralityPlot(net_ebic, # what network estimation model to use
include = c("Strength","Betweenness","Closeness"), # what node centrality indicators to include
scale="z-scores", # z-standardize all indicators for comparison purposes
orderBy = "Strength") # by which node centrality indicator should the plot be orderedNetwork stability and accuracy
Estimate edge-weight accuracy and associated 95% confidence interval using a bootstrapping procedure.2
# - edge-weight accuracy
boot1 <- bootnet::bootnet(net_ebic, # what network estimation model to use
nBoots = 100, # number of bootstrapped ## the higher (usually 1,000) the more accurate the results are
nCores=8, # number of processors to use
type="nonparametric") # for 95% confidence intervalsCreate a nice table of the results.
a<-summary(boot1) %>%
as.data.frame() %>%
dplyr::select(type,id,sample,mean, CIlower,CIupper) %>%
arrange(.,mean) %>%
round_df(3) %>% # custom function
filter(sample>=0.10) %>%
filter(type %in% "edge") %>%
dplyr::select(-type) %>%
rename(Edge=id,
"Sample weight"=sample,
"Bootstrapped weight"=mean,
"CI lower bound"=CIlower,
"CI upper bound"=CIupper) %>%
mutate(.,Nr.=row_number(),.before=Edge)
a %>%
knitr::kable(caption = "Bootstrapped 95% CI for edge-weights >= 0.10",format="pipe") | Nr. | Edge | Sample weight | Bootstrapped weight | CI lower bound | CI upper bound |
|---|---|---|---|---|---|
| 1 | CESDR1–CESDR2 | 0.110 | 0.100 | 0.039 | 0.182 |
| 2 | CESDR5–PSQI8 | 0.117 | 0.104 | 0.032 | 0.201 |
| 3 | CESDR11–PSQI5h | 0.111 | 0.106 | 0.021 | 0.201 |
| 4 | CESDR9–CESDR10 | 0.121 | 0.109 | 0.048 | 0.195 |
| 5 | PSQI5c–PSQI5e | 0.118 | 0.109 | 0.036 | 0.199 |
| 6 | PSQI5e–PSQI5i | 0.121 | 0.110 | 0.034 | 0.208 |
| 7 | CESDR3–CESDR6 | 0.111 | 0.111 | 0.026 | 0.196 |
| 8 | PSQI5i–PSQI8 | 0.117 | 0.112 | 0.042 | 0.192 |
| 9 | CESDR16–CESDR20 | 0.117 | 0.112 | 0.040 | 0.193 |
| 10 | PSQI5e–PSQI5g | 0.117 | 0.112 | 0.023 | 0.212 |
| 11 | CESDR8–CESDR20 | 0.121 | 0.113 | 0.054 | 0.188 |
| 12 | CESDR2–CESDR3 | 0.121 | 0.118 | 0.046 | 0.196 |
| 13 | CESDR12–CESDR13 | 0.131 | 0.118 | 0.047 | 0.214 |
| 14 | CESDR7–CESDR10 | 0.121 | 0.119 | 0.046 | 0.196 |
| 15 | PSQI5a–PSQI5b | 0.126 | 0.123 | 0.039 | 0.212 |
| 16 | CESDR16–PSQI8 | 0.126 | 0.125 | 0.053 | 0.198 |
| 17 | PSQI5h–PSQI5i | 0.121 | 0.125 | 0.040 | 0.202 |
| 18 | CESDR14–CESDR17 | 0.134 | 0.125 | 0.066 | 0.202 |
| 19 | PSQI5f–PSQI5i | 0.141 | 0.129 | 0.059 | 0.223 |
| 20 | PSQI5d–PSQI5e | 0.142 | 0.129 | 0.056 | 0.229 |
| 21 | PSQI5f–PSQI8 | 0.140 | 0.129 | 0.044 | 0.237 |
| 22 | CESDR7–CESDR14 | 0.142 | 0.135 | 0.065 | 0.219 |
| 23 | CESDR10–CESDR11 | 0.152 | 0.138 | 0.072 | 0.231 |
| 24 | CESDR16–CESDR17 | 0.150 | 0.145 | 0.075 | 0.225 |
| 25 | CESDR8–CESDR10 | 0.169 | 0.162 | 0.096 | 0.242 |
| 26 | CESDR5–CESDR16 | 0.168 | 0.162 | 0.088 | 0.248 |
| 27 | PSQI5d–PSQI5i | 0.167 | 0.166 | 0.082 | 0.252 |
| 28 | CESDR12–CESDR16 | 0.172 | 0.168 | 0.097 | 0.247 |
| 29 | CESDR4–CESDR7 | 0.199 | 0.184 | 0.125 | 0.272 |
| 30 | CESDR10–CESDR20 | 0.190 | 0.191 | 0.121 | 0.259 |
| 31 | CESDR13–PSQI5d | 0.237 | 0.223 | 0.153 | 0.321 |
| 32 | CESDR5–CESDR19 | 0.237 | 0.230 | 0.153 | 0.321 |
| 33 | PSQI5b–PSQI5c | 0.269 | 0.249 | 0.197 | 0.341 |
| 34 | CESDR2–CESDR4 | 0.268 | 0.256 | 0.182 | 0.354 |
| 35 | CESDR4–CESDR6 | 0.281 | 0.276 | 0.195 | 0.366 |
| 36 | CESDR1–CESDR18 | 0.310 | 0.292 | 0.225 | 0.394 |
| 37 | CESDR9–CESDR17 | 0.341 | 0.326 | 0.253 | 0.429 |
| 38 | CESDR2–CESDR6 | 0.341 | 0.334 | 0.258 | 0.425 |
| 39 | CESDR19–PSQI5a | 0.351 | 0.340 | 0.268 | 0.435 |
| 40 | CESDR7–CESDR8 | 0.366 | 0.353 | 0.280 | 0.452 |
| 41 | CESDR3–CESDR20 | 0.374 | 0.354 | 0.303 | 0.446 |
| 42 | CESDR14–CESDR15 | 0.450 | 0.443 | 0.380 | 0.520 |
Note that “weight” is terminology used in mathematics but is nothing else than the “thickness” of lines in the network plot. Thicker lines equate higher values on the weight coefficient.
- How would you explain differences between “sample weight” and “Bootstrapped weight”?
- Are there edges with an associated 95% CI that contains 0 (zero)? Remember that 95% CI that contain 0 (zero) signal non-significant results
Estimate node centrality.
## node centrality ---
boot2 <- bootnet::bootnet(net_ebic, # what network estimation model to use,
nBoots = 100, # number of bootstrapped ## the higher (usually 1,000) the more accurate the results are
nCores=8, # number of processors to use
type="case", # specifies that the results should account for node centrality
statistics = c('strength','betweenness','closeness')) # for which node centrality indicatorsInspect the Correlation Stability (CS) coefficient before interpreting node centrality indicators (see Epskamp et al., 2018). Remember that CS values of at least 0.50 are required.
CS<-bootnet::corStability(boot2,
verbose = F)
CSbetweenness closeness strength
0.438 0.516 0.672
- Which node centrality indicators can be interpreted?
Plot the correlation stability coefficients at varying percentages of data omission.
plot(boot2, "all")- How do you interpret this plot?
Relative importance network
The difference from partial correlation network is that this type of network can specify the direction of association.
Network estimation
Estimate the relative importance network by simply modifying the code above.
Note: The estimation will take considerably more time than with the partial correlation network. Why? Because there is much more to estimate due to the bi-directional nature of this network type. If you decide to estimate relative importance networks on the entire dataset be aware that this might take even hours!
To this end, and for the purpose of the seminar, we estimate the relative importance network only for the first ten CESDR items.
## select only first 10 CESDR items
short2_df <- short_df %>% select(CESDR1:CESDR10)
## estimate relative importance network
net_relimp<-bootnet::estimateNetwork(short2_df, # what data to use
default="relimp", # the network estimation technique
verbose=FALSE)Visualize the relative importance network.
plot(net_relimp, # what network estimation model to use
layout="spring", # what layout to use for the network visualization
minimum=0.01 ) # specify which edge weight to displayNote: For the purpose of this seminar, we will focus on partial correlations network. Cross-sectional data are most commonly available and the existent tools in handling this sort of data are easily accessible given tutorial material.
On a different note, the relative importance networks as well as the Bayesian directed acylic graph networks require additional methodological input which goes beyond the purpose of the seminar.
For the relative importance network, it would suffice for this seminar to inspect the network plot. It should be relatively easy to determine which edge direction is stronger (see arrow thickness). The thickness of an arrow signals the relative importance which is quantified into the lmg coefficient ranging from 0 to 1.
Network comparison
If your research question involves group comparison, note that with the current tools, only comparisons between two groups are possible. Also that missing data needs to be handled beforehand, either imputed or list-wise deleted. For this seminar, we use list-wise deletion as data imputation requires additional methodological input that goes beyond the purpose of the seminar.
We create separate datasets based on gender of participants and drop all missing infromation.
## note that we use the original data because gender was dropped in short_df and short2_df
## create one dataset for gender 1=men
men_df <- df %>% filter(gender==1) %>% dplyr::select(CESDR1:PSQI8) %>% drop_na()
## create one dataset for gender 2=women
fem_df <- df %>% filter(gender==2) %>% dplyr::select(CESDR1:PSQI8) %>% drop_na()We compare the networks estimated in these data. Note that usually we’d need to apply a post-hoc correction to correct to multiple testing. In explorative research, this is not obligatory.
# note: here we do not use the post-correction Bonferroni
nct_gndr<-NetworkComparisonTest::NCT(men_df,fem_df,
gamma=0,
it=100, # number of iterations
#p.adjust.methods=bonferroni,
test.edges=TRUE,
edges="all", # either all or specific edges to be tested
centrality=c("strength","betweenness","closeness"), # node centrality to be compared
progressbar = FALSE)We also estimate separate networks for men and women to visualize the networks and plot node centrality indicators.
net_men<-bootnet::estimateNetwork(men_df, default="EBICglasso")
net_fem<-bootnet::estimateNetwork(fem_df, default="EBICglasso")We create an average layout template that will force the plot generator to display results in all groups on the same template. Useful for interpretation purposes.3
## make sure you include also the overall network!
L_gndr<-qgraph::averageLayout(net_ebic,net_men,net_fem) Visualize the networks for men and women as well as display the node centrality indicators for men and women.
## network men
plot(net_men, # what network estimation model to use
layout=L_gndr, # we use the L_gndr template created above to ensure comparability with the overall sample
minimum=0.01, # specify which edge weight to display
groups=list(CESDR=1:20, PSQI=21:31)) # specifies which nodes belong to what instruments ## theory
## network women
plot(net_fem, # what network estimation model to use
layout=L_gndr, # we use the L_gndr template created above to ensure comparability with the overall sample
minimum=0.01, # specify which edge weight to display
groups=list(CESDR=1:20, PSQI=21:31)) # specifies which nodes belong to what instruments ## theoryNetwork of depressive symptoms and sleep deprivation
Create the plots for node centrality indicators for men and women.
## Note that in the code three node centrality indicators are estimated. But based on the CS coefficient which can be interpred?
## for men
qgraph::centralityPlot(net_men,include=c("Strength","Betweenness","Closeness"),scale="z-scores",orderBy="Strength")
## for women
qgraph::centralityPlot(net_fem,include=c("Strength","Betweenness","Closeness"),scale="z-scores",orderBy="Strength")Node centrality indicators for networks of depressive symptoms and sleep deprivation
- Simply by inspecting these plots, is there anything that stands out to you?
Results of network comparison
Global tests
We might be interested in global tests of edge strength (S-test) and of network structure invariance (M-test).
The S-test provides evidence at the global level (overall over all edges in the network) whether there is a significant difference in the edge strength between the compared groups.
The M-test provides evidence at the global level (overall over all nodes in the network) whether there is a significant difference in the network structure between the compared groups.
The network structure first.
The value of the M-test is:
Mtest=nct_gndr$nwinv.real %>% round_df()
Mtest[1] 0.21
and the p-value associated with the test is:
Mtest_pval=nct_gndr$nwinv.pval %>% round_df()
Mtest_pval[1] 0.35
- How do you interpret the results of the M-test?
The edge strength next.
The value of the S-test is:
Stest=nct_gndr$glstrinv.real %>% round_df()
Stest[1] 0.08
for global strength in the compared networks with values of:
glbl_strngth=nct_gndr$glstrinv.sep %>% round_df()
glbl_strngth[1] 13.58 13.50
and the p-value associated with the test is:
Stest_pval<-nct_gndr$glstrinv.pval %>% round_df()
Stest_pval[1] 0.91
- How do you interpret the results of the S-test?
Edge comparison
Show in a table all invariant edges, edges that are significantly different between the groups at a p-value smaller than 0.05. The E-test is constructed in a permutation approach4.
The E-test compares all edges between the two groups.
## run this line for all significant invariant edges
# nct_gndr$einv.pvals
## table with edges that differ between genders at p < 0.05
tbl_gndr<-as_tibble(nct_gndr$einv.pvals) %>%
mutate_if(is.numeric,
round,
digits=2) %>%
filter(`p-value`<0.05)
DT::datatable(tbl_gndr)- How many edges differ between the two groups?
- Which edge differences stand out to you, and how do you interpret them?
- Are there significant edges across depressive symptoms and sleep deprivation, which are they and how do you intepret these?
Your research question
- Were you able to find adequate answers to your research question?
- What information could you identify in the plots, tables, results?
- Any limitations that stand out?
- Was the partial correlation network adequate to your needs?
General tips
Remember that in this seminar we only covered partial correlation networks that have their limitations and advantages. There are other types of psychological networks that can be examined, and it depends on your research question and available data. These other types of networks are relatively more complex to estimate and intepret.
For further reading, I recommend starting with McNally (2016),Robinaugh et al. (2020), Borsboom et al. (2021)
For tutorials in R: Costantini et al. (2017), Epskamp et al. (2018)
References
Footnotes
Note that the symbol
%>%is the pipe operator from therpackagetidyverse. The pipe operator%>%facilitates chaining code onto outcome of previous code.↩︎Read the Wiki page on what Bootstrapping is https://en.wikipedia.org/wiki/Bootstrapping_(statistics)↩︎
You probably noticed by now that the plots are not quite aligned just yet. This is because we’ve created the ‘master layout template’ only after we’ve estimated the networks for men and women separately. What would you do to have the overall network and networks for each gender aligned?↩︎
Read a mathematical explanation of permutation here https://stats.libretexts.org/Bookshelves/Probability_Theory/Introductory_Probability_(Grinstead_and_Snell)/03%3A_Combinatorics/3.01%3A_Permutations↩︎