Introduction to the PRONA R Package

This vignette gives an overview of how the use the PRONA (Patient Reported Outcomes Network Analysis) R package. PRONA allows users to input their own Patient Reported Outcomes (PRO) data on symptom occurrence/frequency/severity and use Network Analysis (NA) to identify complex symptom concordance patterns and symptom clusters. For an example of how our group has used NA to characterize symptom patterns in cancer patients, please see Bergsneider et al., Neuro-Oncology Advances, 2022 and Bergsneider et al., Cancer Medicine, 2024. The steps outlined in this vignette follow the analysis pipeline described in the papers. For more background on NA and its applications to PRO data, please refer to the papers as well.

1. Load PRONA and data to analyze

First, we need to load the PRONA package and data to analyze. Instructions on how to download the PRONA package are at https://github.com/bbergsneider/PRONA. Once you have downloaded the PRONA package, you can load it by running:

library(PRONA)

Next, let’s load a dataset to analyze. In order to be analyzed with PRONA, data must be in a dataframe formatted with the following specifications:

Each row should represent a single patient.
The first column must be named “ID” and contain unique identifications (numbers or strings) for each patient.
The remaining columns should represent symptoms, be named after the symptom, and must contain numerical values.
There should not be any NA values in the dataframe.

For this example, we will analyze publicly-available data on the frequency of Post-traumatic stress disorder (PTSD) among female drug users. To access this data and run through this tutorial yourself, please go to the NIH National Institute on Drug Abuse Data Share Website, click on “CTN-0015 Data Files”, fill out the required data sharing agreement, and navigate to the file titled “qs.csv”. To format the data to the proper PRONA format, use the function format_ptsd_data(). The idea for using this dataset as an example and the code for formatting it properly is adapted from Epskamp Borsboom & Fried, Behavioral Research Methods, 2018.

# Replace this filepath with where you downloaded the qs.csv file
df <- read.csv('../../PRONA_additional_files/original_data/qs.csv', stringsAsFactors = FALSE)
df <- format_ptsd_data(df)

PRONA requires that dataframes have no NA values in them, so we will check if this dataframe has any NA values and then replace any NA values with 0.

print(any(is.na(df)))

## [1] TRUE

df <- replace(df, is.na(df), 0)

2. Plot the frequency and occurrence of all symptoms

Now that we have the data loaded, let’s look at the frequency and occurrence of each symptom across all patients:

plot_frequency(df)

plot_occurrence(df)

3. Unique Variable Analysis

When working with PRO data, oftentimes some of the symptom measurements may overlap with one another and actually capture the same underlying latent variable. Overlapping symptoms can skew network structure, network centrality measurements, and symptom clustering. To avoid this, we can use Unique Variable Analysis (UVA), which is a method for identifying redundant variables in a network by assessing their topological overlap and consolidating them into a single underlying variable using the Maximum Likelihood with Robust standard errors estimate. For more details on UVA, see Chrisensen Garrido & Golino, Multivariate Behavioral Research, 2023.

We have implemented UVA in PRONA through the calculate_wTO, plot_wTO, and perform_uva_consolidation functions.

First, we will calculate and plot the weighted topological overlap (wTO) for the top 20 variable pairs:

wTO_results <- calculate_wTO(df)

plot_wTO(wTO_results)

Next, we will consolidate redundant variables. The default wTO threshold for considering variable pairs as potentially redundant is 0.25, but other factors should also be considered when choosing this threshold such as (1) whether the variables are at the extreme high or low ends of severity/frequency/occurrence (variables at the extremes of severity/frequency/occurrence can tend to exhibit false correlations with other variables due to their limited variance) AND (2) whether the variables have considerable conceptual overlap. In this example, the top two variables pairs with the highest wTO (BEING.JUMPY.OR.EASILY.STARTLED-BEING.OVER.ALERT and UPSET.WHEN.REMINDED.OF.TRAUMA-UPSETTING.THOUGHTS.OR.IMAGES) seem to have considerable conceptual overlap, but the third variable pair (DISTANT.OR.CUT.OFF.FROM.PEOPLE-LESS.INTEREST.IN.ACTIVITIES) has less conceptual overlap, so we will set a cutoff threshold of 0.275 to only consolidate the top two variable pairs.

The parameters for the perform_uva_consolidation function are:

df: A dataframe representing symptom severity/frequency
scale: The scale on which symptom severity/frequency is measured. In our case, PTSD frequency is measured on a 0-3 scale, so scale = 3. To give another example, if your data is measured on a 0-10 scale, then scale = 10.
cut.off: The cut-off used to determine when paiwise wTO values are considered redundant. Must be between 0 and 1. (Default: 0.25)
reduce.method: Method used to reduce redundancies. Available options include “latent”, “mean”, “remove”, and “sum.” See Christensen et al. for more details. We recommend using “latent” and have set it as the default.
new.names: Vector of new names to give to consolidated variables. Variable pairs will be renamed in descending order of wTO. If this vector is not given, new variable pairs will be renamed “CV1”, “CV2”, etc… Moreover, if reduce.method = “remove”, this vector will not be used. (Default: NULL)
output_dir: The directory in which the .RDS and .csv outputs of perform_uva should be saved.

uva_results <- perform_uva_consolidation(df, scale = 3, cut.off = 0.275, reduce.method = "latent", new.names = c('JUMPY.STARTLED.OVER.ALERT', 'UPSETTING.REMINDERS'), output_dir = '../../PRONA_additional_files/test_scripts/output')

## Estimating latent variables...Estimating latent variables... (1 of 2 complete)Estimating latent variables... (2 of 2 complete)Estimating latent variables...done.

perform_uva saves the reduced data in a .csv file titled “reduced_data.csv”. Let’s read in this data and look at the new frequency and occurrence distributions:

reduced_df <- read.csv('../../PRONA_additional_files/test_scripts/output/reduced_data.csv', stringsAsFactors = FALSE)

plot_frequency(reduced_df)

plot_occurrence(reduced_df)

4. Check variable normality

Before constructing a network, all variables should be checked for whether they are normally distributed. If they are not normally distributed, they should be passed through a nonparanormal transformation before network construction. To check if symptoms are normally distributed, we use the Shapiro-Wilk normality test. A p-value of < 0.05 in the Shapiro-Wilk normality test suggests that the variable is not normally distributed. In our dataset, we can see that all variables are non-normally distributed:

plot_density(reduced_df)

## Picking joint bandwidth of 0.284

check_normality(reduced_df)

## $AVOID.REMINDERS.OF.THE.TRAUMA
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.80508, p-value < 2.2e-16
## 
## 
## $BAD.DREAMS.ABOUT.THE.TRAUMA
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.79406, p-value < 2.2e-16
## 
## 
## $DISTANT.OR.CUT.OFF.FROM.PEOPLE
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.83498, p-value < 2.2e-16
## 
## 
## $FEELING.EMOTIONALLY.NUMB
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.82247, p-value < 2.2e-16
## 
## 
## $FEELING.IRRITABLE
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.86122, p-value < 2.2e-16
## 
## 
## $FEELING.PLANS.WONT.COME.TRUE
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.79143, p-value < 2.2e-16
## 
## 
## $HAVING.TROUBLE.CONCENTRATING
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.85827, p-value < 2.2e-16
## 
## 
## $HAVING.TROUBLE.SLEEPING
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.80125, p-value < 2.2e-16
## 
## 
## $LESS.INTEREST.IN.ACTIVITIES
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.79099, p-value < 2.2e-16
## 
## 
## $NOT.ABLE.TO.REMEMBER
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.67634, p-value < 2.2e-16
## 
## 
## $NOT.THINKING.ABOUT.TRAUMA
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.83094, p-value < 2.2e-16
## 
## 
## $PHYSICAL.REACTIONS
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.84814, p-value < 2.2e-16
## 
## 
## $RELIVING.THE.TRAUMA
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.74084, p-value < 2.2e-16
## 
## 
## $JUMPY.STARTLED.OVER.ALERT
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.91583, p-value = 2.756e-13
## 
## 
## $UPSETTING.REMINDERS
## 
##  Shapiro-Wilk normality test
## 
## data:  newX[, i]
## W = 0.94735, p-value = 5.265e-10

5. Construct a Gaussian Graphical Model Network

Now that we are done with all the pre-processing steps, we can construct a Gaussian Graphical Model (GGM) network for our data, assuming all variables are non-normally distributed, and determining symptom networks through the walktrap algorithm:

ptsd_network <- construct_ggm(reduced_df, normal = FALSE)

## Loading required namespace: huge

## Conducting nonparanormal (npn) transformation via skeptic....done.

plot_ggm(ptsd_network, label.size = 2.5)

head(get_ggm_weights(ptsd_network))

##                              row                            col     weight
## 1  AVOID.REMINDERS.OF.THE.TRAUMA    BAD.DREAMS.ABOUT.THE.TRAUMA 0.04382343
## 2  AVOID.REMINDERS.OF.THE.TRAUMA DISTANT.OR.CUT.OFF.FROM.PEOPLE 0.00000000
## 3    BAD.DREAMS.ABOUT.THE.TRAUMA DISTANT.OR.CUT.OFF.FROM.PEOPLE 0.00000000
## 4  AVOID.REMINDERS.OF.THE.TRAUMA       FEELING.EMOTIONALLY.NUMB 0.05065381
## 5    BAD.DREAMS.ABOUT.THE.TRAUMA       FEELING.EMOTIONALLY.NUMB 0.00000000
## 6 DISTANT.OR.CUT.OFF.FROM.PEOPLE       FEELING.EMOTIONALLY.NUMB 0.21954089

The construct_ggm function uses the EBIC-GLASSO method for calculating a regularized GGM, and it uses the same method as the EGAnet package for defining gamma. Specifically, gamma is initially set to 0.5 and a network is constructed. If there are any unconnected nodes in the network, gamma is decreased to 0.25. If there are still any unconnected nodes, gamma is decreased to 0. To check what gamma was ultimately used, check the summary of the network (in this case gamma = 0.5):

summary(ptsd_network)

## Model: GLASSO (EBIC with gamma = 0.5)
## Correlations: auto
## Lambda: 0.0582056231472171 (n = 100, ratio = 0.1)
## 
## Number of nodes: 15
## Number of edges: 69
## Edge density: 0.657
## 
## Non-zero edge weights: 
##      M    SD    Min   Max
##  0.088 0.065 -0.019 0.331
## 
## ----
## 
## Algorithm:  Walktrap
## 
## Number of communities:  2
## 
##  AVOID.REMINDERS.OF.THE.TRAUMA    BAD.DREAMS.ABOUT.THE.TRAUMA 
##                              1                              1 
## DISTANT.OR.CUT.OFF.FROM.PEOPLE       FEELING.EMOTIONALLY.NUMB 
##                              2                              2 
##              FEELING.IRRITABLE   FEELING.PLANS.WONT.COME.TRUE 
##                              2                              2 
##   HAVING.TROUBLE.CONCENTRATING        HAVING.TROUBLE.SLEEPING 
##                              2                              2 
##    LESS.INTEREST.IN.ACTIVITIES           NOT.ABLE.TO.REMEMBER 
##                              2                              1 
##      NOT.THINKING.ABOUT.TRAUMA             PHYSICAL.REACTIONS 
##                              1                              1 
##            RELIVING.THE.TRAUMA      JUMPY.STARTLED.OVER.ALERT 
##                              1                              1 
##            UPSETTING.REMINDERS 
##                              1 
## 
## ----
## 
## Unidimensional Method: Louvain
## Unidimensional: No
## 
## ----
## 
## TEFI: -8.563

6. Calculate network centrality measurements

Finally, we can calculate centrality and bridge centrality measurements for the network such as strength, closeness, betweenness, and expected influence.

calculate_centralities(ptsd_network)

##                                Betweenness   Closeness  Strength
## AVOID.REMINDERS.OF.THE.TRAUMA            2 0.005394104 0.7209867
## BAD.DREAMS.ABOUT.THE.TRAUMA              3 0.004688610 0.7233970
## DISTANT.OR.CUT.OFF.FROM.PEOPLE           7 0.005048337 1.0370721
## FEELING.EMOTIONALLY.NUMB                 4 0.004714444 0.9010318
## FEELING.IRRITABLE                        1 0.004740006 0.6806787
## FEELING.PLANS.WONT.COME.TRUE             1 0.004379697 0.7307798
## HAVING.TROUBLE.CONCENTRATING            10 0.005583428 0.8197532
## HAVING.TROUBLE.SLEEPING                  0 0.003831261 0.4504405
## LESS.INTEREST.IN.ACTIVITIES             11 0.005102076 0.9545680
## NOT.ABLE.TO.REMEMBER                     0 0.004089033 0.4980620
## NOT.THINKING.ABOUT.TRAUMA                3 0.004985803 0.8663101
## PHYSICAL.REACTIONS                       4 0.004998368 0.9527918
## RELIVING.THE.TRAUMA                      0 0.004678243 0.6797603
## JUMPY.STARTLED.OVER.ALERT                9 0.005677806 0.9525108
## UPSETTING.REMINDERS                     19 0.005513275 1.2226384
##                                ExpectedInfluence
## AVOID.REMINDERS.OF.THE.TRAUMA          0.7209867
## BAD.DREAMS.ABOUT.THE.TRAUMA            0.7233970
## DISTANT.OR.CUT.OFF.FROM.PEOPLE         1.0370721
## FEELING.EMOTIONALLY.NUMB               0.9010318
## FEELING.IRRITABLE                      0.6806787
## FEELING.PLANS.WONT.COME.TRUE           0.7307798
## HAVING.TROUBLE.CONCENTRATING           0.8197532
## HAVING.TROUBLE.SLEEPING                0.4128431
## LESS.INTEREST.IN.ACTIVITIES            0.9545680
## NOT.ABLE.TO.REMEMBER                   0.4604646
## NOT.THINKING.ABOUT.TRAUMA              0.8663101
## PHYSICAL.REACTIONS                     0.9527918
## RELIVING.THE.TRAUMA                    0.6797603
## JUMPY.STARTLED.OVER.ALERT              0.9525108
## UPSETTING.REMINDERS                    1.2226384

plot_centralities(ptsd_network)

calculate_bridge_centralities(ptsd_network)

##                                Bridge.Strength Bridge.Betweenness
## AVOID.REMINDERS.OF.THE.TRAUMA       0.14699512                  2
## BAD.DREAMS.ABOUT.THE.TRAUMA         0.08865256                  3
## DISTANT.OR.CUT.OFF.FROM.PEOPLE      0.17723582                  4
## FEELING.EMOTIONALLY.NUMB            0.22434356                  4
## FEELING.IRRITABLE                   0.10516660                  0
## FEELING.PLANS.WONT.COME.TRUE        0.07706336                  0
## HAVING.TROUBLE.CONCENTRATING        0.28731230                  8
## HAVING.TROUBLE.SLEEPING             0.21391505                  0
## LESS.INTEREST.IN.ACTIVITIES         0.34964458                 10
## NOT.ABLE.TO.REMEMBER                0.16221293                  0
## NOT.THINKING.ABOUT.TRAUMA           0.13155978                  0
## PHYSICAL.REACTIONS                  0.10712227                  0
## RELIVING.THE.TRAUMA                 0.05268426                  0
## JUMPY.STARTLED.OVER.ALERT           0.40945364                  9
## UPSETTING.REMINDERS                 0.33600071                 13
##                                Bridge.Closeness
## AVOID.REMINDERS.OF.THE.TRAUMA        0.06404771
## BAD.DREAMS.ABOUT.THE.TRAUMA          0.04902967
## DISTANT.OR.CUT.OFF.FROM.PEOPLE       0.05565030
## FEELING.EMOTIONALLY.NUMB             0.05286795
## FEELING.IRRITABLE                    0.05068850
## FEELING.PLANS.WONT.COME.TRUE         0.04724254
## HAVING.TROUBLE.CONCENTRATING         0.07012627
## HAVING.TROUBLE.SLEEPING              0.04989286
## LESS.INTEREST.IN.ACTIVITIES          0.06008296
## NOT.ABLE.TO.REMEMBER                 0.04773757
## NOT.THINKING.ABOUT.TRAUMA            0.05134075
## PHYSICAL.REACTIONS                   0.05145955
## RELIVING.THE.TRAUMA                  0.04847066
## JUMPY.STARTLED.OVER.ALERT            0.07585530
## UPSETTING.REMINDERS                  0.05735682

plot_bridge_centralities(ptsd_network)