Social Network Analysis using Stochastic Actor-Oriented Models (SAOM)

# Load required packages
library(RSiena)

## Warning: package 'RSiena' was built under R version 4.4.3

library(statnet)

## Warning: package 'statnet' was built under R version 4.4.3

## Loading required package: tergm

## Warning: package 'tergm' was built under R version 4.4.3

## Loading required package: ergm

## Warning: package 'ergm' was built under R version 4.4.3

## Loading required package: network

## Warning: package 'network' was built under R version 4.4.3

## 
## 'network' 1.19.0 (2024-12-08), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 
## 'ergm' 4.8.1 (2025-01-20), part of the Statnet Project
## * 'news(package="ergm")' for changes since last version
## * 'citation("ergm")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 'ergm' 4 is a major update that introduces some backwards-incompatible
## changes. Please type 'news(package="ergm")' for a list of major
## changes.

## Loading required package: networkDynamic

## Warning: package 'networkDynamic' was built under R version 4.4.3

## 
## 'networkDynamic' 0.11.5 (2024-11-21), part of the Statnet Project
## * 'news(package="networkDynamic")' for changes since last version
## * 'citation("networkDynamic")' for citation information
## * 'https://statnet.org' for help, support, and other information

## Registered S3 method overwritten by 'tergm':
##   method                   from
##   simulate_formula.network ergm

## 
## 'tergm' 4.2.1 (2024-10-08), part of the Statnet Project
## * 'news(package="tergm")' for changes since last version
## * 'citation("tergm")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 
## Attaching package: 'tergm'

## The following object is masked from 'package:ergm':
## 
##     snctrl

## Loading required package: ergm.count

## Warning: package 'ergm.count' was built under R version 4.4.3

## 
## 'ergm.count' 4.1.2 (2024-06-15), part of the Statnet Project
## * 'news(package="ergm.count")' for changes since last version
## * 'citation("ergm.count")' for citation information
## * 'https://statnet.org' for help, support, and other information

## Loading required package: sna

## Warning: package 'sna' was built under R version 4.4.3

## Loading required package: statnet.common

## Warning: package 'statnet.common' was built under R version 4.4.3

## 
## Attaching package: 'statnet.common'

## The following object is masked from 'package:ergm':
## 
##     snctrl

## The following objects are masked from 'package:base':
## 
##     attr, order

## sna: Tools for Social Network Analysis
## Version 2.8 created on 2024-09-07.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.

## Loading required package: tsna

## 
## 'statnet' 2019.6 (2019-06-13), part of the Statnet Project
## * 'news(package="statnet")' for changes since last version
## * 'citation("statnet")' for citation information
## * 'https://statnet.org' for help, support, and other information

## unable to reach CRAN

library(igraph)

## Warning: package 'igraph' was built under R version 4.4.3

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:sna':
## 
##     betweenness, bonpow, closeness, components, degree, dyad.census,
##     evcent, hierarchy, is.connected, neighborhood, triad.census

## The following objects are masked from 'package:network':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
##     get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
##     is.directed, list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:igraph':
## 
##     as_data_frame, groups, union

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Load FTA matrix data which record global FTA ties between 1998 and 2019
FTA <- readRDS(url("https://www.dropbox.com/scl/fi/6luyfnrur7ictisun8zxh/FTA19982019.rds?rlkey=h6af5x79e318vm6q7rj8b8loc&st=bpi38ulh&dl=1"))

# Read in country name data, you can see a list of country name, abbreviation, and 3-digit iso code. We need them to index countries in our data 
cid <- read.csv(url("https://www.dropbox.com/scl/fi/7isucibr6woxr5gnlthu8/countryid.csv?rlkey=cezf0zg1vbkcvxrrvazzu4pjf&st=uxw2yxdw&dl=1")) 

# Convert to matrix format
for(i in 1:length(FTA)){
  FTA[[i]] <- as.matrix(FTA[[i]]) 
  # Set country name and transform the adjacency FTA matrix into binary network form 
  colnames(FTA[[i]]) <- cid$comtrade_id
  rownames(FTA[[i]]) <- cid$comtrade_id
}

length(FTA)

## [1] 22

## Stacking FTA outcome network into a multidimensional array. We need FTA matrix from the last five years (2015 to 2019, 22 years!)
FTA2019 <- FTA[[22]]
FTA2018 <- FTA[[21]]
FTA2017 <- FTA[[20]]
FTA2016 <- FTA[[19]]
FTA2015 <- FTA[[18]]

# Quick question. Is global free trade on a secular rise? Let Jaccard index be our guide. Jaccard index is a statistic used for gauging the similarity between two count data. It is defined as: N11 / (N01 + N10 + N11), using FTA_change1819 for illustration, N11 is the number of FTAs existed in both periods (4375), N01 is the number of FTAs that were not in period 1 but were in period 2 (68) and N10 is the number of FTAs that were in period 1 but not in period 2 (0).

FTA_change1819 <- table(FTA[[18]], FTA[[19]])
FTA_change1819  # Similarity in global FTA networks between 2018 and 2019

##    
##         0     1
##   0 21546    68
##   1     0  4307

4307/(4307+68)

## [1] 0.9844571

FTA_change1920 <- table(FTA[[19]], FTA[[20]])
FTA_change1920  # Similarity in global FTA networks between 2019 and 2020

##    
##         0     1
##   0 21008   538
##   1     0  4375

4375/(4375+538)

## [1] 0.8904946

FTA_change2021 <- table(FTA[[20]], FTA[[21]])
FTA_change2021  # Similarity in global FTA networks between 2020 and 2021

##    
##         0     1
##   0 20994    14
##   1     2  4911

4911/(4911+14+2)

## [1] 0.9967526

FTA_change2122 <- table(FTA[[21]], FTA[[22]])
FTA_change2122  # Similarity in global FTA networks between 2021 and 2022

##    
##         0     1
##   0 20866   130
##   1    12  4913

4913/(4913+130+12)

## [1] 0.971909

# Well, it seems that the evolution of global free trade is relatively stable.


# Check row and column dimension: is it a square matrix?
nrow(FTA[[2]]) == ncol(FTA[[22]])   # Yes!

## [1] TRUE

net_size = nrow(FTA[[2]])

## Create a FTA array: five years, 161 by 161 observations in each year
FTA_array <- array(c(FTA2019 = FTA[[22]], FTA2019 = FTA[[21]], FTA2019 = FTA[[20]], FTA2019 = FTA[[19]], FTA2019 = FTA[[18]]), 
                   dim = c(net_size, net_size, 5))

dim(FTA_array)

## [1] 161 161   5

## Create SIENA dependent variable network: one-mode network (all edge values are the same, either 0 or 1)
FTA_networks <- sienaDependent(netarray = FTA_array, 
                               type = "oneMode", 
                               allowOnly = FALSE)   # Allow the dependent variable to both increase AND decrease

## Load country attributes data. You can see a list of country-level covariates, they will be used as regressors.
country_attr <- readRDS(url("https://www.dropbox.com/scl/fi/ddhf2z5tl9bo8l9jo29vb/country_attr.rds?rlkey=5upfqxcljtuphtuvingnptptm&st=9bc9dddy&dl=1"))

# Take a look at the data, just your average-looking data.frame object
names(country_attr)

##  [1] "vertex.names"  "cname"         "year"          "population"   
##  [5] "industrial_va" "service_va"    "trade_open"    "polity"       
##  [9] "execrlc"       "liec"          "bound_rate"    "applied_rate" 
## [13] "binding"       "kaopen"        "ipr"           "binding_over" 
## [17] "lGDP"

head(country_attr)

##   vertex.names cname year population industrial_va service_va trade_open polity
## 1            4   AFG 2015   17.34187      22.12404   53.23529    55.3462     10
## 2            8   ALB 2015   14.88022      21.76366   46.28435    71.8010     20
## 3           24   AGO 2015   17.14772      41.93309   48.65989    62.8885      9
## 4           28   ATG 2015   11.44642      16.45798   69.88088    93.6295     21
## 5           32   ARG 2015   17.57671      23.15306   55.81490    22.4862     20
## 6           51   ARM 2015   14.88022      25.70870   48.21267    71.5933     16
##   execrlc liec bound_rate applied_rate binding kaopen          ipr binding_over
## 1       0    7       0.00         5.47     0.0     -2 0.0001958417        -5.47
## 2      -1    7       6.79         1.11   100.0      0 0.0040150162         5.68
## 3      -1    6      59.08         9.38   100.0     -2 0.0078045620        49.70
## 4      -1    6      58.75        12.03    97.5      1 0.0036434520        46.72
## 5      -1    7      31.74         7.35   100.0     -2 0.0265864835        24.39
## 6       0    7       8.68         2.93   100.0      2 0.0000000000         5.75
##         lGDP
## 1 0.00000000
## 2 0.00000000
## 3 0.00000000
## 4 0.00000000
## 5 0.04520129
## 6 0.04498265

## Create time-varying variables: need to reshape the original data.frame to wide format AND in matrix form
# If you want to use a constant (time-invariant or fixed) variable, then use coCovar()
## Notations ##
#cc = coCovar
#vc = varCovar
#cdc = coDyadCovar
#vdc = varDyadCovar


library(reshape)

## Warning: package 'reshape' was built under R version 4.4.3

## 
## Attaching package: 'reshape'

## The following object is masked from 'package:dplyr':
## 
##     rename

# Population
pop <- reshape(country_attr[, c(1, 4, 3)], idvar = "vertex.names", timevar = "year", direction = "wide")
pop_vc <- varCovar(as.matrix(pop[,-1]))  # Make it into time-varying covariate, -1 means remove country id column

# Trade openness
trade_openness <- reshape(country_attr[, c(1, 7, 3)], idvar = "vertex.names", timevar = "year", direction = "wide")
to_vc <- varCovar(as.matrix(trade_openness[,-1])) # Also make it into time-varying covariate

## Create "dyadic" time-varying variables
# First, we need dyadic trade volume data between 2015 and 2019, recorded as log of million USD

trade2015 <- readRDS(url("https://www.dropbox.com/scl/fi/17i8e42xzhgo9dqdmzohn/trade2015.rds?rlkey=5ldvbtn4550vwrbobsu5lfuhx&st=ipp306c7&dl=1"))
trade2016 <- readRDS(url("https://www.dropbox.com/scl/fi/bevq047cg7q6p96qk8rk3/trade2016.rds?rlkey=x19pybshmqf45aqclc8audnqj&st=qn68qrog&dl=1"))
trade2017 <- readRDS(url("https://www.dropbox.com/scl/fi/1a2wvu3o28ur5ce462blr/trade2017.rds?rlkey=izxqr82e8iirr9h73b48i3yfx&st=gavsra6g&dl=1"))
trade2018 <- readRDS(url("https://www.dropbox.com/scl/fi/tvqyzcgqvkfo7kxpkn9j8/trade2018.rds?rlkey=sibcdornmgwzznb5myjfaeqju&st=bnssd0n7&dl=1"))
trade2019 <- readRDS(url("https://www.dropbox.com/scl/fi/gxustrrcx6kvw9137uttd/trade2019.rds?rlkey=jjvgc0ju17465bv6e9jjnebu3&st=cpgy2cry&dl=1"))

# Take a step back and select global FTA network in 2015 for visualization

g2015 <- graph_from_adjacency_matrix(FTA[[18]])  # Plot the FTA network in year 2015
g2015_edgelist <- as_edgelist(g2015)   # Convert adjacent matrix to edgelist format

# Subset country attributes in year 2015
year2015_attributes <- country_attr[country_attr$year == 2015, ]

# Calculate dyadic trade volume using the trade matrix loaded earlier

id <- list()
for(i in as.numeric(unique(rownames(trade2015)))){
  id[[i]] = rep(i, 161)
  exporter_id = unlist(id)  # Convert list to dataframe format
}

# Create a 3 column dataframe (exporter, importer, trade_vol)
trade2015_df <-  data.frame(exporter = exporter_id, importer = rep(as.numeric(unique(rownames(trade2015))), 161)) 
trade2015_df$trade_vol <- NA

trade2015_df <- as.data.frame(as.table(trade2015))
trade2015_df <- trade2015_df[, c(2, 1, 3)]  # Reorganize columns
names(trade2015_df) <- c("exporter", "importer", "trade_vol")

# Create an igraph object for plotting
year2015_net <- graph_from_data_frame(d = g2015_edgelist, directed = TRUE,
                                      vertices = year2015_attributes)
# Setting layout of the graph
layout <- layout.fruchterman.reingold(year2015_net)

## Warning: `layout.fruchterman.reingold()` was deprecated in igraph 2.1.0.
## ℹ Please use `layout_with_fr()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Plotting
plot(year2015_net, 
     vertex.label = country_attr[,2],   # Label with country abbreviation
     vertex.size = 10, 
     vertex.label.cex = 0.5, # Node label (country abbreviation) text size
     layout = layout, 
     edge.arrow.size = .2, 
     edge.arrow.width = trade2015_df[,3],  # Edge width proportional to dyadic trade volume
     edge.color="light gray",
     vertex.frame.color = NA)

# Back to RSiena pack and continue to create dyadic variables
# varDyadCovar() means time-varying dyadic variables

# For a data array of n by n by m periods, only m-1 periods of data will be used. The last array will be discarded (you cannot use data from the m-th period to predict the network formed in the same period)


trade_Dyadvar <- varDyadCovar(array(c(trade2015, trade2016, trade2017, trade2018), dim = c(161, 161, 4)))

FTA_covar <- sienaDataCreate(FTA_networks, pop_vc, to_vc, trade_Dyadvar)  # The first argument is your network, the rest are your node and edge variables

# After building the model. We first get the base terms using getEffects() function, they are the covariates whose effects are to be estimated.
siena_effects <- getEffects(FTA_covar)

# We can use includeEffects() to add additional variables to the base model. For example, do countries tend to extend FTA ties to other countries that are already tied to its immediate neighbors? This term is called "transTrip" in RSiena. We can add that to the model via includeEffects().

siena_effects <- includeEffects(siena_effects,  #  siena object of effects
                                transTrip,   # Name of the term to be included
                                name = "FTA_networks")  # Name of the dependent variable for which effects are being included

##   effectName          include fix   test  initialValue parm
## 1 transitive triplets TRUE    FALSE FALSE          0   0

# Set the effect of "population" as alter, altX, meaning that do alters (other countries) tend to extend FTAs to more populous focal countries (ego)?

siena_effects <- includeEffects(siena_effects, altX, 
                                interaction1 = "pop_vc")

##   effectName   include fix   test  initialValue parm
## 1 pop_vc alter TRUE    FALSE FALSE          0   0

# Also, do countries (ego) having higher level of trade openness tend to extend FTAs to other countries (alters)?

siena_effects <- includeEffects(siena_effects, egoX, 
                                interaction1 = "to_vc")

##   effectName include fix   test  initialValue parm
## 1 to_vc ego  TRUE    FALSE FALSE          0   0

# includeEffects() function is also necessary to set up the effects of covariates from an actor-oriented perspective, that is, from the perspective of the ego to alters or the other way around.

siena_effects <- includeEffects(siena_effects, X,  # X for dyadic effect, check effectsDocumentation()
                                interaction1 = "trade_Dyadvar")

##   effectName    include fix   test  initialValue parm
## 1 trade_Dyadvar TRUE    FALSE FALSE          0   0

# Model setting
# Before we estimate our model, we need to create an object of input specifications using the sienaAlgorithmCreate() function. For pedagogical purpose, I'll set most things at their default values. We'll also set seed = 123 for replication purpose.

input_options <- sienaAlgorithmCreate(projname = "FTA_model",   # Model name (you can name it whatever you like)
                                      # maxlike = TRUE,  # Whether to use maximum likelihood method or Method of Moments estimation
                                      # nsub = 5, # Number of iterations in phase 2
                                      #   n3 = 200,  # Number of iterations in phase 3
                                      seed = 123)

## If you use this algorithm object, siena07 will create/use an output file FTA_model.txt .

# We can now estimate our model using siena07() function

FTA.mod <- siena07(x = input_options, 
                   data = FTA_covar,   # Variable list
                   effects = siena_effects,   # Effects to be estimated (from whose perspective)
                   batch= TRUE, 
                   verbose = FALSE, 
                   returnDeps = TRUE,  # Whether to return a list of simulated dependent variables (set to TRUE if you will be running sienaGOF() later)
                   useCluster = TRUE,  # Use multiple clusters
                   nbrNodes = 3)  # Number of processes to use if useCluster = TRUE

## 
## Start phase 0 
## theta: -0.11  0.00  0.00  0.00  0.00  0.00 
## 
## Start phase 1 
## Phase 1 Iteration 1 Progress: 0%
## Phase 1 Iteration 4 Progress: 0%
## Phase 1 Iteration 10 Progress: 0%
## Phase 1 Iteration 25 Progress: 0%
## Phase 1 Iteration 40 Progress: 1%
## theta: -0.88389 -0.07301  0.01653  0.00913  0.02597 -0.00187 
## 
## Start phase 2.1
## Phase 2 Subphase 1 Iteration 1 Progress: 14%
## Phase 2 Subphase 1 Iteration 2 Progress: 14%
## theta -1.08375 -0.09578  0.01783  0.01205  0.02774 -0.00202 
## ac 1.191 1.195 1.224 1.682 0.934 0.977 
## Phase 2 Subphase 1 Iteration 3 Progress: 14%
## Phase 2 Subphase 1 Iteration 4 Progress: 14%
## theta -1.72026 -0.17148  0.02239  0.02074  0.03027 -0.00244 
## ac 1.253 1.262 1.295 0.948 1.002 1.102 
## Phase 2 Subphase 1 Iteration 5 Progress: 14%
## Phase 2 Subphase 1 Iteration 6 Progress: 14%
## theta -2.41235 -0.25480  0.02832  0.02848  0.02129 -0.00312 
## ac 1.263 1.273 1.302 0.994 1.074 0.951 
## Phase 2 Subphase 1 Iteration 7 Progress: 15%
## Phase 2 Subphase 1 Iteration 8 Progress: 15%
## theta -2.85688 -0.28992  0.03095  0.03132  0.00743 -0.00384 
## ac 1.264 1.273 1.301 1.224 1.097 0.915 
## Phase 2 Subphase 1 Iteration 9 Progress: 15%
## Phase 2 Subphase 1 Iteration 10 Progress: 15%
## theta -3.16447 -0.28105  0.03221  0.03270 -0.00153 -0.00538 
## ac 1.264 1.273 1.301 0.872 1.097 0.740 
## Phase 2 Subphase 1 Iteration 200 Progress: 23%
## theta -8.52867 -0.25979  0.03401  0.03357 -0.00562 -0.11360 
## ac 1.25589 1.21618 1.25405 0.00663 0.93627 0.71015 
## theta -6.07499 -0.24566  0.03204  0.03265 -0.00384 -0.06836 
## ac  1.2558  1.2142  1.2498 -0.0174  0.9169  0.7007 
## theta: -6.07499 -0.24566  0.03204  0.03265 -0.00384 -0.06836 
## 
## Start phase 2.2
## Phase 2 Subphase 2 Iteration 1 Progress: 23%
## Phase 2 Subphase 2 Iteration 2 Progress: 23%
## Phase 2 Subphase 2 Iteration 3 Progress: 23%
## Phase 2 Subphase 2 Iteration 4 Progress: 23%
## Phase 2 Subphase 2 Iteration 5 Progress: 23%
## Phase 2 Subphase 2 Iteration 6 Progress: 23%
## Phase 2 Subphase 2 Iteration 7 Progress: 23%
## Phase 2 Subphase 2 Iteration 8 Progress: 23%
## Phase 2 Subphase 2 Iteration 9 Progress: 23%
## Phase 2 Subphase 2 Iteration 10 Progress: 23%
## Phase 2 Subphase 2 Iteration 200 Progress: 31%
## theta -8.01813 -0.25968  0.03161  0.03531 -0.00399 -0.10484 
## ac  0.8557  0.5867  0.6490 -0.1304  0.0155  0.5001 
## theta -7.14941 -0.25226  0.03211  0.03499 -0.00477 -0.08945 
## ac  0.853  0.603  0.629 -0.108  0.030  0.502 
## theta: -7.14941 -0.25226  0.03211  0.03499 -0.00477 -0.08945 
## 
## Start phase 2.3
## Phase 2 Subphase 3 Iteration 1 Progress: 32%
## Phase 2 Subphase 3 Iteration 2 Progress: 32%
## Phase 2 Subphase 3 Iteration 3 Progress: 32%
## Phase 2 Subphase 3 Iteration 4 Progress: 32%
## Phase 2 Subphase 3 Iteration 5 Progress: 32%
## Phase 2 Subphase 3 Iteration 6 Progress: 32%
## Phase 2 Subphase 3 Iteration 7 Progress: 33%
## Phase 2 Subphase 3 Iteration 8 Progress: 33%
## Phase 2 Subphase 3 Iteration 9 Progress: 33%
## Phase 2 Subphase 3 Iteration 10 Progress: 33%
## Phase 2 Subphase 3 Iteration 200 Progress: 41%
## theta -8.14842 -0.24740  0.03297  0.03598 -0.00547 -0.10653 
## ac 0.9588 0.6807 0.7299 0.1104 0.0436 0.3567 
## theta -7.83950 -0.25418  0.03291  0.03512 -0.00458 -0.10128 
## ac 0.9460 0.6786 0.6680 0.1003 0.0592 0.4088 
## theta: -7.83950 -0.25418  0.03291  0.03512 -0.00458 -0.10128 
## 
## Start phase 2.4
## Phase 2 Subphase 4 Iteration 1 Progress: 43%
## Phase 2 Subphase 4 Iteration 2 Progress: 44%
## Phase 2 Subphase 4 Iteration 3 Progress: 44%
## Phase 2 Subphase 4 Iteration 4 Progress: 44%
## Phase 2 Subphase 4 Iteration 5 Progress: 44%
## Phase 2 Subphase 4 Iteration 6 Progress: 44%
## Phase 2 Subphase 4 Iteration 7 Progress: 44%
## Phase 2 Subphase 4 Iteration 8 Progress: 44%
## Phase 2 Subphase 4 Iteration 9 Progress: 44%
## Phase 2 Subphase 4 Iteration 10 Progress: 44%
## Phase 2 Subphase 4 Iteration 200 Progress: 52%
## theta -8.31718 -0.25217  0.03233  0.03451 -0.00405 -0.10981 
## ac 1.0194 0.7391 0.7676 0.0564 0.1617 0.4991 
## theta -8.31899 -0.25168  0.03263  0.03533 -0.00451 -0.10963 
## ac 0.9752 0.6915 0.7124 0.0302 0.1586 0.5069 
## theta: -8.31899 -0.25168  0.03263  0.03533 -0.00451 -0.10963 
## 
## Start phase 3 
## Phase 3 Iteration 1 Progress 59%
## Phase 3 Iteration 301 Progress 72%
## Phase 3 Iteration 601 Progress 84%
## Phase 3 Iteration 901 Progress 96%

# Check out the results. The estimated FTA.mod object include a column of estimated coefficients and simulated standard errors 
summary(FTA.mod)

## Estimates, standard errors and convergence t-ratios
## 
##                                    Estimate   Standard   Convergence 
##                                                 Error      t-ratio   
## 
## Rate parameters: 
##   0.1      Rate parameter period 1  0.8846  (  0.0751  )             
##   0.2      Rate parameter period 2  0.0995  (  0.0258  )             
##   0.3      Rate parameter period 3  3.3593  (  0.1444  )             
##   0.4      Rate parameter period 4  0.4222  (  0.0526  )             
## 
## Other parameters: 
##   1.  eval outdegree (density)     -8.3190  ( 37.5958  )    1.4478   
##   2.  eval reciprocity             -0.2517  (  0.1700  )    0.3059   
##   3.  eval transitive triplets      0.0326  (  0.0085  )    0.6765   
##   4.  eval trade_Dyadvar            0.0353  (  0.0419  )   -0.0119   
##   5.  eval pop_vc alter            -0.0045  (  0.0283  )   -0.0087   
##   6.  eval to_vc ego               -0.1096  (  0.6371  )    0.4898   
## 
## Overall maximum convergence ratio:    1.6278 
## 
## 
## Total of 2159 iteration steps.
## 
## Covariance matrix of estimates (correlations below diagonal)
## 
##     1413.446        4.833        0.005       -1.528        0.728       23.952
##        0.756        0.029        0.000       -0.005        0.002        0.082
##        0.017       -0.130        0.000        0.000        0.000        0.000
##       -0.969       -0.728       -0.048        0.002       -0.001       -0.026
##        0.684        0.515       -0.006       -0.684        0.001        0.012
##        1.000        0.756        0.019       -0.969        0.684        0.406
## 
## Derivative matrix of expected statistics X by parameters:
## 
##       43.031        8.673      475.559      -97.790        1.863    -2564.529
##        4.737      170.057      668.043       62.551        8.053       69.487
##      354.378      937.051    66191.595     4420.123     -188.983   -20171.928
##     -107.219       79.664     5960.394    20779.714      828.706     -438.250
##       -1.226        9.990       19.606     1016.502     2454.588    -2646.250
##    -2544.360     -539.067   -27643.006     7055.313     -136.653   153571.571
## 
## Covariance matrix of X (correlations below diagonal):
## 
##       75.656       15.212     1091.634     -118.628       12.191    -1895.675
##        0.093      351.888     3178.457      396.599        6.382     2123.378
##        0.234        0.316   288291.922    22827.959    -1991.753   120409.456
##       -0.065        0.101        0.202    44097.661     1403.566    -3050.761
##        0.028        0.007       -0.073        0.132     2565.054    -6922.330
##       -0.146        0.076        0.150       -0.010       -0.092  2227533.506

# It looks like transitive triplets significantly increase the probability of i-j FTA relative to with other countries, while the three dyadic and nodal attributes are not significant. In fact, their estimated coefficients are in the opposite directions.
# Estimated coefficients are reported in odds-ratio format, so they should be interpreted as exponentiated odds. For example, we see a coefficient for trade_Dyadvar of 0.0353. 
# This would suggest that the odds of an FTA from i being sent to j is exp(0.0353) - 1 times higher than being sent to k, assuming that i-j FTA exists but i-k FTA does not.
# You can also see how rate parameters vary across the four periods. The rate parameter is the lowest in period 2, meaning that FTA network changes less during this period, but the rate of change kind of accelerated in period 3.


# We're now ready to plot the goodness-of-fit statistics (gof), which compares deviations between the observed and simulated networks. It will calculate gof statistics for each period
# Low p-values would indicate a potential problem. Our result looks good.

gof1 <- sienaGOF(sienaFitObject = FTA.mod, 
                 auxiliaryFunction = IndegreeDistribution,  # Returns a named vector, the distribution of the observed or simulated in-degrees for the values in levels.
                 verbose = TRUE,   # Whether to print immediate results
                 join = TRUE,   # Boolean: should sienaGOF test all periods individually (FALSE), or sum across periods (TRUE)
                 varName = "FTA_networks")

## Detected 1002 iterations and 1 group.

## Calculating auxiliary statistics for periods  1 2 3 4 .

##   Period 1

##   > Completed  100  calculations  > Completed  200  calculations  > Completed  300  calculations  > Completed  400  calculations  > Completed  500  calculations  > Completed  600  calculations  > Completed  700  calculations  > Completed  800  calculations  > Completed  900  calculations  > Completed  1000  calculations  > Completed  1002  calculations

##   Period 2

##   > Completed  100  calculations  > Completed  200  calculations  > Completed  300  calculations  > Completed  400  calculations  > Completed  500  calculations  > Completed  600  calculations  > Completed  700  calculations  > Completed  800  calculations  > Completed  900  calculations  > Completed  1000  calculations  > Completed  1002  calculations

##   Period 3

##   > Completed  100  calculations  > Completed  200  calculations  > Completed  300  calculations  > Completed  400  calculations  > Completed  500  calculations  > Completed  600  calculations  > Completed  700  calculations  > Completed  800  calculations  > Completed  900  calculations  > Completed  1000  calculations  > Completed  1002  calculations

##   Period 4

##   > Completed  100  calculations  > Completed  200  calculations  > Completed  300  calculations  > Completed  400  calculations  > Completed  500  calculations  > Completed  600  calculations  > Completed  700  calculations  > Completed  800  calculations  > Completed  900  calculations  > Completed  1000  calculations  > Completed  1002  calculations

plot(gof1)

## Calculate prediction accuracy
# Although RSiena primarily focuses on estimating parameters for network evolution models rather than direct prediction accuracy measures like classification accuracy or AUC. However, we can still assess prediction accuracy using methods based on the goodness-of-fit (GOF), which we just did, or by comparing observed versus simulated networks.

# Extract first simulated network in period 2, which is stored in $sims 
# The structure of sims is a nested list: sims[[run]][[group]][[dependent variable]][[period]]. If x$maxlike=TRUE and there is only one group (like our data here) and one period, then the structure is [[run]][[dependent variable]].

sim_net <- FTA.mod$sims[[1]][1]$Data1$FTA_networks$`2`  # This means first simulated network from data 1 (we only have one here) in period 2. The output is a 3-column edgelist (i, j, tie) = [node i, node j, i-j tie]

# Examine the first 10 observations
head(sim_net, 10)

##       [,1] [,2] [,3]
##  [1,]    1   10    1
##  [2,]    1   66    1
##  [3,]    1   89    1
##  [4,]    1  102    1
##  [5,]    1  110    1
##  [6,]    2    8    1
##  [7,]    2   12    1
##  [8,]    2   36    1
##  [9,]    2   37    1
## [10,]    2   39    1

# We now fill in edgelist values into a 161 by 161 matrix
simulated_network <- matrix(0, nrow = 161, ncol = 161)

for(i in 1:nrow(sim_net)){
  simulated_network[sim_net[i,1], sim_net[i,2]] = sim_net[i,3]
}

# Extract actual FTA network observed in period 2
observed_network <- FTA_covar$depvars[[1]][, , 2]  

# Compute accuracy
correct_predictions <- sum(simulated_network == observed_network) / length(observed_network)
cat("New FTAs prediction accuracy:", correct_predictions)

## New FTAs prediction accuracy: 0.9993442

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Social Network Analysis using Stochastic Actor-Oriented Models (SAOM)

Mokaya Derrick Jesse

2025-05-19

R Markdown

Including Plots