TERGM Assignment 3

Simulated Network

I created a simulated directed and weighted network in R. Similar to the Knockout project, there are ten nodes. For my network, I created several vertex attributes: (1) gender, (2) GPA, (3) leadership views, (4) race, and (5) grade. The demographic variables were created by checking demographic characteristics of the AFA. The attributes were not randomly assigned to the nodes, however, the edges are randomly assigned. Here is the adjacency matrix and plot:

net2 <- network.initialize(num_nodes)

net2 <- network.initialize(num_nodes)             
gender <- c(rep("Female", 2),rep("Male", 8))
GPA <- runif(num_nodes, min = 2, max = 10)
leadership.views <- sample(1:5, replace=TRUE, 10)
race <- c(rep("White"), 8,rep("Not White"), 2)
grade <- sample(1:4, replace=TRUE, 10)

set.vertex.attribute(net2, "Gender", gender)
set.vertex.attribute(net2, "GPA", GPA)
set.vertex.attribute(net2, "Leadership Views", leadership.views)
set.vertex.attribute(net2, "Race", race)
set.vertex.attribute(net2, "Grade", grade)

network.vertex.names(net2) <- node_names
net2[as.matrix(edgelist)] <- 1
as.matrix(net2)

##           person_1 person_2 person_3 person_4 person_5 person_6 person_7
## person_1         0        0        0        1        1        0        0
## person_2         0        0        0        0        0        0        0
## person_3         0        1        0        1        0        0        0
## person_4         0        0        0        0        1        0        1
## person_5         0        1        1        0        0        1        0
## person_6         0        0        0        0        0        0        0
## person_7         1        1        0        1        0        0        0
## person_8         1        0        0        1        1        0        1
## person_9         0        0        0        1        0        1        1
## person_10        0        1        0        0        0        1        0
##           person_8 person_9 person_10
## person_1         0        0         1
## person_2         0        0         0
## person_3         0        1         0
## person_4         0        0         0
## person_5         1        0         0
## person_6         1        1         0
## person_7         0        0         0
## person_8         0        1         0
## person_9         0        0         0
## person_10        0        0         0

plot(net2, vertex.cex = 4)

The initial simulated network is very dense. This is because each participant will have connections to at least three players. I expect that as the rounds increase, players will form more efficient communications. In the next five graph simulations, I will randomly remove five edges in the network until it arrives at an equilibrium.

set.seed(1)
library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:sna':
## 
##     betweenness, bonpow, closeness, components, degree,
##     dyad.census, evcent, hierarchy, is.connected, neighborhood,
##     triad.census

## The following objects are masked from 'package:network':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges,
##     delete.vertices, get.edge.attribute, get.edges,
##     get.vertex.attribute, is.bipartite, is.directed,
##     list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

adjacency_matrix_2 <- net2[,]
g1 <- graph.adjacency(adjacency_matrix_2)

g2 <- g1 %>%
  delete_edges(sample(1:30, 5)) #Round 2

set.seed(2)
g3 <- g2 %>%
  delete_edges(sample(1:25, 5)) #Roud 3

set.seed(13)
g4 <- g3 %>%
  delete_edges(sample(1:20, 5)) #Roud 4

set.seed(5)
par(mfrow = c(2,2))
plot(g1,edge.arrow.size=.05, vertex.size=30, vertex.label=NA, edge.curved=.3, edge.color="black", vertex.color="red", vertex.frame.color="#ffffff", main="Round 1")
plot(g2,edge.arrow.size=.05, vertex.size=30, vertex.label=NA, edge.curved=.3, edge.color="black", vertex.color="red", vertex.frame.color="#ffffff", main="Round 2")
plot(g3,edge.arrow.size=.05, vertex.size=30, vertex.label=NA, edge.curved=.3, edge.color="black", vertex.color="red", vertex.frame.color="#ffffff", main="Round 3")
plot(g4,edge.arrow.size=.05, vertex.size=30, vertex.label=NA, edge.curved=.3, edge.color="black", vertex.color="red", vertex.frame.color="#ffffff", main="Round 4")

Before writing some TERGM models, I wanted to demonstrate that I am comfortable writing ERGM graphs. Below are some ERGMs for the Round 1 network.

ergm.1 <- ergm(net2 ~ edges)

## Evaluating log-likelihood at the estimate.

summary(ergm.1)

## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   net2 ~ edges
## 
## Iterations:  4 out of 20 
## 
## Monte Carlo MLE Results:
##       Estimate Std. Error MCMC %  p-value    
## edges  -0.8473     0.2300      0 0.000394 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 124.8  on 90  degrees of freedom
##  Residual Deviance: 110.0  on 89  degrees of freedom
##  
## AIC: 112    BIC: 114.5    (Smaller is better.)

gof.1 <- gof(ergm.1)
par(mfrow=c(2,2)); plot(gof.1)

The initial model fits very well, but there could be a slight improvement for in degree and out degree. I am going to test three other models, adding in endogenous effects and exogenous covariates. Afterward, I will pick which model is the best fit.

#Model 2
ergm.2 <- ergm(net2 ~ edges + triangles)

## Starting maximum likelihood estimation via MCMLE:
## Iteration 1 of at most 20: 
## The log-likelihood improved by 0.003065 
## Step length converged once. Increasing MCMC sample size.
## Iteration 2 of at most 20: 
## The log-likelihood improved by 0.000905 
## Step length converged twice. Stopping.
## Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
## 
## This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.

summary(ergm.2)

## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   net2 ~ edges + triangles
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##          Estimate Std. Error MCMC % p-value
## edges     -0.4437     0.5958      0   0.458
## triangle  -0.1477     0.2165      0   0.497
## 
##      Null Deviance: 124.8  on 90  degrees of freedom
##  Residual Deviance: 109.4  on 88  degrees of freedom
##  
## AIC: 113.4    BIC: 118.4    (Smaller is better.)

gof.2 <- gof(ergm.2)
par(mfrow=c(2,2)); plot(gof.2)

mcmc.diagnostics(ergm.2)

## Sample statistics summary:
## 
## Iterations = 16384:4209664
## Thinning interval = 1024 
## Number of chains = 1 
## Sample size per chain = 4096 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##             Mean    SD Naive SE Time-series SE
## edges    -0.1213 3.471  0.05423        0.05423
## triangle -0.4043 9.552  0.14925        0.14925
## 
## 2. Quantiles for each variable:
## 
##          2.5% 25% 50% 75% 97.5%
## edges      -7  -2   0   2     6
## triangle  -16  -7  -1   5    20
## 
## 
## Sample statistics cross-correlations:
##              edges  triangle
## edges    1.0000000 0.8752801
## triangle 0.8752801 1.0000000
## 
## Sample statistics auto-correlation:
## Chain 1 
##                edges     triangle
## Lag 0     1.00000000  1.000000000
## Lag 1024  0.01227304  0.009321315
## Lag 2048 -0.02039179 -0.013572297
## Lag 3072 -0.02117775 -0.022037795
## Lag 4096 -0.01518019 -0.016421793
## Lag 5120 -0.01097202 -0.003354707
## 
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1 
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##    edges triangle 
##  -0.2574   0.3890 
## 
## Individual P-values (lower = worse):
##     edges  triangle 
## 0.7968994 0.6972828 
## Joint P-value (lower = worse):  0.4405729 .

## Loading required namespace: latticeExtra

## 
## MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).

#Model 3
#ergm.3 <- ergm(net2 ~ edges + triangles + 
 #                mutual + nodefactor("Sex")
  #               absdiff("GPA"), 
   #              control=control.ergm(
    #             MCMC.samplesize=500,
     #            MCMC.burnin=1000,
      #           MCMLE.maxit=10), verbose=TRUE)
#summary(ergm.3)
#mcmc.diagnostics(ergm.3)
#gof.3 <- gof(ergm.3)
#par(mfrow=c(2,2)); plot(gof.3)

#Model 4
ergm.4 <- ergm(net2 ~ edges + triangles + idegreepopularity +
                 cycle(4) + absdiff("Grade") + 
                 absdiff("GPA"), 
               control=control.ergm(
                 MCMC.samplesize=500,
                 MCMC.burnin=1000,
                 MCMLE.maxit=10), verbose=TRUE)

## Evaluating network in model
## Initializing Metropolis-Hastings proposal(s): ergm:MH_TNT
## Initializing model.
## Using initial method 'MPLE'.
## Fitting initial model.
## MPLE covariate matrix has 88 rows.
## Fitting ERGM.
## Starting maximum likelihood estimation via MCMLE:
## Density guard set to 10000 from an initial count of 27  edges.
## Iteration 1 of at most 10 with parameter: 
##             edges          triangle idegreepopularity            cycle4 
##        0.91251992       -0.12218828       -0.38392768       -0.01358603 
##     absdiff.Grade       absdiff.GPA 
##       -0.07779674       -0.14474593 
## Sampler accepted  67.939% of 512000 proposed steps.
## Sample size = 500 by 500 
## Back from unconstrained MCMC. Average statistics:
##             edges          triangle idegreepopularity            cycle4 
##         -0.394000         -1.388000         -1.071423         -0.084000 
##     absdiff.Grade       absdiff.GPA 
##          0.438000         -0.982667 
## Average estimating equation values:
##             edges          triangle idegreepopularity            cycle4 
##         -0.394000         -1.388000         -1.071423         -0.084000 
##     absdiff.Grade       absdiff.GPA 
##          0.438000         -0.982667 
## is.inCH: iter= 1, inside hull.
## iter= 1, est=1.000000, low=1.000000, high=1.000000, test=1.
## Calling MCMLE Optimization...
## Using Newton-Raphson Step with step length  1  ...
## Using lognormal metric (see control.ergm function).
## Using log-normal approx (no optim)
## The log-likelihood improved by 0.05551 
## Step length converged once. Increasing MCMC sample size.
## Iteration 2 of at most 10 with parameter: 
##             edges          triangle idegreepopularity            cycle4 
##        1.03174317       -0.09970756       -0.37224990       -0.05830474 
##     absdiff.Grade       absdiff.GPA 
##       -0.15154365       -0.15310124 
## Sampler accepted  67.837% of 2048000 proposed steps.
## Sample size = 2000 by 2000 
## Back from unconstrained MCMC. Average statistics:
##             edges          triangle idegreepopularity            cycle4 
##         0.4050000         1.1000000         0.8868318         0.2540000 
##     absdiff.Grade       absdiff.GPA 
##         0.4135000         0.7000610 
## Average estimating equation values:
##             edges          triangle idegreepopularity            cycle4 
##         0.4050000         1.1000000         0.8868318         0.2540000 
##     absdiff.Grade       absdiff.GPA 
##         0.4135000         0.7000610 
## is.inCH: iter= 1, inside hull.
## iter= 1, est=1.000000, low=1.000000, high=1.000000, test=1.
## Calling MCMLE Optimization...
## Using Newton-Raphson Step with step length  1  ...
## Using lognormal metric (see control.ergm function).
## Using log-normal approx (no optim)
## Starting MCMC s.e. computation.
## The log-likelihood improved by 0.0198 
## Step length converged twice. Stopping.
## Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
## 
## This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.

summary(ergm.4)

## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   net2 ~ edges + triangles + idegreepopularity + cycle(4) + absdiff("Grade") + 
##     absdiff("GPA")
## 
## Iterations:  2 out of 10 
## 
## Monte Carlo MLE Results:
##                   Estimate Std. Error MCMC % p-value
## edges              0.74559    1.99573      0   0.710
## triangle          -0.11412    0.23271      0   0.625
## idegreepopularity -0.28555    0.78538      0   0.717
## cycle4            -0.02614    0.29689      0   0.930
## absdiff.Grade     -0.14604    0.26992      0   0.590
## absdiff.GPA       -0.14624    0.14286      0   0.309
## 
##      Null Deviance: 124.8  on 90  degrees of freedom
##  Residual Deviance: 108.0  on 84  degrees of freedom
##  
## AIC: 120    BIC: 135    (Smaller is better.)

mcmc.diagnostics(ergm.4)

## Sample statistics summary:
## 
## Iterations = 1000:2047976
## Thinning interval = 1024 
## Number of chains = 1 
## Sample size per chain = 2000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                     Mean     SD Naive SE Time-series SE
## edges             0.4050  3.134  0.07008        0.07284
## triangle          1.1000  9.599  0.21464        0.21464
## idegreepopularity 0.8868  7.901  0.17667        0.18346
## cycle4            0.2540  5.556  0.12424        0.12424
## absdiff.Grade     0.4135  5.235  0.11705        0.11705
## absdiff.GPA       0.7001 10.724  0.23980        0.23980
## 
## 2. Quantiles for each variable:
## 
##                     2.5%    25%     50%   75% 97.5%
## edges              -5.00 -2.000  0.0000 2.000  7.00
## triangle          -14.00 -6.000  0.0000 7.000 23.00
## idegreepopularity -13.40 -4.695  0.8126 6.084 17.43
## cycle4             -8.00 -4.000 -1.0000 3.000 13.00
## absdiff.Grade     -10.00 -3.000  0.0000 4.000 11.00
## absdiff.GPA       -19.51 -6.892  0.3519 7.772 21.79
## 
## 
## Sample statistics cross-correlations:
##                       edges  triangle idegreepopularity    cycle4
## edges             1.0000000 0.8708816         0.9849808 0.7704989
## triangle          0.8708816 1.0000000         0.8836388 0.7471780
## idegreepopularity 0.9849808 0.8836388         1.0000000 0.7471779
## cycle4            0.7704989 0.7471780         0.7471779 1.0000000
## absdiff.Grade     0.7018665 0.6194329         0.6942518 0.5401957
## absdiff.GPA       0.7539240 0.6539379         0.7453795 0.5785358
##                   absdiff.Grade absdiff.GPA
## edges                 0.7018665   0.7539240
## triangle              0.6194329   0.6539379
## idegreepopularity     0.6942518   0.7453795
## cycle4                0.5401957   0.5785358
## absdiff.Grade         1.0000000   0.4791398
## absdiff.GPA           0.4791398   1.0000000
## 
## Sample statistics auto-correlation:
## Chain 1 
##                 edges     triangle idegreepopularity       cycle4
## Lag 0     1.000000000  1.000000000       1.000000000  1.000000000
## Lag 1024  0.038377108  0.020769486       0.037451935  0.018232958
## Lag 2048 -0.003340457  0.003351358      -0.006599757 -0.028360875
## Lag 3072 -0.015194176  0.014035496      -0.019663972 -0.018302736
## Lag 4096  0.022290691  0.012965883       0.027873135  0.007656207
## Lag 5120 -0.016636838 -0.006452375      -0.012165698 -0.019641575
##          absdiff.Grade   absdiff.GPA
## Lag 0      1.000000000  1.0000000000
## Lag 1024  -0.001756894 -0.0009535622
## Lag 2048  -0.005395257 -0.0010214207
## Lag 3072  -0.031523975 -0.0212666898
## Lag 4096   0.038871651  0.0100042238
## Lag 5120  -0.012729330 -0.0238349620
## 
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1 
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##             edges          triangle idegreepopularity            cycle4 
##           0.10651           0.15499           0.04993          -0.91995 
##     absdiff.Grade       absdiff.GPA 
##          -0.16806          -0.98068 
## 
## Individual P-values (lower = worse):
##             edges          triangle idegreepopularity            cycle4 
##         0.9151785         0.8768273         0.9601801         0.3576001 
##     absdiff.Grade       absdiff.GPA 
##         0.8665356         0.3267506 
## Joint P-value (lower = worse):  0.55399 .

## 
## MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).

gof.4 <- gof(ergm.4)
par(mfrow=c(2,2)); plot(gof.4)

In my analysis, model three was degenerative so I commented out of the code. From my analysis, I think that model 4 is the best fit. This is verified by the high p-values in the summary statistics. Additionally, the diagnostic check shows a “fuzzy” output, which means there is good variation.

I decided to simulate networks for rounds 5, 6, 7, 8, 9, and 10 for my network. I am assuming that later rounds will have little change in the communication networks (these models do not have a knockout).

set.seed(17)
g5 <- g4 %>%
  delete_edges(sample(1:10, 1)) #Roud 5
g5 <- g5 %>% add_edges(c(sample(1:9, 1),sample(1:9, 1), sample(1:9, 1),sample(1:9, 1)))
g6 <- g5 
g7 <- g6 %>%
  delete_edges(sample(1:10, 2)) #Roud 5
g8 <- g7 %>% add_edges(c(sample(1:10, 1),sample(1:10, 1), sample(1:10, 1),sample(1:10, 1)))
g9 <- g8 %>% add_edges(c(sample(1:10, 1),sample(1:10, 1), sample(1:10, 1),sample(1:10, 1)))
g10 <- g9 %>%
  delete_edges(sample(1:10, 1)) #Roud 5
g10 <- g10 %>% add_edges(c(sample(1:9, 1),sample(1:9, 1), sample(1:9, 1),sample(1:9, 1)))
par(mfrow = c(2,3))
plot(g5,edge.arrow.size=.05, vertex.size=30, vertex.label=NA, edge.curved=.3, edge.color="black", vertex.color="red", vertex.frame.color="#ffffff", main="Round 5")
plot(g6,edge.arrow.size=.05, vertex.size=30, vertex.label=NA, edge.curved=.3, edge.color="black", vertex.color="red", vertex.frame.color="#ffffff", main="Round 6")
plot(g7,edge.arrow.size=.05, vertex.size=30, vertex.label=NA, edge.curved=.3, edge.color="black", vertex.color="red", vertex.frame.color="#ffffff", main="Round 7")
plot(g8,edge.arrow.size=.05, vertex.size=30, vertex.label=NA, edge.curved=.3, edge.color="black", vertex.color="red", vertex.frame.color="#ffffff", main="Round 8")
plot(g9,edge.arrow.size=.05, vertex.size=30, vertex.label=NA, edge.curved=.3, edge.color="black", vertex.color="red", vertex.frame.color="#ffffff", main="Round 9")
plot(g10,edge.arrow.size=.05, vertex.size=30, vertex.label=NA, edge.curved=.3, edge.color="black", vertex.color="red", vertex.frame.color="#ffffff", main="Round 10")

I am now going to run some TERGM models to test this. In my simulated model, closer GPAs and similar grades lead to a greater likelihood to have a connection. I suspect that mutuality will be a good predictor of ties for the later networks. (I know this will work because I simulated the model to follow this pattern.)

TERGM Assignment 3

Jared Edgerton

Simulated Network