Class ERGM

Data

To access your data, go to the network data link on the website and look under Practicum 10 for ClassData.zip. This is a zip file with all six of the networks you provided. It is late in the semester, so have preloaded the networks with attributes to save time and effort on your part.

Before you load the data, start statnet.

library(statnet)

Load Data

load("know.rda")
load("buddy.rda")
load("friend.rda")
load("groupWork.rda")
load("haveClass.rda")
load("WorkWith.rda")

The networks above were pre-loaded with attributes for your convenience. The attributes are:

credits – the number of credits you are taking this term
discuss – class discussion frequency
attend – whether you attend MIIS events
live – whether you live in Monterey
major – major (recorded as colors - for a few reasons)
gender – gender is coded just like major, for the same reasons

To look at any of these attributes, use the get.vertex.attribute() command.

get.vertex.attribute(friend, "gender")

##  [1] "orange" "green"  "green"  "green"  "orange" "orange" "orange"
##  [8] "green"  "orange" "green"  "orange" "orange" "green"  "orange"
## [15] "orange" "orange" "orange" "green"  "orange" "green"  "orange"
## [22] "orange"

Loading attributes

If you want to learn how to load attributes onto the networks yourself, use the following procedure.

Atts <- read.csv("Attributes.csv", header=TRUE)

attrs <- as.matrix(read.csv('Attributes.csv', strip.white = T, header = T))
# # set attributes
network::set.vertex.attribute(WorkWith, "credits", attrs[,1])
network::set.vertex.attribute(WorkWith, "discuss", attrs[,2]) # Attributes
network::set.vertex.attribute(WorkWith, "attend", attrs[,3])
network::set.vertex.attribute(WorkWith, "live", attrs[,4])
network::set.vertex.attribute(WorkWith, "major", attrs[,5])
network::set.vertex.attribute(WorkWith, "gender", attrs[,6])

Useful searches in “Help” window

ergm.terms
?traid.classify

Plotting the Networks

Plot the networks in order to get an idea of what you will be using in your analyses. You can use this to help conceptualize the models for your analysis. It helps to color by one or more attributes.

Two attributes (major, and gender) have colors listed instead of the original attributes. This was originally intended to make it more difficult for you to figure out who is who. But we are also able to use the color names to directly color the network. We use the get.vertex.attribute() command as we did above to assign colors.

## Plotting:
par(mfrow=c(2,3))
gplot(know, displaylabels=F, 
      arrowhead.cex = 0.5,
      edge.lwd = 0.005, 
      vertex.col = get.vertex.attribute(know, "major"),
      vertex.cex = 2, 
      edge.col=300, main="Know")
gplot(buddy, displaylabels=F, 
      arrowhead.cex = 0.5,
      edge.lwd = 0.005, 
      vertex.col = get.vertex.attribute(buddy, "major"),
      vertex.cex = 2, 
      edge.col=300, main="Buddy")
gplot(friend, displaylabels=F,
      arrowhead.cex = 0.5,
      edge.lwd = 0.005, 
      vertex.col = get.vertex.attribute(friend, "major"),
      vertex.cex = 2, 
      edge.col=300, main="Friend")
gplot(groupWork, displaylabels=F, 
      usearrows = FALSE,
      edge.lwd = 0.005, 
      vertex.col = get.vertex.attribute(groupWork, "major"),
      vertex.cex = 2, 
      arrowhead.cex = 0.5,
      edge.col=300, main="Group Work")
gplot(haveClass, displaylabels=F,
      arrowhead.cex = 0.5,
      edge.lwd = 0.005, 
      vertex.col = get.vertex.attribute(haveClass, "major"),
      vertex.cex = 2, 
      usearrows = FALSE,
      edge.col=300, main="Have Other Classes With")
gplot(WorkWith, displaylabels=F, 
      usearrows = FALSE,
      edge.lwd = 0.005, 
      vertex.col = get.vertex.attribute(WorkWith, "major"),
      vertex.cex = 2, 
      arrowhead.cex = 0.5,
      edge.col=300, main="Work With")
par(mfrow=c(1,1))

Take a look at the networks that you provided. As expected, ties on the level of who knows who (the most superficial level that we recorded) are the most dense. Casual friendships (buddy) are the second most densely connected network. Because these two networks are so well integrated, it is not likely that they will be very interesting or easy to estimate.

The network of close friends, however, is more interesting. The friends network is sparse with a few high degree nodes, and the colors seem to suggest that friendship has a lot to do with degree programs. Some of the attributes that may explain friends in this class could include program, the degree of one’s alter. We can also consider some other factors to be tested. But it is a good idea to start out with the structural (endogenous) variables first, and then fit the remainder of the model once the structural variables are selected.

Setting Up a Model

Structural Elements

# Set your computer's random 
#   seed so that you can compare
#   your results to those below.
set.seed(8675301)

# Start with a structural model
Model1 <- ergm(friend ~ edges + gwdsp +  gwesp,
               control=control.ergm(MCMC.burnin=1024*75, MCMLE.maxit=40))

## Starting maximum likelihood estimation via MCMLE:

## Iteration 1 of at most 40:

## Optimizing with step length 1.

## The log-likelihood improved by 1.389.

## Step length converged once. Increasing MCMC sample size.

## Iteration 2 of at most 40:

## Optimizing with step length 1.

## The log-likelihood improved by 0.07968.

## Step length converged twice. Stopping.

## Evaluating log-likelihood at the estimate.

## Using 20 bridges:

## 1

## 2

## 3

## 4

## 5

## 6

## 7

## 8

## 9

## 10

## 11

## 12

## 13

## 14

## 15

## 16

## 17

## 18

## 19

## 20

## .

## This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.

summary(Model1)

## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   friend ~ edges + gwdsp + gwesp
## 
## Iterations:  2 out of 40 
## 
## Monte Carlo MLE Results:
##             Estimate Std. Error MCMC % p-value    
## edges        -2.5722     0.3958      0  <1e-04 ***
## gwdsp        -0.0643     0.2480      0   0.796    
## gwdsp.decay   1.2897     1.6854      0   0.445    
## gwesp         1.0480     0.2628      0  <1e-04 ***
## gwesp.decay   0.1589     0.3274      0   0.628    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 640.5  on 462  degrees of freedom
##  Residual Deviance: 377.9  on 457  degrees of freedom
##  
## AIC: 387.9    BIC: 408.6    (Smaller is better.)

Next, you can check the model’s convergence.

# save(Model1, file="Model1.Rdata")  # Save this model if you like.

# Then run diagnostics
mcmc.diagnostics(Model1)

## Sample statistics summary:
## 
## Iterations = 76800:4270080
## Thinning interval = 1024 
## Number of chains = 1 
## Sample size per chain = 4096 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                  Mean     SD Naive SE Time-series SE
## edges       -1.310547 13.787   0.2154         0.2758
## gwdsp       -5.013140 76.489   1.1951         1.5210
## gwdsp.decay -1.668570 10.147   0.1586         0.2023
## gwesp       -1.046642 17.690   0.2764         0.3513
## gwesp.decay -0.004224  8.118   0.1268         0.1466
## 
## 2. Quantiles for each variable:
## 
##                2.5%     25%    50%    75%  97.5%
## edges        -29.00 -11.000 -1.000  8.000  25.00
## gwdsp       -143.39 -58.620 -8.578 42.922 155.06
## gwdsp.decay  -21.16  -8.623 -1.736  5.003  18.48
## gwesp        -33.91 -13.566 -1.534 10.506  35.24
## gwesp.decay  -10.61  -6.244 -1.482  4.292  19.20
## 
## 
## Sample statistics cross-correlations:
##                 edges     gwdsp gwdsp.decay     gwesp gwesp.decay
## edges       1.0000000 0.9786006   0.9788381 0.9637377   0.7829294
## gwdsp       0.9786006 1.0000000   0.9956223 0.9696361   0.8169494
## gwdsp.decay 0.9788381 0.9956223   1.0000000 0.9590967   0.7729040
## gwesp       0.9637377 0.9696361   0.9590967 1.0000000   0.8369867
## gwesp.decay 0.7829294 0.8169494   0.7729040 0.8369867   1.0000000
## 
## Sample statistics auto-correlation:
## Chain 1 
##                 edges        gwdsp  gwdsp.decay        gwesp  gwesp.decay
## Lag 0     1.000000000  1.000000000  1.000000000  1.000000000  1.000000000
## Lag 1024  0.241900067  0.236445950  0.238780800  0.235163920  0.143527079
## Lag 2048  0.070782795  0.072810161  0.073161926  0.067457854  0.042212987
## Lag 3072  0.014747552  0.015544235  0.013220382  0.019351211  0.016679060
## Lag 4096 -0.014803403 -0.014308649 -0.012798136 -0.007244404 -0.015310145
## Lag 5120 -0.007812919 -0.004840942 -0.005023693 -0.007018254  0.002952869
## 
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1 
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##       edges       gwdsp gwdsp.decay       gwesp gwesp.decay 
##      -1.391      -1.495      -1.372      -1.704      -2.021 
## 
## Individual P-values (lower = worse):
##       edges       gwdsp gwdsp.decay       gwesp gwesp.decay 
##  0.16408769  0.13488256  0.17001246  0.08842646  0.04331635 
## Joint P-value (lower = worse):  0.5078524 .

## Warning in formals(fun): argument is not a function

## 
## MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).

summary(Model1)

## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   friend ~ edges + gwdsp + gwesp
## 
## Iterations:  2 out of 40 
## 
## Monte Carlo MLE Results:
##             Estimate Std. Error MCMC % p-value    
## edges        -2.5722     0.3958      0  <1e-04 ***
## gwdsp        -0.0643     0.2480      0   0.796    
## gwdsp.decay   1.2897     1.6854      0   0.445    
## gwesp         1.0480     0.2628      0  <1e-04 ***
## gwesp.decay   0.1589     0.3274      0   0.628    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 640.5  on 462  degrees of freedom
##  Residual Deviance: 377.9  on 457  degrees of freedom
##  
## AIC: 387.9    BIC: 408.6    (Smaller is better.)

Next, you can check the model’s “Goodness of Fit.”

mod.fit <- gof(Model1, GOF = ~ distance + espartners + idegree + odegree)
summary(mod.fit)
par(mfrow=c(2,2))
plot(mod.fit, main="GOF - Model 1")
par(mfrow=c(1,1))

Adding Attributes to the Model

Model2 <- ergm(friend ~ edges + gwdsp +  gwesp
                 + dyadcov(groupWork)
                 + dyadcov(haveClass) 
                 # + dyadcov(WorkWith)
                 # + absdiff("credits")
                 # + nodematch("discuss")
                 # + nodematch("attend") 
                 # + nodematch("live") 
                 # + nodematch("major") 
                 # + nodematch("gender")
                 # + nodefactor("credits")
                 # + nodefactor("discuss")
                 # + nodefactor("attend")
                 # + nodefactor("live") 
                 # + nodefactor("major")
                 # + nodefactor("gender")
                 )
summary(Model2)
# save(Model2, file="Model2.Rdata")
# Then run diagnostics
mcmc.diagnostics(Model2)
summary(Model2) 
mod.fit <- gof(Model2, GOF = ~ distance + espartners + degree + triadcensus, nsim=10000) # , control=control.ergm(force.main=TRUE))
# mod.fit <- gof(Model2, GOF = ~ distance + dspartners + degree + triadcensus, nsim=10000) # Test re. dyads reveals dyad probs.
summary(mod.fit)
par(mfrow=c(2,2))
plot(mod.fit, main="GOF - Model 1")
par(mfrow=c(1,1))
summary(Model2)

For more terms that you may use in an ERGM, check out any one of the terms to bring up the help viewer. ?absdiff