Authors: James D. Wilson and Matthew J. Denny

Political Networks Conference; June 18th, 2015

I. Installing the GERGM R package

Note: Before we begin, make sure that you have the most version of R downloaded and installed. That is, R version 3.2 or higher. If not, please download that now at the CRAN website.

First, install the code and data files:

  1. Code

  2. Data

Next, place the GERGM_Package and Data folders in your favorite directory.

For me, I’ll place them in Users/jameswilson/Dropbox/GERGM_Tutorial.

Set your current directory your chosen location

setwd("~/Dropbox/GERGM_Tutorial")

Our Metropolis Hastings procedure is run using C++, and our main GERGM code calls C++ using the nice RCpp package.

Let’s first install and load the two needed packages to enable Rcpp.

install.packages("RcppArmadillo")
install.packages("BH")
library(RcppArmadillo)
library(BH)

Now the moment of truth. Source the two required GERGM files.

Rcpp::sourceCpp("GERGM_Package/MH_Sampler.cpp")
source("GERGM_Package/GERGM.R", echo = F)
Report <- function(x) {
    print(x)
}


II: Description of GERGM Functions

There are two primary functions in the GERGM library:

  1. gergm (object, directed = c(TRUE, FALSE), MPLE.only = c(FALSE, TRUE), transform.data = NULL, method = c(“Gibbs”, “Metropolis”), max.num.iterations = 10, mc.num.iterations = 100, nsim = 500, thin = 1, shape.parameter = 1, weights = NULL, together = 0, MCMC.burnin = 100, seed = 123, tolerance = 0.01, gain.factor = 0)

Description: fit a GERGM model to a weighted network object


Arguments:

















Output: a GERGM object that contains the fitted model and diagnostics.


  1. simulate.gergm (object, nsim, seed = 123, method = c(“Gibbs”, “Metropolis”), MCMC.burnin = 100,
    weights = object@weights, coef = object@theta.coef, formula = object@formula, thin = 1, shape.parameter = 1, together = 0)

Description: simulate weighted networks from a fitted GERGM object.


Arguments:












Output: a list containing the following three elements:




III: Application: U.S. Migration Data

First, load the Migration data. This is a list with two files: one that is a formula.object, the other is transform.data (used for transformation).

load("~/Dropbox/GERGM_Tutorial/Data/Migration_Data.Rdata")

Create the GERGM model

net <- data$Graph

formula.obj = net ~ in2star + out2star + ctriads + ttriads + recip

Plot the Edge weight distribution

hist(as.numeric(net), n = 100, main = "Edge weights of U.S. Migration", xlab = "Edge weights")

Fit the GERGM in three different ways. First, we try the MCMC MLE methods which require simulation in each iteration.

MCMC Estimation

In cases where Gibbs sampling works, Metropolis-Hastings will give similar results. Gibbs typically works faster in these cases.

seed = 123
CUSTOM = F

## 1) Gibbs Fit
Gibbs.fit <- gergm(formula.obj, directed = TRUE, seed = seed, transform.data = transform.data, 
    method = "Gibbs", max.num.iterations = 10, mc.num.iterations = 100, nsim = 1000, 
    MCMC.burnin = 500, tolerance = 1e-04, shape.parameter = 1, together = 1, 
    gain.factor = 0)

## 2) Metropolis-Hastings fit
MH.fit <- gergm(formula.obj, directed = TRUE, seed = seed, transform.data = transform.data, 
    method = "Metropolis", max.num.iterations = 10, mc.num.iterations = 100, 
    nsim = 1e+06, MCMC.burnin = 10000, tolerance = 1e-04, shape.parameter = 0.01, 
    together = 1, thin = 0.001, gain.factor = 0)

Maximum Pseudo Likelihood Estimation

Very fast as it requires no simulation, but estimates are biased. We shall run this one for demonstration.

MPLE.fit <- gergm(formula.obj, directed = TRUE, MPLE.only = TRUE, seed = seed, 
    max.num.iterations = 10, transform.data = transform.data, mc.num.iterations = 100, 
    tolerance = 1e-04, shape.parameter = 0.01, together = 1, thin = 0.001, gain.factor = 0)

## Look at the results##
MPLE.fit

Network Simulation

For goodness of fit evaluations, we may be interested in simulating networks with an estimated GERGM fit and test whether or not they match the network statistics in the observed network.

You can use either Gibbs or Metropolis-Hastings for simulations. Again, Gibbs is typically faster.

Gibbs.sims <- simulate.gergm(MPLE.fit, 100, seed = seed, method = "Gibbs", MCMC.burnin = 100)

# We can also use M-H to simulate networks (not run)
MH.sims <- simulate.gergm(MH.fit, 1000, seed = seed, method = "Metropolis", 
    MCMC.burnin = 50000, thin = 0.01, shape.parameter = 0.01)

# View Gibbs.sims
names(Gibbs.sims)
Gibbs.sims

Goodness of Fit

Now, we will check the goodness of fit from the Gibbs.sims

# The true statistics of the network we hope to simulate
True.stats <- MPLE.fit@stats[2, ]

# The statistics of the networks we just simulated
sim.stats <- Gibbs.sims$Statistics

## Plot the Goodness of Fit plots for 6 network statistics
par(oma = c(1, 1, 1, 1))
par(mar = c(4, 4.5, 2, 1))
spacing = seq(1, 401, by = 15)
par(mfrow = c(2, 3))

d11 <- density(sim.stats[, 1])
plot(d11, lwd = 3, xlab = "", ylab = "Density", main = "out2star", cex.axis = 1.5, 
    cex.lab = 2, cex.main = 2)
abline(v = True.stats[1], lwd = 3, col = "red")

d11 <- density(sim.stats[, 2])
plot(d11, lwd = 3, xlab = "", ylab = "Density", main = "in2star", cex.axis = 1.5, 
    cex.lab = 2, cex.main = 2)
abline(v = True.stats[2], lwd = 3, col = "red")

d11 <- density(sim.stats[, 3])
plot(d11, lwd = 3, xlab = "", ylab = "Density", main = "ctriads", cex.axis = 1.5, 
    cex.lab = 2, cex.main = 2)
abline(v = True.stats[3], lwd = 3, col = "red")

d11 <- density(sim.stats[, 4])
plot(d11, lwd = 3, xlab = "", ylab = "Density", main = "recip", cex.axis = 1.5, 
    cex.lab = 2, cex.main = 2)
abline(v = True.stats[4], lwd = 3, col = "red")

d11 <- density(sim.stats[, 5])
plot(d11, lwd = 3, xlab = "", ylab = "Density", main = "ttriads", cex.axis = 1.5, 
    cex.lab = 2, cex.main = 2)
abline(v = True.stats[5], lwd = 3, col = "red")

d11 <- density(sim.stats[, 6])
plot(d11, lwd = 3, xlab = "", ylab = "Density", main = "edgeweight", cex.axis = 1.5, 
    cex.lab = 2, cex.main = 2)
abline(v = True.stats[6], lwd = 3, col = "red")