Introduction to ERGMs

library(devtools)

## Loading required package: usethis

install_github("zalmquist/networkdata")

## Skipping install of 'networkdata' from a github remote, the SHA1 (63566221) has not changed since last install.
##   Use `force = TRUE` to force installation

library(networkdata)

## Loading required package: network

## network: Classes for Relational Data
## Version 1.16.1 created on 2020-10-06.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Martina Morris, University of Washington
##                     Skye Bender-deMoll, University of Washington
##  For citation information, type citation("network").
##  Type help("network-package") to get started.

## Loading required package: sna

## Loading required package: statnet.common

## 
## Attaching package: 'statnet.common'

## The following object is masked from 'package:base':
## 
##     order

## sna: Tools for Social Network Analysis
## Version 2.6 created on 2020-10-5.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.

## networkdata: Lin Freeman's Network Data Collection
## Version 0.01 created on 2014-01-31.
## copyright (c) 2014, Zack W. Almquist, University of Minnesota
##  For citation information, type citation("networkdata").
##  Type help(package="networkdata") to get started.

load("/Users/rheemh/Dropbox (ASU)/Coding/R/bott.RData")
# The data is from a paper by H. Bott in 1928 that used ethological observations of children in a nursery school (Bott 1928). The Bott data actually contain five sets of relations:

# 1. talked to another child
# 2. interfered with another child
# 3. watched another child
# 4. imitated another child
# 5. cooperated with another child

# These are organized into a "graph stack", which is essentially a specialized `list` for storing network data used by the `network` package
library(statnet)

## Loading required package: tergm

## Loading required package: ergm

## 
## ergm: version 3.11.0, created on 2020-10-14
## Copyright (c) 2020, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, UNSW Sydney
##                     Martina Morris, University of Washington
##                     with contributions from
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Skye Bender-deMoll, University of Washington
##                     Chad Klumb
##                     Michał Bojanowski, Kozminski University
##                     Ben Bolker
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm").

## NOTE: Versions before 3.6.1 had a bug in the implementation of the bd()
## constraint which distorted the sampled distribution somewhat. In
## addition, Sampson's Monks datasets had mislabeled vertices. See the
## NEWS and the documentation for more details.

## NOTE: Some common term arguments pertaining to vertex attribute and
## level selection have changed in 3.10.0. See terms help for more
## details. Use 'options(ergm.term=list(version="3.9.4"))' to use old
## behavior.

## Loading required package: networkDynamic

## 
## networkDynamic: version 0.10.1, created on 2020-01-16
## Copyright (c) 2020, Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll, University of Washington
##                     with contributions from
##                     Zack Almquist, University of California -- Irvine
##                     David R. Hunter, Penn State University
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Steven M. Goodreau, University of Washington
##                     Jeffrey Horner
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("networkDynamic").

## 
## tergm: version 3.7.0, created on 2020-10-15
## Copyright (c) 2020, Pavel N. Krivitsky, UNSW Sydney
##                     Mark S. Handcock, University of California -- Los Angeles
##                     with contributions from
##                     David R. Hunter, Penn State University
##                     Steven M. Goodreau, University of Washington
##                     Martina Morris, University of Washington
##                     Nicole Bohme Carnegie, New York University
##                     Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Skye Bender-deMoll
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Chad Klumb
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("tergm").

## Loading required package: ergm.count

## 
## ergm.count: version 3.4.0, created on 2019-05-15
## Copyright (c) 2019, Pavel N. Krivitsky, University of Wollongong
##                     with contributions from
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm.count").

## NOTE: The form of the term 'CMP' has been changed in version 3.2 of
## 'ergm.count'. See the news or help('CMP') for more information.

## Loading required package: tsna

## 
## statnet: version 2019.6, created on 2019-06-13
## Copyright (c) 2019, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("statnet").

## unable to reach CRAN

summary(bott)

##           Length Class   Mode
## talk      5      network list
## interfere 5      network list
## watch     5      network list
## imitate   5      network list
## cooperate 5      network list

bott

## $talk
##  Network attributes:
##   vertices = 11 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 75 
##     missing edges= 0 
##     non-missing edges= 75 
## 
##  Vertex attribute names: 
##     age.month vertex.names 
## 
##  Edge attribute names: 
##     edgevalue 
## 
## $interfere
##  Network attributes:
##   vertices = 11 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 72 
##     missing edges= 0 
##     non-missing edges= 72 
## 
##  Vertex attribute names: 
##     age.month vertex.names 
## 
##  Edge attribute names: 
##     edgevalue 
## 
## $watch
##  Network attributes:
##   vertices = 11 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 51 
##     missing edges= 0 
##     non-missing edges= 51 
## 
##  Vertex attribute names: 
##     age.month vertex.names 
## 
##  Edge attribute names: 
##     edgevalue 
## 
## $imitate
##  Network attributes:
##   vertices = 11 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 35 
##     missing edges= 0 
##     non-missing edges= 35 
## 
##  Vertex attribute names: 
##     age.month vertex.names 
## 
##  Edge attribute names: 
##     edgevalue 
## 
## $cooperate
##  Network attributes:
##   vertices = 11 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 23 
##     missing edges= 0 
##     non-missing edges= 23 
## 
##  Vertex attribute names: 
##     age.month vertex.names 
## 
##  Edge attribute names: 
##     edgevalue

plot(bott[[1]])

plot(bott[[2]])

plot(bott[[3]])

plot(bott[[3]])

plot(bott[[4]])

plot(bott[[5]])

# We can see that the first three relations yield quite dense networks (note that ergms would not tell us much in this case)! We will work first with the imitation network.

# The first thing we'll do is fit a model that just has edges in it. This is kind of like fitting a regression model with only an intercept.

summary(bott[[4]]~edges) # how many edges?

## edges 
##    35

bottmodel.01 <- ergm(bott[[4]]~edges) # Estimate minimal model

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Stopping at the initial estimate.

## Evaluating log-likelihood at the estimate.

summary(bottmodel.01)

## Call:
## ergm(formula = bott[[4]] ~ edges)
## 
## Iterations:  4 out of 20 
## 
## Monte Carlo MLE Results:
##       Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges  -0.7621     0.2047      0  -3.723 0.000197 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 152.5  on 110  degrees of freedom
##  Residual Deviance: 137.6  on 109  degrees of freedom
##  
## AIC: 139.6    BIC: 142.3    (Smaller is better.)

# This model fits very quickly, and yields the same result every time you run it. This is because this model contains no 'dependency' terms, meaning terms that model how the probability of a tie between two nodes is affected by existing ties in the network. 

# In models without dependency terms, the ERGM solution can be approximated without simulation, using MPLE. This is basically fitting a logistic regression to our network ties.

# We now consider a model with a dependency term that helps to deal with the "problem" of transitivity.

# THE ALLURE OF TRIANGLES
# Triads are really important for understanding social relations. Some important roles for triads:

# - hierarchy
# - network closure
# - brokerage
# - exclusion
# - generalized exchange
# - bystander effects
# - forbidden relationships
# - filial loyalties

# Starting with the work of Davis, Holland, Leinhardt, we know that one of the principle differences between empirical social networks and random graphs is the pattern of triads in human social networks. 
# Put simply, there are far more triangles, and particularly transitive triads, in social networks than would be expected from random graphs of similar density.

# Empirically, we know a number of things about transitive triads in particular:

# - Friendships tend to be overwhelmingly transitive (Holland and Leinhardt 1971)
# - Children show increasing tendencies for transitivity as they get older (Leinhardt 1972)
# - Higher agreement between pairs within transitive triads (Krackhardt and Kilduff 2002)
# - Adolescents in intransitive friendship triads have more suicidal ideations (Bearman and Moody 2004)

# Furthermore, the presence of intransitive triads is itself important. For instance, Yeon Jung Yu (2016) showed that sex workers on Hainan Island, China, who overwhelmingly tend to be migrants from rural aras, have a large number of intransitive friendship triangles, which she interpreted as arising because of the need to manage information about these women's activities away from home.

# Triangles measure transitivity and clustering in networks. We can add a triangle term to our ergm. Unlike the edges-only model, this model will need to be estimated via MCMC.

summary(bott[[4]]~edges+triangle)

##    edges triangle 
##       35       40

bottmodel.02 <- ergm(bott[[4]]~edges+triangle)

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 20:

## Optimizing with step length 1.

## The log-likelihood improved by 0.02672.

## Step length converged once. Increasing MCMC sample size.

## Iteration 2 of at most 20:

## Optimizing with step length 1.

## The log-likelihood improved by 0.009663.

## Step length converged twice. Stopping.

## Finished MCMLE.

## Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(bottmodel.02)

## Call:
## ergm(formula = bott[[4]] ~ edges + triangle)
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##          Estimate Std. Error MCMC % z value Pr(>|z|)
## edges    -0.54524    0.58031      0  -0.940    0.347
## triangle -0.06079    0.15446      0  -0.394    0.694
## 
##      Null Deviance: 152.5  on 110  degrees of freedom
##  Residual Deviance: 137.5  on 108  degrees of freedom
##  
## AIC: 141.5    BIC: 146.9    (Smaller is better.)

# You'll notice that as we move from the edges-only model to the edges-plus-triangles, the estimation method changes. Triangles are a dyad-dependent term and the ergm must be fit using MCMC simulation. 
# Depending on the size of your network, this might take a while. Whenever ergm fits a model via MCMC, you'll get a warning telling you to check for degeneracy and model diagnostics. If the model shows clear signs of degeneracy, you'll receive a warning (but don't assume that if you don't get a warning that the model is fine; always check the MCMC diagnostics and the model goodness-of-fit).

# In general, triangles cause problems in ergms. They often lead to a phenomen known as degeneracy. While our intuition tells us that triangles should matter for networks -- because triangles result from transitivity - it turns out that there are better ways to represent transitivity in network models.

# A series of papers by Katie Faust (2007, 2008, 2010) shows that triadic structures are highly constrained by lower-level structures, namely, edge density. In the limited-choice paradigm employed by Ad Health (i.e., "name your five best female/male friends"), Faust (2010) found that the triad census in 128 networks was almost perfectly explained by one dimension of a multivariate analysis (conceptually similar to the loading on a principal component) and that edge density accounted for 96% of the variance in locations along this dimension!

# This result was extreme -- and determined in large part by the analysis of restricted-choice questions for a single relation - but the point remains: the number (and pattern) of triads in a network will be highly constrained by the number of edges and the size of the network, which jointly determine the network density. 

# Mark Handcock (2003) showed that this situation of extreme constraint causes the MCMC algorithm by which ergms are estimated to behave badly, leading to the condition of degeneracy discussed above. A network model is degenerate when the space that an MCMC sampler can explore is so constrained that the only way to get the observed g(y) is essentially to flicker between full and empty graphs in the right proportion. Not what you want out of an MCMC estimator. 

# A good indication that you have a degenerate model is that you have NA values for standard errors on your ergm parameter estimates. You can't calculate a variance -- and, therefore, a standard error -- if you simply flicker between full and empty graphs.

# The upshot of this is that, despite our strong intuitions about the importance of triadic interactions in determining social structure, WE DO NOT RECOMMEND USING TRIANGLE TERMS IN ERGM. There are better alternatives, which we will discuss later.

# NODAL COVARIATES
# We often have a situation where we think that the attributes of the individuals who make up our graph vertices may affect their propensity to form (or rceive) ties. To test this hypothesis, we can employ nodal covariates using the nodecov() term.

age <- bott[[4]] %v% "age.month"
summary(age)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   26.00   33.50   38.00   40.09   48.50   54.00

bott[[4]]

##  Network attributes:
##   vertices = 11 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 35 
##     missing edges= 0 
##     non-missing edges= 35 
## 
##  Vertex attribute names: 
##     age.month vertex.names 
## 
##  Edge attribute names: 
##     edgevalue

# 'age.month' is a nodal attribute

plot(bott[[4]], vertex.cex=age/24)

summary(bott[[4]]~edges+nodecov('age.month'))

##             edges nodecov.age.month 
##                35              2842

bottmodel.03 <- ergm(bott[[4]]~edges+nodecov('age.month'))

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Stopping at the initial estimate.

## Evaluating log-likelihood at the estimate.

summary(bottmodel.03)

## Call:
## ergm(formula = bott[[4]] ~ edges + nodecov("age.month"))
## 
## Iterations:  4 out of 20 
## 
## Monte Carlo MLE Results:
##                    Estimate Std. Error MCMC % z value Pr(>|z|)
## edges             -1.526483   1.335799      0  -1.143    0.253
## nodecov.age.month  0.009501   0.016352      0   0.581    0.561
## 
##      Null Deviance: 152.5  on 110  degrees of freedom
##  Residual Deviance: 137.3  on 108  degrees of freedom
##  
## AIC: 141.3    BIC: 146.7    (Smaller is better.)

# Our result suggests that in this dataset, the combined age of the children involved in a dyad has no effect on the probability of a tie between them, in either direction.

# However, imitation is a directed relationship. We might expect that older children are more likely to be role models to others. To test this hypotehsis, we can use the directed variants of nodecov(): nodeocov() (effect of an attribute on out-degree) and nodeicov() (effect of an attribute on in-degree). We note that the nodecov() group of terms are for numeric attributes; nodefactor() terms are available for categorical attribtues.

bottmodel.03b <- ergm(bott[[4]]~edges+nodeicov('age.month'))

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Stopping at the initial estimate.

## Evaluating log-likelihood at the estimate.

summary(bottmodel.03b)

## Call:
## ergm(formula = bott[[4]] ~ edges + nodeicov("age.month"))
## 
## Iterations:  4 out of 20 
## 
## Monte Carlo MLE Results:
##                    Estimate Std. Error MCMC % z value Pr(>|z|)   
## edges              -2.89853    0.96939      0   -2.99  0.00279 **
## nodeicov.age.month  0.05225    0.02272      0    2.30  0.02147 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 152.5  on 110  degrees of freedom
##  Residual Deviance: 132.1  on 108  degrees of freedom
##  
## AIC: 136.1    BIC: 141.5    (Smaller is better.)

# The number of other children who imitated a child increase with the child's age!

# An extremely important class of nodal covariates are the homophily terms. Homophily refers to the tendency for people to sort on sociodemographic attributes. In evolutionary biology, we are probably more familiar with the term assortative mating, which refers to the tendency for mating pairs to form deferentially based on some observable feature of their phenotype. Indeed, the classic demographic studies of homophily focus on marriage. Humans show a tendency to assort on all manner of attributes: age, educational status, race, ethnicity, religion, etc. 

# Attribute-based mixing is extremely important for diffusion processes on networks (e.g., infectious disease or cultural transmission). Strong homophily in a network can slow down overall diffusion compared to a well-mixed model and can create pockets of high prevalence. There are two broad flavors of homophily: uniform and differential. 

# Consider the case of assortative mating based on ethnicity. Uniform homophily refers to the tendency for people to form sexual unions with people of the same ethnicity and is the same regardless of which ethnicity is considered. 

# Differential homophily accounts for the fact that the assortative tendencies of sexual partnerships are different depending on the ethnicities of the individuals involved. In the United States, homophilous (or "ethnically-concordant") partnerships are much more likely among African Americans than among white couples.

# Unfortunately, the bott data set does not have any categorical nodal covariates that lend themselves to either nodematch() or nodemix(), but we can run a model with absdiff(), which is a term for homophily on a numerical attribute, such as age. 

# It models the effect of the absolute difference in an attribute on the probability of a tie between two nodes.

bottmodel.03c <- ergm(bott[[4]]~edges+absdiff('age.month'))

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Stopping at the initial estimate.

## Evaluating log-likelihood at the estimate.

summary(bottmodel.03c)

## Call:
## ergm(formula = bott[[4]] ~ edges + absdiff("age.month"))
## 
## Iterations:  3 out of 20 
## 
## Monte Carlo MLE Results:
##                   Estimate Std. Error MCMC % z value Pr(>|z|)  
## edges             -0.14442    0.37055      0  -0.390   0.6967  
## absdiff.age.month -0.05593    0.02933      0  -1.907   0.0565 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 152.5  on 110  degrees of freedom
##  Residual Deviance: 133.7  on 108  degrees of freedom
##  
## AIC: 137.7    BIC: 143.1    (Smaller is better.)

# The greater the difference in age between children, the less likely it is that one of the imitate sthe other.

# RECIPROCITY
# Reciprocity is a common relation that we wish to investigate in anthropological investigations. ERGM includes an elegant term for testing the hypothesis of reciprocity in directed networks. The term we use is `mutual` and it is defined as the number of pairs in the network in which i-->j and i<--j both exists.

bottmodel.04 <- ergm(bott[[4]]~edges+mutual)

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 20:

## Optimizing with step length 1.

## The log-likelihood improved by 0.0004165.

## Step length converged once. Increasing MCMC sample size.

## Iteration 2 of at most 20:

## Optimizing with step length 1.

## The log-likelihood improved by 0.0004016.

## Step length converged twice. Stopping.

## Finished MCMLE.

## Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(bottmodel.04)

## Call:
## ergm(formula = bott[[4]] ~ edges + mutual)
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##        Estimate Std. Error MCMC % z value Pr(>|z|)   
## edges   -0.8125     0.2900      0  -2.802  0.00508 **
## mutual   0.1623     0.6170      0   0.263  0.79257   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 152.5  on 110  degrees of freedom
##  Residual Deviance: 137.5  on 108  degrees of freedom
##  
## AIC: 141.5    BIC: 146.9    (Smaller is better.)

# EDGE COVARIATES
# The idea of a nodal covariate is pretty straightforward. These are often what we would call (socio-demographic) attributes (e.g., age, sex, status, location) in more standard regerssion models. In the network framework, it is entirely possible that the relation itself might be modified by a covariate. 

# For example, in a forthcoming paper, Brian Wood and I show that Hadza food-sharing is predicted by reciprocal sharing relation, relatedness, and gender homophily. 

# Reciprocity is modeled, as above, using a `mutual` term and gender homophily by a `nodematch` term. **Relatedness is an edge-level covariate and is measured using a pairwise matrix of the coefficient of relatedness.**

# In the Bott imitation network, we can test the hypothesis that children are more likely to imitate those with whom they have conversations.

# Test the imitation network also using edges from the talking network
bottmodel.05 <- ergm(bott[[4]]~edges+edgecov(bott[[1]]))

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Stopping at the initial estimate.

## Evaluating log-likelihood at the estimate.

summary(bottmodel.05)

## Call:
## ergm(formula = bott[[4]] ~ edges + edgecov(bott[[1]]))
## 
## Iterations:  5 out of 20 
## 
## Monte Carlo MLE Results:
##                   Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges              -1.5755     0.4485      0  -3.513 0.000443 ***
## edgecov.bott[[1]]   1.1142     0.5073      0   2.196 0.028075 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 152.5  on 110  degrees of freedom
##  Residual Deviance: 132.2  on 108  degrees of freedom
##  
## AIC: 136.2    BIC: 141.6    (Smaller is better.)

# bott[[4]]: imitation network
# bott[[1]]: talk network


# One possibility is that the difference in age between two children determines the likelihood of one imitating the other. We can test that hypothesis by including as an edge-level covariate teh absolute difference in age between two vertices.

# To calculate this (and preserve the sturcture of the matrix to make using it as an edge covariate), we employ the trick of using the `R` function `outer()`. This function is a generalization of the outer product in linear algebra in which two vectors, x and y each of length k, are multiplied such that the ijth element of the resulting k*k matrix is the product xiyj. 

# `outer()` takes as its first two arguments the vectors (or matrices) to which your operation will be applied and the (optional) third argument is the function to be applied. If not specified, this is assumed to be multiplication. In our case, we will use subtraction to calculate our absolute differences.

## note that this creates a vector of all the kids' ages
bott[[4]]%v%"age.month"

##  [1] 26 29 31 36 37 38 40 45 52 53 54

## create matrix of pairwise absolute age differences
agediff <- abs(outer(bott[[4]]%v%"age.month", bott[[4]]%v%"age.month","-"))

bottmodel.06 <- ergm(bott[[4]]~edges+edgecov(bott[[1]])+edgecov(agediff))

## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Stopping at the initial estimate.
## Evaluating log-likelihood at the estimate.

# imitation is dependent on whether the subjects have talked before and whether they have a large age difference gap.

summary(bottmodel.06)

## Call:
## ergm(formula = bott[[4]] ~ edges + edgecov(bott[[1]]) + edgecov(agediff))
## 
## Iterations:  4 out of 20 
## 
## Monte Carlo MLE Results:
##                   Estimate Std. Error MCMC % z value Pr(>|z|)  
## edges             -0.94818    0.52825      0  -1.795   0.0727 .
## edgecov.bott[[1]]  1.21715    0.51946      0   2.343   0.0191 *
## edgecov.agediff   -0.06346    0.03047      0  -2.083   0.0373 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 152.5  on 110  degrees of freedom
##  Residual Deviance: 127.5  on 107  degrees of freedom
##  
## AIC: 133.5    BIC: 141.6    (Smaller is better.)

# The edge covariance matrix of age differences that we calculated here is the same as what is produced by the `absdiff()` term used above. The ergm package provides the `absdiff()` term for convenience; we show the calculation of an edge covariance matrix to demonstrate the logic of edge covariates and to show how more complex edge covariate matrices might be calculated (for instance, if you wished to take the log of the absolute difference).

# DETERMINING MODEL GOODNESS-OF-FIT
# It is relatively straightforward to fit a model. Determining whether or not the model makes any sense is another matter altogether. 

# When we use ordinary least-squares regression, for example, we are probably used to calculating residuals, which are the difference between the observed and the predicted values for a specific value of the independent variable.

# While there is no simple analog to a residual in a linear model, we can ask whether our observed network is consistent with the family of networks implied by our estimated model parameters.

# We test the goodness-of-fit of an ergm using the function `gof()` (Hunter, Goodreau, and Handcock 2008). This function simulates networks using the fitted parameters of your model and calculates a variety of structural measures from these graphs. You are then able to compare the counts of graph statistics from these simulated networks with the counts in your observed network.

# `gof()` will calculate approximate p-values for the differences between your observed graph statistics and those from the simulated networks. A low p-value suggests that there may be a problem with the fit for that graph statistic.

# A particularly useful feature of `gof()` is the ability to generate box plots of the simulated counts and overlay your observed graph statistics. This can provide a quick sanity check of the quality of your model and can help you formulate hypotheses about why your model might be failing.

# What `gof()` calculates depends on the type of network you have (e.g., directed vs. undirected). To use `gof()`, you pass it a formula that takes a network object (e.g., `bottmodel.06`) as the left-hand side of a goodness-of-fit model.

# Default gof-model for undirected graphs is `~degree + espartners + distance + model` while the default for directed graphs is `~idegree + odegree + espartners + distance + model`. In these formulae, `model` indicates the model that you fit for the model obejct.

# Not all ergm terms are supported by `gof()`. See the help file for supported terms. 

# Let's look at the gof of `bottmodel.06`. We will use our model statistics, edgewise shared partners (esp) and geodesic (distance) as our goodness-of-fit criteria.

bottmodel.06.gof <- gof(bottmodel.06 ~ model + esp + distance)
bottmodel.06.gof

## 
## Goodness-of-fit for model statistics 
## 
##                   obs min   mean max MC p-value
## edges              35  25  35.67  44       0.98
## edgecov.bott[[1]]  29  20  29.33  38       1.00
## edgecov.agediff   336 192 342.12 457       0.86
## 
## Goodness-of-fit for edgewise shared partner 
## 
##      obs min  mean max MC p-value
## esp0  13   6 12.69  20       1.00
## esp1  15   7 14.45  25       0.90
## esp2   5   0  6.46  14       0.80
## esp3   1   0  1.77   8       1.00
## esp4   1   0  0.24   3       0.36
## esp5   0   0  0.05   2       1.00
## esp6   0   0  0.01   1       1.00
## 
## Goodness-of-fit for minimum geodesic distance 
## 
##     obs min  mean max MC p-value
## 1    35  25 35.67  44       0.98
## 2    41  28 46.67  57       0.46
## 3    19   7 17.84  29       0.88
## 4     8   0  3.07  13       0.18
## 5     5   0  0.56   8       0.06
## 6     2   0  0.10   3       0.04
## Inf   0   0  6.09  34       1.00

plot(bottmodel.06.gof)

# The model-fit looks pretty solid. Our observed statistics fall within the range of the simulated values, there are no small p-values, and our plot show no major red flags.

# ASSESSING MCMC DIAGNOSTICS
# Even if your model-fit is good, for models with dependence terms, the quality of the MCMC simulation that generated the model estimates also need to be carefully evaluated to ensure that your model is reliable.

# Fortunately, the functions we need to assess the MCMC simulation are conveniently pre-packaged for us in ergm in the function `mcmc.diagnostics()`. To use this function, we'll first install the package `latticeExtra`. This package is not required but is optionally called by `mcmc.diagnostics()` to make nicer plots than the default. 

# install.packages("latticeExtra")

# Now let's look at the MCMC diagnostics for the model with reciprocity we ran earlier.

mcmc.diagnostics(bottmodel.04)

## Sample statistics summary:
## 
## Iterations = 16384:4209664
## Thinning interval = 1024 
## Number of chains = 1 
## Sample size per chain = 4096 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##            Mean    SD Naive SE Time-series SE
## edges  -0.08057 4.914  0.07678        0.07900
## mutual -0.06445 2.310  0.03609        0.03765
## 
## 2. Quantiles for each variable:
## 
##        2.5% 25% 50% 75% 97.5%
## edges    -9  -3   0   3    10
## mutual   -4  -2   0   1     5
## 
## 
## Sample statistics cross-correlations:
##            edges    mutual
## edges  1.0000000 0.7123795
## mutual 0.7123795 1.0000000
## 
## Sample statistics auto-correlation:
## Chain 1 
##                 edges       mutual
## Lag 0     1.000000000  1.000000000
## Lag 1024  0.028429754  0.042296212
## Lag 2048  0.016617180  0.017029376
## Lag 3072  0.006573766  0.023159806
## Lag 4096  0.008490164  0.014136452
## Lag 5120 -0.022826560 -0.002665345
## 
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1 
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##   edges  mutual 
## -0.7570 -0.1797 
## 
## Individual P-values (lower = worse):
##     edges    mutual 
## 0.4490585 0.8573854 
## Joint P-value (lower = worse):  0.6709349 .

## 
## MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).

# This function gives a lot of output. Here's what you need to look for.

# Sample statistic auto-correlation: This measures the correlation between sample statistics at different points in the MCMC chain. A good chain that is randomly mixing and not headed off in a bad direction should have low auto-correlation values (close to zero) at all sampling points after Lag 0.

# Sample statistic burn-in diagnostic (Geweke): This diagnostic gives a measure of convergence by comparing the means of the sample statistic at different places in the Markov chain. If the chains are stationary (randomly mising), then the means at different locations of the chains should be equal. Consequently, for this test we are looking for p-values close to one (and far from zero) for all the individual sample statistics.

# MCMC trace plots: These are plots of the difference between the sample statistics and your observed network for every step of the simulation. They should show evidence of "mixing" (random variation at each step), centered around zero.

# MCMC density plots: The values of the sample statistics should have a bell-shaped distribution, centered at zero (i.e., no difference from your observed network). When examining these pltos you may sometimes see a sawtooth pattern rathern than a smooth distribution. This is a result of variables that are discrete and have a small range; it is not a problem so long as the overall shape is roughly normal and the statistics are centered at zero.

# In this case, our MCMC diagnostics look good. Unfortunately, models often fail to converte, and troubleshooting them can be difficult. For now we note that problems with MCMC diagnostics can often be resolved by adjusting the MCMC control parameters, particularly increasing the MCMC sample size, burn-in, and/or interval. Increasing burn-in time can often help with the Geweke diagnostics, and increasing the MCMC interval can often help with problems with sample auto-correlation. These settings can be adjusted by adding the argument `control=control.ergm(MCMC.burnin=X, MCMC.interval=Y, MCMC.samplesize=Z))` to your ergm call.

# A few suggestions and warnings for running models with MCMLE:
# - It can take a very long tiem (hours, overnight) for a model with large MCMC control parameters to compute, so plan accordingly. Add the argument `verbose=T` to the ergm call to print more output as the model runs so that you can see that it is still running.
# - Run your models with multiple chains or multiple times to ensure you're getting consistent results.
# - Set the seed in R (using `set.seed()`) for any model you plan to publish so you (and others) can replicate your results.
# - Always expect to encounter difficulties.

### EDGEWISE SHARED PARTNERSHIPS, OR TRIANGLES REVISITED
# Throughout our material we have emphasized the importance of triangles ad transitivity in the kinds of social networks that are typically studied by anthropologists. But, as we noted earlier, the `triangles` term, which in theory should be the simplest way to model triad closure in an ERGM, frequently leads to model degeneracy.

# A different, more recent approach to modeling triad closure involves counting 'edgewise shared partnerships' (Hunter 2007). Two nodes i and j have an edgewise shared partner (ESP) if they are connected to each other and each is also connected to a third node k. In other words, two nodes have an ESP if a tie between them closes a triangle.

# There are lots of good reasons to expect that ties that close triangles might have a higher probablity than ties that don't; yet, it is also fair to say that not all triangles are equally probable. Your two closest friends may know each other, but as you consider ties with more and more distance acquaintances, the less likely it is that they also know your other friends. To model this sort of process, the odds of closing a triangle (or equivalently, adding another edgewise shared partnership to the network) can be weighted based on existing ties in the network.

# This idea is implemented with the `gwesp()`, or "Geometrically-Weighted Edgewise Shared Partnerships" term. This term models triad closure, but has a decay parameter such that the probability of a tie that would add an ESP between two nodes is weighted downwards as a function of the number of ESP the pair of nodes already share.

# Let's try estimating a model with GWESP.

bottmodel.07 <- ergm(bott[[4]] ~ edges + nodeicov('age.month') + edgecov(bott[[1]]) + edgecov(agediff) + gwesp(1, fixed=FALSE))

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 20:

## Optimizing with step length 1.

## The log-likelihood improved by 0.05919.

## Step length converged once. Increasing MCMC sample size.

## Iteration 2 of at most 20:

## Optimizing with step length 1.

## The log-likelihood improved by 0.002489.

## Step length converged twice. Stopping.

## Finished MCMLE.

## Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

# Here, by setting `fixed=FALSE` in the `gwesp()` argument, we tell ergm to estimate the decay parameter for us, with a starting value of 1. By setting `fixed=TRUE`, ergm would instead simply estimate the model with the value provided as the decay parameter.

summary(bottmodel.07)

## Call:
## ergm(formula = bott[[4]] ~ edges + nodeicov("age.month") + edgecov(bott[[1]]) + 
##     edgecov(agediff) + gwesp(1, fixed = FALSE))
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##                    Estimate Std. Error MCMC % z value Pr(>|z|)   
## edges              -3.26429    1.24230      0  -2.628   0.0086 **
## nodeicov.age.month  0.07172    0.03178      0   2.257   0.0240 * 
## edgecov.bott[[1]]   1.17861    0.54012      0   2.182   0.0291 * 
## edgecov.agediff    -0.07650    0.03378      0  -2.265   0.0235 * 
## gwesp              -0.18128    0.34923      0  -0.519   0.6037   
## gwesp.decay         1.67491    9.11269      0   0.184   0.8542   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 152.5  on 110  degrees of freedom
##  Residual Deviance: 120.6  on 104  degrees of freedom
##  
## AIC: 132.6    BIC: 148.9    (Smaller is better.)

# A superficial interpretation of GWESP is simple: a significant, positive coefficient indicates more triangles than expected in a random graph fitting the other model constraints, while a significant, negative coefficient indicates fewer triangles than expected (although in our data we have never seen a model with a significant negative GWESP coefficient). However, GWESP is a complex statistic that, through the way that edgewise shared partnerships are weighted, takes into account not just the number of triangles but how they are distributed among nodes. We explain how GWESP works more fully in a separate tutorial. 

# There are other geometrically-weighted terms implemented in ergm in addition to GWESP, including `gwdegree()` (geometrically-weighted degree; also implemented for in-degree and out-degree), `gwdsp()` (geometrically-weighted dyadwise shared partnerships), as well as `dgwesp()` germs that count only certain types of triangles in directed networks. Anthropological interpretation of former two terms can be very challenging; although GWDSP, if used in conjuction with GWESP, can be useful for investigating intransitive relationships.

# Getting models with geometrically-weighted terms to converge can be difficult, especially if you are trying to estimate the decay parameter as well as the coefficient for the geometrically-weighted term. A common work-around for this issue involves running models with different values of the decay parameter and choosing the decay parameter that gives the best model fit.

# STRATEGIES FOR MODEL-BUILDING
# ERGMs are powerful tools, but like other statistical methods they are not without flaws, so we conclude with a few words of caution. 

# First, ERGMs can be difficult to fit: real social networks are often not much like "random" networks. Close attention to both MCMC diagnostics and model goodness-of-fit is essential, obviously, but getting a model with a lot complex terms (especially multiple dependency terms) to converge is not always feasible. Compromises may have to be made.

# Second, there are many, many ways to include nodal covariates in an ERGM (e.g., `nodefactor`, `nodematch`, `nodemix`) and these different terms assume that these attributes are important for different reasons. Does a variable increase an individual's overall propensity for ties? Is it a source of homophily? Or potentially both? These questions don't typically arise in standard statistical analyses, but they must be carefully considered in constructing an ERGM. 

# Like other social network analysis methods, ERG modeling of human social networks inherently involves placing artificial boundaries on people's social connections, even when studying well-defined groups. A school may provide a clear sampling frame, for instance, but for a varieth of reasons, some children will have more friends outside of school than others. Even if only ties within the school are important for the research question, students' outside social connections will likely affect the distribution of friendship ties within the school.

# Finally, no ERGM is a perfect model of the microlevel processes that lead to global network structures. In fact, on the contrary, there are considerable limitations on what ERGMs can tell us about these processes. For instance, as we've shown, ERGMs are generally constrained to consider only networks with similar density, even though processes that lead to a given network density are ideally part of what we should be trying to explain.

# For these reasons, it is essential for your modeling strategy to be strongly motivated by hyptheses that can help inform your choice of which covariates, including dependency terms, to include and how to include them. Good hyptheses also helps you evaluate when a model is adequate to answer your research question.

Introduction to ERGMs

krhee

12/13/2020