This shows how to parameterize two forms of age-mixing in an ERGM using Statnet. One will be with a continous version of age and the other with a binned version.

library(ergm)

Set up the network and make two forms of age, one continuous and the other splitting age into three categories.

nw <- network.initialize(1000, directed = FALSE)

age <- sample(20:49, 1000, TRUE)

agecat <- rep(1, 1000)
agecat[age >= 30] <- 2
agecat[age >= 40] <- 3
table(agecat)
## agecat
##   1   2   3 
## 351 328 321

Set both attributes on the network.

nw <- set.vertex.attribute(nw, "age", age)
nw <- set.vertex.attribute(nw, "agecat", agecat)
nw
##  Network attributes:
##   vertices = 1000 
##   directed = FALSE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 0 
##     missing edges= 0 
##     non-missing edges= 0 
## 
##  Vertex attribute names: 
##     age agecat vertex.names 
## 
## No edge attributes

First let’s look at a null model that only includes an edges term controlling the mean degree, which we will set to 0.75.

mean.deg <- 0.75
edges <- mean.deg * (1000/2)

fit.null <- ergm(nw ~ edges, target.stats = edges)

We can add statistics to monitor after the model fit to see what the expected values of the nodematch statistic for the age category and absdiff term for the continuous age variable would be. The nodematch term is the count in each cell on the diagonal of the mixing matrix, whereas the absdiff term is the sum of the absolute differences between the age of one partner and the age of the other partner in an edgelist.

sim.null <- simulate(fit.null, nsim = 1000, statsonly = TRUE, 
                     monitor = ~nodematch("agecat", diff = TRUE) +
                                absdiff("age"))
colMeans(sim.null)
##              edges nodematch.agecat.1 nodematch.agecat.2 
##            374.271             45.909             39.801 
## nodematch.agecat.3        absdiff.age 
##             38.415           3681.874

More intuitively, heres what the absdiff term is measuring. We are plotting the age of one versus the other partner in one cross-sectional simulation. The absdiff term is the sum of the distances between each dot and the identity line.

el <- as.edgelist(simulate(fit.null))
age.el <- matrix(age[el], ncol = 2)
plot(jitter(age.el[, 1]), jitter(age.el[, 2]))
abline(a = 0, b = 1)

The numerical average for each pair, which is what we would actually be able to measure in survey data, is this number.

mean(sim.null[, "absdiff.age"])/mean(sim.null[, "edges"])
## [1] 9.837455

Here’s what the binned mixing matrix looks like in tabular form. Each of the nine squares is roughly the same size (or at least will be if we simulate this lots of times).

agecat.el <- matrix(agecat[el], ncol = 2)
tab.agecat <- table(agecat.el[, 1], agecat.el[, 2])
tab.agecat
##    
##      1  2  3
##   1 29 48 48
##   2 39 36 36
##   3 36 45 31
mosaicplot(tab.agecat)

Ok, let’s fit a model with an absdiff term where the average age difference is not 10 years (as it is in the null model), but 4 years. Here’s how we do that, and the results of simulations from the model.

absdiff.target <- edges * 4
fit.absdiff <- ergm(nw ~ edges + absdiff("age"),
                    target.stats = c(edges, absdiff.target)) 
## Starting maximum likelihood estimation via MCMLE:
## Iteration 1 of at most 20: 
## The log-likelihood improved by 0.0005483 
## Step length converged once. Increasing MCMC sample size.
## Iteration 2 of at most 20: 
## The log-likelihood improved by 0.0009036 
## Step length converged twice. Stopping.
## 
## This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
sim.absdiff <- simulate(fit.absdiff, nsim = 1e4, statsonly = TRUE)
colMeans(sim.absdiff)
##       edges absdiff.age 
##    373.9507   1494.0088

Here’s the continuous age mixture plot. Much more clustering now.

el <- as.edgelist(simulate(fit.absdiff))
age.el <- matrix(age[el], ncol = 2)
plot(jitter(age.el[, 1]), jitter(age.el[, 2]))
abline(a = 0, b = 1)

Here’s the binned version. Lot’s more mass on the diagonal squares now.

agecat.el <- matrix(agecat[el], ncol = 2)
tab.agecat <- table(agecat.el[, 1], agecat.el[, 2])
tab.agecat
##    
##      1  2  3
##   1 78 39  5
##   2 26 73 25
##   3  4 25 84
mosaicplot(tab.agecat)

The benefit of the absdiff method is that it uses a nice parameteric form to control the mixing, but a drawback is that it may not allow for fine-tuned control over differentials in assortivity by age (although there are ways to partially handle this with a transform on age). The nodematch allows us to do that, as in this example where the age clustering is very strong for twenty-somethings, then gets weaker at older ages.

nodematch.targets <- c(120, 100, 50)
fit.agecat <- ergm(nw ~ edges + nodematch("agecat", diff = TRUE),
                   target.stats = c(edges, nodematch.targets))
sim.agecat <- simulate(fit.agecat, nsim = 1e4, statsonly = TRUE)
colMeans(sim.agecat)
##              edges nodematch.agecat.1 nodematch.agecat.2 
##           375.4567           120.0907           100.1251 
## nodematch.agecat.3 
##            50.1795

Here’s the continous age mixture plot. Slightly more clustering on the bottom left than the upper right.

el <- as.edgelist(simulate(fit.absdiff))
age.el <- matrix(age[el], ncol = 2)
plot(jitter(age.el[, 1]), jitter(age.el[, 2]))

Here is the mixing matrix version.

agecat.el <- matrix(agecat[el], ncol = 2)
tab.agecat <- table(agecat.el[, 1], agecat.el[, 2])
tab.agecat
##    
##      1  2  3
##   1 93 37  3
##   2 29 84 19
##   3  5 27 80
mosaicplot(tab.agecat)