Introduction to Egocentric Network Data Analysis with ERGMs using Statnet

Last updated 2021-06-27

install.packages('ergm.ego')
library('ergm.ego')
Loading required package: ergm
Loading required package: network

'network' 1.17.1 (2021-06-12), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information

'ergm' 4.0.1 (2021-06-20), part of the Statnet Project
* 'news(package="ergm")' for changes since last version
* 'citation("ergm")' for citation information
* 'https://statnet.org' for help, support, and other information
'ergm' 4 is a major update that introduces some backwards-incompatible
changes. Please type 'news(package="ergm")' for a list of major
changes.
Loading required package: egor
Loading required package: dplyr

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Loading required package: tibble

'ergm.ego' 1.0-654 (2021-06-22), part of the Statnet Project
* 'news(package="ergm.ego")' for changes since last version
* 'citation("ergm.ego")' for citation information
* 'https://statnet.org' for help, support, and other information

Attaching package: 'ergm.ego'
The following objects are masked from 'package:ergm':

    COLLAPSE_SMALLEST, snctrl
The following object is masked from 'package:base':

    sample
packageVersion('ergm.ego')
[1] '1.0.654'
example(sample.egor)
library(help='ergm.ego')
?as.egor
help('ergm.ego-terms')
library(ergm.ego)
sessionInfo()
set.seed(1)
data(faux.mesa.high)
mesa <- faux.mesa.high
plot(mesa, vertex.col="Grade")
legend('bottomleft',fill=7:12,legend=paste('Grade',7:12),cex=0.75)
mesa.ego <- as.egor(mesa) 
names(mesa.ego) # what are the components of this object?
[1] "ego"   "alter" "aatie"
mesa.ego # prints a few lines for each component
# EGO data ([32mactive[39m): 205 x 4
  .egoID Grade Race  Sex  
   <int> <dbl> <chr> <chr>
1      1     7 Hisp  F    
2      2     7 Hisp  F    
3      3    11 NatAm M    
4      4     8 Hisp  M    
5      5    10 White F    
# ALTER data: 406 x 5
  .altID .egoID Grade Race  Sex  
   <int>  <int> <dbl> <chr> <chr>
1    174      1     7 Hisp  F    
2    161      1     7 Hisp  F    
3    151      1     7 Hisp  F    
# AATIE data: 372 x 3
  .egoID .srcID .tgtID
   <int>  <int>  <int>
1      1    151    127
2      1    127     52
3      1    127     87
#View(mesa.ego) # opens the component in the Rstudio source window
class(mesa.ego) # what type of "object" is this?
[1] "egor" "list"
class(mesa.ego$ego) # and what type of objects are the components?
[1] "tbl_df"     "tbl"        "data.frame"
class(mesa.ego$alter)
[1] "tbl_df"     "tbl"        "data.frame"
class(mesa.ego$aatie)
[1] "tbl_df"     "tbl"        "data.frame"
mesa.ego$ego # first few rows of the ego table
# A tibble: 205 x 4
   .egoID Grade Race  Sex  
    <int> <dbl> <chr> <chr>
 1      1     7 Hisp  F    
 2      2     7 Hisp  F    
 3      3    11 NatAm M    
 4      4     8 Hisp  M    
 5      5    10 White F    
 6      6    10 Hisp  F    
 7      7     8 NatAm M    
 8      8    11 NatAm M    
 9      9     9 White M    
10     10     9 NatAm F    
# ... with 195 more rows
mesa.ego$alter # first few rows of the alter table
# A tibble: 406 x 5
   .altID .egoID Grade Race  Sex  
    <int>  <int> <dbl> <chr> <chr>
 1    174      1     7 Hisp  F    
 2    161      1     7 Hisp  F    
 3    151      1     7 Hisp  F    
 4    127      1     7 Hisp  F    
 5    110      1     7 Hisp  F    
 6    100      1     7 Hisp  F    
 7     96      1     7 NatAm F    
 8     92      1     7 NatAm F    
 9     87      1     7 White F    
10     70      1     7 NatAm F    
# ... with 396 more rows
# ties show up twice, but alter info is linked to .altID
mesa.ego$alter %>% filter((.altID==1 & .egoID==25) | (.egoID==1 & .altID==25))
# A tibble: 2 x 5
  .altID .egoID Grade Race  Sex  
   <int>  <int> <dbl> <chr> <chr>
1     25      1     7 White F    
2      1     25     7 Hisp  F    
mesa.ego$aatie # first few rows of the alter table
# A tibble: 372 x 3
   .egoID .srcID .tgtID
    <int>  <int>  <int>
 1      1    151    127
 2      1    127     52
 3      1    127     87
 4      1    127    151
 5      1    110     87
 6      1    110     92
 7      1    110     96
 8      1    100     96
 9      1     96     87
10      1     96    110
# ... with 362 more rows
# egos
write.csv(mesa.ego$ego, file="mesa.ego.table.csv", row.names = F)

# alters
write.csv(mesa.ego$alter[,-1], file="mesa.alter.table.csv", row.names = F)
mesa.egos <- read.csv("mesa.ego.table.csv")
head(mesa.egos)
  .egoID Grade  Race Sex
1      1     7  Hisp   F
2      2     7  Hisp   F
3      3    11 NatAm   M
4      4     8  Hisp   M
5      5    10 White   F
6      6    10  Hisp   F
mesa.alts <- read.csv("mesa.alter.table.csv")
head(mesa.alts)
  .egoID Grade Race Sex
1      1     7 Hisp   F
2      1     7 Hisp   F
3      1     7 Hisp   F
4      1     7 Hisp   F
5      1     7 Hisp   F
6      1     7 Hisp   F
my.egodata <- egor(egos = mesa.egos, 
                   alters = mesa.alts, 
                   ID.vars = list(ego = ".egoID"))
my.egodata
# EGO data ([32mactive[39m): 205 x 4
  .egoID Grade Race  Sex  
  <chr>  <int> <chr> <chr>
1 1          7 Hisp  F    
2 2          7 Hisp  F    
3 3         11 NatAm M    
4 4          8 Hisp  M    
5 5         10 White F    
# ALTER data: 406 x 4
  .egoID Grade Race  Sex  
  <chr>  <int> <chr> <chr>
1 1          7 Hisp  F    
2 1          7 Hisp  F    
3 1          7 Hisp  F    
# AATIE data: 0 x 3
example("egor")
# to reduce typing, we'll pull the ego and alter data frames
egos <- mesa.ego$ego
alters <- mesa.ego$alter

table(egos$Sex, exclude=NULL)

  F   M 
 99 106 
table(egos$Race, exclude=NULL)

Black  Hisp NatAm Other White 
    6   109    68     4    18 
barplot(table(egos$Grade), ylab="frequency")
# compare egos and alters...

par(mfrow=c(1,2))
barplot(table(egos$Race)/nrow(egos),
        main="Ego Race Distn", ylab="percent",
        ylim = c(0,0.5))
barplot(table(alters$Race)/nrow(alters),
        main="Alter Race Distn", ylab="percent",
        ylim = c(0,0.5))
# to get the crosstabulated counts of ties:
mixingmatrix(mesa.ego,"Grade")
     7   8   9  10  11  12
7  150   0   0   1   1   1
8    0  66   2   4   2   1
9    0   2  46   7   6   4
10   1   4   7  18   1   5
11   1   2   6   1  34   5
12   1   1   4   5   5  12
# contrast with the original network crosstab:
mixingmatrix(mesa, "Grade")
    7  8  9 10 11 12
7  75  0  0  1  1  1
8   0 33  2  4  2  1
9   0  2 23  7  6  4
10  1  4  7  9  1  5
11  1  2  6  1 17  5
12  1  1  4  5  5  6
# to get the row conditional probabilities:

round(mixingmatrix(mesa.ego, "Grade", rowprob=T), 2)
      7    8    9   10   11   12
7  0.98 0.00 0.00 0.01 0.01 0.01
8  0.00 0.88 0.03 0.05 0.03 0.01
9  0.00 0.03 0.71 0.11 0.09 0.06
10 0.03 0.11 0.19 0.50 0.03 0.14
11 0.02 0.04 0.12 0.02 0.69 0.10
12 0.04 0.04 0.14 0.18 0.18 0.43
round(mixingmatrix(mesa.ego, "Race", rowprob=T), 2)
      Black Hisp NatAm Other White
Black  0.00 0.31  0.50  0.00  0.19
Hisp   0.04 0.60  0.23  0.01  0.12
NatAm  0.08 0.26  0.59  0.00  0.06
Other  0.00 1.00  0.00  0.00  0.00
White  0.11 0.49  0.22  0.00  0.18
# first, using the original network
network.edgecount(faux.mesa.high)
[1] 203
# compare to the egodata
# note that the ties are double counted, so we need to divide by 2.
nrow(mesa.ego$alter)/2
[1] 203
# mean degree -- here we want to count each "stub", so we don't divide by 2
nrow(mesa.ego$alter)/nrow(mesa.ego$ego)
[1] 1.980488
# overall degree distribution
summary(mesa.ego ~ degree(0:20))
         scaled mean     SE
degree0           57 6.4306
degree1           51 6.2048
degree2           30 5.0730
degree3           28 4.9289
degree4           18 4.0620
degree5           10 3.0917
degree6            2 1.4107
degree7            4 1.9852
degree8            1 1.0000
degree9            2 1.4107
degree10           1 1.0000
degree11           0 0.0000
degree12           0 0.0000
degree13           1 1.0000
degree14           0 0.0000
degree15           0 0.0000
degree16           0 0.0000
degree17           0 0.0000
degree18           0 0.0000
degree19           0 0.0000
degree20           0 0.0000
# and stratified by sex
summary(mesa.ego ~ degree(0:13, by="Sex"))
           scaled mean     SE
deg0.SexF           23 4.5299
deg1.SexF           23 4.5299
deg2.SexF           10 3.0917
deg3.SexF           17 3.9581
deg4.SexF           12 3.3694
deg5.SexF            7 2.6066
deg6.SexF            1 1.0000
deg7.SexF            3 1.7235
deg8.SexF            1 1.0000
deg9.SexF            0 0.0000
deg10.SexF           1 1.0000
deg11.SexF           0 0.0000
deg12.SexF           0 0.0000
deg13.SexF           1 1.0000
deg0.SexM           34 5.3385
deg1.SexM           28 4.9289
deg2.SexM           20 4.2588
deg3.SexM           11 3.2343
deg4.SexM            6 2.4193
deg5.SexM            3 1.7235
deg6.SexM            1 1.0000
deg7.SexM            1 1.0000
deg8.SexM            0 0.0000
deg9.SexM            2 1.4107
deg10.SexM           0 0.0000
deg11.SexM           0 0.0000
deg12.SexM           0 0.0000
deg13.SexM           0 0.0000
summary(mesa.ego ~ degree(0:10), scaleto=100000)
         scaled mean      SE
degree0     27804.88 3136.89
degree1     24878.05 3026.75
degree2     14634.15 2474.63
degree3     13658.54 2404.34
degree4      8780.49 1981.47
degree5      4878.05 1508.16
degree6       975.61  688.17
degree7      1951.22  968.41
degree8       487.80  487.80
degree9       975.61  688.17
degree10      487.80  487.80
summary(mesa.ego ~ degree(0:10), scaleto=nrow(mesa.ego$ego)*100)
         scaled mean     SE
degree0         5700 643.06
degree1         5100 620.48
degree2         3000 507.30
degree3         2800 492.89
degree4         1800 406.20
degree5         1000 309.17
degree6          200 141.07
degree7          400 198.52
degree8          100 100.00
degree9          200 141.07
degree10         100 100.00
# to get the frequency counts

degreedist(mesa.ego, plot=T)
degreedist(mesa.ego, by="Sex", plot=T)

# to get the proportion at each degree level

degreedist(mesa.ego, by="Sex", plot=T, prob=T)
degreedist(mesa.ego, brg=T)
degreedist(mesa.ego, by="Sex", prob=T, brg=T)
?control.ergm.ego
fit.edges <- ergm.ego(mesa.ego ~ edges)
summary(fit.edges)
Call:
ergm(formula = ergm.formula, constraints = constraints, offset.coef = ergm.offset.coef, 
    target.stats = m, eval.loglik = FALSE, control = control$ergm)

Monte Carlo Maximum Likelihood Results:

            Estimate Std. Error MCMC % z value Pr(>|z|)    
netsize.adj -5.32301    0.00000      0    -Inf   <1e-04 ***
edges        0.69930    0.08257      1   8.469   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 The following terms are fixed by offset and are not estimated:
  netsize.adj 
names(fit.edges)
 [1] "coefficients"     "sample"           "sample.obs"       "iterations"      
 [5] "MCMCtheta"        "loglikelihood"    "gradient"         "hessian"         
 [9] "covar"            "failure"          "network"          "newnetworks"     
[13] "newnetwork"       "coef.init"        "est.cov"          "coef.hist"       
[17] "stats.hist"       "steplen.hist"     "control"          "etamap"          
[21] "call"             "ergm_version"     "MPLE_is_MLE"      "formula"         
[25] "target.stats"     "nw.stats"         "target.esteq"     "constrained"     
[29] "constraints"      "obs.constraints"  "reference"        "estimate"        
[33] "estimate.desc"    "offset"           "drop"             "estimable"       
[37] "v"                "m"                "ergm.formula"     "ergm.offset.coef"
[41] "egor"             "ppopsize"         "popsize"          "netsize.adj"     
[45] "ergm.covar"       "DtDe"            
fit.edges$ppopsize
[1] 205
fit.edges$popsize
[1] 1
 The following terms are fixed by offset and are not estimated:
  netsize.adj
summary(ergm.ego(mesa.ego ~ edges, 
                 control = control.ergm.ego(ppopsize=1000)))
Call:
ergm(formula = ergm.formula, constraints = constraints, offset.coef = ergm.offset.coef, 
    target.stats = m, eval.loglik = FALSE, control = control$ergm)

Monte Carlo Maximum Likelihood Results:

            Estimate Std. Error MCMC % z value Pr(>|z|)    
netsize.adj -6.93245    0.00000      0    -Inf   <1e-04 ***
edges        0.68124    0.08055      0   8.457   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 The following terms are fixed by offset and are not estimated:
  netsize.adj 
mcmc.diagnostics(fit.edges, which ="plots")

MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).
plot(gof(fit.edges, GOF="model"))
plot(gof(fit.edges, GOF="degree"))
set.seed(1)
fit.deg0 <- ergm.ego(mesa.ego ~ edges + degree(0), control=control.ergm.ego(ppopsize=1000))
summary(fit.deg0)
Call:
ergm(formula = ergm.formula, constraints = constraints, offset.coef = ergm.offset.coef, 
    target.stats = m, eval.loglik = FALSE, control = control$ergm)

Monte Carlo Maximum Likelihood Results:

            Estimate Std. Error MCMC % z value Pr(>|z|)    
netsize.adj  -6.9324     0.0000      0    -Inf   <1e-04 ***
edges         1.1699     0.1055      0  11.087   <1e-04 ***
degree0       1.4833     0.2711      0   5.472   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 The following terms are fixed by offset and are not estimated:
  netsize.adj 
mcmc.diagnostics(fit.deg0, which = "plots")
plot(gof(fit.deg0, GOF="model"))
plot(gof(fit.deg0, GOF="degree"))
fit.full <- ergm.ego(mesa.ego ~ edges + degree(0:1) 
                     + nodefactor("Sex")
                     + nodefactor("Race", levels = -LARGEST)
                     + nodefactor("Grade")
                     + nodematch("Sex") 
                     + nodematch("Race") 
                     + nodematch("Grade"))
summary(fit.full)
Call:
ergm(formula = ergm.formula, constraints = constraints, offset.coef = ergm.offset.coef, 
    target.stats = m, eval.loglik = FALSE, control = control$ergm)

Monte Carlo Maximum Likelihood Results:

                      Estimate Std. Error MCMC % z value Pr(>|z|)    
netsize.adj           -5.32301    0.00000      0    -Inf  < 1e-04 ***
edges                 -1.39738    0.21432      0  -6.520  < 1e-04 ***
degree0                2.10700    0.36592      0   5.758  < 1e-04 ***
degree1                1.00960    0.29695      0   3.400 0.000674 ***
nodefactor.Sex.M      -0.17765    0.06240      0  -2.847 0.004411 ** 
nodefactor.Race.Black  1.21477    0.20316      0   5.979  < 1e-04 ***
nodefactor.Race.NatAm  0.30945    0.06253      0   4.949  < 1e-04 ***
nodefactor.Race.Other -0.91183    0.69912      0  -1.304 0.192149    
nodefactor.Race.White  0.57907    0.13165      0   4.399  < 1e-04 ***
nodefactor.Grade.8     0.13722    0.05514      0   2.489 0.012820 *  
nodefactor.Grade.9     0.13959    0.05024      0   2.778 0.005467 ** 
nodefactor.Grade.10    0.31255    0.07529      0   4.151  < 1e-04 ***
nodefactor.Grade.11    0.40604    0.06412      0   6.333  < 1e-04 ***
nodefactor.Grade.12    0.77650    0.07451      0  10.422  < 1e-04 ***
nodematch.Sex          0.64757    0.12803      0   5.058  < 1e-04 ***
nodematch.Race         0.85331    0.14206      0   6.007  < 1e-04 ***
nodematch.Grade        3.06429    0.16358      0  18.733  < 1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 The following terms are fixed by offset and are not estimated:
  netsize.adj 
mcmc.diagnostics(fit.full, which = "plots")
plot(gof(fit.full, GOF="model"))
plot(gof(fit.full, GOF="degree"))
sim.full <- simulate(fit.full)
summary(mesa.ego ~ edges + degree(0:1)
                      + nodefactor("Sex")
                      + nodefactor("Race", levels = -LARGEST)
                      + nodefactor("Grade")
                      + nodematch("Sex") + nodematch("Race") + nodematch("Grade"))
                      scaled mean      SE
edges                         203 15.2022
degree0                        57  6.4306
degree1                        51  6.2048
nodefactor.Sex.M              171 17.1990
nodefactor.Race.Black          26  6.5507
nodefactor.Race.NatAm         156 19.7787
nodefactor.Race.Other           1  0.7054
nodefactor.Race.White          45  9.1943
nodefactor.Grade.8             75 17.3212
nodefactor.Grade.9             65 11.2475
nodefactor.Grade.10            36  8.0931
nodefactor.Grade.11            49 11.4861
nodefactor.Grade.12            28  7.2756
nodematch.Sex                 132 12.1128
nodematch.Race                103 10.0369
nodematch.Grade               163 13.6309
summary(sim.full ~ edges + degree(0:1)
                      + nodefactor("Sex")
                      + nodefactor("Race", levels = -LARGEST)
                      + nodefactor("Grade")
                      + nodematch("Sex") + nodematch("Race") + nodematch("Grade"))
                edges               degree0               degree1 
                  245                    43                    46 
     nodefactor.Sex.M nodefactor.Race.Black nodefactor.Race.NatAm 
                  173                    30                   192 
nodefactor.Race.Other nodefactor.Race.White    nodefactor.Grade.8 
                    1                    57                    79 
   nodefactor.Grade.9   nodefactor.Grade.10   nodefactor.Grade.11 
                   93                    34                    50 
  nodefactor.Grade.12         nodematch.Sex        nodematch.Race 
                   52                   168                   121 
      nodematch.Grade 
                  195 
plot(sim.full, vertex.col="Grade")
legend('bottomleft',fill=7:12,legend=paste('Grade',7:12),cex=0.75)
sim.full2 <- simulate(fit.full, popsize=network.size(mesa)*2)
summary(mesa~edges + degree(0:1)
                      + nodefactor("Sex")
                      + nodefactor("Race", levels = -LARGEST)
                      + nodefactor("Grade")
                      + nodematch("Sex") + nodematch("Race") + nodematch("Grade"))*2
                edges               degree0               degree1 
                  406                   114                   102 
     nodefactor.Sex.M nodefactor.Race.Black nodefactor.Race.NatAm 
                  342                    52                   312 
nodefactor.Race.Other nodefactor.Race.White    nodefactor.Grade.8 
                    2                    90                   150 
   nodefactor.Grade.9   nodefactor.Grade.10   nodefactor.Grade.11 
                  130                    72                    98 
  nodefactor.Grade.12         nodematch.Sex        nodematch.Race 
                   56                   264                   206 
      nodematch.Grade 
                  326 
summary(sim.full2~edges + degree(0:1)
                      + nodefactor("Sex")
                      + nodefactor("Race", levels = -LARGEST)
                      + nodefactor("Grade")
                      + nodematch("Sex") + nodematch("Race") + nodematch("Grade"))
                edges               degree0               degree1 
                  475                    98                   104 
     nodefactor.Sex.M nodefactor.Race.Black nodefactor.Race.NatAm 
                  380                    68                   381 
nodefactor.Race.Other nodefactor.Race.White    nodefactor.Grade.8 
                    2                    91                   149 
   nodefactor.Grade.9   nodefactor.Grade.10   nodefactor.Grade.11 
                  190                    64                    90 
  nodefactor.Grade.12         nodematch.Sex        nodematch.Race 
                   78                   325                   254 
      nodematch.Grade 
                  380 
data(faux.magnolia.high)
faux.magnolia.high -> fmh
N <- network.size(fmh)
fit.ergm <- ergm(fmh ~ degree(0:3) 
                 + nodefactor("Race", levels=TRUE) + nodematch("Race")
                 + nodefactor("Sex") + nodematch("Sex") 
                 + absdiff("Grade"))
round(coef(fit.ergm), 3)
              degree0               degree1               degree2 
                0.935                 0.257                 0.029 
              degree3 nodefactor.Race.Asian nodefactor.Race.Black 
               -0.245                -2.479                -3.045 
 nodefactor.Race.Hisp nodefactor.Race.NatAm nodefactor.Race.Other 
               -2.701                -2.279                -2.623 
nodefactor.Race.White        nodematch.Race      nodefactor.Sex.M 
               -3.387                 1.678                -0.089 
        nodematch.Sex         absdiff.Grade 
                0.857                -2.112 
fmh.ego <- as.egor(fmh)
head(fmh.ego)
# EGO data ([32mactive[39m): 3 x 5
  .egoID Grade Race  Sex   vertex.names
   <int> <dbl> <chr> <chr> <chr>       
1      1     9 Black F     1           
2      2    10 Black M     2           
3      3    12 Black F     3           
# ALTER data: 6 x 6
  .altID .egoID Grade Race  Sex   vertex.names
   <int>  <int> <dbl> <chr> <chr> <chr>       
1    669      1     9 Black F     669         
2    963      2    10 White F     963         
3    912      2    10 White M     912         
# AATIE data: 0 x 3
egofit <- ergm.ego(fmh.ego ~ degree(0:3) 
                   + nodefactor("Race", levels=TRUE) + nodematch("Race")
                   + nodefactor("Sex") + nodematch("Sex") 
                   + absdiff("Grade"), popsize=N,
                  control=control.ergm.ego(ppopsize=N))

# A convenience function.
model.se <- function(fit) sqrt(diag(vcov(fit)))

# Parameters recovered:
coef.compare <- data.frame(
  "NW est" = coef(fit.ergm), 
  "Ego Cen est" = coef(egofit)[-1],
  "diff Z" = (coef(fit.ergm)-coef(egofit)[-1])/model.se(egofit)[-1])

round(coef.compare, 3)
                      NW.est Ego.Cen.est diff.Z
degree0                0.935       0.941 -0.013
degree1                0.257       0.262 -0.013
degree2                0.029       0.033 -0.015
degree3               -0.245      -0.243 -0.011
nodefactor.Race.Asian -2.479      -2.481  0.019
nodefactor.Race.Black -3.045      -3.047  0.022
nodefactor.Race.Hisp  -2.701      -2.703  0.016
nodefactor.Race.NatAm -2.279      -2.295  0.114
nodefactor.Race.Other -2.623      -2.670  0.169
nodefactor.Race.White -3.387      -3.384 -0.034
nodematch.Race         1.678       1.677  0.021
nodefactor.Sex.M      -0.089      -0.089  0.020
nodematch.Sex          0.857       0.856  0.013
absdiff.Grade         -2.112      -2.113  0.013
# MCMC diagnostics. 
mcmc.diagnostics(egofit, which="plots")
plot(gof(egofit, GOF="model"))
plot(gof(egofit, GOF="degree"))
set.seed(1)
fmh.egosampN <- sample(fmh.ego, N, replace=TRUE)
Warning in `[.egor`(x, is, , unit = "ego"): Some ego indices have been selected
multiple times. They will be duplicated, and '.egoID's renumbered to preserve
uniqueness.
egofitN <- ergm.ego(fmh.egosampN ~ degree(0:3) 
                    + nodefactor("Race", levels=TRUE) + nodematch("Race") 
                    + nodefactor("Sex") + nodematch("Sex")
                    + absdiff("Grade"),
                    popsize=N)

# compare the coef
coef.compare <- data.frame(
  "NW est" = coef(fit.ergm), 
  "Ego SampN est" = coef(egofitN)[-1],
  "diff Z" = (coef(fit.ergm)-coef(egofitN)[-1])/model.se(egofitN)[-1])

round(coef.compare, 3)
                      NW.est Ego.SampN.est diff.Z
degree0                0.935         1.397 -0.957
degree1                0.257         0.524 -0.693
degree2                0.029         0.368 -1.155
degree3               -0.245        -0.021 -1.076
nodefactor.Race.Asian -2.479        -2.405 -0.476
nodefactor.Race.Black -3.045        -2.911 -1.168
nodefactor.Race.Hisp  -2.701        -2.529 -1.349
nodefactor.Race.NatAm -2.279        -2.100 -1.310
nodefactor.Race.Other -2.623        -2.620 -0.009
nodefactor.Race.White -3.387        -3.271 -1.098
nodematch.Race         1.678         1.609  0.919
nodefactor.Sex.M      -0.089        -0.139  1.770
nodematch.Sex          0.857         0.880 -0.423
absdiff.Grade         -2.112        -2.021 -1.415
# compare the s.e.'s
se.compare <- data.frame(
  "NW SE" = model.se(fit.ergm), 
  "Ego census SE" =model.se(egofit)[-1], 
  "Ego SampN SE" = model.se(egofitN)[-1])

round(se.compare, 3)
                      NW.SE Ego.census.SE Ego.SampN.SE
degree0               0.458         0.455        0.483
degree1               0.366         0.366        0.385
degree2               0.274         0.273        0.294
degree3               0.202         0.207        0.208
nodefactor.Race.Asian 0.149         0.135        0.155
nodefactor.Race.Black 0.116         0.111        0.115
nodefactor.Race.Hisp  0.146         0.127        0.127
nodefactor.Race.NatAm 0.159         0.142        0.136
nodefactor.Race.Other 0.401         0.283        0.329
nodefactor.Race.White 0.110         0.105        0.106
nodematch.Race        0.102         0.078        0.075
nodefactor.Sex.M      0.033         0.030        0.029
nodematch.Sex         0.070         0.052        0.056
absdiff.Grade         0.071         0.072        0.064
set.seed(0) # Some samples have different sets of alter levels from ego levels.

fmh.egosampN4 <- sample(fmh.ego, round(N/4), replace=TRUE)
Warning in `[.egor`(x, is, , unit = "ego"): Some ego indices have been selected
multiple times. They will be duplicated, and '.egoID's renumbered to preserve
uniqueness.
egofitN4 <- ergm.ego(fmh.egosampN4 ~ degree(0:3) 
                    + nodefactor("Race", levels=TRUE) + nodematch("Race") 
                    + nodefactor("Sex") + nodematch("Sex")
                    + absdiff("Grade"),
                    popsize=N)

# compare the coef
coef.compare <- data.frame(
  "NW est" = coef(fit.ergm), 
  "Ego SampN4 est" = coef(egofitN4)[-1],
  "diff Z" = (coef(fit.ergm)-coef(egofitN4)[-1])/model.se(egofitN4)[-1])

round(coef.compare, 3)
                      NW.est Ego.SampN4.est diff.Z
degree0                0.935          0.510  0.443
degree1                0.257         -0.259  0.675
degree2                0.029         -0.053  0.145
degree3               -0.245         -0.373  0.319
nodefactor.Race.Asian -2.479         -2.204 -1.081
nodefactor.Race.Black -3.045         -2.990 -0.230
nodefactor.Race.Hisp  -2.701         -2.813  0.453
nodefactor.Race.NatAm -2.279         -2.498  0.847
nodefactor.Race.Other -2.623         -2.432 -0.429
nodefactor.Race.White -3.387         -3.435  0.206
nodematch.Race         1.678          1.581  0.662
nodefactor.Sex.M      -0.089         -0.191  1.658
nodematch.Sex          0.857          0.887 -0.296
absdiff.Grade         -2.112         -2.079 -0.243
# compare the s.e.'s
se.compare <- data.frame(
  "NW SE" = model.se(fit.ergm), 
  "Ego census SE" =model.se(egofit)[-1], 
  "Ego SampN SE" = model.se(egofitN)[-1],
  "Ego Samp4 SE" = model.se(egofitN4)[-1])

round(se.compare, 3)
                      NW.SE Ego.census.SE Ego.SampN.SE Ego.Samp4.SE
degree0               0.458         0.455        0.483        0.961
degree1               0.366         0.366        0.385        0.766
degree2               0.274         0.273        0.294        0.564
degree3               0.202         0.207        0.208        0.401
nodefactor.Race.Asian 0.149         0.135        0.155        0.254
nodefactor.Race.Black 0.116         0.111        0.115        0.237
nodefactor.Race.Hisp  0.146         0.127        0.127        0.246
nodefactor.Race.NatAm 0.159         0.142        0.136        0.259
nodefactor.Race.Other 0.401         0.283        0.329        0.445
nodefactor.Race.White 0.110         0.105        0.106        0.231
nodematch.Race        0.102         0.078        0.075        0.148
nodefactor.Sex.M      0.033         0.030        0.029        0.062
nodematch.Sex         0.070         0.052        0.056        0.104
absdiff.Grade         0.071         0.072        0.064        0.136

Function	Purpose
`summary`	takes an `ergm` formula with an ‘egodata’ object on the LHS and `ergm` terms on the RHS, and returns the observed values of the statistics on the RHS. When used on an `ergm.ego` object, this returns a summary of the model fit.
`ergm.ego`	the main function used to fit an ERGM to the egodata object. In addition to the arguments that are specific to fitting from egocentrically sampled data, you can pass the usual arguments to `ergm` for controlling the fitting algorithm.
`control.ergm.ego`	a set of control parameters specifically needed for fitting ERGMs to egocentrically sampled data, as well a method for passing other control parameters to `ergm`.
`vcov`	returns the variance–covariance matrix of the estimate for \(\theta\).

Function	Purpose
`simulate`	Simulates complete networks (of any size) from the fitted model. You can either pass the `ergm.ego` object (which contains the results of the model fit) to simulate, or pass a formula as the object, along with a vector of coefficients.
`gof`	Uses the simulation function above to compare observed statistics to statistics from networks simulated from the model. Constrained by the data to the model terms, degree distribution, and the ESP distribution (if the alter–alter ties are observed).
`control.simulate.ergm.ego`	Allows you to control how to resolve fractional numbers of vertices when this is produced by the sample weights and the population size selected. Also provides a way to pass controls to SAN and `ergm` simulate.

Parameter	Meaning
\(N\)	the population being studied: a very large, but finite, set of actors whose relations are of interest
\(x _ i\)	attribute (e.g., age, sex, race) vector of actor \(i \in N\)
\(x_N\)	(or just \(x\), when there is no ambiguity) the attributes of actors in \(N\)
\(\mathbb{Y}(N)\)	the set of dyads (potential ties) in an undirected network of actors in \(N\)
\(y\subseteq \mathbb{Y}(N)\)	the population network: a fixed but unknown network (a set of relationships) of relationships of interest. In particular,
\(y_{ij}\)	an indicator function of whether a tie between \(i\) and \(j\) is present in \(y\)
\(y _ i=\{j\in N: y _ {ij}=1\}\)	the set of \(i\)’s network neighbors.

Parameter	Meaning
\(e_{N}\)	the egocentric census, the information retained by the minimal egocentric sampling design when all nodes are sampled
\(S\subseteq N\)	the set of egos in a sample
\(e_{S}\)	the data contained in an egocentric sample
\(e_i\)	the “egocentric” view of network \(y\) from the point of view of actor \(i\) (“ego”), with the following parts:
\(e^e_i \equiv x_i\)	\(i\)’s own attributes
\(e^a_i \equiv (x_{j})_{j\in y_i}\)	an unordered list of attribute vectors of \(i\)’s immediate neighbors (“alters”), but not their identities (indices in \(N\))
\(e^e_{i,k}\equiv x_{i,k}\)	The \(k\)th attribute/covariate observed on ego \(i\)
\(e^a_{i,k}\equiv( x_{j,k})_{j\in y_i}\)	and its alters.

Statistic	\(g_{k}( y,x)\)	\(h _ {k}(e_i)\)
General sum over ties	\(\sum _ {(i,j)\in y} f _ k(x _ i,x _ j)\)	\(\frac{1}{2}\sum _ {j'\in e^\text{a} _ i} f _ k\big(e^\text{e}_i,e^\text{a}_{i,j'}\big)\)
Number of ties in the network	\(\lvert y \rvert\equiv \sum _ {(i,j) \in y} 1\)	\(\frac{1}{2}\lvert e^\text{a}_{i}\rvert\)
weighted by actor covariate \(x _ {i,k}\)	\(\sum _ {(i,j) \in y} (x _ {i,k}+x _ {j,k})\)	\(\frac{1}{2} \big(e^\text{e}_{i,k} \lvert e^\text{a}_{i}\rvert + \sum _ {j'\in e^\text{a} _ i} e^\text{a}_{i,j',k} \big)\)
weighted by difference in \(x _ {i,k}\)	\(\sum _ {(i,j) \in y} \lvert x _ {i,k}-x _ {j,k}\rvert\)	\(\frac{1}{2}\sum _ {j'\in e^\text{a} _ i} \lvert e^\text{e}_{i,k}-e^\text{a}_{i,j',k}\rvert\)
within groups identified by \(x _ {i,k}\)	\(\sum _ {(i,j) \in y} 1_{x _ {i,k}=x _ {j,k}}\)	\(\frac{1}{2}\sum _ {j'\in e^\text{a} _ i} 1_{ e^\text{e}_{i,k}= e^\text{a}_{i,j',k}}\)
General sum over actors	\(\sum _ {i\in N} f _ k\big\{x _ {i},(x _ j) _ {j\in y_{i}}\big\}\)	\(f _ k\big(e^\text{e}_i,e^\text{a}_{i}\big)\)
Number of actors with \(d\) neighbors	\(\sum _ {i\in N} 1_{\lvert y_{i}\rvert=d}\)	\(1_{\lvert e^\text{a}_{i}\rvert=d}\)
weighted by actor covariate \(x _ {i,k}\)	\(\sum _ {i\in N} x _ {i,k} 1_{\lvert y_{i}\rvert=d}\)	\(e^\text{e}_{i,k}1_{\lvert e^\text{a}_{i}\rvert=d}\)

Introduction to Egocentric Network Data Analysis with ERGMs using Statnet

Statnet Development Team

The statnet Project

Introduction

Prerequisites

Software Installation

Overview of ergm.ego

Key concepts

ERGMs

Network Sampling

Link-Trace Designs

Egocentric Designs

Methods for sampled network data

Model-based

Design-based

Theoretical Framework

Estimation

Practical issues

Network Size

Observable discrepancies

Egocentric target statistics

Statistical Inference

Survey design effects

The package ergm.ego

Data structure and input

Model terms

Model-related functions

Simulation related functions

Example Analysis

Data construction

From a network object

From external data

Exploratory analysis

Model Fitting

Preliminaries

Fit a simple model

Convergence assessment

GOF assessment

Improve the fit

Parameter recovery and sampling

Egocentric census

Sample same size

Smaller sample

Package Development

References

Appendices

1: Real world example

2: Formal definitions of egocentric statistics

Notation

Population network

Egocentric sample

Egocentric Statistics

The `statnet` Project

The package `ergm.ego`

From a `network` object