Social Network Analysis: Monte Carlo maximum likelihood estimation with RSiena

  • rm(list=ls())
  • getwd()

1 Load the the given data file

library(igraph)
library(network)
library(RSiena)
library(DT)
library(magrittr)
library(reshape)
library(treemap)
##  [1] "alc.att" "alc.beh" "fem"     "fri"     "hob"     "res.act" "res.pas"
##  [8] "sch.att" "sch.beh" "spo.att" "spo.beh" "str.abs" "tob.att" "tob.beh"

2 Notation in the given datasets

  • Notes
    • V = August/September 2003
    • W = November/December 2003
    • X = February/March 2004
    • Y = May/June 2004
    • NA = non-response
  • fri = adjacency matrix of friendship

    • 0 = no friendship reported
    • 1 = friendship reported
    • NA = person did not respond
    • 10 = structurally absent student
  • “fem” : gender
    • male(0) = yellow in color
    • female(1) = green in color

3 Data Cleansing: Friendship Network at period V

fri_mat = matrix(fri)
fri_V = fri_mat[,1][[1]]
omitted_V_nodes = c(-23, -24,-25) 
fri_V_final  = fri_V[omitted_V_nodes , omitted_V_nodes ] 

4 Friendship Network at period V : igraph network

igraph_fV_directed = graph.adjacency(as.matrix(fri_V_final),mode = "directed",weighted = NULL)
mynet_V = igraph_fV_directed
mynet_V$gender = fem[omitted_V_nodes]
plot(mynet_V , vertex.color=c( "yellow", "green")[ 1+ (mynet_V$gender==1)], vertex.size=11, edge.arrow.size=0.5,  vertex.label.dist=0.01)
title("Yellow (Male)  at period V")

5 Basic Ideas

In the paper of Goodreau, Kitts, and Morris (2009), the scholars believe that friendship is interdependent to generate complicated and complexed social network structure. They claim that friendship’s network is not originated from ethnicity but rather from gender. According to Balance Theory, a friend of a friend is a “mutual” and “reciprocal” relation existed in human behaviors; but “friends of your friends are (or are not) your friends” may be unconditionally transitive or may be potentially transitive or may be intransitive in a friendship triad closure. Therefore, gender can also be classified as homophilic relation between human, of which the network can build up either by assortative mixing or selective mixing. Our ERGM Models: mutuality(= reciporality )

    + Model_01 :  friendship's Network = mutuality + density  + Gender 
    
    + Model_02 :  friendship's Network = mutuality + density + transitive 
    
    + Model_03 :  friendship's Network = mutuality  + density + Gender + gwesp(triad closure)

6 ERGM Model_01 : friendship’s Network = mutuality + density + Gender

6.1 Monte Carlo maximum likelihood estimation (MCMLE): model_01

model_01 = formula( fri_V_network~ mutual + density + nodematch("sexFem"))
results_01 = ergm(model_01)
summary(results_01)
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   fri_V_network ~ mutual + density + nodematch("sexFem")
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##                    Estimate Std. Error MCMC % z value Pr(>|z|)    
## mutual               2.3915     0.5090      0   4.698   <1e-04 ***
## density          -1911.3584   220.4253      0  -8.671   <1e-04 ***
## nodematch.sexFem     2.1151     0.5138      0   4.117   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 640.5  on 462  degrees of freedom
##  Residual Deviance: 268.2  on 459  degrees of freedom
##  
## AIC: 274.2    BIC: 286.6    (Smaller is better.)

6.1.1 Results: ERGM Mode_01

  • There is no reason to accept the null hypothesis at 1% significant level.

  • Therefore, all coefficients are statistically signifcant different from zero at 1% level.

7 ERGM Model_02 : friendship’s Network = mutuality + density + transitive

model_02 = formula( fri_V_network~ reciporality + density + transitive )
results_02 = ergm(model_02)
summary(results_02)

7.1 Monte Carlo maximum likelihood estimation (MCMLE): model_02

Due to the problem of “Unconstrained MCMC sampling” in model_02, Computation process is crashed and stoped. The reason is mainly due to the interdepedency of transitive triad with other main effects(mutuality and density)

8 ERGM Model_03 : friendship’s Network = mutuality + density + Gender + gwesp(triad Closure)

8.1 Monte Carlo maximum likelihood estimation (MCMLE): model_03

model_03 = formula( fri_V_network~ mutual + density + nodematch("sexFem") + gwdsp(.5,fixed=T) )
results_03 = ergm(model_03)
summary(results_03)
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   fri_V_network ~ mutual + density + nodematch("sexFem") + gwdsp(0.5, 
##     fixed = T)
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##                    Estimate Std. Error MCMC %  z value Pr(>|z|)    
## mutual            2.422e+00  4.850e-01      0    4.994  < 1e-04 ***
## density          -1.347e+03  5.442e+00    100 -247.538  < 1e-04 ***
## nodematch.sexFem  1.944e+00  3.548e-01      0    5.478  < 1e-04 ***
## gwdsp.fixed.0.5  -2.877e-01  8.902e-02      0   -3.232  0.00123 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 640.5  on 462  degrees of freedom
##  Residual Deviance: 263.4  on 458  degrees of freedom
##  
## AIC: 271.4    BIC: 287.9    (Smaller is better.)

8.1.1 Results: ERGM Mode_03

  • There is no reason to accept the null hypothesis at 1% significant level.

  • Therefore, all coefficients are statistically signifcant different from zero at 1% level.

library(RSiena)

#### Note  from RSiena user manaual, it mentions that the adj matrix structure  can handle 
#### NA in sparse matrix structure , therefore,  it is userfriendly to convert "10" into NA and keep 
#### the adj matrix dimension fixed over timespan, otherwise, it cannot compute with RSiena .

fri_mat = matrix(fri)

fri_V = fri_mat[,1][[1]]
fri_V[which(fri_V== 10)]=NA
fri_VNA = fri_V

fri_W = fri_mat[,1][[2]]
fri_W[which(fri_W== 10)]=NA
fri_WNA = fri_W

fri_X = fri_mat[,1][[3]]
fri_X[which(fri_X== 10)]=NA
fri_XNA = fri_X


fri_Y = fri_mat[,1][[4]]
fri_Y[which(fri_Y== 10)]=NA
fri_YNA = fri_Y

# Note : Since there is no fem dataset provides for Time at V,W,V,X, 
# therefore I assume fem keep fixed over timespan !
femVWXY = data.frame(fem, fem, fem,fem) 
colnames(femVWXY) = c("femV", "femW","femX","femY" ) 
femVWXYmat = as.matrix(femVWXY)

9 Model_04 : outdegree(density) + reciprocity + Gender homophily effect

9.1 Model_04 (Steglich, Snijders, and West (2006))

  • Testing Gender homophily effect: Actors tend to prefer same gender friendships
  • covariate-related similarity (simX) in RSiena
mynet_VWXYNA = sienaDependent(array(c(fri_VNA,fri_WNA ,fri_XNA ,fri_YNA   ), dim=c(dim(fri_YNA) , 4)))
Fem_Homophily = sienaDependent(femVWXYmat, type="behavior")
Comov_Fri_FemHom_data = sienaDataCreate(mynet_VWXYNA , Fem_Homophily)
## For some variables, in some periods, there are only increases, or only decreases.
## This will be respected in the simulations.
## If this is not desired, use allowOnly=FALSE when creating the dependent variables.
Comov_Fri_FemHom_data
## Dependent variables:  mynet_VWXYNA, Fem_Homophily 
## Number of observations: 4 
## 
## Nodeset                  Actors 
## Number of nodes              25 
## 
## Dependent variable mynet_VWXYNA       
## Type               oneMode            
## Observations       4                  
## Nodeset            Actors             
## Densities          0.12 0.15 0.16 0.15
## 
## Dependent variable Fem_Homophily
## Type               behavior     
## Observations       4            
## Nodeset            Actors       
## Range              0 - 1        
## 
## "uponly":  Fem_Homophily:    all periods
## "downonly":  Fem_Homophily:    all periods
Comov_Fri_FemHom_eff = getEffects(Comov_Fri_FemHom_data)
Comov_Fri_FemHom_eff_final = includeEffects(Comov_Fri_FemHom_eff ,simX,  interaction1="Fem_Homophily", name="mynet_VWXYNA")
##   effectName               include fix   test  initialValue parm
## 1 Fem_Homophily similarity TRUE    FALSE FALSE          0   0
Comov_algm = sienaAlgorithmCreate(nsub=3, n3=300 , seed = 12345, projname='Comov-results')
Comov_FemHom_RM03 = siena07(Comov_algm ,data=Comov_Fri_FemHom_data, effects=Comov_Fri_FemHom_eff_final ,batch=TRUE,verbose=TRUE,returnDeps= TRUE )
Comov_FemHom_RM03_gofod  = sienaGOF(Comov_FemHom_RM03,verbose=TRUE, varName="mynet_VWXYNA", OutdegreeDistribution, cumulative=TRUE, levls= 1:10)
Comov_FemHom_RM03_gofTc  = sienaGOF(Comov_FemHom_RM03, TriadCensus, verbose=TRUE, join=TRUE, varName="mynet_VWXYNA")
coCovarExtraction = function (i, obsData, sims, period, groupName, varName){return(obsData[[groupName]]$cCovars[[varName]]) }

demoClust = function(i, data, sims, wave, groupName, varName, levls=1){
    #unloadNamespace("igraph") # to avoid package clashes
    require(sna)
    require(network)
    x <- as.sociomatrix(networkExtraction(i, data, sims, wave, groupName, varName[1]))
    v <- coCovarExtraction(i,data,sims,wave,groupName,varName[2])

    # handle missing data [note that distances may get larger due to this!]:
    x[is.na(x)] <- 0

    # make "same covariate" matrix:
    samev <- outer(v,v,'==')*1
    diag(samev) <- NA

    # calculate distances (replace "unreachable"  by "in n steps"):
    dista <- geodist(x,inf.replace=length(v))$gdist # distance matrix
    in.dist <- mean(dista[samev==1],na.rm=TRUE) # within group mean
    out.dist <- mean(dista[samev==0],na.rm=TRUE) # between groups mean

    # calculate number of in-group clusters:
    xs <- x*samev # reduces network to those of same v group
    diag(xs) <- 0 # was NA due to samev settings above
    xs2 <- xs%*%xs # calculates two-paths inside same v groups
    diag(xs2) <- 0 # omits reciprocal ties (returning two-paths)
    in.clusters <- sum(xs*xs2,na.rm=TRUE) # counts closed two-paths

    # for Moran also impute missing covariate information by covariate mean:
    v[is.na(v)] <- mean(v,na.rm=TRUE)
    moran <- nacf(x,v,lag.max=1,typ="moran")[2]

    # combine everything in vector of three outcomes:
    res <- c(moran,in.clusters,out.dist,in.dist)
    names(res) <- c("Moran's I","in.clusters","mean.out.dist","mean.in.dist")
    return(res)
}

Moran123 = function(i, data, sims, wave, groupName, varName, levls=1){
    #unloadNamespace("igraph") # to avoid package clashes
    require(sna)
    require(network)
    x <- as.sociomatrix(networkExtraction(i, data, sims, wave, groupName, varName[1]))
    z <- behaviorExtraction(i,data,sims,wave,groupName,varName[2])
    # handle missing data [not checked if this makes sense]:
    x[is.na(x)] <- 0
    z[is.na(z)] <- mean(z,na.rm=TRUE)
    res <- nacf(x,z,lag.max=3,typ="moran")[2:4]
    names(res) <- c("d=1","d=2","d=3")
    return(res)
}

MoranGeary = function(i, data, sims, wave, groupName, varName, levls=1:2){
    #unloadNamespace("igraph") # to avoid package clashes
    require(sna)
    require(network)
    x <- as.sociomatrix(networkExtraction(i, data, sims, wave, groupName, varName[1]))
    z <- behaviorExtraction(i,data,sims,wave,groupName,varName[2])
    n <- length(z)
    z.ave <- mean(z,na.rm=TRUE)
    numerator <- n*sum(x*outer(z-z.ave,z-z.ave),na.rm=TRUE)
    denominator <- sum(x,na.rm=TRUE)*sum((z-z.ave)^2,na.rm=TRUE)
    res <- numerator/denominator
    numerator <- (n-1)*sum(x*(outer(z,z,FUN='-')^2),na.rm=TRUE)
    denominator <- 2*sum(x,na.rm=TRUE)*sum((z-z.ave)^2,na.rm=TRUE)
    res[2] <- numerator/denominator
    names(res) <- c("Moran","Geary")
    return(res)
}
Comov_FemHom_RM03_gofMoran = sienaGOF(Comov_FemHom_RM03,Moran123,verbose=TRUE,join=TRUE,varName=c("mynet_VWXYNA" , "Fem_Homophily"))
Comov_FemHom_RM03_gofMoranGeary  = sienaGOF(Comov_FemHom_RM03,MoranGeary,verbose=TRUE,join=TRUE,varName=c("mynet_VWXYNA" , "Fem_Homophily"))
summary(Comov_FemHom_RM03)
## Estimates, standard errors and convergence t-ratios
## 
##                                                 Estimate   Standard   Convergence 
##                                                              Error      t-ratio   
## Network Dynamics 
##   1. rate constant mynet_VWXYNA rate (period 1)  5.5053  ( 1.5197   )    0.0218   
##   2. rate constant mynet_VWXYNA rate (period 2)  3.7978  ( 0.8390   )   -0.0413   
##   3. rate constant mynet_VWXYNA rate (period 3)  3.2565  ( 1.0067   )   -0.1302   
##   4. eval outdegree (density)                   -2.0573  ( 0.2614   )    0.0417   
##   5. eval reciprocity                            1.2240  ( 0.2215   )    0.0289   
##   6. eval Fem_Homophily similarity               2.3521  ( 0.4757   )    0.0502   
## 
## Behavior Dynamics
##   7. rate rate Fem_Homophily (period 1)          0.1000  (     NA   )    0.0000   
##   8. rate rate Fem_Homophily (period 2)          0.1000  (     NA   )    0.0000   
##   9. rate rate Fem_Homophily (period 3)          0.1000  (     NA   )    0.0000   
## 
## Overall maximum convergence ratio:    0.1572 
## 
## 
## Total of 1882 iteration steps.
## 
## Covariance matrix of estimates (correlations below diagonal)
## 
##        2.310       -0.003        0.086        0.083       -0.020       -0.144           NA           NA           NA
##       -0.003        0.704        0.034        0.039       -0.018       -0.094           NA           NA           NA
##        0.056        0.040        1.013        0.078       -0.029       -0.120           NA           NA           NA
##        0.210        0.177        0.296        0.068       -0.018       -0.107           NA           NA           NA
##       -0.061       -0.094       -0.132       -0.310        0.049       -0.004           NA           NA           NA
##       -0.200       -0.237       -0.251       -0.863       -0.037        0.226           NA           NA           NA
##           NA           NA           NA           NA           NA           NA           NA           NA           NA
##           NA           NA           NA           NA           NA           NA           NA           NA           NA
##           NA           NA           NA           NA           NA           NA           NA           NA           NA
## 
## Derivative matrix of expected statistics X by parameters:
## 
##        3.701        0.000        0.000        0.189        0.255        0.149        0.000        0.000        0.000
##        0.000        6.550        0.000        1.575        1.883        1.023        0.000        0.000        0.000
##        0.000        0.000        4.425       -0.095       -0.020        0.334        0.000        0.000        0.000
##        6.075        4.696        3.374       96.794       73.456       44.747        0.000        0.000        0.000
##       -0.091        1.137        2.809       39.694       58.930       19.204        0.000        0.000        0.000
##        3.927        3.938        1.922       46.106       36.258       24.962        0.000        0.000        0.000
##        3.251        0.000        0.000        9.384        7.572        4.311        0.000        0.000        0.000
##        0.000       -7.265        0.000       -4.747       -3.979       -1.748        0.000        0.000        0.000
##        0.000        0.000       -3.109       -2.197       -2.067       -0.089        0.000        0.000        0.000
## 
## Covariance matrix of X (correlations below diagonal):
## 
##       32.108       -0.929        1.309        8.925        1.378        3.469        0.000        0.000        0.000
##       -0.031       28.339       -0.094        2.473        0.014        1.461        0.000        0.000        0.000
##        0.052       -0.004       19.623        6.488        2.887        2.946        0.000        0.000        0.000
##        0.170        0.050        0.158       86.111       65.138       39.123        0.000        0.000        0.000
##        0.026        0.000        0.069        0.743       89.303       30.687        0.000        0.000        0.000
##        0.136        0.061        0.147        0.934        0.719       20.390        0.000        0.000        0.000
##          NaN          NaN          NaN          NaN          NaN          NaN        0.000        0.000        0.000
##          NaN          NaN          NaN          NaN          NaN          NaN          NaN        0.000        0.000
##          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN        0.000
plot(Comov_FemHom_RM03_gofod )
## Note: some statistics are not plotted because their variance is 0.
## This holds for the statistic: 10.

plot(Comov_FemHom_RM03_gofTc , center=TRUE, scale=TRUE)

plot(Comov_FemHom_RM03_gofMoran)

plot(Comov_FemHom_RM03_gofMoranGeary)

9.1.1 Results: Model_04

  • result : The stochastic simulation approximation reusult is acceptable :

    + all coeffs of t-ratios for convergence  are < 0.1 in absolute value
    + except rate Fem_Homophily (period 1,2,3) because "fem" is fixed over timespan.
    + Overall maximum convergence is < 0.25

Goodreau, Steven M., James A. Kitts, and Martina Morris. 2009. “Birds of a Feather, or Friend of a Friend? Using Exponential Random Graph Models to Investigate Adolescent Social Networks*.” Demography 46 (1): 103–25. https://doi.org/10.1353/dem.0.0045.

Steglich, Christian, Tom AB Snijders, and Patrick West. 2006. “Applying Siena.” Methodology 2 (1): 48–56. https://doi.org/10.1027/1614-2241.2.1.48.

Written by DK WC

May 31, 2019