Social Network Analysis: Monte Carlo maximum likelihood estimation with RSiena
Social Network Analysis: Monte Carlo maximum likelihood estimation with RSiena
- 1 Load the the given data file
- 2 Notation in the given datasets
- 3 Data Cleansing: Friendship Network at period V
- 4 Friendship Network at period V : igraph network
- 5 Basic Ideas
- 6 ERGM Model_01 : friendship’s Network = mutuality + density + Gender
- 7 ERGM Model_02 : friendship’s Network = mutuality + density + transitive
- 8 ERGM Model_03 : friendship’s Network = mutuality + density + Gender + gwesp(triad Closure)
- 9 Model_04 : outdegree(density) + reciprocity + Gender homophily effect
- rm(list=ls())
- getwd()
1 Load the the given data file
library(igraph)
library(network)
library(RSiena)
library(DT)
library(magrittr)
library(reshape)
library(treemap)
## [1] "alc.att" "alc.beh" "fem" "fri" "hob" "res.act" "res.pas"
## [8] "sch.att" "sch.beh" "spo.att" "spo.beh" "str.abs" "tob.att" "tob.beh"
2 Notation in the given datasets
- Notes
- V = August/September 2003
- W = November/December 2003
- X = February/March 2004
- Y = May/June 2004
- NA = non-response
fri = adjacency matrix of friendship
- 0 = no friendship reported
- 1 = friendship reported
- NA = person did not respond
- 10 = structurally absent student
- “fem” : gender
- male(0) = yellow in color
- female(1) = green in color
3 Data Cleansing: Friendship Network at period V
fri_mat = matrix(fri)
fri_V = fri_mat[,1][[1]]
omitted_V_nodes = c(-23, -24,-25)
fri_V_final = fri_V[omitted_V_nodes , omitted_V_nodes ]
4 Friendship Network at period V : igraph network
igraph_fV_directed = graph.adjacency(as.matrix(fri_V_final),mode = "directed",weighted = NULL)
mynet_V = igraph_fV_directed
mynet_V$gender = fem[omitted_V_nodes]
plot(mynet_V , vertex.color=c( "yellow", "green")[ 1+ (mynet_V$gender==1)], vertex.size=11, edge.arrow.size=0.5, vertex.label.dist=0.01)
title("Yellow (Male) at period V")
5 Basic Ideas
In the paper of Goodreau, Kitts, and Morris (2009), the scholars believe that friendship is interdependent to generate complicated and complexed social network structure. They claim that friendship’s network is not originated from ethnicity but rather from gender. According to Balance Theory, a friend of a friend is a “mutual” and “reciprocal” relation existed in human behaviors; but “friends of your friends are (or are not) your friends” may be unconditionally transitive or may be potentially transitive or may be intransitive in a friendship triad closure. Therefore, gender can also be classified as homophilic relation between human, of which the network can build up either by assortative mixing or selective mixing. Our ERGM Models: mutuality(= reciporality )
+ Model_01 : friendship's Network = mutuality + density + Gender
+ Model_02 : friendship's Network = mutuality + density + transitive
+ Model_03 : friendship's Network = mutuality + density + Gender + gwesp(triad closure)
6 ERGM Model_01 : friendship’s Network = mutuality + density + Gender
6.1 Monte Carlo maximum likelihood estimation (MCMLE): model_01
model_01 = formula( fri_V_network~ mutual + density + nodematch("sexFem"))
results_01 = ergm(model_01)
summary(results_01)
##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: fri_V_network ~ mutual + density + nodematch("sexFem")
##
## Iterations: 2 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## mutual 2.3915 0.5090 0 4.698 <1e-04 ***
## density -1911.3584 220.4253 0 -8.671 <1e-04 ***
## nodematch.sexFem 2.1151 0.5138 0 4.117 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 640.5 on 462 degrees of freedom
## Residual Deviance: 268.2 on 459 degrees of freedom
##
## AIC: 274.2 BIC: 286.6 (Smaller is better.)
6.1.1 Results: ERGM Mode_01
There is no reason to accept the null hypothesis at 1% significant level.
Therefore, all coefficients are statistically signifcant different from zero at 1% level.
7 ERGM Model_02 : friendship’s Network = mutuality + density + transitive
model_02 = formula( fri_V_network~ reciporality + density + transitive )
results_02 = ergm(model_02)
summary(results_02)
7.1 Monte Carlo maximum likelihood estimation (MCMLE): model_02
Due to the problem of “Unconstrained MCMC sampling” in model_02, Computation process is crashed and stoped. The reason is mainly due to the interdepedency of transitive triad with other main effects(mutuality and density)
8 ERGM Model_03 : friendship’s Network = mutuality + density + Gender + gwesp(triad Closure)
8.1 Monte Carlo maximum likelihood estimation (MCMLE): model_03
model_03 = formula( fri_V_network~ mutual + density + nodematch("sexFem") + gwdsp(.5,fixed=T) )
results_03 = ergm(model_03)
summary(results_03)
##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: fri_V_network ~ mutual + density + nodematch("sexFem") + gwdsp(0.5,
## fixed = T)
##
## Iterations: 2 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## mutual 2.422e+00 4.850e-01 0 4.994 < 1e-04 ***
## density -1.347e+03 5.442e+00 100 -247.538 < 1e-04 ***
## nodematch.sexFem 1.944e+00 3.548e-01 0 5.478 < 1e-04 ***
## gwdsp.fixed.0.5 -2.877e-01 8.902e-02 0 -3.232 0.00123 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 640.5 on 462 degrees of freedom
## Residual Deviance: 263.4 on 458 degrees of freedom
##
## AIC: 271.4 BIC: 287.9 (Smaller is better.)
8.1.1 Results: ERGM Mode_03
There is no reason to accept the null hypothesis at 1% significant level.
Therefore, all coefficients are statistically signifcant different from zero at 1% level.
library(RSiena)
#### Note from RSiena user manaual, it mentions that the adj matrix structure can handle
#### NA in sparse matrix structure , therefore, it is userfriendly to convert "10" into NA and keep
#### the adj matrix dimension fixed over timespan, otherwise, it cannot compute with RSiena .
fri_mat = matrix(fri)
fri_V = fri_mat[,1][[1]]
fri_V[which(fri_V== 10)]=NA
fri_VNA = fri_V
fri_W = fri_mat[,1][[2]]
fri_W[which(fri_W== 10)]=NA
fri_WNA = fri_W
fri_X = fri_mat[,1][[3]]
fri_X[which(fri_X== 10)]=NA
fri_XNA = fri_X
fri_Y = fri_mat[,1][[4]]
fri_Y[which(fri_Y== 10)]=NA
fri_YNA = fri_Y
# Note : Since there is no fem dataset provides for Time at V,W,V,X,
# therefore I assume fem keep fixed over timespan !
femVWXY = data.frame(fem, fem, fem,fem)
colnames(femVWXY) = c("femV", "femW","femX","femY" )
femVWXYmat = as.matrix(femVWXY)
9 Model_04 : outdegree(density) + reciprocity + Gender homophily effect
9.1 Model_04 (Steglich, Snijders, and West (2006))
- Testing Gender homophily effect: Actors tend to prefer same gender friendships
- covariate-related similarity (simX) in RSiena
mynet_VWXYNA = sienaDependent(array(c(fri_VNA,fri_WNA ,fri_XNA ,fri_YNA ), dim=c(dim(fri_YNA) , 4)))
Fem_Homophily = sienaDependent(femVWXYmat, type="behavior")
Comov_Fri_FemHom_data = sienaDataCreate(mynet_VWXYNA , Fem_Homophily)
## For some variables, in some periods, there are only increases, or only decreases.
## This will be respected in the simulations.
## If this is not desired, use allowOnly=FALSE when creating the dependent variables.
Comov_Fri_FemHom_data
## Dependent variables: mynet_VWXYNA, Fem_Homophily
## Number of observations: 4
##
## Nodeset Actors
## Number of nodes 25
##
## Dependent variable mynet_VWXYNA
## Type oneMode
## Observations 4
## Nodeset Actors
## Densities 0.12 0.15 0.16 0.15
##
## Dependent variable Fem_Homophily
## Type behavior
## Observations 4
## Nodeset Actors
## Range 0 - 1
##
## "uponly": Fem_Homophily: all periods
## "downonly": Fem_Homophily: all periods
Comov_Fri_FemHom_eff = getEffects(Comov_Fri_FemHom_data)
Comov_Fri_FemHom_eff_final = includeEffects(Comov_Fri_FemHom_eff ,simX, interaction1="Fem_Homophily", name="mynet_VWXYNA")
## effectName include fix test initialValue parm
## 1 Fem_Homophily similarity TRUE FALSE FALSE 0 0
Comov_algm = sienaAlgorithmCreate(nsub=3, n3=300 , seed = 12345, projname='Comov-results')
Comov_FemHom_RM03 = siena07(Comov_algm ,data=Comov_Fri_FemHom_data, effects=Comov_Fri_FemHom_eff_final ,batch=TRUE,verbose=TRUE,returnDeps= TRUE )
Comov_FemHom_RM03_gofod = sienaGOF(Comov_FemHom_RM03,verbose=TRUE, varName="mynet_VWXYNA", OutdegreeDistribution, cumulative=TRUE, levls= 1:10)
Comov_FemHom_RM03_gofTc = sienaGOF(Comov_FemHom_RM03, TriadCensus, verbose=TRUE, join=TRUE, varName="mynet_VWXYNA")
coCovarExtraction = function (i, obsData, sims, period, groupName, varName){return(obsData[[groupName]]$cCovars[[varName]]) }
demoClust = function(i, data, sims, wave, groupName, varName, levls=1){
#unloadNamespace("igraph") # to avoid package clashes
require(sna)
require(network)
x <- as.sociomatrix(networkExtraction(i, data, sims, wave, groupName, varName[1]))
v <- coCovarExtraction(i,data,sims,wave,groupName,varName[2])
# handle missing data [note that distances may get larger due to this!]:
x[is.na(x)] <- 0
# make "same covariate" matrix:
samev <- outer(v,v,'==')*1
diag(samev) <- NA
# calculate distances (replace "unreachable" by "in n steps"):
dista <- geodist(x,inf.replace=length(v))$gdist # distance matrix
in.dist <- mean(dista[samev==1],na.rm=TRUE) # within group mean
out.dist <- mean(dista[samev==0],na.rm=TRUE) # between groups mean
# calculate number of in-group clusters:
xs <- x*samev # reduces network to those of same v group
diag(xs) <- 0 # was NA due to samev settings above
xs2 <- xs%*%xs # calculates two-paths inside same v groups
diag(xs2) <- 0 # omits reciprocal ties (returning two-paths)
in.clusters <- sum(xs*xs2,na.rm=TRUE) # counts closed two-paths
# for Moran also impute missing covariate information by covariate mean:
v[is.na(v)] <- mean(v,na.rm=TRUE)
moran <- nacf(x,v,lag.max=1,typ="moran")[2]
# combine everything in vector of three outcomes:
res <- c(moran,in.clusters,out.dist,in.dist)
names(res) <- c("Moran's I","in.clusters","mean.out.dist","mean.in.dist")
return(res)
}
Moran123 = function(i, data, sims, wave, groupName, varName, levls=1){
#unloadNamespace("igraph") # to avoid package clashes
require(sna)
require(network)
x <- as.sociomatrix(networkExtraction(i, data, sims, wave, groupName, varName[1]))
z <- behaviorExtraction(i,data,sims,wave,groupName,varName[2])
# handle missing data [not checked if this makes sense]:
x[is.na(x)] <- 0
z[is.na(z)] <- mean(z,na.rm=TRUE)
res <- nacf(x,z,lag.max=3,typ="moran")[2:4]
names(res) <- c("d=1","d=2","d=3")
return(res)
}
MoranGeary = function(i, data, sims, wave, groupName, varName, levls=1:2){
#unloadNamespace("igraph") # to avoid package clashes
require(sna)
require(network)
x <- as.sociomatrix(networkExtraction(i, data, sims, wave, groupName, varName[1]))
z <- behaviorExtraction(i,data,sims,wave,groupName,varName[2])
n <- length(z)
z.ave <- mean(z,na.rm=TRUE)
numerator <- n*sum(x*outer(z-z.ave,z-z.ave),na.rm=TRUE)
denominator <- sum(x,na.rm=TRUE)*sum((z-z.ave)^2,na.rm=TRUE)
res <- numerator/denominator
numerator <- (n-1)*sum(x*(outer(z,z,FUN='-')^2),na.rm=TRUE)
denominator <- 2*sum(x,na.rm=TRUE)*sum((z-z.ave)^2,na.rm=TRUE)
res[2] <- numerator/denominator
names(res) <- c("Moran","Geary")
return(res)
}
Comov_FemHom_RM03_gofMoran = sienaGOF(Comov_FemHom_RM03,Moran123,verbose=TRUE,join=TRUE,varName=c("mynet_VWXYNA" , "Fem_Homophily"))
Comov_FemHom_RM03_gofMoranGeary = sienaGOF(Comov_FemHom_RM03,MoranGeary,verbose=TRUE,join=TRUE,varName=c("mynet_VWXYNA" , "Fem_Homophily"))
summary(Comov_FemHom_RM03)
## Estimates, standard errors and convergence t-ratios
##
## Estimate Standard Convergence
## Error t-ratio
## Network Dynamics
## 1. rate constant mynet_VWXYNA rate (period 1) 5.5053 ( 1.5197 ) 0.0218
## 2. rate constant mynet_VWXYNA rate (period 2) 3.7978 ( 0.8390 ) -0.0413
## 3. rate constant mynet_VWXYNA rate (period 3) 3.2565 ( 1.0067 ) -0.1302
## 4. eval outdegree (density) -2.0573 ( 0.2614 ) 0.0417
## 5. eval reciprocity 1.2240 ( 0.2215 ) 0.0289
## 6. eval Fem_Homophily similarity 2.3521 ( 0.4757 ) 0.0502
##
## Behavior Dynamics
## 7. rate rate Fem_Homophily (period 1) 0.1000 ( NA ) 0.0000
## 8. rate rate Fem_Homophily (period 2) 0.1000 ( NA ) 0.0000
## 9. rate rate Fem_Homophily (period 3) 0.1000 ( NA ) 0.0000
##
## Overall maximum convergence ratio: 0.1572
##
##
## Total of 1882 iteration steps.
##
## Covariance matrix of estimates (correlations below diagonal)
##
## 2.310 -0.003 0.086 0.083 -0.020 -0.144 NA NA NA
## -0.003 0.704 0.034 0.039 -0.018 -0.094 NA NA NA
## 0.056 0.040 1.013 0.078 -0.029 -0.120 NA NA NA
## 0.210 0.177 0.296 0.068 -0.018 -0.107 NA NA NA
## -0.061 -0.094 -0.132 -0.310 0.049 -0.004 NA NA NA
## -0.200 -0.237 -0.251 -0.863 -0.037 0.226 NA NA NA
## NA NA NA NA NA NA NA NA NA
## NA NA NA NA NA NA NA NA NA
## NA NA NA NA NA NA NA NA NA
##
## Derivative matrix of expected statistics X by parameters:
##
## 3.701 0.000 0.000 0.189 0.255 0.149 0.000 0.000 0.000
## 0.000 6.550 0.000 1.575 1.883 1.023 0.000 0.000 0.000
## 0.000 0.000 4.425 -0.095 -0.020 0.334 0.000 0.000 0.000
## 6.075 4.696 3.374 96.794 73.456 44.747 0.000 0.000 0.000
## -0.091 1.137 2.809 39.694 58.930 19.204 0.000 0.000 0.000
## 3.927 3.938 1.922 46.106 36.258 24.962 0.000 0.000 0.000
## 3.251 0.000 0.000 9.384 7.572 4.311 0.000 0.000 0.000
## 0.000 -7.265 0.000 -4.747 -3.979 -1.748 0.000 0.000 0.000
## 0.000 0.000 -3.109 -2.197 -2.067 -0.089 0.000 0.000 0.000
##
## Covariance matrix of X (correlations below diagonal):
##
## 32.108 -0.929 1.309 8.925 1.378 3.469 0.000 0.000 0.000
## -0.031 28.339 -0.094 2.473 0.014 1.461 0.000 0.000 0.000
## 0.052 -0.004 19.623 6.488 2.887 2.946 0.000 0.000 0.000
## 0.170 0.050 0.158 86.111 65.138 39.123 0.000 0.000 0.000
## 0.026 0.000 0.069 0.743 89.303 30.687 0.000 0.000 0.000
## 0.136 0.061 0.147 0.934 0.719 20.390 0.000 0.000 0.000
## NaN NaN NaN NaN NaN NaN 0.000 0.000 0.000
## NaN NaN NaN NaN NaN NaN NaN 0.000 0.000
## NaN NaN NaN NaN NaN NaN NaN NaN 0.000
plot(Comov_FemHom_RM03_gofod )
## Note: some statistics are not plotted because their variance is 0.
## This holds for the statistic: 10.
plot(Comov_FemHom_RM03_gofTc , center=TRUE, scale=TRUE)
plot(Comov_FemHom_RM03_gofMoran)
plot(Comov_FemHom_RM03_gofMoranGeary)
9.1.1 Results: Model_04
result : The stochastic simulation approximation reusult is acceptable :
+ all coeffs of t-ratios for convergence are < 0.1 in absolute value + except rate Fem_Homophily (period 1,2,3) because "fem" is fixed over timespan. + Overall maximum convergence is < 0.25
Goodreau, Steven M., James A. Kitts, and Martina Morris. 2009. “Birds of a Feather, or Friend of a Friend? Using Exponential Random Graph Models to Investigate Adolescent Social Networks*.” Demography 46 (1): 103–25. https://doi.org/10.1353/dem.0.0045.
Steglich, Christian, Tom AB Snijders, and Patrick West. 2006. “Applying Siena.” Methodology 2 (1): 48–56. https://doi.org/10.1027/1614-2241.2.1.48.
May 31, 2019