12/03/2018

Binomial Random Graph

  • Recall the random graph model (Erdos and Renyi model) \[p(G)=\frac{N_e}{N} \, \mbox{ where } N = \left(\begin{array}{c} N_{v}\\ 2 \end{array}\right),\] \(N\) is the total number of distinct node pairs.

  • \(p\) measures the proportion of observed ties \(N_e\) to the maximum number of possible ties \(N_v (N_v -1)\). So \(p \propto N_e / N_v^2\).

Binomial Random Graph

  • This is a Bernoulli graph with \(n\) nodes and probability of an edge \(p\). When \(N_v\) gets larger, the proportion of vertices with degree \(d\) in \(G\) would follow a Poisson distribution with mean \(p(N_v - 1)\) (Poisson approximation to a binomial distribution).

Binomial Random Graph

gen.berG = function(n,p){
    id = seq(1,n,by=1) # Create vertex set
    E = rbinom(choose(n,2),1,p) # Create edge set
    A = matrix(0, nrow = n, ncol = n) # Adjacency matrix
    # Distribute the edges for all pairs
    count = 1
    for(j in 1:(n-1)){
        for(k in (j+1):n){
            A[j,k] = E[count]
            A[k,j] = A[j,k]
            count = count + 1}
    }
    G = list(id=id,E=E,A=A); return(G)
}

Binomial Random Graph

par(mfrow = c(1,2)); g.er = erdos.renyi.game(80,0.05); g=gen.berG(80,0.05);
g.er = asNetwork(g.er); g = as.network(g$A, directed=FALSE);
plot(g.er); plot(g)

detach("package:igraph", unload=TRUE)

ERGM Null Model

  • The previous ERGM model \[ p(\mathbf{Y} |\eta)=\frac{1}{\kappa}\exp\left( \eta N_e \right)\] where \(\sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y})=\sum_{i,j} \eta_{ij} y_{ij} = \eta \sum_{i,j} y_{ij} = \eta N_e\). In particular \[p(y_{ij}=1 |\eta)=\frac{\exp( \eta y_{ij})}{exp(\eta \times 0)+\exp( \eta y_{ij})}=\frac{\exp( \eta)}{1+\exp( \eta)}\] which is the logistic link function.

  • As GLM, when \(Y\) is a binary variable, the link function is the logistic function.

ERGM Null Model

  • So the odds-ratio is \[\frac{p(y_{ij}=1 |\eta)}{p(y_{ij}=0 |\eta)}=\exp( \eta).\] So if the Binomial random graph is generated by \(p=0.05\), then the odds-ratio \(\frac{0.05}{0.95}=0.0526\).
library(ergm); eg = ergm(g~edges); exp(coef(eg))
##     edges 
## 0.0446281
eger = ergm(g.er~edges); exp(coef(eger))
##      edges 
## 0.05368456

Network Data

  • Network data also store characteristic of nodes and edges.

  • Nodes (people): basic information on these peoples such as gender or income.

  • Edges (interactions): numerical values about the positive and negative strengths.

  • Subgroups: Many networks are made up of relatively densely connected subgroups.

Network Data

g2 = rbind(c(0,1,1,0), 
           c(0,0,1,1), 
           c(0,1,0,0), 
           c(0,0,1,0)); 
rownames(g2) = c("A","B","C","D"); 
colnames(g2) = rownames(g2); g2 = as.network(g2);
set.vertex.attribute(g2, "gender", c("F", "F", "M", "F"))
set.edge.attribute(g2,"Random.Val", runif(network.size(g2),0,1))
get.vertex.attribute(g2,"gender")
## [1] "F" "F" "M" "F"
get.edge.attribute(g2,"Random.Val")
## [1] 0.94292907 0.02547872 0.86776161 0.97411587 0.94292907 0.02547872

Network Data

vernames = get.vertex.attribute(g2,"gender")
subg2 = get.inducedSubgraph(g2, which(vernames== "F"))
level = get.edge.attribute(g2, "Random.Val") 
par(mfrow = c(1,2)); plot(subg2);
plot(g2, vertex.cex=degree(g2), label=vernames, edge.lwd=15*level)

Markov Dependence and ERGM

  • The previous model \[ p(\mathbf{Y} |\eta)=\frac{1}{\kappa}\exp\left( \eta N_e \right)\] where \(\sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y})=\sum_{i,j} \eta_{ij} y_{ij} = \eta \sum_{i,j} y_{ij} = \eta N_e\).

  • \(N_e\): counts of the number of two nodes ties in \(G\).

  • Configuration \(\mathcal{C}\) in \(\sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y})\) can be broken into different types of nodes connections by Markovian condition of dependence.

  • Two possible edges are dependent whenever they share a vertex, conditional on all other possible edges.

Markov Dependence and ERGM

  • Links between nodes rarely appear at random: agents have specific reasons to connect with one agent rather than another. For example, Homophily - friends of my friends are my friends.

  • The odds of a link depend on the neighborhood of the node and not on the entire rest of the graph.

  • A regression model for \(y_{ij}\) is specified by a form: \[\Pr(y_{ij}=1|Y_{(-ij)}=y_{(-ij)} )\] where \(Y_{(-ij)}\) and \(Y_{ij}\) are dependent conditional on that \(y_{(-ij)}\) is on the neighborhood of \(y_{ij}\). This is an auto-regressive model.

Markov Dependence and ERGM

  • Edges represent mutual ties (two nodes). An incident of two edges constructs a triangle tie (three nodes).

  • Two edge indicators \(\{i,j\}\) and \(\{i',k\}\) are conditionally dependent if \(\{i,j\} \cap \{i',k\} \neq \textrm{Ø}\). An incident of \(k\)-edges constructs \(k\)-level star structure

Markov Dependence and ERGM

  • Triangle structure induces dependent clusters: two edge indicators \(\{i,j\}\) and \(\{l,k\}\) are conditionally dependent if \(\{i,j\}, \{l,k\} \in \mbox{cluster}\).

Markov Dependence and ERGM

  • Consider adding the triangular configuration \(\mathcal{C}_2\) to the original configuration \(\mathcal{C}_1\): \[p(\mathbf{Y} |\eta)=\frac{1}{\kappa}\exp\left( \eta_{1}\phi_1 (\mathbf{Y}) + \eta_{2}\phi_2 (\mathbf{Y}) \right)\] \[\phi_k(\mathbf{Y}) = \prod_{\{i,j\}\in \mathcal{C}_k} y_{ij} = \begin{cases} 1, & \{y_{ij}\}_{\{i,j\}\in\mathcal{C}_k } \mbox{ conditional on the rest, }\\ 0 & \mbox{otherwise}. \end{cases}\] In particular, \(\phi_1(\cdot)\) denotes the number of mutual nodes, namely \(N_e\), \(\phi_2(\cdot)\) denotes the number of triangle nodes.

Markov Dependence and ERGM

  • The odds-ratio now conditional depends on: \[\log \frac{p(\mathbf{Y}=y_{ij}=1| y_{(-ij)}, \eta)}{p(\mathbf{Y}=y_{ij}=1|y_{(-ij)} , \eta)} = \eta_{1}\phi_1 (y_{ij}, y_{(-ij)} ) + \eta_{2}\phi_2 (y_{ij}, y_{(-ij)})\] is the difference in counts of two structural types, tie and triangle.

  • The parameter \(\eta_1\) and \(\eta_2\) measure the relative importance of the ties and triangles shape.

Florentine Data

  • This is a statnet package build-in data set consisting of weddings among leading Florentine families.
data(florentine); plot(flomarriage,displaylabels=T,boxed.labels=F)

Florentine Data

flo1 = ergm(flomarriage~edges)
or = exp(coef(flo1)); or # odds-ratio
## edges 
##   0.2
p = or/(1+or); p
##     edges 
## 0.1666667
plogis(coef(flo1))
##     edges 
## 0.1666667

Florentine Data

flo2 = ergm(flomarriage~edges + kstar(2) + triangle); summary(flo2)
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   flomarriage ~ edges + kstar(2) + triangle
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##          Estimate Std. Error MCMC % p-value  
## edges    -1.54537    0.89609      0  0.0872 .
## kstar2   -0.03479    0.21998      0  0.8746  
## triangle  0.23590    0.78830      0  0.7653  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 166.4  on 120  degrees of freedom
##  Residual Deviance: 108.1  on 117  degrees of freedom
##  
## AIC: 114.1    BIC: 122.4    (Smaller is better.)

Florentine Data

plot(flo2$sample)

Simulation

  • Given a specific model we can simulate potential outcomes under that model.

  • Estimation: to match observed statistics and expected statistics and to check that we have “the solution”

  • Goodness of fit (GOF):to check whether the model can replicated features of the data that we not explicitly modelled

  • Standard goodness of fit procedures are not valid for ERGMs – no F-tests or Chi-square tests available.

Florentine Data

flo2.sim=simulate(flo2, n=4); par(mfrow = c(1,4));
for(i in 1:4){plot(flo2.sim[[i]])}

Florentine Data

plot(gof(flo2))

ERGM with Node Attributes

  • An edge joining two vertices depends not only on edges between other vertex pairs, but also on attributes of the vertices themselves. For example: social ties (\(\mathbf{Y}\)) and social factors (\(\mathbf{X}\))

  • Homophily: the tendency for entities to associate with others like them, such as age, gender, social class. Homophily implies that "friends of my friends are my friends".

  • AR model for \(y_{ij}\) becomes \[\Pr(y_{ij}=1|\mathbf{Y}_{(-ij)}=y_{(-ij)}, \mathbf{X}_i=x_i, \mathbf{X}_j=x_j )\] where \(\mathbf{X}_i\) contains measurements of attributes on \(i\)-th vertex.

ERGM with Node Attributes

  • ERGM becomes \[p(\mathbf{Y} |\eta, \mathbf{X})=\frac{1}{\kappa}\exp\left( \eta_{1}\phi_1 (\mathbf{Y}) + \eta_{2}\phi_2 (\mathbf{Y}) + \eta_3 \phi_3(\mathbf{Y}| \mathbf{X}) \right).\] One choice for \(\phi_3(\mathbf{Y} | \mathbf{X})\) is \(\sum_{i,j}y_{ij}h(x_i,x_j)\) where \(h(x_i,x_j)=h(x_j,x_i)\). For example \(h(x_i,x_j)=x_i + x_j\) is called the main effect. \(h(x_i,x_j)=\mathbf{1}\{x_i = x_j \}\) is called the match effect or homophily.

ERGM with Node Attributes

  • The edge statistics consists of \(y_{ij}\): a 1st order information.

  • The \(k\)-stars statistics: \(y_{ij}y_{jl}\dots y_{qn} y_{nk}\), \(k\) cross products. For example, \(2\)-stars, \(y_{ij}y_{jk}\) is the interaction product.

  • The triangle statistics: \(y_{ij}y_{jk}y_{ki}\), a circuit.

  • The main effect: \(y_{ij}(x_i + x_j)=(x_i + x_j)\mathbf{1}\{y_{ij}=1\}\), an interaction between \(y_{ij}\) and \((x_i,x_j)\).

  • Homophily: \(y_{ij} \mathbf{1}\{x_i = x_j\}=\mathbf{1}\{y_{ij}=1,\, x_i=x_j \}\).

  • Many other statistics used in ERGM. Check help("ergm-terms").

Florentine Data

list.vertex.attributes(flomarriage)
## [1] "na"           "priorates"    "totalties"    "vertex.names"
## [5] "wealth"

Florentine Data

flo3 = ergm(flomarriage~edges+nodemain("wealth")); summary(flo3)
## Evaluating log-likelihood at the estimate.
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   flomarriage ~ edges + nodemain("wealth")
## 
## Iterations:  4 out of 20 
## 
## Monte Carlo MLE Results:
##                 Estimate Std. Error MCMC % p-value    
## edges          -2.594929   0.536056      0  <1e-04 ***
## nodecov.wealth  0.010546   0.004674      0  0.0259 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 166.4  on 120  degrees of freedom
##  Residual Deviance: 103.1  on 118  degrees of freedom
##  
## AIC: 107.1    BIC: 112.7    (Smaller is better.)

Florentine Data

flo3.sim=simulate(flo3, n=4); par(mfrow = c(1,4));
for(i in 1:4){plot(flo3.sim[[i]])}

Florentine Data

plot(gof(flo3))

Network Dynamics

  • Social networks, by their very nature, are dynamic.

  • Network ties are formed, maintained, and sometimes dissolved over time.

  • We assume that there is an underlying process of social selection that determines how new ties are formed or dissolved.

  • The changes are driven by the similarity (and dissimilarity) among network members on some abstract node characteristic (chr) such as a behavior, an attitude or opinion.

  • Consider the relevenece of a given characteristic on the social dynamics of a population of interest through the underyling social netowrk.

Network Dynamics

N=25; gs = erdos.renyi.game(N,0.05); chr.val = runif(N,0,1);
chr = cut(chr.val,breaks=2,labels = FALSE); 
V(gs)$chr = chr; V(gs)$chr.val=chr.val;

Network Dynamics

selection.fr = function(g,v){
  v.adj = neighbors(g,v) 
  if(length(v.adj)==0){return(g)}
  ch.diff = max(V(g)[v.adj]$chr.val - V(g)[v]$chr.val) 
  sel.fr = sample(v.adj, 1, ch.diff) 
  g[v,sel.fr] = FALSE # dissolved
  g}

Network Dynamics

select.sim = function(g,upd){
  g.new = lapply(1:(upd+1), function(i) i) 
  g.new[[1]] = g
  for (i in 1:upd) {
    g.tem = g.new[[i]]
    node = sample(1:vcount(g),1) 
    g.upd = selection.fr(g.tem, node) 
    g.new[[i+1]] = g.upd
    }
  g.new}

Network Dynamics

set.seed(2018); i=21; gnew = select.sim(gs, i)

Network Flows

  • The edge attributes characterize information about network flows.

  • For example, transportation networks support flows of commodities and people. Communication networks allow for the flow of data, and networks of trade relations reflect the flow of capital.

  • Flows have direction: \(G\) is a directed graph.

  • Origin-destination matrix or traffic matrix: \(\mathbf{Z} = \{ z_{ij} \}\), \(z_{ij}\) is the total volume of flows from the origin vertex \(i\) to the destination vertex \(j\).

Network Flows and Gravity

  • (Demographic) Gravity model: \[\mathbb{E}[Z_{ij}] = c \pi_{O,i} \pi_{D,j} \mbox{dist}_{ij}^{-2}\] where \(\pi_{O,i}\) and \(\pi_{D,j}\) measure the population sizes for the origin region \(i\) and destination \(j\), \(d_{ij}\) is a measure of distance. This formulation is completely analogous to Newton's gravity law.

  • International trade: \(Z_{ij}\) is the trade between country \(i\) and country \(j\). GDP of the origin/destination region serves as a proxy for economic indicator. It is the analogy to population size in the demographic model.

Network Flows and Gravity

  • A parametric form: \[\mathbb{E}[Z_{ij}] = c (\pi_{O,i})^{\beta_1} (\pi_{D,j})^{\beta_2} (\mbox{dist}_{ij})^{\beta_3}\] where implies a linear log relationship between log of trade and log of GDP of origin \(O\) and destination \(D\): \[ \log (\mbox{Trade}_{ij}) = \beta_{1} \log O_i + \beta_{2} \log D_j + \beta_3 \times \mbox{dist}_{ij}.\] Thus we can use GLM and Poisson link function to model the trades.

  • Data (2006) from A Practical Guide to Trade Policy Analysis (WTO and UN)

Gravity Model and Trade

Gravity Model and Trade

V(g.trade)
## + 69/69 vertices, named:
##  [1] ARG AUS AUT BEL BGR BOL BRA CAN CHE CHL CHN CMR COL CRI CYP DEU DNK
## [18] ECU EGY ESP FIN FRA GBR GRC HKG HUN IDN IND IRL IRN ISL ISR ITA JOR
## [35] JPN KEN KOR KWT LKA MAC MAR MEX MLT MMR MUS MWI MYS NER NGA NLD NOR
## [52] NPL PAN PHL POL PRT QAT ROM SEN SGP SWE THA TTO TUN TUR TZA URY USA
## [69] ZAF
ecount(g.trade)
## [1] 4623
E(g.trade)$weight[1:10]
##  [1] 32313.36933   107.80198    25.19563   189.26305    17.03026
##  [6]   391.91907  6197.99762   338.80366   278.34775  2455.56245

Gravity Model and Trade

library(car); scatterplotMatrix( ~ ln.trade+ln.O+ln.D+ln.dist, data=trade2006)

Gravity Model and Trade

trade.glm=glm(trade ~ ln.O + ln.D + ln.dist, data=trade2006, family="poisson")
summary(trade.glm)
## 
## Call:
## glm(formula = trade ~ ln.O + ln.D + ln.dist, family = "poisson", 
##     data = trade2006)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1045.94    -10.00     -1.87      2.09   1552.59  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.140e+01  3.203e-03   -3558   <2e-16 ***
## ln.O         2.490e+00  4.671e-04    5332   <2e-16 ***
## ln.D         2.510e+00  4.740e-04    5294   <2e-16 ***
## ln.dist     -2.394e+00  2.996e-04   -7993   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 211359412  on 4622  degrees of freedom
## Residual deviance:  34157842  on 4619  degrees of freedom
## AIC: Inf
## 
## Number of Fisher Scoring iterations: 6

Gravity Model and Trade

trade.glm.c=glm(trade ~ O + D + DIST, data=trade2006, family="poisson")
summary(trade.glm.c)
## 
## Call:
## glm(formula = trade ~ O + D + DIST, family = "poisson", data = trade2006)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1073.37    -60.46    -14.88      1.77   1853.38  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.739e+00  3.662e-04   26594   <2e-16 ***
## O            7.230e-07  1.350e-10    5355   <2e-16 ***
## D            6.254e-07  1.225e-10    5106   <2e-16 ***
## DIST        -5.987e-04  1.222e-07   -4897   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 211359412  on 4622  degrees of freedom
## Residual deviance:  59276368  on 4619  degrees of freedom
## AIC: Inf
## 
## Number of Fisher Scoring iterations: 7

Gravity Model and Trade