Topic in Econometrics (2018)

12/03/2018

Binomial Random Graph

Recall the random graph model (Erdos and Renyi model) \[p(G)=\frac{N_e}{N} \, \mbox{ where } N = \left(\begin{array}{c} N_{v}\\ 2 \end{array}\right),\] \(N\) is the total number of distinct node pairs.
\(p\) measures the proportion of observed ties \(N_e\) to the maximum number of possible ties \(N_v (N_v -1)\). So \(p \propto N_e / N_v^2\).

Binomial Random Graph

This is a Bernoulli graph with \(n\) nodes and probability of an edge \(p\). When \(N_v\) gets larger, the proportion of vertices with degree \(d\) in \(G\) would follow a Poisson distribution with mean \(p(N_v - 1)\) (Poisson approximation to a binomial distribution).

Binomial Random Graph

gen.berG = function(n,p){
    id = seq(1,n,by=1) # Create vertex set
    E = rbinom(choose(n,2),1,p) # Create edge set
    A = matrix(0, nrow = n, ncol = n) # Adjacency matrix
    # Distribute the edges for all pairs
    count = 1
    for(j in 1:(n-1)){
        for(k in (j+1):n){
            A[j,k] = E[count]
            A[k,j] = A[j,k]
            count = count + 1}
    }
    G = list(id=id,E=E,A=A); return(G)
}

Binomial Random Graph

par(mfrow = c(1,2)); g.er = erdos.renyi.game(80,0.05); g=gen.berG(80,0.05);
g.er = asNetwork(g.er); g = as.network(g$A, directed=FALSE);
plot(g.er); plot(g)

detach("package:igraph", unload=TRUE)

ERGM Null Model

The previous ERGM model \[ p(\mathbf{Y} |\eta)=\frac{1}{\kappa}\exp\left( \eta N_e \right)\] where \(\sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y})=\sum_{i,j} \eta_{ij} y_{ij} = \eta \sum_{i,j} y_{ij} = \eta N_e\). In particular \[p(y_{ij}=1 |\eta)=\frac{\exp( \eta y_{ij})}{exp(\eta \times 0)+\exp( \eta y_{ij})}=\frac{\exp( \eta)}{1+\exp( \eta)}\] which is the logistic link function.
As GLM, when \(Y\) is a binary variable, the link function is the logistic function.

ERGM Null Model

So the odds-ratio is \[\frac{p(y_{ij}=1 |\eta)}{p(y_{ij}=0 |\eta)}=\exp( \eta).\] So if the Binomial random graph is generated by \(p=0.05\), then the odds-ratio \(\frac{0.05}{0.95}=0.0526\).

library(ergm); eg = ergm(g~edges); exp(coef(eg))

##     edges 
## 0.0446281

eger = ergm(g.er~edges); exp(coef(eger))

##      edges 
## 0.05368456

Network Data

Network data also store characteristic of nodes and edges.
Nodes (people): basic information on these peoples such as gender or income.
Edges (interactions): numerical values about the positive and negative strengths.
Subgroups: Many networks are made up of relatively densely connected subgroups.

Network Data

g2 = rbind(c(0,1,1,0), 
           c(0,0,1,1), 
           c(0,1,0,0), 
           c(0,0,1,0)); 
rownames(g2) = c("A","B","C","D"); 
colnames(g2) = rownames(g2); g2 = as.network(g2);
set.vertex.attribute(g2, "gender", c("F", "F", "M", "F"))
set.edge.attribute(g2,"Random.Val", runif(network.size(g2),0,1))
get.vertex.attribute(g2,"gender")

## [1] "F" "F" "M" "F"

get.edge.attribute(g2,"Random.Val")

## [1] 0.94292907 0.02547872 0.86776161 0.97411587 0.94292907 0.02547872

Network Data

vernames = get.vertex.attribute(g2,"gender")
subg2 = get.inducedSubgraph(g2, which(vernames== "F"))
level = get.edge.attribute(g2, "Random.Val") 
par(mfrow = c(1,2)); plot(subg2);
plot(g2, vertex.cex=degree(g2), label=vernames, edge.lwd=15*level)

Markov Dependence and ERGM

The previous model \[ p(\mathbf{Y} |\eta)=\frac{1}{\kappa}\exp\left( \eta N_e \right)\] where \(\sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y})=\sum_{i,j} \eta_{ij} y_{ij} = \eta \sum_{i,j} y_{ij} = \eta N_e\).
\(N_e\): counts of the number of two nodes ties in \(G\).
Configuration \(\mathcal{C}\) in \(\sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y})\) can be broken into different types of nodes connections by Markovian condition of dependence.
Two possible edges are dependent whenever they share a vertex, conditional on all other possible edges.

Markov Dependence and ERGM

Links between nodes rarely appear at random: agents have specific reasons to connect with one agent rather than another. For example, Homophily - friends of my friends are my friends.
The odds of a link depend on the neighborhood of the node and not on the entire rest of the graph.
A regression model for \(y_{ij}\) is specified by a form: \[\Pr(y_{ij}=1|Y_{(-ij)}=y_{(-ij)} )\] where \(Y_{(-ij)}\) and \(Y_{ij}\) are dependent conditional on that \(y_{(-ij)}\) is on the neighborhood of \(y_{ij}\). This is an auto-regressive model.

Markov Dependence and ERGM

Edges represent mutual ties (two nodes). An incident of two edges constructs a triangle tie (three nodes).
Two edge indicators \(\{i,j\}\) and \(\{i',k\}\) are conditionally dependent if \(\{i,j\} \cap \{i',k\} \neq \textrm{Ø}\). An incident of \(k\)-edges constructs \(k\)-level star structure

Markov Dependence and ERGM

Triangle structure induces dependent clusters: two edge indicators \(\{i,j\}\) and \(\{l,k\}\) are conditionally dependent if \(\{i,j\}, \{l,k\} \in \mbox{cluster}\).

Markov Dependence and ERGM

Consider adding the triangular configuration \(\mathcal{C}_2\) to the original configuration \(\mathcal{C}_1\): \[p(\mathbf{Y} |\eta)=\frac{1}{\kappa}\exp\left( \eta_{1}\phi_1 (\mathbf{Y}) + \eta_{2}\phi_2 (\mathbf{Y}) \right)\] \[\phi_k(\mathbf{Y}) = \prod_{\{i,j\}\in \mathcal{C}_k} y_{ij} = \begin{cases} 1, & \{y_{ij}\}_{\{i,j\}\in\mathcal{C}_k } \mbox{ conditional on the rest, }\\ 0 & \mbox{otherwise}. \end{cases}\] In particular, \(\phi_1(\cdot)\) denotes the number of mutual nodes, namely \(N_e\), \(\phi_2(\cdot)\) denotes the number of triangle nodes.

Markov Dependence and ERGM

The odds-ratio now conditional depends on: \[\log \frac{p(\mathbf{Y}=y_{ij}=1| y_{(-ij)}, \eta)}{p(\mathbf{Y}=y_{ij}=1|y_{(-ij)} , \eta)} = \eta_{1}\phi_1 (y_{ij}, y_{(-ij)} ) + \eta_{2}\phi_2 (y_{ij}, y_{(-ij)})\] is the difference in counts of two structural types, tie and triangle.
The parameter \(\eta_1\) and \(\eta_2\) measure the relative importance of the ties and triangles shape.

Florentine Data

This is a statnet package build-in data set consisting of weddings among leading Florentine families.

data(florentine); plot(flomarriage,displaylabels=T,boxed.labels=F)

Florentine Data

flo1 = ergm(flomarriage~edges)
or = exp(coef(flo1)); or # odds-ratio

## edges 
##   0.2

p = or/(1+or); p

##     edges 
## 0.1666667

plogis(coef(flo1))

##     edges 
## 0.1666667

Florentine Data

flo2 = ergm(flomarriage~edges + kstar(2) + triangle); summary(flo2)

## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   flomarriage ~ edges + kstar(2) + triangle
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##          Estimate Std. Error MCMC % p-value  
## edges    -1.54537    0.89609      0  0.0872 .
## kstar2   -0.03479    0.21998      0  0.8746  
## triangle  0.23590    0.78830      0  0.7653  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 166.4  on 120  degrees of freedom
##  Residual Deviance: 108.1  on 117  degrees of freedom
##  
## AIC: 114.1    BIC: 122.4    (Smaller is better.)

Florentine Data

plot(flo2$sample)

Simulation

Given a specific model we can simulate potential outcomes under that model.
Estimation: to match observed statistics and expected statistics and to check that we have “the solution”
Goodness of fit (GOF):to check whether the model can replicated features of the data that we not explicitly modelled
Standard goodness of fit procedures are not valid for ERGMs – no F-tests or Chi-square tests available.

Florentine Data

flo2.sim=simulate(flo2, n=4); par(mfrow = c(1,4));
for(i in 1:4){plot(flo2.sim[[i]])}

Florentine Data

plot(gof(flo2))

ERGM with Node Attributes

An edge joining two vertices depends not only on edges between other vertex pairs, but also on attributes of the vertices themselves. For example: social ties (\(\mathbf{Y}\)) and social factors (\(\mathbf{X}\))
Homophily: the tendency for entities to associate with others like them, such as age, gender, social class. Homophily implies that "friends of my friends are my friends".
AR model for \(y_{ij}\) becomes \[\Pr(y_{ij}=1|\mathbf{Y}_{(-ij)}=y_{(-ij)}, \mathbf{X}_i=x_i, \mathbf{X}_j=x_j )\] where \(\mathbf{X}_i\) contains measurements of attributes on \(i\)-th vertex.

ERGM with Node Attributes

ERGM becomes \[p(\mathbf{Y} |\eta, \mathbf{X})=\frac{1}{\kappa}\exp\left( \eta_{1}\phi_1 (\mathbf{Y}) + \eta_{2}\phi_2 (\mathbf{Y}) + \eta_3 \phi_3(\mathbf{Y}| \mathbf{X}) \right).\] One choice for \(\phi_3(\mathbf{Y} | \mathbf{X})\) is \(\sum_{i,j}y_{ij}h(x_i,x_j)\) where \(h(x_i,x_j)=h(x_j,x_i)\). For example \(h(x_i,x_j)=x_i + x_j\) is called the main effect. \(h(x_i,x_j)=\mathbf{1}\{x_i = x_j \}\) is called the match effect or homophily.

ERGM with Node Attributes

The edge statistics consists of \(y_{ij}\): a 1st order information.
The \(k\)-stars statistics: \(y_{ij}y_{jl}\dots y_{qn} y_{nk}\), \(k\) cross products. For example, \(2\)-stars, \(y_{ij}y_{jk}\) is the interaction product.
The triangle statistics: \(y_{ij}y_{jk}y_{ki}\), a circuit.
The main effect: \(y_{ij}(x_i + x_j)=(x_i + x_j)\mathbf{1}\{y_{ij}=1\}\), an interaction between \(y_{ij}\) and \((x_i,x_j)\).
Homophily: \(y_{ij} \mathbf{1}\{x_i = x_j\}=\mathbf{1}\{y_{ij}=1,\, x_i=x_j \}\).
Many other statistics used in ERGM. Check help("ergm-terms").

Florentine Data

list.vertex.attributes(flomarriage)

## [1] "na"           "priorates"    "totalties"    "vertex.names"
## [5] "wealth"

Florentine Data

flo3 = ergm(flomarriage~edges+nodemain("wealth")); summary(flo3)

## Evaluating log-likelihood at the estimate.

## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   flomarriage ~ edges + nodemain("wealth")
## 
## Iterations:  4 out of 20 
## 
## Monte Carlo MLE Results:
##                 Estimate Std. Error MCMC % p-value    
## edges          -2.594929   0.536056      0  <1e-04 ***
## nodecov.wealth  0.010546   0.004674      0  0.0259 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 166.4  on 120  degrees of freedom
##  Residual Deviance: 103.1  on 118  degrees of freedom
##  
## AIC: 107.1    BIC: 112.7    (Smaller is better.)

Florentine Data

flo3.sim=simulate(flo3, n=4); par(mfrow = c(1,4));
for(i in 1:4){plot(flo3.sim[[i]])}

Florentine Data

plot(gof(flo3))

Network Dynamics

Social networks, by their very nature, are dynamic.
Network ties are formed, maintained, and sometimes dissolved over time.
We assume that there is an underlying process of social selection that determines how new ties are formed or dissolved.
The changes are driven by the similarity (and dissimilarity) among network members on some abstract node characteristic (chr) such as a behavior, an attitude or opinion.
Consider the relevenece of a given characteristic on the social dynamics of a population of interest through the underyling social netowrk.

Network Dynamics

N=25; gs = erdos.renyi.game(N,0.05); chr.val = runif(N,0,1);
chr = cut(chr.val,breaks=2,labels = FALSE); 
V(gs)$chr = chr; V(gs)$chr.val=chr.val;

Network Dynamics

selection.fr = function(g,v){
  v.adj = neighbors(g,v) 
  if(length(v.adj)==0){return(g)}
  ch.diff = max(V(g)[v.adj]$chr.val - V(g)[v]$chr.val) 
  sel.fr = sample(v.adj, 1, ch.diff) 
  g[v,sel.fr] = FALSE # dissolved
  g}

Network Dynamics

select.sim = function(g,upd){
  g.new = lapply(1:(upd+1), function(i) i) 
  g.new[[1]] = g
  for (i in 1:upd) {
    g.tem = g.new[[i]]
    node = sample(1:vcount(g),1) 
    g.upd = selection.fr(g.tem, node) 
    g.new[[i+1]] = g.upd
    }
  g.new}

Network Dynamics

set.seed(2018); i=21; gnew = select.sim(gs, i)

Network Flows

The edge attributes characterize information about network flows.
For example, transportation networks support flows of commodities and people. Communication networks allow for the flow of data, and networks of trade relations reflect the flow of capital.
Flows have direction: \(G\) is a directed graph.
Origin-destination matrix or traffic matrix: \(\mathbf{Z} = \{ z_{ij} \}\), \(z_{ij}\) is the total volume of flows from the origin vertex \(i\) to the destination vertex \(j\).

Network Flows and Gravity

(Demographic) Gravity model: \[\mathbb{E}[Z_{ij}] = c \pi_{O,i} \pi_{D,j} \mbox{dist}_{ij}^{-2}\] where \(\pi_{O,i}\) and \(\pi_{D,j}\) measure the population sizes for the origin region \(i\) and destination \(j\), \(d_{ij}\) is a measure of distance. This formulation is completely analogous to Newton's gravity law.
International trade: \(Z_{ij}\) is the trade between country \(i\) and country \(j\). GDP of the origin/destination region serves as a proxy for economic indicator. It is the analogy to population size in the demographic model.

Network Flows and Gravity

A parametric form: \[\mathbb{E}[Z_{ij}] = c (\pi_{O,i})^{\beta_1} (\pi_{D,j})^{\beta_2} (\mbox{dist}_{ij})^{\beta_3}\] where implies a linear log relationship between log of trade and log of GDP of origin \(O\) and destination \(D\): \[ \log (\mbox{Trade}_{ij}) = \beta_{1} \log O_i + \beta_{2} \log D_j + \beta_3 \times \mbox{dist}_{ij}.\] Thus we can use GLM and Poisson link function to model the trades.
Data (2006) from A Practical Guide to Trade Policy Analysis (WTO and UN)

Gravity Model and Trade

V(g.trade)

## + 69/69 vertices, named:
##  [1] ARG AUS AUT BEL BGR BOL BRA CAN CHE CHL CHN CMR COL CRI CYP DEU DNK
## [18] ECU EGY ESP FIN FRA GBR GRC HKG HUN IDN IND IRL IRN ISL ISR ITA JOR
## [35] JPN KEN KOR KWT LKA MAC MAR MEX MLT MMR MUS MWI MYS NER NGA NLD NOR
## [52] NPL PAN PHL POL PRT QAT ROM SEN SGP SWE THA TTO TUN TUR TZA URY USA
## [69] ZAF

ecount(g.trade)

## [1] 4623

E(g.trade)$weight[1:10]

##  [1] 32313.36933   107.80198    25.19563   189.26305    17.03026
##  [6]   391.91907  6197.99762   338.80366   278.34775  2455.56245

Gravity Model and Trade

library(car); scatterplotMatrix( ~ ln.trade+ln.O+ln.D+ln.dist, data=trade2006)

Gravity Model and Trade

trade.glm=glm(trade ~ ln.O + ln.D + ln.dist, data=trade2006, family="poisson")
summary(trade.glm)

## 
## Call:
## glm(formula = trade ~ ln.O + ln.D + ln.dist, family = "poisson", 
##     data = trade2006)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1045.94    -10.00     -1.87      2.09   1552.59  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.140e+01  3.203e-03   -3558   <2e-16 ***
## ln.O         2.490e+00  4.671e-04    5332   <2e-16 ***
## ln.D         2.510e+00  4.740e-04    5294   <2e-16 ***
## ln.dist     -2.394e+00  2.996e-04   -7993   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 211359412  on 4622  degrees of freedom
## Residual deviance:  34157842  on 4619  degrees of freedom
## AIC: Inf
## 
## Number of Fisher Scoring iterations: 6

Gravity Model and Trade

trade.glm.c=glm(trade ~ O + D + DIST, data=trade2006, family="poisson")
summary(trade.glm.c)

## 
## Call:
## glm(formula = trade ~ O + D + DIST, family = "poisson", data = trade2006)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1073.37    -60.46    -14.88      1.77   1853.38  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.739e+00  3.662e-04   26594   <2e-16 ***
## O            7.230e-07  1.350e-10    5355   <2e-16 ***
## D            6.254e-07  1.225e-10    5106   <2e-16 ***
## DIST        -5.987e-04  1.222e-07   -4897   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 211359412  on 4622  degrees of freedom
## Residual deviance:  59276368  on 4619  degrees of freedom
## AIC: Inf
## 
## Number of Fisher Scoring iterations: 7