Topic in Econometrics (2018)

05/03/2018

Overview of Network and Spatial Models

Data from Network or Spatial systems are high-dimensional, complex, and stochastic.
Probabilistic structure provides a better characterization.
Parametric structure reduces the difficulty of processing the network and spatial data.

Network Data

To characterize network information, we need a two dimensional data structure.
A network \(G\).
Two sets of information: \(G = (\mathcal{V}, \mathcal{E})\).
\(\mathcal{V}\): a set of vertices (nodes).
\(\mathcal{E}\): a set of edges. If \(u,v\in \mathcal{V}\) and they are connected, the pair \(\{u,v\} \in \mathcal{E}\).

Network Data

library(igraph); g = graph.formula(1-2, 1-3); # 3 notes, 2 edges
plot(g)

Network Data

V(g)

## + 3/3 vertices, named:
## [1] 1 2 3

vcount(g)

## [1] 3

E(g)

## + 2/2 edges (vertex names):
## [1] 1--2 1--3

ecount(g)

## [1] 2

Network Data

The graph is called directed graph if there is a directed connection between two nodes.
If \(u,v\in \mathcal{V}\) and they are connected from \(u\) to \(v\), the pair \(\{u,v\} \in \mathcal{E}\) but \(\{v,u\} \notin \mathcal{E}\) - an order exists in the pair.

Network Data

g = graph.formula(1-+2, 1++3); # 3 nodes, 1 directed edge
plot(g)

Network Representation

The network/graph can be represented in matrix form.
\(N_v\): number of the nodes
\(\mathbf{Y}\): an \(N_v \times N_v\) adjacency matrix for a graph \(G\): \[ Y_{ij}=\begin{cases} 1, & \mbox{ if }\{i,j\}\in\mathcal{E}\\ 0 & \mbox{ otherwise} \end{cases}.\]

get.adjacency(g)

## 3 x 3 sparse Matrix of class "dgCMatrix"
##   1 2 3
## 1 . 1 1
## 2 . . .
## 3 1 . .

Network Representation

A node \(v\in\mathcal{V}\) is incident on an edge \(e\in\mathcal{E}\) if \(v\) is the endpoint of \(e\).
The degree of a node/vertex \(\mbox{deg}_v\) defines the number of edges incident on \(v\).

degree(g)

## 1 2 3 
## 3 1 2

Two edges are adjacent if they both join a common endpoint.

Network Representation

degree(g, mode="in")

## 1 2 3 
## 1 1 1

degree(g, mode="out")

## 1 2 3 
## 2 0 1

Network Representation

par(mfrow = c(1,3)); g=graph.lattice(c(3,3)); plot(g)
g=graph.lattice(c(3,3,3)); plot(g)
plot(degree(g))

Network Representation

par(mfrow = c(1,2)); igraph.options(vertex.label=NA,vertex.size=3); 
plot(g) # NULL for default
plot(g, layout=layout.circle)

Random Graph

The plot of degree of \(3-3-3\) lattice is unchanged.
Two graphs may be mathematically equivalent if they differ only up to the labels of their vertices and edges but not the structure. (They are said to be isomorphic.)
Generate randomness by a collection of graphs/networks that are different in nodes (\(N_v\)), edges (\(N_e\)) and their sizes.
The size of graphs differs in terms of the number of nodes or of the degrees of those nodes. A collection of graphs induce a degree sequence.

Random Graph

A network graph is a parametric family of distribution \[\mathcal{P} = \{p_\theta (G): G\in\mathbb{G}, \theta\in\Theta\}.\]
\(\mathbb{G}\): a collection of possible graphs.
\(p_\theta(G)\): the probability of \(G\in\mathbb{G}\).
\(\theta\): parameter of interest.
Random Graph: \(p\) is uniform over \(\mathbb{G}\).

Random Graph

Erdos and Renyi model: Equal probability given size of the graph e.g. \[p(G)=\frac{N_e}{N}\, \mbox{ where } N = \left(\begin{array}{c} N_{v}\\ 2 \end{array}\right),\] \(N\) is the total number of distinct node pairs. Thus \(p \propto N_e / N_v^2\).

Random Graph

par(mfrow = c(1,2)); g.er1 = erdos.renyi.game(100,0.01); 
g.er2 = erdos.renyi.game(100,0.05); # N_v=100 
plot(g.er1); plot(g.er2)

Random Graph

The degree of the given vertex is distributed as a binomial random variable with parameters \(N_v - 1\) and \(p\) (the mean: \((N_v - 1) p\)).

mean(degree(g.er1))

## [1] 1.08

mean(degree(g.er2))

## [1] 4.32

Random Graph

par(mfrow = c(1,2)); hist(degree(g.er1),breaks=10);
hist(degree(g.er2),breaks=10)

Random Graph

A graph with \(N_v\) nodes. Each node has a corresponding \(\mbox{deg}\). A sequence \(\{\mbox{deg}_1,\dots \mbox{deg}_{N_v} \}\) characterizes the graph.
The graphs are the same (the same probabilities) if they have the same number of nodes and the same degree sequences.

Random Graph

par(mfrow = c(1,2));degs = c(1:20); g.degs = degree.sequence.game(degs); plot(g.degs,layout=layout_as_tree); hist(degree(g.degs),breaks=20)

Random Graph

nv = vcount(g.degs); ne = ecount(g.degs); 
g.er3 = erdos.renyi.game(nv,ne,"gnm", directed = TRUE); 
mean(degree(g.er3))

## [1] 10.5

Random Graph

plot(g.er3,layout=layout_as_tree)

Random Graph

g.er4 = erdos.renyi.game(nv,0.25,"gnp", directed = TRUE); 
mean(degree(g.er4))

## [1] 9.8

Random Graph

plot(g.er4,layout=layout_as_tree)

Exponential Random Graph Model (ERGM)

Recall the canonical exponential family (with \(h(y)=1\)):\[ p(y|\eta)=\exp(\eta^{T}\phi(y)-A(\eta))= \frac{ \exp(\eta^{T}\phi(y))}{\int \exp\left\lbrace \eta^{T}\phi(y)\right\rbrace dy}.\] where \(A(\eta)=\log (\int \exp\left\lbrace \eta^{T}\phi(y)\right\rbrace dy).\)
\(y\) could be \(0\) and \(1\) binary observation.
All information of \(G\) is captured by the adjacency matrix \(\mathbf{Y}\).

ERGM

The ERGM for graph \(G\) is \[ p(\mathbf{Y} |\eta)=\frac{1}{\kappa}\exp\left( \sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y})\right)\] where \(\mathbf{Y}\) is the adjacency matrix for \(G\), the set \(\mathcal{C}\) refers to a set of possible edges satisfying certain configuration, the statistics \[\phi(\mathbf{Y}) = \prod_{\{i,j\}\in \mathcal{C}} y_{ij} = \begin{cases} 1, & \mbox{ if } \{y_{ij}\}_{\{i,j\}\in\mathcal{C} } \mbox{ satisfy the configuration } \mathcal{C}\\ 0 & \mbox{ otherwise}. \end{cases}\] \(\eta\) captures the dependence between \(y_{ij}\) and the configuration.

ERGM

For random graph, the presence of an edge between one pair is indepndent of the status of possible edges between any other pairs.
The configuration only consider the connection of any paired vertices: \[\sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y}) = \sum_{i, j} \eta_{ij} y_{ij}.\] In addition, the coefficients \(\eta_{ij}\) are equal to for any two edges so \(\eta_{ij} = \eta\).

ERGM

The model is reduced to \[ p(\mathbf{Y} |\eta)=\frac{1}{\kappa}\exp\left( \eta N_e \right)\] as \(\sum_{i,j} \eta_{ij} y_{ij} = \eta \sum_{i,j} y_{ij} = \eta N_e\).

library(statnet); Y = as.matrix(get.adjacency(g.er2));

ERGM

image(Y, col=c("white","black"), xlab="column", ylab="row")

ERGM

ergm.er = ergm(Y~edges); coef(ergm.er)

##     edges 
## -3.087247

summary.statistics(ergm.er)

## edges 
##   432

exp(-2.921)

## [1] 0.05387978

Hierarchical Structure for Panels

Recall the success count model. Now we consider \(Y_{i,t}\) where the count depends on time \(t\) and site \(i\).
The spatiotemporal structure \[\begin{align} Y_{i,t} &\sim \mbox{Poi}(\theta_{i,t}) \\ \log (\theta_{i,t}) &= \eta = \beta_{i}^T\mathbf{x}_t = \beta_{0,i} + \sum_{k=1}^{3}\beta_k x^{k}_{t} + \varepsilon_t .\end{align}\]
\(\beta_{0,i}\): the spatial effect \(\beta_{0,i}\).
\(\{\beta_{k},\beta_{2},\beta_{3}\}\): do not vary by time or by spaces.
\(\varepsilon_t\): the temporal effects

Hierarchical Structure

Random effects may consist of two or more effects or paramters that integrated together in some way.
\(\beta_{0,i}\sim \mathcal{N}(\mu,\sigma^2)\) has (hyper)parameter that expresses the fixed effect \(\mu\) and the magnitude of the variability among sites \(\sigma^{2}\).
\(\varepsilon_t \sim \mathcal{N}(0,\sigma_{\varepsilon}^2)\) has the magnitude of the variability among years \(\sigma_{\varepsilon}^2\).
Hierarchical models: a hierarchy of effects or a nested relationship among the parameters.

Hierarchical Structure

Hyperparameters: They are parameters for parameters, e.g. \(\sigma\) is the parameter of the parameter \(\beta_{0,i}\).
Hyperparameters have their effects during estimation, by shrinking the varying effect \(\sigma\) the parameters towards a common mean \(\mu\).
We simultaneously estimate both the intercept for each site (\(\beta_{0,i}\)) and the variation among sites \(\sigma\).

First Spatiotemporal Model

nyears = 36; nsite = 50; mu= 4.2; beta1 = 2; 
beta2 = 0.1; beta3 = -1.7; sd.site = 0.5; sd.year = 0.2;
year = 1:nyears;  x = (year-round(nyears/2)) / (nyears / 2)
site = 1:nsite; y = eta = array(NA, dim=c(nyears, nsite))
beta0 = rnorm(nsite, mu, sd.site)
eps = rnorm(nyears, 0, sd.year)
for (i in 1:nsite){
  eta[,i] = beta0[i] + beta1 * x + beta2 * x^2 + beta3 * x^3 + eps 
  y.mean = exp(eta[,i])
  y[,i] = rpois(nyears, y.mean)
}

First Spatiotemporal Model

d.spt = list(y=y, nsite=nsite, nyear=nyears, x=x)
cat("
model { # Priors
  for (i in 1:nsite){
     beta0[i] ~ dnorm(mu, tau.b0)   
  }
  mu ~ dnorm(0, 0.01)               
  tau.b0 = 1 / (sd.b0*sd.b0)    
  sd.b0 ~ dunif(0, 2)
  for (k in 1:3){
     beta[k] ~ dnorm(0, 0.01)
  }
  tau.year = 1 / (sd.year*sd.year)
  sd.year ~ dunif(0, 1)

First Spatiotemporal Model

  # Likelihood
  for (t in 1:nyear){
     eps[t] ~ dnorm(0, tau.year)           
     for (i in 1:nsite){
        y[t,i] ~ dpois(lambda[t,i])        
        lambda[t,i] = exp(log.lambda[t,i]) 
        log.lambda[t,i] = beta0[i] + beta[1] * x[t] + 
        beta[2] *   pow(x[t],2) + beta[3] * pow(x[t],3) + eps[t] 
      }  #i
   }  #t
}",file="model_spt.bug")
params = c("mu","beta0","beta","sd.b0","sd.year")
spt.model = jags(data=d.spt, model.file="model_spt.bug", 
                    parameters.to.save=params, n.iter=2000, 
                    n.burnin=1000, n.chains=3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1800
##    Unobserved stochastic nodes: 92
##    Total graph size: 16583
## 
## Initializing model

First Spatiotemporal Model

A Simple Spatial Model

Random effect captures the spatial variations.
The effects so far, whether they are intercepts or slopes, have been defined over discrete, unordered categories, e.g. from location \(1\), to location \(50\).
Consider continuous categories like the distance amongst any two points.
Two locations nearby share many common features, like culture, enviornment, transportation, etc. And the similarities would decrease when the distance becomes larger.
We need a unique characterization of intercepts and slops in which locations close to each others should have more similar intercepts and slops.

Gaussian Process

Recall the culture example. Corporation is explained by the "contact" variable.
Close geographic nieghbors should have more similar technologies. Because closer distance may share unmeasured geographic features like sources that lead to similar technological industries.
We define a distance matrix among the cultures.
A smmetric matrix with diagnoal zeros.

A Simple Spatial Model

Larger populations will both develop and sustain more complex technologies. (Obs from plays in one video game.)

culture = c("N","R","I","B","J","F","G","C","U","A")
pop = c(7, 7.3, 8.2, 8.5, 8.9, 9, 9.1, 9.4, 9.8, 12.5)
contact = c(0, 0, 0, 1, 1, 1, 1, 0, 1, 0)
cvi = factor(c(1,1,1,1,2,2,2,3,3,3))
tech = c(13, 22, 24, 43, 33, 19, 40, 28, 55, 71)
d = data.frame(culture,pop,contact,tech,cvi)

Gaussian Process

##     N   R   I   B   J   F   G   C   U   A
## N 0.0 0.5 0.6 4.4 1.2 2.0 3.2 2.8 1.9 5.7
## R 0.5 0.0 0.3 4.2 1.2 2.0 2.9 2.7 2.0 5.3
## I 0.6 0.3 0.0 3.9 1.6 1.7 2.6 2.4 2.3 5.4
## B 4.4 4.2 3.9 0.0 5.4 2.5 1.6 1.6 6.1 7.2
## J 1.2 1.2 1.6 5.4 0.0 3.2 4.0 3.9 0.8 4.9
## F 2.0 2.0 1.7 2.5 3.2 0.0 1.8 0.8 3.9 6.7
## G 3.2 2.9 2.6 1.6 4.0 1.8 0.0 1.2 4.8 5.8
## C 2.8 2.7 2.4 1.6 3.9 0.8 1.2 0.0 4.6 6.7
## U 1.9 2.0 2.3 6.1 0.8 3.9 4.8 4.6 0.0 5.0
## A 5.7 5.3 5.4 7.2 4.9 6.7 5.8 6.7 5.0 0.0

Gaussian Process

We consider the Poisson likelihood with a varying intercept \[\begin{align} y_i &= \mbox{Poi}(\theta_i), \\ log(\theta_i) = \eta_i &= \mu + \beta_{0,i} + \beta_{1} x_{i}.\\ \beta_{0} &\sim \mbox{MVN}([0,\dots,0], \mathbf{K})\\ k_{ij} &= \gamma^{2} \exp(-\rho^2 d^{2}_{ij}) + \delta_{ij}\sigma^{2} \end{align}\]
The parameter \(\beta_0\) is about the distance not the category.
The prior for \(\beta_0\) is a \(10\)-dimensional multivariate Gaussian prior. The mean is a zero vector because there is a mean \(\mu\) in the model.
The distance matrix is the covariance matrix for these zero intercepts which makes these intercepts deviations.

Gaussian Process

The formula of \(\mathbf{K}\) gives the covariance model its shape is \(\exp\{-\rho^2 d^{2}_{ij}\}\) (curve \(\exp(-x)\), dot curve \(\exp(-x^2)\))

Gaussian Process

The covariance matrix: clusters in the upper-left are highly correlated.

     N    R    I    B    J    F    G    C    U A
N 1.00 0.87 0.81 0.00 0.52 0.18 0.02 0.04 0.24 0
R 0.87 1.00 0.92 0.00 0.52 0.19 0.03 0.05 0.20 0
I 0.81 0.92 1.00 0.00 0.36 0.30 0.07 0.10 0.12 0
B 0.00 0.00 0.00 1.00 0.00 0.08 0.36 0.33 0.00 0
J 0.52 0.52 0.36 0.00 1.00 0.01 0.00 0.00 0.75 0
F 0.18 0.19 0.30 0.08 0.01 1.00 0.26 0.71 0.00 0
G 0.02 0.03 0.07 0.36 0.00 0.26 1.00 0.53 0.00 0
C 0.04 0.05 0.10 0.33 0.00 0.71 0.53 1.00 0.00 0
U 0.24 0.20 0.12 0.00 0.75 0.00 0.00 0.00 1.00 0
A 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1

Gaussian Process

Dark lines: stronger correlation
Pure white: zero correlation

data{
  int<lower=1> N;
  int<lower=1> N_society;
  int total_tools[N];
  real logpop[N];
  int society[N];
  matrix[N_society,N_society] Dmat;
  }
  parameters{
  vector[N_society] g;
  real a;
  real bp;
  real<lower=0> etasq;
  real<lower=0> rhosq;
  }
  model{
  matrix[N_society,N_society] SIGMA_Dmat;
  vector[N] lambda;
  rhosq ~ cauchy( 0 , 1 );
  etasq ~ cauchy( 0 , 1 );
  bp ~ normal( 0 , 1 );
  a ~ normal( 0 , 10 );
  for ( i in 1:(N_society-1) )
    for ( j in (i+1):N_society ) {
      SIGMA_Dmat[i,j] = etasq*exp(-rhosq*pow(Dmat[i,j],2));
      SIGMA_Dmat[j,i] = SIGMA_Dmat[i,j];
      }
    for ( k in 1:N_society )
      SIGMA_Dmat[k,k] = etasq + 0.01;
      g ~ multi_normal( rep_vector(0,N_society) , SIGMA_Dmat );
      for ( i in 1:N ) {
        lambda[i] = a + g[society[i]] + bp * logpop[i];
      }
      total_tools ~ poisson_log( lambda );}
    generated quantities{
      matrix[N_society,N_society] SIGMA_Dmat;
      vector[N] lambda;
      real dev;
      dev = 0;
      for ( i in 1:N ) {
      lambda[i] = a + g[society[i]] + bp * logpop[i];
      }
      dev = dev + (-2)*poisson_log_lpmf( total_tools | lambda );
}"