05/03/2018

Overview of Network and Spatial Models

  • Data from Network or Spatial systems are high-dimensional, complex, and stochastic.

  • Probabilistic structure provides a better characterization.

  • Parametric structure reduces the difficulty of processing the network and spatial data.

Network Data

  • To characterize network information, we need a two dimensional data structure.

  • A network \(G\).

  • Two sets of information: \(G = (\mathcal{V}, \mathcal{E})\).

  • \(\mathcal{V}\): a set of vertices (nodes).

  • \(\mathcal{E}\): a set of edges. If \(u,v\in \mathcal{V}\) and they are connected, the pair \(\{u,v\} \in \mathcal{E}\).

Network Data

library(igraph); g = graph.formula(1-2, 1-3); # 3 notes, 2 edges
plot(g)

Network Data

V(g)
## + 3/3 vertices, named:
## [1] 1 2 3
vcount(g)
## [1] 3
E(g)
## + 2/2 edges (vertex names):
## [1] 1--2 1--3
ecount(g)
## [1] 2

Network Data

  • The graph is called directed graph if there is a directed connection between two nodes.

  • If \(u,v\in \mathcal{V}\) and they are connected from \(u\) to \(v\), the pair \(\{u,v\} \in \mathcal{E}\) but \(\{v,u\} \notin \mathcal{E}\) - an order exists in the pair.

Network Data

g = graph.formula(1-+2, 1++3); # 3 nodes, 1 directed edge
plot(g)

Network Representation

  • The network/graph can be represented in matrix form.

  • \(N_v\): number of the nodes

  • \(\mathbf{Y}\): an \(N_v \times N_v\) adjacency matrix for a graph \(G\): \[ Y_{ij}=\begin{cases} 1, & \mbox{ if }\{i,j\}\in\mathcal{E}\\ 0 & \mbox{ otherwise} \end{cases}.\]

get.adjacency(g)
## 3 x 3 sparse Matrix of class "dgCMatrix"
##   1 2 3
## 1 . 1 1
## 2 . . .
## 3 1 . .

Network Representation

  • A node \(v\in\mathcal{V}\) is incident on an edge \(e\in\mathcal{E}\) if \(v\) is the endpoint of \(e\).

  • The degree of a node/vertex \(\mbox{deg}_v\) defines the number of edges incident on \(v\).

degree(g)
## 1 2 3 
## 3 1 2
  • Two edges are adjacent if they both join a common endpoint.

Network Representation

degree(g, mode="in")
## 1 2 3 
## 1 1 1
degree(g, mode="out")
## 1 2 3 
## 2 0 1

Network Representation

par(mfrow = c(1,3)); g=graph.lattice(c(3,3)); plot(g)
g=graph.lattice(c(3,3,3)); plot(g)
plot(degree(g))

Network Representation

par(mfrow = c(1,2)); igraph.options(vertex.label=NA,vertex.size=3); 
plot(g) # NULL for default
plot(g, layout=layout.circle)

Random Graph

  • The plot of degree of \(3-3-3\) lattice is unchanged.

  • Two graphs may be mathematically equivalent if they differ only up to the labels of their vertices and edges but not the structure. (They are said to be isomorphic.)

  • Generate randomness by a collection of graphs/networks that are different in nodes (\(N_v\)), edges (\(N_e\)) and their sizes.

  • The size of graphs differs in terms of the number of nodes or of the degrees of those nodes. A collection of graphs induce a degree sequence.

Random Graph

  • A network graph is a parametric family of distribution \[\mathcal{P} = \{p_\theta (G): G\in\mathbb{G}, \theta\in\Theta\}.\]

  • \(\mathbb{G}\): a collection of possible graphs.

  • \(p_\theta(G)\): the probability of \(G\in\mathbb{G}\).

  • \(\theta\): parameter of interest.

  • Random Graph: \(p\) is uniform over \(\mathbb{G}\).

Random Graph

  • Erdos and Renyi model: Equal probability given size of the graph e.g. \[p(G)=\frac{N_e}{N}\, \mbox{ where } N = \left(\begin{array}{c} N_{v}\\ 2 \end{array}\right),\] \(N\) is the total number of distinct node pairs. Thus \(p \propto N_e / N_v^2\).

Random Graph

par(mfrow = c(1,2)); g.er1 = erdos.renyi.game(100,0.01); 
g.er2 = erdos.renyi.game(100,0.05); # N_v=100 
plot(g.er1); plot(g.er2)

Random Graph

  • The degree of the given vertex is distributed as a binomial random variable with parameters \(N_v - 1\) and \(p\) (the mean: \((N_v - 1) p\)).
mean(degree(g.er1))
## [1] 1.08
mean(degree(g.er2))
## [1] 4.32

Random Graph

par(mfrow = c(1,2)); hist(degree(g.er1),breaks=10);
hist(degree(g.er2),breaks=10)

Random Graph

  • A graph with \(N_v\) nodes. Each node has a corresponding \(\mbox{deg}\). A sequence \(\{\mbox{deg}_1,\dots \mbox{deg}_{N_v} \}\) characterizes the graph.

  • The graphs are the same (the same probabilities) if they have the same number of nodes and the same degree sequences.

Random Graph

par(mfrow = c(1,2));degs = c(1:20); g.degs = degree.sequence.game(degs); plot(g.degs,layout=layout_as_tree); hist(degree(g.degs),breaks=20)

Random Graph

nv = vcount(g.degs); ne = ecount(g.degs); 
g.er3 = erdos.renyi.game(nv,ne,"gnm", directed = TRUE); 
mean(degree(g.er3))
## [1] 10.5

Random Graph

plot(g.er3,layout=layout_as_tree)

Random Graph

g.er4 = erdos.renyi.game(nv,0.25,"gnp", directed = TRUE); 
mean(degree(g.er4))
## [1] 9.8

Random Graph

plot(g.er4,layout=layout_as_tree)

Exponential Random Graph Model (ERGM)

  • Recall the canonical exponential family (with \(h(y)=1\)):\[ p(y|\eta)=\exp(\eta^{T}\phi(y)-A(\eta))= \frac{ \exp(\eta^{T}\phi(y))}{\int \exp\left\lbrace \eta^{T}\phi(y)\right\rbrace dy}.\] where \(A(\eta)=\log (\int \exp\left\lbrace \eta^{T}\phi(y)\right\rbrace dy).\)

  • \(y\) could be \(0\) and \(1\) binary observation.

  • All information of \(G\) is captured by the adjacency matrix \(\mathbf{Y}\).

ERGM

  • The ERGM for graph \(G\) is \[ p(\mathbf{Y} |\eta)=\frac{1}{\kappa}\exp\left( \sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y})\right)\] where \(\mathbf{Y}\) is the adjacency matrix for \(G\), the set \(\mathcal{C}\) refers to a set of possible edges satisfying certain configuration, the statistics \[\phi(\mathbf{Y}) = \prod_{\{i,j\}\in \mathcal{C}} y_{ij} = \begin{cases} 1, & \mbox{ if } \{y_{ij}\}_{\{i,j\}\in\mathcal{C} } \mbox{ satisfy the configuration } \mathcal{C}\\ 0 & \mbox{ otherwise}. \end{cases}\] \(\eta\) captures the dependence between \(y_{ij}\) and the configuration.

ERGM

  • For random graph, the presence of an edge between one pair is indepndent of the status of possible edges between any other pairs.

  • The configuration only consider the connection of any paired vertices: \[\sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y}) = \sum_{i, j} \eta_{ij} y_{ij}.\] In addition, the coefficients \(\eta_{ij}\) are equal to for any two edges so \(\eta_{ij} = \eta\).

ERGM

  • The model is reduced to \[ p(\mathbf{Y} |\eta)=\frac{1}{\kappa}\exp\left( \eta N_e \right)\] as \(\sum_{i,j} \eta_{ij} y_{ij} = \eta \sum_{i,j} y_{ij} = \eta N_e\).
library(statnet); Y = as.matrix(get.adjacency(g.er2)); 

ERGM

image(Y, col=c("white","black"), xlab="column", ylab="row")

ERGM

ergm.er = ergm(Y~edges); coef(ergm.er)
##     edges 
## -3.087247
summary.statistics(ergm.er) 
## edges 
##   432
exp(-2.921)
## [1] 0.05387978

Hierarchical Structure for Panels

  • Recall the success count model. Now we consider \(Y_{i,t}\) where the count depends on time \(t\) and site \(i\).

  • The spatiotemporal structure \[\begin{align} Y_{i,t} &\sim \mbox{Poi}(\theta_{i,t}) \\ \log (\theta_{i,t}) &= \eta = \beta_{i}^T\mathbf{x}_t = \beta_{0,i} + \sum_{k=1}^{3}\beta_k x^{k}_{t} + \varepsilon_t .\end{align}\]

  • \(\beta_{0,i}\): the spatial effect \(\beta_{0,i}\).

  • \(\{\beta_{k},\beta_{2},\beta_{3}\}\): do not vary by time or by spaces.

  • \(\varepsilon_t\): the temporal effects

Hierarchical Structure

  • Random effects may consist of two or more effects or paramters that integrated together in some way.

  • \(\beta_{0,i}\sim \mathcal{N}(\mu,\sigma^2)\) has (hyper)parameter that expresses the fixed effect \(\mu\) and the magnitude of the variability among sites \(\sigma^{2}\).

  • \(\varepsilon_t \sim \mathcal{N}(0,\sigma_{\varepsilon}^2)\) has the magnitude of the variability among years \(\sigma_{\varepsilon}^2\).

  • Hierarchical models: a hierarchy of effects or a nested relationship among the parameters.

Hierarchical Structure

  • Hyperparameters: They are parameters for parameters, e.g. \(\sigma\) is the parameter of the parameter \(\beta_{0,i}\).

  • Hyperparameters have their effects during estimation, by shrinking the varying effect \(\sigma\) the parameters towards a common mean \(\mu\).

  • We simultaneously estimate both the intercept for each site (\(\beta_{0,i}\)) and the variation among sites \(\sigma\).

First Spatiotemporal Model

nyears = 36; nsite = 50; mu= 4.2; beta1 = 2; 
beta2 = 0.1; beta3 = -1.7; sd.site = 0.5; sd.year = 0.2;
year = 1:nyears;  x = (year-round(nyears/2)) / (nyears / 2)
site = 1:nsite; y = eta = array(NA, dim=c(nyears, nsite))
beta0 = rnorm(nsite, mu, sd.site)
eps = rnorm(nyears, 0, sd.year)
for (i in 1:nsite){
  eta[,i] = beta0[i] + beta1 * x + beta2 * x^2 + beta3 * x^3 + eps 
  y.mean = exp(eta[,i])
  y[,i] = rpois(nyears, y.mean)
}

First Spatiotemporal Model

First Spatiotemporal Model

d.spt = list(y=y, nsite=nsite, nyear=nyears, x=x)
cat("
model { # Priors
  for (i in 1:nsite){
     beta0[i] ~ dnorm(mu, tau.b0)   
  }
  mu ~ dnorm(0, 0.01)               
  tau.b0 = 1 / (sd.b0*sd.b0)    
  sd.b0 ~ dunif(0, 2)
  for (k in 1:3){
     beta[k] ~ dnorm(0, 0.01)
  }
  tau.year = 1 / (sd.year*sd.year)
  sd.year ~ dunif(0, 1) 

First Spatiotemporal Model

  # Likelihood
  for (t in 1:nyear){
     eps[t] ~ dnorm(0, tau.year)           
     for (i in 1:nsite){
        y[t,i] ~ dpois(lambda[t,i])        
        lambda[t,i] = exp(log.lambda[t,i]) 
        log.lambda[t,i] = beta0[i] + beta[1] * x[t] + 
        beta[2] *   pow(x[t],2) + beta[3] * pow(x[t],3) + eps[t] 
      }  #i
   }  #t
}",file="model_spt.bug")
params = c("mu","beta0","beta","sd.b0","sd.year")
spt.model = jags(data=d.spt, model.file="model_spt.bug", 
                    parameters.to.save=params, n.iter=2000, 
                    n.burnin=1000, n.chains=3)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1800
##    Unobserved stochastic nodes: 92
##    Total graph size: 16583
## 
## Initializing model

First Spatiotemporal Model

A Simple Spatial Model

  • Random effect captures the spatial variations.

  • The effects so far, whether they are intercepts or slopes, have been defined over discrete, unordered categories, e.g. from location \(1\), to location \(50\).

  • Consider continuous categories like the distance amongst any two points.

  • Two locations nearby share many common features, like culture, enviornment, transportation, etc. And the similarities would decrease when the distance becomes larger.

  • We need a unique characterization of intercepts and slops in which locations close to each others should have more similar intercepts and slops.

Gaussian Process

  • Recall the culture example. Corporation is explained by the "contact" variable.

  • Close geographic nieghbors should have more similar technologies. Because closer distance may share unmeasured geographic features like sources that lead to similar technological industries.

  • We define a distance matrix among the cultures.

  • A smmetric matrix with diagnoal zeros.

A Simple Spatial Model

Larger populations will both develop and sustain more complex technologies. (Obs from plays in one video game.)

culture = c("N","R","I","B","J","F","G","C","U","A")
pop = c(7, 7.3, 8.2, 8.5, 8.9, 9, 9.1, 9.4, 9.8, 12.5)
contact = c(0, 0, 0, 1, 1, 1, 1, 0, 1, 0)
cvi = factor(c(1,1,1,1,2,2,2,3,3,3))
tech = c(13, 22, 24, 43, 33, 19, 40, 28, 55, 71)
d = data.frame(culture,pop,contact,tech,cvi)

Gaussian Process

##     N   R   I   B   J   F   G   C   U   A
## N 0.0 0.5 0.6 4.4 1.2 2.0 3.2 2.8 1.9 5.7
## R 0.5 0.0 0.3 4.2 1.2 2.0 2.9 2.7 2.0 5.3
## I 0.6 0.3 0.0 3.9 1.6 1.7 2.6 2.4 2.3 5.4
## B 4.4 4.2 3.9 0.0 5.4 2.5 1.6 1.6 6.1 7.2
## J 1.2 1.2 1.6 5.4 0.0 3.2 4.0 3.9 0.8 4.9
## F 2.0 2.0 1.7 2.5 3.2 0.0 1.8 0.8 3.9 6.7
## G 3.2 2.9 2.6 1.6 4.0 1.8 0.0 1.2 4.8 5.8
## C 2.8 2.7 2.4 1.6 3.9 0.8 1.2 0.0 4.6 6.7
## U 1.9 2.0 2.3 6.1 0.8 3.9 4.8 4.6 0.0 5.0
## A 5.7 5.3 5.4 7.2 4.9 6.7 5.8 6.7 5.0 0.0

Gaussian Process

  • We consider the Poisson likelihood with a varying intercept \[\begin{align} y_i &= \mbox{Poi}(\theta_i), \\ log(\theta_i) = \eta_i &= \mu + \beta_{0,i} + \beta_{1} x_{i}.\\ \beta_{0} &\sim \mbox{MVN}([0,\dots,0], \mathbf{K})\\ k_{ij} &= \gamma^{2} \exp(-\rho^2 d^{2}_{ij}) + \delta_{ij}\sigma^{2} \end{align}\]

  • The parameter \(\beta_0\) is about the distance not the category.

  • The prior for \(\beta_0\) is a \(10\)-dimensional multivariate Gaussian prior. The mean is a zero vector because there is a mean \(\mu\) in the model.

  • The distance matrix is the covariance matrix for these zero intercepts which makes these intercepts deviations.

Gaussian Process

  • The formula of \(\mathbf{K}\) gives the covariance model its shape is \(\exp\{-\rho^2 d^{2}_{ij}\}\) (curve \(\exp(-x)\), dot curve \(\exp(-x^2)\))

Gaussian Process

Gaussian Process

The covariance matrix: clusters in the upper-left are highly correlated.

     N    R    I    B    J    F    G    C    U A
N 1.00 0.87 0.81 0.00 0.52 0.18 0.02 0.04 0.24 0
R 0.87 1.00 0.92 0.00 0.52 0.19 0.03 0.05 0.20 0
I 0.81 0.92 1.00 0.00 0.36 0.30 0.07 0.10 0.12 0
B 0.00 0.00 0.00 1.00 0.00 0.08 0.36 0.33 0.00 0
J 0.52 0.52 0.36 0.00 1.00 0.01 0.00 0.00 0.75 0
F 0.18 0.19 0.30 0.08 0.01 1.00 0.26 0.71 0.00 0
G 0.02 0.03 0.07 0.36 0.00 0.26 1.00 0.53 0.00 0
C 0.04 0.05 0.10 0.33 0.00 0.71 0.53 1.00 0.00 0
U 0.24 0.20 0.12 0.00 0.75 0.00 0.00 0.00 1.00 0
A 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1

Gaussian Process

  • Dark lines: stronger correlation
  • Pure white: zero correlation
data{
  int<lower=1> N;
  int<lower=1> N_society;
  int total_tools[N];
  real logpop[N];
  int society[N];
  matrix[N_society,N_society] Dmat;
  }
  parameters{
  vector[N_society] g;
  real a;
  real bp;
  real<lower=0> etasq;
  real<lower=0> rhosq;
  }
  model{
  matrix[N_society,N_society] SIGMA_Dmat;
  vector[N] lambda;
  rhosq ~ cauchy( 0 , 1 );
  etasq ~ cauchy( 0 , 1 );
  bp ~ normal( 0 , 1 );
  a ~ normal( 0 , 10 );
  for ( i in 1:(N_society-1) )
    for ( j in (i+1):N_society ) {
      SIGMA_Dmat[i,j] = etasq*exp(-rhosq*pow(Dmat[i,j],2));
      SIGMA_Dmat[j,i] = SIGMA_Dmat[i,j];
      }
    for ( k in 1:N_society )
      SIGMA_Dmat[k,k] = etasq + 0.01;
      g ~ multi_normal( rep_vector(0,N_society) , SIGMA_Dmat );
      for ( i in 1:N ) {
        lambda[i] = a + g[society[i]] + bp * logpop[i];
      }
      total_tools ~ poisson_log( lambda );}
    generated quantities{
      matrix[N_society,N_society] SIGMA_Dmat;
      vector[N] lambda;
      real dev;
      dev = 0;
      for ( i in 1:N ) {
      lambda[i] = a + g[society[i]] + bp * logpop[i];
      }
      dev = dev + (-2)*poisson_log_lpmf( total_tools | lambda );
}"