Data from Network or Spatial systems are high-dimensional, complex, and stochastic.
Probabilistic structure provides a better characterization.
Parametric structure reduces the difficulty of processing the network and spatial data.
05/03/2018
Data from Network or Spatial systems are high-dimensional, complex, and stochastic.
Probabilistic structure provides a better characterization.
Parametric structure reduces the difficulty of processing the network and spatial data.
To characterize network information, we need a two dimensional data structure.
A network \(G\).
Two sets of information: \(G = (\mathcal{V}, \mathcal{E})\).
\(\mathcal{V}\): a set of vertices (nodes).
\(\mathcal{E}\): a set of edges. If \(u,v\in \mathcal{V}\) and they are connected, the pair \(\{u,v\} \in \mathcal{E}\).
library(igraph); g = graph.formula(1-2, 1-3); # 3 notes, 2 edges plot(g)
V(g)
## + 3/3 vertices, named: ## [1] 1 2 3
vcount(g)
## [1] 3
E(g)
## + 2/2 edges (vertex names): ## [1] 1--2 1--3
ecount(g)
## [1] 2
The graph is called directed graph if there is a directed connection between two nodes.
If \(u,v\in \mathcal{V}\) and they are connected from \(u\) to \(v\), the pair \(\{u,v\} \in \mathcal{E}\) but \(\{v,u\} \notin \mathcal{E}\) - an order exists in the pair.
g = graph.formula(1-+2, 1++3); # 3 nodes, 1 directed edge plot(g)
The network/graph can be represented in matrix form.
\(N_v\): number of the nodes
\(\mathbf{Y}\): an \(N_v \times N_v\) adjacency matrix for a graph \(G\): \[ Y_{ij}=\begin{cases} 1, & \mbox{ if }\{i,j\}\in\mathcal{E}\\ 0 & \mbox{ otherwise} \end{cases}.\]
get.adjacency(g)
## 3 x 3 sparse Matrix of class "dgCMatrix" ## 1 2 3 ## 1 . 1 1 ## 2 . . . ## 3 1 . .
A node \(v\in\mathcal{V}\) is incident on an edge \(e\in\mathcal{E}\) if \(v\) is the endpoint of \(e\).
The degree of a node/vertex \(\mbox{deg}_v\) defines the number of edges incident on \(v\).
degree(g)
## 1 2 3 ## 3 1 2
degree(g, mode="in")
## 1 2 3 ## 1 1 1
degree(g, mode="out")
## 1 2 3 ## 2 0 1
par(mfrow = c(1,3)); g=graph.lattice(c(3,3)); plot(g) g=graph.lattice(c(3,3,3)); plot(g) plot(degree(g))
par(mfrow = c(1,2)); igraph.options(vertex.label=NA,vertex.size=3); plot(g) # NULL for default plot(g, layout=layout.circle)
The plot of degree of \(3-3-3\) lattice is unchanged.
Two graphs may be mathematically equivalent if they differ only up to the labels of their vertices and edges but not the structure. (They are said to be isomorphic.)
Generate randomness by a collection of graphs/networks that are different in nodes (\(N_v\)), edges (\(N_e\)) and their sizes.
The size of graphs differs in terms of the number of nodes or of the degrees of those nodes. A collection of graphs induce a degree sequence.
A network graph is a parametric family of distribution \[\mathcal{P} = \{p_\theta (G): G\in\mathbb{G}, \theta\in\Theta\}.\]
\(\mathbb{G}\): a collection of possible graphs.
\(p_\theta(G)\): the probability of \(G\in\mathbb{G}\).
\(\theta\): parameter of interest.
Random Graph: \(p\) is uniform over \(\mathbb{G}\).
par(mfrow = c(1,2)); g.er1 = erdos.renyi.game(100,0.01); g.er2 = erdos.renyi.game(100,0.05); # N_v=100 plot(g.er1); plot(g.er2)
mean(degree(g.er1))
## [1] 1.08
mean(degree(g.er2))
## [1] 4.32
par(mfrow = c(1,2)); hist(degree(g.er1),breaks=10); hist(degree(g.er2),breaks=10)
A graph with \(N_v\) nodes. Each node has a corresponding \(\mbox{deg}\). A sequence \(\{\mbox{deg}_1,\dots \mbox{deg}_{N_v} \}\) characterizes the graph.
The graphs are the same (the same probabilities) if they have the same number of nodes and the same degree sequences.
par(mfrow = c(1,2));degs = c(1:20); g.degs = degree.sequence.game(degs); plot(g.degs,layout=layout_as_tree); hist(degree(g.degs),breaks=20)
nv = vcount(g.degs); ne = ecount(g.degs); g.er3 = erdos.renyi.game(nv,ne,"gnm", directed = TRUE); mean(degree(g.er3))
## [1] 10.5
plot(g.er3,layout=layout_as_tree)
g.er4 = erdos.renyi.game(nv,0.25,"gnp", directed = TRUE); mean(degree(g.er4))
## [1] 9.8
plot(g.er4,layout=layout_as_tree)
Recall the canonical exponential family (with \(h(y)=1\)):\[ p(y|\eta)=\exp(\eta^{T}\phi(y)-A(\eta))= \frac{ \exp(\eta^{T}\phi(y))}{\int \exp\left\lbrace \eta^{T}\phi(y)\right\rbrace dy}.\] where \(A(\eta)=\log (\int \exp\left\lbrace \eta^{T}\phi(y)\right\rbrace dy).\)
\(y\) could be \(0\) and \(1\) binary observation.
All information of \(G\) is captured by the adjacency matrix \(\mathbf{Y}\).
For random graph, the presence of an edge between one pair is indepndent of the status of possible edges between any other pairs.
The configuration only consider the connection of any paired vertices: \[\sum_{k \in \mathcal{C}} \eta_{k}\phi_k (\mathbf{Y}) = \sum_{i, j} \eta_{ij} y_{ij}.\] In addition, the coefficients \(\eta_{ij}\) are equal to for any two edges so \(\eta_{ij} = \eta\).
library(statnet); Y = as.matrix(get.adjacency(g.er2));
image(Y, col=c("white","black"), xlab="column", ylab="row")
ergm.er = ergm(Y~edges); coef(ergm.er)
## edges ## -3.087247
summary.statistics(ergm.er)
## edges ## 432
exp(-2.921)
## [1] 0.05387978
Recall the success count model. Now we consider \(Y_{i,t}\) where the count depends on time \(t\) and site \(i\).
The spatiotemporal structure \[\begin{align} Y_{i,t} &\sim \mbox{Poi}(\theta_{i,t}) \\ \log (\theta_{i,t}) &= \eta = \beta_{i}^T\mathbf{x}_t = \beta_{0,i} + \sum_{k=1}^{3}\beta_k x^{k}_{t} + \varepsilon_t .\end{align}\]
\(\beta_{0,i}\): the spatial effect \(\beta_{0,i}\).
\(\{\beta_{k},\beta_{2},\beta_{3}\}\): do not vary by time or by spaces.
\(\varepsilon_t\): the temporal effects
Random effects may consist of two or more effects or paramters that integrated together in some way.
\(\beta_{0,i}\sim \mathcal{N}(\mu,\sigma^2)\) has (hyper)parameter that expresses the fixed effect \(\mu\) and the magnitude of the variability among sites \(\sigma^{2}\).
\(\varepsilon_t \sim \mathcal{N}(0,\sigma_{\varepsilon}^2)\) has the magnitude of the variability among years \(\sigma_{\varepsilon}^2\).
Hierarchical models: a hierarchy of effects or a nested relationship among the parameters.
Hyperparameters: They are parameters for parameters, e.g. \(\sigma\) is the parameter of the parameter \(\beta_{0,i}\).
Hyperparameters have their effects during estimation, by shrinking the varying effect \(\sigma\) the parameters towards a common mean \(\mu\).
We simultaneously estimate both the intercept for each site (\(\beta_{0,i}\)) and the variation among sites \(\sigma\).
nyears = 36; nsite = 50; mu= 4.2; beta1 = 2;
beta2 = 0.1; beta3 = -1.7; sd.site = 0.5; sd.year = 0.2;
year = 1:nyears; x = (year-round(nyears/2)) / (nyears / 2)
site = 1:nsite; y = eta = array(NA, dim=c(nyears, nsite))
beta0 = rnorm(nsite, mu, sd.site)
eps = rnorm(nyears, 0, sd.year)
for (i in 1:nsite){
eta[,i] = beta0[i] + beta1 * x + beta2 * x^2 + beta3 * x^3 + eps
y.mean = exp(eta[,i])
y[,i] = rpois(nyears, y.mean)
}
d.spt = list(y=y, nsite=nsite, nyear=nyears, x=x)
cat("
model { # Priors
for (i in 1:nsite){
beta0[i] ~ dnorm(mu, tau.b0)
}
mu ~ dnorm(0, 0.01)
tau.b0 = 1 / (sd.b0*sd.b0)
sd.b0 ~ dunif(0, 2)
for (k in 1:3){
beta[k] ~ dnorm(0, 0.01)
}
tau.year = 1 / (sd.year*sd.year)
sd.year ~ dunif(0, 1)
# Likelihood
for (t in 1:nyear){
eps[t] ~ dnorm(0, tau.year)
for (i in 1:nsite){
y[t,i] ~ dpois(lambda[t,i])
lambda[t,i] = exp(log.lambda[t,i])
log.lambda[t,i] = beta0[i] + beta[1] * x[t] +
beta[2] * pow(x[t],2) + beta[3] * pow(x[t],3) + eps[t]
} #i
} #t
}",file="model_spt.bug")
params = c("mu","beta0","beta","sd.b0","sd.year")
spt.model = jags(data=d.spt, model.file="model_spt.bug",
parameters.to.save=params, n.iter=2000,
n.burnin=1000, n.chains=3)
## Compiling model graph ## Resolving undeclared variables ## Allocating nodes ## Graph information: ## Observed stochastic nodes: 1800 ## Unobserved stochastic nodes: 92 ## Total graph size: 16583 ## ## Initializing model
Random effect captures the spatial variations.
The effects so far, whether they are intercepts or slopes, have been defined over discrete, unordered categories, e.g. from location \(1\), to location \(50\).
Consider continuous categories like the distance amongst any two points.
Two locations nearby share many common features, like culture, enviornment, transportation, etc. And the similarities would decrease when the distance becomes larger.
We need a unique characterization of intercepts and slops in which locations close to each others should have more similar intercepts and slops.
Recall the culture example. Corporation is explained by the "contact" variable.
Close geographic nieghbors should have more similar technologies. Because closer distance may share unmeasured geographic features like sources that lead to similar technological industries.
We define a distance matrix among the cultures.
A smmetric matrix with diagnoal zeros.
Larger populations will both develop and sustain more complex technologies. (Obs from plays in one video game.)
culture = c("N","R","I","B","J","F","G","C","U","A")
pop = c(7, 7.3, 8.2, 8.5, 8.9, 9, 9.1, 9.4, 9.8, 12.5)
contact = c(0, 0, 0, 1, 1, 1, 1, 0, 1, 0)
cvi = factor(c(1,1,1,1,2,2,2,3,3,3))
tech = c(13, 22, 24, 43, 33, 19, 40, 28, 55, 71)
d = data.frame(culture,pop,contact,tech,cvi)
## N R I B J F G C U A ## N 0.0 0.5 0.6 4.4 1.2 2.0 3.2 2.8 1.9 5.7 ## R 0.5 0.0 0.3 4.2 1.2 2.0 2.9 2.7 2.0 5.3 ## I 0.6 0.3 0.0 3.9 1.6 1.7 2.6 2.4 2.3 5.4 ## B 4.4 4.2 3.9 0.0 5.4 2.5 1.6 1.6 6.1 7.2 ## J 1.2 1.2 1.6 5.4 0.0 3.2 4.0 3.9 0.8 4.9 ## F 2.0 2.0 1.7 2.5 3.2 0.0 1.8 0.8 3.9 6.7 ## G 3.2 2.9 2.6 1.6 4.0 1.8 0.0 1.2 4.8 5.8 ## C 2.8 2.7 2.4 1.6 3.9 0.8 1.2 0.0 4.6 6.7 ## U 1.9 2.0 2.3 6.1 0.8 3.9 4.8 4.6 0.0 5.0 ## A 5.7 5.3 5.4 7.2 4.9 6.7 5.8 6.7 5.0 0.0
We consider the Poisson likelihood with a varying intercept \[\begin{align} y_i &= \mbox{Poi}(\theta_i), \\ log(\theta_i) = \eta_i &= \mu + \beta_{0,i} + \beta_{1} x_{i}.\\ \beta_{0} &\sim \mbox{MVN}([0,\dots,0], \mathbf{K})\\ k_{ij} &= \gamma^{2} \exp(-\rho^2 d^{2}_{ij}) + \delta_{ij}\sigma^{2} \end{align}\]
The parameter \(\beta_0\) is about the distance not the category.
The prior for \(\beta_0\) is a \(10\)-dimensional multivariate Gaussian prior. The mean is a zero vector because there is a mean \(\mu\) in the model.
The distance matrix is the covariance matrix for these zero intercepts which makes these intercepts deviations.
The covariance matrix: clusters in the upper-left are highly correlated.
N R I B J F G C U A N 1.00 0.87 0.81 0.00 0.52 0.18 0.02 0.04 0.24 0 R 0.87 1.00 0.92 0.00 0.52 0.19 0.03 0.05 0.20 0 I 0.81 0.92 1.00 0.00 0.36 0.30 0.07 0.10 0.12 0 B 0.00 0.00 0.00 1.00 0.00 0.08 0.36 0.33 0.00 0 J 0.52 0.52 0.36 0.00 1.00 0.01 0.00 0.00 0.75 0 F 0.18 0.19 0.30 0.08 0.01 1.00 0.26 0.71 0.00 0 G 0.02 0.03 0.07 0.36 0.00 0.26 1.00 0.53 0.00 0 C 0.04 0.05 0.10 0.33 0.00 0.71 0.53 1.00 0.00 0 U 0.24 0.20 0.12 0.00 0.75 0.00 0.00 0.00 1.00 0 A 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1
data{
int<lower=1> N;
int<lower=1> N_society;
int total_tools[N];
real logpop[N];
int society[N];
matrix[N_society,N_society] Dmat;
}
parameters{
vector[N_society] g;
real a;
real bp;
real<lower=0> etasq;
real<lower=0> rhosq;
}
model{
matrix[N_society,N_society] SIGMA_Dmat;
vector[N] lambda;
rhosq ~ cauchy( 0 , 1 );
etasq ~ cauchy( 0 , 1 );
bp ~ normal( 0 , 1 );
a ~ normal( 0 , 10 );
for ( i in 1:(N_society-1) )
for ( j in (i+1):N_society ) {
SIGMA_Dmat[i,j] = etasq*exp(-rhosq*pow(Dmat[i,j],2));
SIGMA_Dmat[j,i] = SIGMA_Dmat[i,j];
}
for ( k in 1:N_society )
SIGMA_Dmat[k,k] = etasq + 0.01;
g ~ multi_normal( rep_vector(0,N_society) , SIGMA_Dmat );
for ( i in 1:N ) {
lambda[i] = a + g[society[i]] + bp * logpop[i];
}
total_tools ~ poisson_log( lambda );}
generated quantities{
matrix[N_society,N_society] SIGMA_Dmat;
vector[N] lambda;
real dev;
dev = 0;
for ( i in 1:N ) {
lambda[i] = a + g[society[i]] + bp * logpop[i];
}
dev = dev + (-2)*poisson_log_lpmf( total_tools | lambda );
}"