Spatial Modeling

Spatial modeling is important in many contexts where there are regional dependencies. As an example, consider the rate of natural catastrophies. Those that live on coastal cities are more likely to experience damage from hurricanes, homes that are near rivers or lakes are more likely to experience flooding.

The problem with this, however, is that we don’t know the location of the next fire or flood. Their occurence is modeled as a random variable.

Finally, it is computationally difficult to model each exact location point. Therefore, we call our entire space of possibilities, regions. So in our case, Earth is our region. We can then partition Earth into small grids; each representing a piece of a region. These grids are equally spaced and will have the same area. An easy way to segment it is to select cutoffs for latitude and longitude.

We can store regional structure into an $n\times n$ matrix $R\in \mathbb{R}^{n\times n}$ where $n$ are all of the districts in the region. If $r_{ij}=1$, then $i$ and $j$ border each other, otherwise $r_{ij}=0$.

Spatial here is defined as border to but it can also be distance to measured by Euclidean distance or some other metric. So, in our matrix $R$, we have that $N(i)$ is the set of neighbors that border observation $i$. Then we get the following:

The matrix is :

\[R\in \overbrace{ \underbrace{n}_\text{start location i} \times \underbrace{n}_\text{neighbors of i}}^\text{the neighborhood matrix}\]

Some other Latex Notes:

\[\textrm{Experimenting \textit{with \textbf{LaTeX}}}\]

\[r_{ij}=\left\{ \begin{array}{lr} 1 &:j\in N(i)\\ 0 &:j\not\in N(i) \end{array} \right. \]

There are also a couple of statistics that are equivalent in time series: the Moran’s I and Geary’s C with equivalent forms as the autocorrelation coefficient and Durban-Watson statistic.

The author notes something interesting; after fitting a cubic GAM model to remove the population density effect (that rural regions experience higher average claims than denser regions), he tests the residuals of the fit using Moran’s I and Geary’s C. That is, he fits the GAM and produces residuals $\vec{r}=\vec{y}-\vec{\hat{y}}$ so that the GAM removes the density effects.

Spatial Models

Here we will talk about two spatial models that are:

Conditionally Autoregressive Models (CAR)
Simultaneously Autoregressive Models (SAR)

The CAR model uses the spatial matrix $R$ with the neighborhood metric $N(i)$. So that now it becomes convenient to use conditional distributions of $Y_i, i=1,2,...,n$ given all other variables,

\[p(y_i|\vec{y}_{i-1}) = \frac{p(\vec{y})}{p(\vec{y_{-1}})}\]

so that for the neighborhood model, the above is equal to: \[=p(y_i|y_j,j\in N(j)), 1=1,2,...,n.\]

To see this, notice that $y_i$ is only influenced by its neighbors which implies that: \[ \begin{align*} p(y_i|j\in N(i),j\not \in N(i))&=p(y_i|j\in N(i))\\ &=p(y_i|j\in N(i))p(j\in N(i)) \end{align*} \]

This distribution is not compatible for all cases where compatible is defined as:

\[\sum_{i=1}^np(y_i|j\in N(i))p(j\in N(i))=p(y_1,y_2,...,y_n).\]

If the conditional distribution is normal: \[y_i|{y_j \in N(i)} \sim N(\mu_i+\sum_{j\in N(i)}c_{ij}(y_j-\mu_j),\sigma^2)\]

where:

$\mu_j \in \mathbb{R}$.
$c_{ij}\in \mathbb{R}$ specifies the degree of dependence between regions $i$ and $j$. For the neighborhood model, $c_{ij}=1 \iff j\in N(i)$ and for a distance model, $c_{ij}$ is $1/dist(i,j)$ so that further distances have a smaller weight.

We can think of the CAR model as an AR process; meaning that $y_i$ is a weighted function of the distances of its neighbors instead of a weighted function of the previous values.

To finish the discussion on compatibility, let $M_\sigma$ be a diagonal matrix with entries $(M_\sigma)_{ii}=\sigma_i^2, i=1,2,...,n,$ and $C=(c_{ij})\in \mathbb{R}^{n\times n}$ be a proximity or neighborhood matrix with $c_{ij}=1 \iff j\in N(i)$. Then, the CAR model holds: \[\vec{y}\sim N_n(\vec{\mu},(I-C)^{-1}M_\sigma).\]

where:

$\vec{\mu}=(\mu_1,\mu_2,...,\mu_n)^T$
$c_{ij}\sigma_j^2=c_{ij}\sigma_i^2$ ensures that $(I_C)^{-1}M_\sigma$ is symmetric.
$M_\sigma=\sigma^2M$ where $M=(m_{ij})\in \mathbb{R}^{n\times n}$ and M is diagonal.

On page 266, the book states that the intrinsically autoregressive model (IAR) is the most common CAR specification. The IAR model has entries $m_{ii}=n_i^{-1}, i=1,2,...,n,$ and where $n_i:=|N(i)|$ are the number of neighbors and $C=MW$, where $W$ is the adjacency matrix and $C$ is the normalized adjacency matrix. This means that \[C= \left[ \begin{array}{cccc} \frac{1}{n_1} & 0 & ... & 0\\ 0 & \frac{1}{n_2} & 0 & 0\\ 0 & 0 & \ddots & 0\\ 0 & 0 & ... & \frac{1}{n_n} \end{array} W \]

where $W$ is a symmetric matrix with zero on the diagonal.

For IAR models, the joint distribution on $\vec{y}$ is not proper, however, there is a convenient way to remedy this by adding a parmater $\phi\in [0,1]$ to the specification of the spatial dependence matrix $C$. That is, let $C'=\phi MW$. Additionally, the full conditionals are available and thus useful for Markov Chain Monte Carlo methods.

Simultaneously Autoregressive Models

The CAR model implies that \[(I-C)\vec{y} \sim N_n((I-C)\vec{\mu},M_\sigma(I-C)')\] \[\implies \epsilon_{CAR}:=(I-C)(\vec{y}-\vec{\mu}) \sim N_n(\vec{0},M_\sigma(I-C)').\]

so that the variance $\epsilon_{CAR}$ are not independent. The SAR model is defined as:

\[\epsilon_{SAR}:=(I-B)(\vec{y}-\vec{\mu}) \sim N_n(\vec{0},D_T)\]

where:

$t=(t_1,t_2,...,t_n)^T$
$D_T$ is diaglon with $(D_T)_{ii}=t_i^2, i=1,2,...,n$.
$B=(b_{ij})\in \mathbb{R}^{n\times n}$ and $b_{ij}=0$ if $j\not \in N(i)$.

The SAR model can be thought of as a regression of $y_i$ on $y_j$, $j\in N(i)$ and can be written as:

\[\vec{y}=\vec{\mu}+B(\vec{y}-\vec{\mu})+\vec{\epsilon_{SAR}}\] \[\iff \vec{y}=B\vec{y}+(I-B)\vec{\mu}+\vec{\epsilon_{SAR}}\].

This implies we can write the conditional distribution as: \[E[y_i|y_j,j\in N(i)]=\mu_i+\sum_{j\in N(i)}b_{ij}(y_j-\mu_j), i=1,2,...,n.\]

The CAR and SAR models illustrate the autoregressive models for spatial data. In the SAR model, the joint distribution is:

\[\vec{y}\sim N_n(\vec{\mu},(I-B)^{-1}D_T(I-B')^{-1}).\]

and then we can add covariate information by the usual way; that is, set $\mu_i=x_i^Tb$.

Application

The book example estimates average claim size using the CAR and SAR models where both models have the same estimations as discussed previously.

The conditional means $\mu_i$ are specified with an intercept model $\beta_0$, covariate models with intercept $\beta_0$ using logged population density up to order 3.

Traditional insurance loss modeling is of the following form:

\[\frac{\$Loss}{Year}=\frac{#Accidents}{Year}\cdot\frac{\$Loss}{#Accident}.\]

Additionally, although regression can explain most of the variance in the observed claims, there are still unobserved spatial dependence that is not modelled. This is where spatial modeling could be useful; to explain more of the variance after using the covariate information. Bayesian methods are used to estimate the parameters.

For the notation, let $N_1,N_2,...,N_m$ denote the $m$ policyholders in the year for $R$ regions where $r(i)$ is the neighborhood of region $r$.

The number of claims for individual $i$ is: \[N_i \sim Poisson(\mu_i^N), i=1,2,...,m.\]

That is, the conditional expectation of the number of claims for individual $i$ is $E[N_i]=\mu_i$. For the log-linear model, we can specify the mean as:

\[mu_i^N=E_i\exp{x_i^T\beta+\gamma_{r(i)}}.\]

where:

$E_i$ is the exposure.
$x_i=(1,x_{i1},x_{i2},...,x_{ik})^T$.
$\gamma_1,\gamma_2,...,\gamma_R$ are random effects corresponding to the $R$ regions and induce spatial dependence among $N_1,N_2,...,N_m.$

Claim Severity

To model severity, we can model one of two ways which are:

Model each individual claim size $S_{ij}, j=1,2,...,N_i$ for each policyholder $i$ as follows:

\[S_i:=\frac{1}{N_i}\sum_{j=1}^{N_i}S_{ij}, i=1,2,...,m.\]

To make sense of the above function, we have individual severity amounts for each claim ($frequency_j$). This implies that $S_{ij}=S_i, \forall j$ if we assume that the $jth$ claim and severty are independent. We use a Gamma regression model for this.

For the individual claim sizes $S_{ij}, j=1,2,...,N_i$, we assume that they are independently Gamma distributed given the number of claims.

\[S_{ij}|N_i \sim Gamma(\mu_i^S,v), j=1,2,...,N_i, i=1,2,...,m.\]

Parameter Estimation

Parameters are estimated in a Bayesian approach using MCMC techniques.

library(ggmap)

## Warning: package 'ggmap' was built under R version 3.1.2

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 3.1.2

library(RgoogleMaps)

## Warning: package 'RgoogleMaps' was built under R version 3.1.2

london.center <- getGeoCode("London")
london <- get_map("London", zoom=12)

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=London&zoom=12&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=London&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms

x    <- seq(-2,2,0.1)
df   <- expand.grid(x=x,y=x)
f = function(a,b){
   dnorm(a^2+b^2)
}
df$z <- with(df,f(x,y))
df$x <- london.center[2]+df$x/20
df$y <- london.center[1]+df$y/20

ggp <- ggmap(london)+
  geom_tile(data=df,aes(x=x,y=y,fill=z), alpha=0.2)+
  scale_fill_gradientn(guide="none",colours=rev(heat.colors(10)))+
  stat_contour(data=df, aes(x=x, y=y, z=z, color=..level..), geom="path", size=1)+
  scale_color_gradientn(colours=rev(heat.colors(10)))
plot(ggp)

## Warning: Removed 574 rows containing non-finite values (stat_contour).