The Spatial Autoregression Regression Model and Correlation

Introduction and Set-Up

This article highlights some issues with spatial lag models identified in this paper by Melanie Wall - involving some highly counterintuitive properties of the correlation between dependent variables that occur in some situations. A key approach to investigating this issue here is visual exploration, and working through the R code presented here will demonstrate a number of (I think) useful techniques. You will need to have the packages spdep and GISTools installed to run through the exercise.

Firstly, make sure the package has been loaded:

library(maptools)
library(spdep)

For this exercise you will look at the columbus data - as seen, for example in this text. Typing in the following will load the shapefile of neighbourhoods in Columbus, Ohio:

columbus <- readShapePoly(system.file("etc/shapes/columbus.shp",
 package="spdep")[1])

A First look at The Data

Firstly, a basic look at the zones from the Columbus, Ohio neighbourhood data.

# Create a plot of columbus
plot(columbus,col='wheat')
# Add labels for each of the zones
text(coordinates(columbus),as.character(1:49),cex=0.8)
# The box just makes things look neater
box(which='outer',lwd=2)

As stated above, this has been used in a number of studies. For each neighbourhood, a number of attributes are provided, including Average House Price, Burglary Rate and Average Income. However, here these will not be considered, as focus will be on the correlation structure implied by the adjacency matrix:

The adjacency Matrix

Next, we need to consider the adjacency matrix for these zones. The adjacency matrix plays an important role in a spatially lagged autoregressive model (SAR). This is a matrix \(W\) whose element \(w_{ij}\) is zero if zones \(i\) and \(j\) are not adjacent and some positive value if they are not. I have been deliberately vague about the value of \(w_{ij}\) for adjacent zones, as there are several options. There are also several options in terms of specifying the definition of zonal adjacency. Two possibilities that are commonly used are the Rook’s Case and Queen’s Case adjacency - the former requiring a shared boundary arc between two zones, the latter additionally allowing zones meeting at a single point to be deemed adjacent. Both of these can be computed from the columbus SpatialPolygonsDataFrame object.

# Extract a 'queen's case' adjacency object and print it out
col.queen.nb <- poly2nb(columbus,queen=TRUE)
col.queen.nb

## Neighbour list object:
## Number of regions: 49 
## Number of nonzero links: 236 
## Percentage nonzero weights: 9.829238 
## Average number of links: 4.816327

# Extract a 'rooks's case' djacency object and print it out
col.rook.nb <- poly2nb(columbus,queen=FALSE)
col.rook.nb

## Neighbour list object:
## Number of regions: 49 
## Number of nonzero links: 200 
## Percentage nonzero weights: 8.329863 
## Average number of links: 4.081633

The two variables col.queen.nb and col.rook.nb respectively contain the adjacency information for the Queen’s and Rook’s case adjacency. It can be seen that the Queen’s case has 36 more adjacencies than the Rook’s case. To add to this, the original data used in the references above actually used another definition of adjacency (the rationale behind this choice is less clear, but it appears to be Queen’s case minus 6 adjacencies).

# Read in the original adjacencies and print out the object
col.gal.nb <- read.gal(system.file("etc/weights/columbus.gal",
 package="spdep")[1])
col.gal.nb

## Neighbour list object:
## Number of regions: 49 
## Number of nonzero links: 230 
## Percentage nonzero weights: 9.579342 
## Average number of links: 4.693878

The differences between these three adjacency specifications can be seen if they are drawn on the map of Columbus. It turns out that the adjacencies in col.rook.nb are a subset of those in col.gal.nb and in turn these are a subset of those in cal.queen.nb. For this reason, a way of visualising these is to draw the largest set first (with thick connecting edges), then the next largest, and then the smallest. Here, a connecting edge is drawn between the centroid of two zones if they are adjacent.

plot(columbus,col='lightgrey')
plot(col.queen.nb,coords=coordinates(columbus),
     add=T,col='blue',lwd=3)
plot(col.gal.nb,coords=coordinates(columbus),
     add=T,col='yellow',lwd=3)
plot(col.rook.nb,coords=coordinates(columbus),
     add=T,col='red',lwd=3)
box(which='outer',lwd=2)

Here, the red edges correspond to the Rook’s case adjacencies. Adding the yellow edges gives those that are in the original adjacency definitions in addition to the Rook’s case, but not in the Queen’s case. Finally, the blue edges are in all three definitions.

You may notice that there appear to be only half as many edges in the plots as the number suggested in the earlier print-outs – for example there are only three blue edges, when the output above states that the Queen’s case has six more edges than the original adjacency. This is because some definitions of adjacency are assymetric - for example another often used definition of adjacency states that zone \(B\) is adjacent to zone \(B\) if zone \(B\) is one of the \(k\) th nearest neighbours of zone \(latex A\) - however this does not imply that zone \(B\) is one of the \(k\) th nearest neighbours of zone \(A\) - and if it is not then this will result in an assymetric adjacency relation. Thus, adjacency objects note if \(A\) is adjacent to \(B\) and if \(B\) is adjacent to \(A\). Since the definitions used here are all symmetric, each edge gets counted `both ways’ so that the print-outs of the adjacency objects show twice the number of visible edges.

Non-Zero Values of Adjacencies

Earlier it was stated that although non-adjacent zone pairs \(i\) and \(j\) always corresponded to a \(w_{ij}\) value of zero in the adjacency matrix, there were several alternative definitions for \(w_{ij}\) when zones \(i\) and \(j\) are adjacent. Possibly the simplest is to set \(w_{ij} = 1\) - so that \(W\) is an indicator matrix. However, another common approach is to set \(w_{ij} = d_i^{-1}\) where \(d_i\) is the number of adjacent zones that zone \(i\) has - then \(\sum_{i} w_{ij} = 1\) , that is, each row of \(W\) sums to 1. This approach will be used here.

One reason for using this is the specification of a SAR (Spatial Autoregressive Model) - where the dependent variable \(y_i\) is associated with zone \(i\) and is modelled as a function of explanatory variables \(x_{ij}\) and the adjacency matrix by

\[ y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_m x_{im} + \lambda \sum_j{w_{ij}y_j} + \varepsilon_i \]

Where the \(\epsilon_i\)’s are independent Gaussian random variables with mean zero and variance \(\sigma^2\).

or, in vector notation

\[ Y = X\beta + \lambda WY + \varepsilon \]

Where \(\varepsilon\) is a a Gaussian vector with mean zero and variance-covariance matrix \(\sigma^2 I\). Effectively, this is a standard regression model, but with one extra term (\(\lambda WY\)) which allows the predictor variable in adjacent zones to a given zone to have some impact. For example, if we were modelling house prices, this assumes that, as well as characteristics of a zone affecting the average house price in this zone, they would also be influenced by house prices in nearby areas. This also justifies the choice of non-zero values in the adjacency matrix being set to \(d_{i}^{-1}\) - this means that \(WY\) is actually the average value of the adjacent zones.

Normally, the way such models are dealt with is to calibrate the values of \(\beta\), \(\lambda\) and \(\sigma^2\) using maximum likelihood approaches. This will also provide theory to test hypotheses such as \(H_0 : \lambda = 0\) against \(H_1 : \lambda \ne 0\). That is useful since the null hypothesis is the standard linear regression model - that is, we are testing the null hypothesis of no spatial pattern against an alternative where a form of spatial pattern is present. However, Melanie Wall’s paper suggests we should think quite carefully about this particular alternative.

Characteristics of the Autoregressive Regression Model.

We can re-arrange the above equation above to

\[ Y = (I - \lambda W)^{-1} \beta X + (I - \lambda W)^{-1} \varepsilon \]

(provided \(I - \lambda W\) is invertible). Since the only random part of the right hand side is the term in \(\varepsilon\), it can be seen that

\[ \textrm{Var}(Y) = (I - \lambda W)^{-1 }\left[ (I - \lambda W)^{-1} \right]^T \sigma^2\]

Thus, the spatial autoregressive model is essentially a regression model with non-independent error terms, unless \(\lambda=0\) in which case it is equivalent to an OLS regression model.

The variance-covariance matrix is therefore a function of the variables \(W\), \(\sigma^2\) and \(\lambda\). Without loss of generality, we can assume that \(Y\) is scaled so that \(\sigma^2 = 1\). Then, for any given definition of adjacency for the study area, it is possible to investigate the correlation structure for various values of \(\lambda\). In R, the following defines a function to compute a variance-covariance matrix from \(\lambda\) and \(W\). Here, the adjacency object is used (rather than supplying a \(W\) matrix), but this contains the same information.

covmat <- function(lambda,adj) {
  solve(tcrossprod(diag(length(adj)) - lambda* listw2mat(nb2listw(adj))))
}

This can also be used as the basis for finding the correlation matrix (rather than the variance-covariance matrix).

cormat <- function(lambda,adj) {
  cov2cor(covmat(lambda,adj))
}

… and this is where it gets interesting. We can now examinine the relationship between, say, the correlation between zones 41 and 47, and \(\lambda\).

# Create a range of valid lambda values
lambda.range <- seq(-1.3,0.99,l=101)
# Create an array to store the corresponding correlations
cor.41.47 <- lambda.range*0
# ... store them
for (i in 1:101) cor.41.47[i] <- cormat(lambda.range[i],col.rook.nb)[41,47]
# ... plot the relationship
plot(lambda.range,cor.41.47,type='l')

A first look at the problem

This seems reasonable - basically, larger values of \(\lambda\) lead to higher correlation between the zones, \(\lambda=0\) implies no correlation, and the sign of \(\lambda\) implies the sign of the correlation. However, now consider the same curve, but between zones 40 and 41.

# First,  plot the correlation btween zones 41 and 47 as a 'backdrop'
plot(lambda.range,cor.41.47,type='l',xlab=expression(lambda),ylab='Correlation')
# Now compute the correlation between zones 40 and 41.
cor.40.41 <- lambda.range*0
for (i in 1:101) cor.40.41[i] <- cormat(lambda.range[i],col.rook.nb)[40,41]
# ... and add these to the plot
lines(lambda.range,cor.40.41,lty=2)

Here, something strange is happening. When \(\lambda\) drops below around -0.5 the correlation between the zones 40 and 41 begins to increase, and at a point at around -0.7 it becomes positive again. This is somewhat counter-intuitive, particularly as \(\lambda\) is often referred to as an indicator of spatial association. For example, this publication by Keith Ord states that \(w_{ij}\) as ‘represents the degree of possible interaction of location j on location i’ and Cressie’s 1993 book^1 refers to \(\lambda W\) as a `spatial dependence matrix’. Also, although initially for positive \(\lambda\) the correlation between zones 40 and 41 is less than that for zones 41 and 47, when \(\lambda\) exceeds around 0.5 the situation is reversed (although this is a less pronounced effect than the sign change noted earlier). A useful diagnostic plot is a parametric curve of the two correlations, with parameter \(\lambda\):

# First,  plot the empty canvas (type='n)
plot(c(-1,1),c(-1,1),type='n',xlim=c(-1,1),ylim=c(-1,1),xlab='Corr1',ylab='Corr2')
# Then the quadrants
rect(-1.2,-1.2,1.2,1.2,col='pink',border=NA)
rect(-1.2,-1.2,0,0,col='lightyellow',border=NA)
rect(0,0,1.2,1.2,col='lightyellow',border=NA)
# Then the x=y reference line
abline(a=0,b=1,lty=3)
# Then the curve
lines(cor.40.41,cor.41.47)

I call this a battenburg plot. Tracing along this line from top right shows the relationship between the two correlations as \(\lambda\) decreases from its maximum limiting value of 1. The dotted line is the \(x=y\) reference point - whenever the curve crosses this, the values of the two correlations change order. Perhaps the key feature is that the curve ‘doubles back’ on itself - so that for some ranges of \(\lambda\) one of the correlations increases while the other decreases. The quadrants are also important - if a curve enters one of the pink quadrants this suggests that one of the correlations is positive, while the other is negative. Again, this is perhaps counter-intuitive, given the interpretation of \(\lambda\) as a measure of spatial association. Note in his case after the ‘doubling back’ of the curve it does enter the pink quadrant.

A selection of 100 random pairs of correlations (chosen so that each pair has one zone in common) can be drawn (see below). This seems to suggest that ‘doubling back’ and curves going inside the pink quadrants are not uncommon problems. In addition, for positive \(\lambda\) values, there is a fair deal of variation in the values of correlation for given \(\lambda\) values. In addition, the variability is not consistent, so that the order of values of correlation changes frequently.

# First,  plot the empty canvas (type='n)
plot(c(-1,1),c(-1,1),type='n',xlim=c(-1,1),ylim=c(-1,1),xlab='Corr1',ylab='Corr2')
# Then the quadrants
rect(-1.2,-1.2,1.2,1.2,col='pink',border=NA)
rect(-1.2,-1.2,0,0,col='lightyellow',border=NA)
rect(0,0,1.2,1.2,col='lightyellow',border=NA)
# Then the x=y reference line
abline(a=0,b=1,lty=3)
# Then the curves
# First, set a seed for reproducibility
set.seed(310712)
for (i in 1:100) {
  r1 <- sample(1:length(col.rook.nb),1)
  r2 <- sample(col.rook.nb[[r1]],2)
  cor.ij1 <- lambda.range*0
  cor.ij2 <- lambda.range*0
  for (k in 1:101) cor.ij1[k] <- cormat(lambda.range[k],col.rook.nb)[r1,r2[1]]
  for (k in 1:101) cor.ij2[k] <- cormat(lambda.range[k],col.rook.nb)[r1,r2[2]]
  lines(cor.ij1,cor.ij2)
}

This can also be seen when looking directly at the relationship between \(\lambda\) and individual correlations:

# Create another blank canvas
plot(lambda.range,cor.41.47,type='n',ylim=c(-1,1),xlab=expression(lambda),ylab='Correlation')
# Now plot lambda against correlation - reset RNG seed to get the same correlations as before
set.seed(310712)
for (i in 1:100) {
  r1 <- sample(1:length(col.rook.nb),1)
  r2 <- sample(col.rook.nb[[r1]],2)
  cor.ij1 <- lambda.range*0
  cor.ij2 <- lambda.range*0
  for (k in 1:101) cor.ij1[k] <- cormat(lambda.range[k],col.rook.nb)[r1,r2[1]]
  for (k in 1:101) cor.ij2[k] <- cormat(lambda.range[k],col.rook.nb)[r1,r2[2]]
  lines(lambda.range,cor.ij1)
  lines(lambda.range,cor.ij2)
}

This shows a pattern very similar to those seen in Wall’s paper. Essentially, for negative \(\lambda\) values some correlations become positive while other remain negative. The ordering can also change as \(\lambda\) changes - as noted earlier, so that some adjacent zones are more correlated than others for certain \(\lambda\) values, but this situation can alter. Finally, some adjacent zone pairs experience a sign change for negative values of \(\lambda\), whilst others do not.

Back to the multiple definitions of adjacency

As well as the counterintuitive behaviour of the relationship between the correlations and \(\lambda\) it is also useful to investigate the effects of using different definitions of adjacency. Until now we have been looking at Rook’s case, but one could equally look at the Queen’s case, or the adjacencies supplied as data relating to Columbus with the spdep package. For a quick look at this, here are the Battenburg plots for the correlations between zones 40 and 41 and 47 for the three types of adjacency.

# First,  plot the empty canvas (type='n)
plot(c(-1,1),c(-1,1),type='n',xlim=c(-1,1),ylim=c(-1,1),xlab='Corr1',ylab='Corr2')
# Then the quadrants
rect(-1.2,-1.2,1.2,1.2,col='pink',border=NA)
rect(-1.2,-1.2,0,0,col='lightyellow',border=NA)
rect(0,0,1.2,1.2,col='lightyellow',border=NA)
# Then the x=y reference line
abline(a=0,b=1)

# Then the curves -- There are six sets of vectors to produce
cor.rook.41.47  <- lambda.range * 0
cor.queen.41.47 <- lambda.range * 0
cor.gal.41.47 <- lambda.range * 0
cor.rook.40.41  <- lambda.range * 0
cor.queen.40.41 <- lambda.range * 0
cor.gal.40.41 <- lambda.range * 0

# This loop calculates the values for the curves
for (k in 1:101) {
  cor.rook.40.41[k]  <- cormat(lambda.range[k],col.rook.nb)[40,41]
  cor.queen.40.41[k] <- cormat(lambda.range[k],col.queen.nb)[40,41]
  cor.gal.40.41[k]   <- cormat(lambda.range[k],col.gal.nb)[40,41]
  cor.rook.41.47[k]  <- cormat(lambda.range[k],col.rook.nb)[41,47]
  cor.queen.41.47[k] <- cormat(lambda.range[k],col.queen.nb)[41,47]
  cor.gal.41.47[k]   <- cormat(lambda.range[k],col.gal.nb)[41,47]
}
# This draws them
lines(cor.gal.40.41,cor.gal.41.47,lty=2)
lines(cor.rook.40.41,cor.rook.41.47,lty=3)
lines(cor.queen.40.41,cor.queen.41.47,lty=4)
legend(0.77,0.73,c('GAL','Rook','Queen'),lty=2:4)

In this case, there is not a great deal of sensitivity to the definition of contiguity - although unfortunately this implies that the behaviour of the correlation with \(\lambda\) is counter-intuitive for all of definitions of adjacency used here.

There are a number of other aspects that could be explored: * Sensitivity to change in the definition of the value of non-zero \(w_{ij}\) elements in the contiguity matrix - here \(d_i^{-1}\) was used, but an alternative might be to use a 1/0 indicator for adjacency. * Variation in the variance of the \(y_i\)’s - noting that the leading diagonal of the variance covariance matrix does not consist of equal values.

Conclusions

The aim of this article has been to highlight the issues in Melanie Wall’s paper, but also to suggest some visual techniques in R that could be used to explore these - and identify situations in which the counter-intuitive behaviour seen here may be occurring. As a general rule, I have found this not to happen a great deal when working with zones based on a regular grid, but that the problems seen here occur quite often for irregular lattices. This provides empirical back-up to the more theoretical arguments of Besag and Kooperberg for another form of spatial regression model (Conditional Autoregressive) set out here. As James LeSage observes on page 3 of this document “All connectivity results come into play through the matrix inversion process” - this is true, but careful attention needs to be paid to the nature of these results.