Interpreting Spatial Lag Models

One of the challenges of spatial lag models is that the coefficients reported by R are difficult to interpret. The structure of a spatial lag models implies that a unit change in one areal unit has impacts on other areal units. This brief script illustrates two methods for interpreting “equilibrium effects” (following Ward and Gelditsch).

############ LOAD LIBS
library(classInt)
library(spdep)
library(RColorBrewer)
library(gstat)

############ LOAD DATA
load("/Users/Seth/Dropbox/GEOG 5023/GEOG 5023 - Spring 2013/Data/soco.rda")  #spatial data frame

First we have to build a weights matrix, in this case we'll use a row standardized, Queen's contiguity matrix.

soco_nbq <- poly2nb(soco)  #Queen’s neighborhood
soco_nbq_w <- nb2listw(soco_nbq)

Our goal will be to model the percent of children living in poverty in the southeastern USA. This variable shows significant spatial auto-correlation, but the nugget effect is also substantial.

plot(variogram(soco$PPOV ~ 1, locations = coordinates(soco), data = soco, cloud = F), 
    type = "b", pch = 16, main = "Variogram of PPOV")

plot of chunk unnamed-chunk-3

The Moran's I also indicates significant spatial autocorrelation in poverty:

moran.mc(soco$PPOV, soco_nbq_w, nsim = 999)

## 
##  Monte-Carlo simulation of Moran's I
## 
## data:  soco$PPOV 
## weights: soco_nbq_w  
## number of simulations + 1: 1000 
##  
## statistic = 0.5893, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater

This variogram and Moran's I indicate that there is spatial autocorrelation in childhood poverty (PPOV) but this spatial pattern could be cause by the spatial patterning of the covariates of poverty. A full analysis would involve fitting an OLS model (weighted by population) and comparing that model to spatial models. Here, we'll assume that spatial lag model is appropriate (I'll fit an unweighted model for simplicity).

soco_LAG <- lagsarlm(PPOV ~ PFHH + PUNEM + PBLK + P65UP, data = soco, soco_nbq_w)
summary(soco_LAG)

## 
## Call:lagsarlm(formula = PPOV ~ PFHH + PUNEM + PBLK + P65UP, data = soco, 
##     listw = soco_nbq_w)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.2457653 -0.0284360 -0.0028762  0.0262169  0.2374894 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error  z value  Pr(>|z|)
## (Intercept) -0.100260   0.007375 -13.5946 < 2.2e-16
## PFHH         0.429404   0.040246  10.6695 < 2.2e-16
## PUNEM        1.354637   0.065959  20.5374 < 2.2e-16
## PBLK        -0.069046   0.015335  -4.5025 6.716e-06
## P65UP        0.291192   0.035210   8.2701 2.220e-16
## 
## Rho: 0.5172, LR test value: 491.5, p-value: < 2.22e-16
## Asymptotic standard error: 0.02118
##     z-value: 24.42, p-value: < 2.22e-16
## Wald statistic: 596.3, p-value: < 2.22e-16
## 
## Log likelihood: 2239 for lag model
## ML residual variance (sigma squared): 0.002193, (sigma: 0.04682)
## Number of observations: 1387 
## Number of parameters estimated: 7 
## AIC: -4464, (AIC for lm: -3974)
## LM test for residual autocorrelation
## test value: 4.85, p-value: 0.027651

The results of the model indicate that each of the predictors and the spatial lag are significant. We can evaluate the significance of the spatial lag a variety of ways. The output includes a z-test (“z-value” in the output) based on the standard error of Moran's I 0.51719/.02118=24.41879. It also includes a Likelihood Ratio Test (“LR test”, discussed in class) which is a test of the model with and without the spatial lag. The reported “LR test” suggests that the addition of the lag is an improvement, it is formally equivalent to:

lm1 <- lm(PPOV ~ PFHH + PUNEM + PBLK + P65UP, data = soco)
anova(soco_LAG, lm1)

##          Model df   AIC logLik Test L.Ratio p-value
## soco_LAG     1  7 -4464   2239    1                
## lm1          2  6 -3974   1993    2     491       0

Tests suggest that the lag is a useful addition but it makes reading the rest of the model output difficult. In a spatial lag model a change in one place cascades through the entire map. This can be easily seen by changing the values in one location and examining the changes elsewhere. We'll do this by altering one independent (predictor) variable, in one county and then using the fitted spatial model to predict the values (PPOV) elsewhere on the map. For example, what would happen to Poverty in the Southeast if the unemployment rate rose from 6% to 75% in Jefferson County, Alabama.

soco.new <- soco  #copy the data frame (so we don't mess up the original)

# Change the unemployment rate
soco.new@data[soco.new@data$CNTY_ST == "Jefferson County  AL", "PUNEM"] <- 0.75

# The original predicted values
orig.pred <- as.data.frame(predict(soco_LAG))

# The predicted values with the new unemployment rate in Alabama
new.pred <- as.data.frame(predict(soco_LAG, newdata = soco.new, listw = soco_nbq_w))

# the difference between the predicted values
jCoAL_effect <- new.pred$fit - orig.pred$fit
el <- data.frame(name = soco$CNTY_ST, diff_in_pred_PPOV = jCoAL_effect)
soco.new$jee <- el$diff_in_pred_PPOV

# sort counties by absolute value of the change in predicted PPOV
el <- el[rev(order(abs(el$diff_in_pred_PPOV))), ]
el[1:10, ]  #show the top 10 counties

##                       name diff_in_pred_PPOV
## 37    Jefferson County  AL           0.99079
## 4          Bibb County  AL           0.11637
## 58    St. Clair County  AL           0.11165
## 5        Blount County  AL           0.10881
## 64       Walker County  AL           0.10738
## 59       Shelby County  AL           0.10516
## 63   Tuscaloosa County  AL           0.09641
## 1050    El Paso County  TX          -0.08629
## 1197     Sutton County  TX          -0.08108
## 437         Lee County  KY          -0.07285

We see that the biggest change was in the count we messed with (no surprise), this is the direct effect of our change. However, when we look at the 10 counties that experiences the biggest change el[1:10,] we also see that counties as far away as Kentucky experienced substantial change in their predicted poverty rate. It's also interesting to note that these changes in predicted PPOV are both positive and negative.

Mapping these changes is also instructive. We can do this several ways:

breaks <- c(min(soco.new$jee), -0.03, 0.03, max(soco.new$jee))
labels <- c("Negative effect (greater than .03)", "No Effect (effect -.03 to .03)", 
    "Positive effect (greater than .03)")


np <- findInterval(soco.new$jee, breaks)
colors <- c("red", "yellow", "blue")

# Draw Map
plot(soco.new, col = colors[np])
mtext("Effects of a change in Jefferson County, AL (set PUNEM = .75)\n on predicted values in a spatial lag model", 
    side = 3, line = 1)
legend("topleft", legend = labels, fill = colors, bty = "n")

plot of chunk unnamed-chunk-8

We could also map the magnitude of the changes caused by messing with Jefferson County, AL.

pal5 <- brewer.pal(6, "Spectral")
cats5 <- classIntervals(soco.new$jee, n = 5, style = "jenks")
colors5 <- findColours(cats5, pal5)
plot(soco.new, col = colors5)
legend("topleft", legend = round(cats5$brks, 2), fill = pal5, bty = "n")
mtext("Effects of a change in Jefferson County, AL (set PUNEM = .75)\n on predicted values in a spatial lag model", 
    side = 3, line = 1)

plot of chunk unnamed-chunk-9

While these maps and the effects for individual counties may be useful they only tell us about a change in one place. The impacts() function gives us something like OLS regression coefficients for a spatial lag model. The logic of the impacts() function is similar to the code above, it tells you the direct (local), indirect (spill-over), and total effect of a unit change in each of the predictor variables. The changes reported by impacts are the global average impact:

impacts(soco_LAG, listw = soco_nbq_w)  #takes a minute or two

## Impact measures (lag, exact):
##         Direct Indirect   Total
## PFHH   0.45695  0.43243  0.8894
## PUNEM  1.44154  1.36419  2.8057
## PBLK  -0.07348 -0.06953 -0.1430
## P65UP  0.30987  0.29325  0.6031

Our model deals with rates, so a unit change is big! The output from impacts tells us that a 100% increase in unemployment leads to a 280% (on average) increase in the childhood poverty rate.