03.18 Exercise - Interpreting Spatial Lag Models

From Seth's Rpubs: http://rpubs.com/Seth/LagEquibEff

One of the challenges of spatial lag models is that the coefficients reported by R are difficult to interpret. The structure of a spatial lag models implies that a unit change in one areal unit has impacts on other areal units. This brief script illustrates two methods for interpreting “equilibrium effects” (following Ward and Gelditsch).

# (assuming packages already installed)
library(classInt)
## Loading required package: class
## Loading required package: e1071
library(spdep)
## Loading required package: sp
## Loading required package: boot
## Loading required package: Matrix
## Loading required package: lattice
## Attaching package: 'lattice'
## The following object(s) are masked from 'package:boot':
## 
## melanoma
## Loading required package: MASS
## Loading required package: nlme
## Loading required package: maptools
## Loading required package: foreign
## Loading required package: grid
## Checking rgeos availability: FALSE Note: when rgeos is not available,
## polygon geometry computations in maptools depend on gpclib, which has a
## restricted licence. It is disabled by default; to enable gpclib, type
## gpclibPermit()
## Loading required package: deldir
## deldir 0.0-21
## Loading required package: coda
## Loading required package: splines
library(RColorBrewer)
library(gstat)
load("/Users/caitlin/Dropbox/CAITLINS DOCUMENTS/CU Boulder/Courses/GEOG 5023 Quant methods - Spielman/Data/soco.rda")  # .rda is a spatial data frame (data frame = R's language for a data set)

First we have to build a weights matrix, in this case we'll use a row standardized, Queen's contiguity matrix.

soco_nbq <- poly2nb(soco)  #Queen’s neighborhood (nbq) is the default
soco_nbq_w <- nb2listw(soco_nbq)  #Row standardizing a weights matrix

Our goal will be to model the percent of children living in poverty in the southeastern USA.

plot(variogram(soco$PPOV ~ 1, locations = coordinates(soco), data = soco, cloud = F), 
    type = "b", pch = 16, main = "Variogram of PPOV")  # Meters on the x axis??

plot of chunk variogram

X-axis indicates distances between different pairs of counties (in meters?). The point at which the curve starts to flatten out, around 400,000 meter distance between pairs, is where places are completely independent from each other. Counties that are 200,000m apart are fairly similar to one another. [Question: how is semivariance different from Moran's I?]

The basic takeaway of this semivariogram is that we DO see a significant pattern of spatial autocorrelation. In addition, the nugget effect is also substantial. (The size of the nugget effect is an indicator of microscale variability. When counties are 0m apart, the variograph SHOULD go through 0 because they should be exactly the SAME (100% autocorrelation), but when the starting point of the variogram is not at 0, it indicates that counties very close to each other are very different from one another. Like finding a gold nugget in a field of dirt.)

We can also test for spatial autocorrelation by looking at the Moran's I:

moran.mc(soco$PPOV, soco_nbq_w, nsim = 999)
## 
##  Monte-Carlo simulation of Moran's I
## 
## data:  soco$PPOV 
## weights: soco_nbq_w  
## number of simulations + 1: 1000 
##  
## statistic = 0.5893, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater

The Moran's I is 0.5893, with a p-value indicating that it is significant. In other words, there IS spatial autocorrelation in the data on childhood poverty (PPOV). HOWEVER, we don't yet know whether this spatial pattern could be caused by the spatial patterning of the covariates of poverty. A full analysis would involve fitting an OLS model (weighted by population) and comparing that model to spatial models.

But to move forward: we'll assume that a spatial lag model (for simplicity, we'll fit an unweighted model) is appropriate.

PFHH = % HH headed by female PUNEM = proportion unemployed PBLK = proportion black P65UP = proportion 65 and older

soco_LAG <- lagsarlm(PPOV ~ PFHH + PUNEM + PBLK + P65UP, data = soco, soco_nbq_w)
summary(soco_LAG)
## 
## Call:lagsarlm(formula = PPOV ~ PFHH + PUNEM + PBLK + P65UP, data = soco, 
##     listw = soco_nbq_w)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.2457653 -0.0284360 -0.0028762  0.0262169  0.2374894 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error  z value  Pr(>|z|)
## (Intercept) -0.100260   0.007375 -13.5946 < 2.2e-16
## PFHH         0.429404   0.040246  10.6695 < 2.2e-16
## PUNEM        1.354637   0.065959  20.5374 < 2.2e-16
## PBLK        -0.069046   0.015335  -4.5025 6.716e-06
## P65UP        0.291192   0.035210   8.2701 2.220e-16
## 
## Rho: 0.5172, LR test value: 491.5, p-value: < 2.22e-16
## Asymptotic standard error: 0.02118
##     z-value: 24.42, p-value: < 2.22e-16
## Wald statistic: 596.3, p-value: < 2.22e-16
## 
## Log likelihood: 2239 for lag model
## ML residual variance (sigma squared): 0.002193, (sigma: 0.04682)
## Number of observations: 1387 
## Number of parameters estimated: 7 
## AIC: -4464, (AIC for lm: -3974)
## LM test for residual autocorrelation
## test value: 4.85, p-value: 0.027651

The results of the model indicate that each of the predictors and the spatial lag are significant. We can evaluate the significance of the spatial lag a variety of ways. The output includes a z-test (“z-value” in the output) based on the standard error of Moran's I 0.51719/.02118=24.41879. It also includes a Likelihood Ratio Test (“LR test”, discussed in class) which is a test of the model with and without the spatial lag. The reported “LR test” suggests that the addition of the lag is an improvement, it is formally equivalent to:

lm1 <- lm(PPOV ~ PFHH + PUNEM + PBLK + P65UP, data = soco)
anova(soco_LAG, lm1)
##          Model df   AIC logLik Test L.Ratio p-value
## soco_LAG     1  7 -4464   2239    1                
## lm1          2  6 -3974   1993    2     491       0

Tests suggest that the lag is a useful addition but it makes reading the rest of the model output difficult. In a spatial lag model a change in one place cascades through the entire map. This can be easily seen by changing the values in one location and examining the changes elsewhere. We'll do this by altering one independent (predictor) variable, in one county and then using the fitted spatial model to predict the values (PPOV) elsewhere on the map. For example, what would happen to Poverty in the Southeast if the unemployment rate rose from 6% to 75% in Jefferson County, Alabama.

soco.new <- soco  #copy the data frame (so we don't mess up the original)

# Change the unemployment rate
soco.new@data[soco.new@data$CNTY_ST == "Jefferson County  AL", "PUNEM"] <- 0.75

# The original predicted values
orig.pred <- as.data.frame(predict(soco_LAG))

# The predicted values with the new unemployment rate in Alabama
new.pred <- as.data.frame(predict(soco_LAG, newdata = soco.new, listw = soco_nbq_w))

# the difference between the predicted values
jCoAL_effect <- new.pred$fit - orig.pred$fit
el <- data.frame(name = soco$CNTY_ST, diff_in_pred_PPOV = jCoAL_effect)
soco.new$jee <- el$diff_in_pred_PPOV

# sort counties by absolute value of the change in predicted PPOV
el <- el[rev(order(abs(el$diff_in_pred_PPOV))), ]
el[1:10, ]  #show the top 10 counties
##                       name diff_in_pred_PPOV
## 37    Jefferson County  AL           0.99079
## 4          Bibb County  AL           0.11637
## 58    St. Clair County  AL           0.11165
## 5        Blount County  AL           0.10881
## 64       Walker County  AL           0.10738
## 59       Shelby County  AL           0.10516
## 63   Tuscaloosa County  AL           0.09641
## 1050    El Paso County  TX          -0.08629
## 1197     Sutton County  TX          -0.08108
## 437         Lee County  KY          -0.07285

We see that the biggest change was in the count we messed with (no surprise), this is the direct effect of our change. However, when we look at the 10 counties that experiences the biggest change el[1:10,] we also see that counties as far away as Kentucky experienced substantial change in their predicted poverty rate. It's also interesting to note that these changes in predicted PPOV are both positive and negative.

Mapping these changes is also instructive. We can do this several ways:

breaks <- c(min(soco.new$jee), -0.03, 0.03, max(soco.new$jee))
labels <- c("Negative effect (greater than .03)", "No Effect (effect -.03 to .03)", 
    "Positive effect (greater than .03)")

np <- findInterval(soco.new$jee, breaks)
colors <- c("red", "yellow", "blue")

# Draw Map
plot(soco.new, col = colors[np])
mtext("Effects of a change in Jefferson County, AL (set PUNEM = .75)\n on predicted values in a spatial lag model", 
    side = 3, line = 1)
legend("topleft", legend = labels, fill = colors, bty = "n")

plot of chunk unnamed-chunk-3

We could also map the magnitude of the changes caused by messing with Jefferson County, AL.

pal5 <- brewer.pal(6, "Spectral")
cats5 <- classIntervals(soco.new$jee, n = 5, style = "jenks")
colors5 <- findColours(cats5, pal5)
plot(soco.new, col = colors5)
legend("topleft", legend = round(cats5$brks, 2), fill = pal5, bty = "n")
mtext("Effects of a change in Jefferson County, AL (set PUNEM = .75)\n on predicted values in a spatial lag model", 
    side = 3, line = 1)

plot of chunk unnamed-chunk-4

While these maps and the effects for individual counties may be useful they only tell us about a change in one place. The impacts() function gives us something like OLS regression coefficients for a spatial lag model. The logic of the impacts() function is similar to the code above, it tells you the direct (local), indirect (spill-over), and total effect of a unit change in each of the predictor variables. The changes reported by impacts are the global average impact:

impacts(soco_LAG, listw = soco_nbq_w)  #takes a minute or two
## Impact measures (lag, exact):
##         Direct Indirect   Total
## PFHH   0.45695  0.43243  0.8894
## PUNEM  1.44154  1.36419  2.8057
## PBLK  -0.07348 -0.06953 -0.1430
## P65UP  0.30987  0.29325  0.6031

Our model deals with rates, so a unit change is big! The output from impacts tells us that a 100% increase in unemployment leads to a 280% (on average) increase in the childhood poverty rate.