Spatial Autocorrelation

Disclaimer: The contents of this document come from Chapter 13 Spatial Autocorrelation and the Appendix of that chapter of Intro to GIS and Spatial Analysis(Gimond, 2019). This document is prepared for CP6521 Advanced GIS, a graduate-level city planning elective course at Georgia Tech in Spring 2019. For any question, contact the instructor, Yongsung Lee, Ph.D. via yongsung.lee(at)gatech.edu.

This document is also published on RPubs.

“The first law of geography: Everything is related to everything else, but near things are more related than distant things.” Waldo R. Tobler

Motivation

Now, we switch gears and change our topic from the location of events to the attributes of events.
In fact, a typical use case would be polygons (e.g., counties, tracts) that cover the entire study area.
We want to determine whether similar values of a certain attribute (i.e., per capita income of individual counties) are clustered, randomly distributed, or dispersed across space.

Learning Objectives

Learn how to quantify the degree of spatial clustering (i.e., spatial autocorrelation)
Learn how to test if a computed degree of spatial clustering is statistically significant

13.1 Global Moran’s I

13.1.1 Computing the Moran’s I

Moran’s I measures the correlation between the values of individual polygons and those of their neighbors
If the correlation coefficient is high & positive, we interpret that simlar values are clustered across space
What do we mean by a neighbor? We have a few ways to define the neighbors of a polygon
- spatial contiguity: other polyongs sharing any vertex (point) or edge (boundary line)
- distance: other polygons whose centroids are within a certain distance
- N-closest: the N nearest polygons

After identifying neighbors, calculate the (weighted) average value (e.g., per capita income) of those neighbors for each polygon
Compute the correlation coefficient between two sets of values (i.e., the averages and own values)
Visually examine the slope of their scatter plot

13.1.2 Monte Carlo approach to estimating significance

How can we know a computed correlation coefficient is statistically different from zero (i.e., no spatial autocorrelation)
Two ways of testing
- Analytical approach: with some restrictive assumptions about the data, not always reliable
- Monte Carlo test: simulation with no assumptions
How does the Monte Carlo test work?
1. Randomly redistribute the values of an attribute across space
2. Compute the correlation coefficient
3. Repeat the 1 & 2 steps N times
4. Plot these N correlation coefficients
5. Check where the actual correlation coefficient lies in the distribution of simulated correlation coefficients

Manual calculation of the p-value
If it is small enough, we reject the null of no spatial autocorrelation

13.2 Moran’s I at different lags

Examine the ranges of autocorrelation values as a function of distance
1. Compute lag values for a defined set of neighbors
2. Calculate the Moran’s I value for this set of neighbors
3. Repeat steps 1 and 2 for a different set of neighbors (at a greater distance for example)

13.3 Local Moran’s I

Decompose the global Moran;s I down to its components
A localized measure of autocorrelation (i.e., hot/cold spots)
A Monte Carlo simulation for statistical significance chcek is available

In-Class Exercises

Step 1: Load packages & data

library(sp)
library(spdep)
library(tmap)

load(url("http://github.com/mgimond/Spatial/raw/master/Data/moransI.RData"))

Step 2: Examine the data

class(s1)
str(s1, max.level = 2)
s1@data

tmap_mode("view")
tm_basemap("Stamen.TonerLite") + 
  tm_shape(s1) +
  tm_polygons(style = "quantile", col = "Income", alpha = 0.5) +
  tm_legend(outside = TRUE, text.size = .8)

Step 3: Define neighboring polygons

We can define neighboring polygons by a few different approaches
- contiguous polygons
- polygons within a certain distance band
- non-spatial in nature and defined by social, political, or cultural “neighbors”
When defining neighbors as contiguous polygons, we have two options:
- any polygon that shares at least one vertex by queen=TRUE
- any polygon that shares at least one edge by queen=FALSE

?poly2nb

nb <- poly2nb(s1, queen=TRUE)

nb lists contiguous neighbors for each polygon.
nb itself is a list, meaning that [ will return a sublist, and [[ will return a vector of neigboring polygon IDs.

class(nb) 
str(nb)
summary(nb)

nb[1]
class(nb[1])

nb[[1]]
class(nb[[1]])

Check the links between neigbors

xy <- sp::coordinates(s1) # s1 is an sp object, so we use sp::coordinates
plot(s1, border='grey', lwd=1)
plot(nb, xy, col='red', lwd=1, add=TRUE)

s1 is an sp object, and s1$NAME returns a vector of County names.
To identify the list of neighboring counties for the first county in s1, subset s1$NAME by the vector of its neigboring counties.

s1$NAME
s1$NAME[1]
class(s1$NAME[1])

s1$NAME[nb[[1]]]

## [1] Somerset    Piscataquis Penobscot   Washington 
## 16 Levels: Androscoggin Aroostook Cumberland Franklin Hancock ... York

Step 4: For each polygon, compute the (weighted) average values of its neighbors

nb2listw returns a list of weights for individual counties
style="W" returns weights, which are row-standardized (i.e., for each county, the weights of its neighboring counties add up to one.)
zero.policy=TRUE allows counties with no neighbor (i.e., isolated polygon)

?nb2listw

lw <- nb2listw(nb, style="W", zero.policy=TRUE)

Examine the class, structure, and elements of lw

class(lw)
str(lw)

lw$weights
class(lw$weights)
str(lw$weights)

lw$weights[1]
lw$weights[[1]]

Compute the average neighbor s1$Income value for each polygon (i.e., County)
These values are often referred to as spatially lagged values

?lag.listw

Inc.lag <- lag.listw(lw, s1$Income)

Step 5: Compute the Moran’s I statistic: the hard way

plot lagged income vs. income and fit a linear regression model

# Create a regression model
M <- lm(Inc.lag ~ s1$Income)

# Plot the data
plot(Inc.lag ~ s1$Income, pch=20, asp=1, las=1)
abline(M, col = "red")

The slop of the regression line is the Moran’s I coefficient

coef(M)[2]

To determine if the slope is statistically different from zero (i.e., no spatial autocorrelation):
- randomly permute the income values across all counties
- fit a regression model to each permuted set of values
- the slope values from individual regression models form a distribution of Moran’s I values under no spatial autocorrelation
- compare the observed Moran’s I value to this distribution

n <- 599L   # Define the number of simulations
I.r <- vector(length=n)  # Create an empty vector

for (i in seq_along(I.r)){
  # Randomly shuffle income values
  x <- sample(s1$Income, replace=FALSE)
  # Compute new set of lagged values
  x.lag <- lag.listw(lw, x)
  # Compute the regression slope and store its value
  M.r    <- lm(x.lag ~ x)
  I.r[i] <- coef(M.r)[2]
}

Our observed Moran’s I value is not consistent with a Moran’s I value under the assumption of no spatial autocorrelation

# Plot the histogram of simulated Moran's I values
# then add our observed Moran's I value to the plot
hist(I.r, main=NULL, xlab="Moran's I", las=1)
abline(v=coef(M)[2], col="red")

Computing a pseudo p-value from an MC simulation
First, count the number of simulated Moran’s I values greater than the observed Moran’s I value

N.greater <- sum(coef(M)[2] > I.r)

Find the end of the distribution (i.e., either left or right) closest to the observed Moran’s I value
Divide that count by the total count (so-called one-sided P-value)

p <- min(N.greater + 1, n + 1 - N.greater) / (n + 1)
p # your value might be slightly different from the printed value below b/c of the random simulation

## [1] 0.02666667

Step 6: Computing the Moran’s I statistic: the easy way

With the values s1$Income and the weights list lw, use spdep::moran.test

moran.test(s1$Income,lw)

## 
##  Moran I test under randomisation
## 
## data:  s1$Income  
## weights: lw    
## 
## Moran I statistic standard deviate = 2.2472, p-value = 0.01231
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.28281108       -0.06666667        0.02418480

The p-value computed from the moran.test function is not computed from an MC simulation but analytically instead.
This may not always prove to be the most accurate measure of significance.
To test for significance using the MC simulation method instead, use the moran.mc function.

MC<- moran.mc(s1$Income, lw, nsim=599)
# View results (including p-value)
MC

## 
##  Monte-Carlo simulation of Moran I
## 
## data:  s1$Income 
## weights: lw  
## number of simulations + 1: 600 
## 
## statistic = 0.28281, observed rank = 589, p-value = 0.01833
## alternative hypothesis: greater

# Plot the distribution (note that this is a density plot instead of a histogram)
plot(MC, main="", las=1)

Step 7: Moran’s I as a function of a distance band

Define neighbors based on distance to polygon centers
First, extract the center of each polygon
Note that spdep works on sp objects, whose syntax differs from those of sf objects.
That is, we use sp::coordinates instead of sf::st_centroid

coo <- sp::coordinates(s1)

Next, define a search radius for neighbors
dnearneigh takes on three parameters: the coordinate values coo, the radius for the inner radius of the annulus band, and the radius for the outer annulus band

S.dist <- dnearneigh(coo, 0, 50000) # find neighbors within 50 km 
str(S.dist)

Check the links between neighbors

plot(s1, border='grey', lwd=1)
plot(S.dist, coo, col='red', lwd=1, add=TRUE)

Now, create a new weights list based on the new neighbor list

lw <- nb2listw(S.dist, style="W", zero.policy=T)

Run the Monte Carlo simulation

MI  <-  moran.mc(s1$Income, lw, nsim=599, zero.policy=T)

Plot the results

plot(MI, main = "", las = 1)

Check the p-value and summary statistics

MI

## 
##  Monte-Carlo simulation of Moran I
## 
## data:  s1$Income 
## weights: lw  
## number of simulations + 1: 600 
## 
## statistic = 0.31361, observed rank = 598, p-value = 0.003333
## alternative hypothesis: greater