Disclaimer: The contents of this document come from Chapter 13 Spatial Autocorrelation and the Appendix of that chapter of Intro to GIS and Spatial Analysis(Gimond, 2019). This document is prepared for CP6521 Advanced GIS, a graduate-level city planning elective course at Georgia Tech in Spring 2019. For any question, contact the instructor, Yongsung Lee, Ph.D. via yongsung.lee(at)gatech.edu.
This document is also published on RPubs.

“The first law of geography: Everything is related to everything else, but near things are more related than distant things.” Waldo R. Tobler

Motivation

Learning Objectives

13.1 Global Moran’s I

13.1.1 Computing the Moran’s I

  • Moran’s I measures the correlation between the values of individual polygons and those of their neighbors
  • If the correlation coefficient is high & positive, we interpret that simlar values are clustered across space
  • What do we mean by a neighbor? We have a few ways to define the neighbors of a polygon
    • spatial contiguity: other polyongs sharing any vertex (point) or edge (boundary line)
    • distance: other polygons whose centroids are within a certain distance
    • N-closest: the N nearest polygons

  • After identifying neighbors, calculate the (weighted) average value (e.g., per capita income) of those neighbors for each polygon
  • Compute the correlation coefficient between two sets of values (i.e., the averages and own values)
  • Visually examine the slope of their scatter plot

13.1.2 Monte Carlo approach to estimating significance

  • How can we know a computed correlation coefficient is statistically different from zero (i.e., no spatial autocorrelation)
  • Two ways of testing
    • Analytical approach: with some restrictive assumptions about the data, not always reliable
    • Monte Carlo test: simulation with no assumptions
  • How does the Monte Carlo test work?
    1. Randomly redistribute the values of an attribute across space
    2. Compute the correlation coefficient
    3. Repeat the 1 & 2 steps N times
    4. Plot these N correlation coefficients
    5. Check where the actual correlation coefficient lies in the distribution of simulated correlation coefficients
  • Manual calculation of the p-value
  • If it is small enough, we reject the null of no spatial autocorrelation

13.2 Moran’s I at different lags

13.3 Local Moran’s I

In-Class Exercises

Step 1: Load packages & data

library(sp)
library(spdep)
library(tmap)
load(url("http://github.com/mgimond/Spatial/raw/master/Data/moransI.RData"))

Step 2: Examine the data

class(s1)
str(s1, max.level = 2)
s1@data
tmap_mode("view")
tm_basemap("Stamen.TonerLite") + 
  tm_shape(s1) +
  tm_polygons(style = "quantile", col = "Income", alpha = 0.5) +
  tm_legend(outside = TRUE, text.size = .8)

Step 3: Define neighboring polygons

?poly2nb
nb <- poly2nb(s1, queen=TRUE)
class(nb) 
str(nb)
summary(nb)

nb[1]
class(nb[1])

nb[[1]]
class(nb[[1]])
xy <- sp::coordinates(s1) # s1 is an sp object, so we use sp::coordinates
plot(s1, border='grey', lwd=1)
plot(nb, xy, col='red', lwd=1, add=TRUE)

s1$NAME
s1$NAME[1]
class(s1$NAME[1])
s1$NAME[nb[[1]]]
## [1] Somerset    Piscataquis Penobscot   Washington 
## 16 Levels: Androscoggin Aroostook Cumberland Franklin Hancock ... York

Step 4: For each polygon, compute the (weighted) average values of its neighbors

?nb2listw
lw <- nb2listw(nb, style="W", zero.policy=TRUE)
class(lw)
str(lw)

lw$weights
class(lw$weights)
str(lw$weights)

lw$weights[1]
lw$weights[[1]]
?lag.listw
Inc.lag <- lag.listw(lw, s1$Income)

Step 5: Compute the Moran’s I statistic: the hard way

# Create a regression model
M <- lm(Inc.lag ~ s1$Income)

# Plot the data
plot(Inc.lag ~ s1$Income, pch=20, asp=1, las=1)
abline(M, col = "red")

coef(M)[2]
n <- 599L   # Define the number of simulations
I.r <- vector(length=n)  # Create an empty vector

for (i in seq_along(I.r)){
  # Randomly shuffle income values
  x <- sample(s1$Income, replace=FALSE)
  # Compute new set of lagged values
  x.lag <- lag.listw(lw, x)
  # Compute the regression slope and store its value
  M.r    <- lm(x.lag ~ x)
  I.r[i] <- coef(M.r)[2]
}
# Plot the histogram of simulated Moran's I values
# then add our observed Moran's I value to the plot
hist(I.r, main=NULL, xlab="Moran's I", las=1)
abline(v=coef(M)[2], col="red")

N.greater <- sum(coef(M)[2] > I.r)
p <- min(N.greater + 1, n + 1 - N.greater) / (n + 1)
p # your value might be slightly different from the printed value below b/c of the random simulation 
## [1] 0.02666667

Step 6: Computing the Moran’s I statistic: the easy way

moran.test(s1$Income,lw)
## 
##  Moran I test under randomisation
## 
## data:  s1$Income  
## weights: lw    
## 
## Moran I statistic standard deviate = 2.2472, p-value = 0.01231
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.28281108       -0.06666667        0.02418480
MC<- moran.mc(s1$Income, lw, nsim=599)
# View results (including p-value)
MC
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  s1$Income 
## weights: lw  
## number of simulations + 1: 600 
## 
## statistic = 0.28281, observed rank = 589, p-value = 0.01833
## alternative hypothesis: greater
# Plot the distribution (note that this is a density plot instead of a histogram)
plot(MC, main="", las=1)

Step 7: Moran’s I as a function of a distance band

coo <- sp::coordinates(s1)
S.dist <- dnearneigh(coo, 0, 50000) # find neighbors within 50 km 
str(S.dist)
plot(s1, border='grey', lwd=1)
plot(S.dist, coo, col='red', lwd=1, add=TRUE)

lw <- nb2listw(S.dist, style="W", zero.policy=T) 
MI  <-  moran.mc(s1$Income, lw, nsim=599, zero.policy=T) 
plot(MI, main = "", las = 1)

MI 
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  s1$Income 
## weights: lw  
## number of simulations + 1: 600 
## 
## statistic = 0.31361, observed rank = 598, p-value = 0.003333
## alternative hypothesis: greater