Lecture Content

Note: Document prepared for Spatial Socioeconometric Modeling and materials build on Gimond (2017), Tobler (1970), and Anselin (2020)

1 Spatial autocorrelation

Once one we will be studying the following law of geography:

“Everything is related to everything else, but near things are more related than distant things” ( Tobler (1970))

Spatial vs random features, source @gimond2017intro

Spatial vs random features, source Gimond (2017)

Loan Interest Distribution, do you see any patterns?

Loan Interest Distribution, do you see any patterns?

In terms of slopes (@gimond2017intro)

In terms of slopes (Gimond (2017))

In terms of spatial weights we have

In terms of spatial weights we have

1.0.1 Application

We will rely on the following data

# Data 2014-2018 ACS estimates at the tract level in New York state
a<-read.csv("G:\\My Drive\\Phudcfily\\Syllabus PHUDCFILY\\GitHub PHUDCFILY\\SSEM\\Spatially constrained multivariate clustering\\Materials\\concentrated_disadvantage_tract.csv")

Then we join data as before

require(plyr)
## Loading required package: plyr
library(spdep)
## Loading required package: sp
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(rgdal)
## rgdal: version: 1.5-16, (SVN revision 1050)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
## Path to GDAL shared files: C:/Users/msgc/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
## Path to PROJ shared files: C:/Users/msgc/Documents/R/win-library/4.0/rgdal/proj
## Linking to sp version:1.4-2
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
library(maptools)
## Checking rgeos availability: TRUE
library(stringr)
library(tigris)
## To enable 
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
require(plyr)
options(tigris_use_cache = TRUE)

#Get shapefile polygons dataset
trc<-tracts("NY", class="sp")
## Warning in proj4string(obj): CRS object has comment, which is lost in output
a$tract<-str_pad(a$tract, 6, pad = "0")
trc<-geo_join(trc, a, by_sp="TRACTCE", by_df="tract", how = "left")
head(trc)
## class       : SpatialPolygonsDataFrame 
## features    : 6 
## extent      : -73.87468, -73.78314, 40.70717, 40.74916  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=NAD83 +no_defs 
## variables   : 21
## names       : STATEFP, COUNTYFP, TRACTCE,       GEOID, NAME,         NAMELSAD, MTFCC, FUNCSTAT,  ALAND, AWATER,    INTPTLAT,     INTPTLON,  tract,   pct_single_mother, pct_african_american, ... 
## min values  :      36,      081,  044800, 36081044800,  448, Census Tract 448, G5020,        S, 129758,      0, +40.7098547, -073.7869958, 044800, 0.00770712909441233,                    0, ... 
## max values  :      36,      081,  046500, 36081046500,  465, Census Tract 465, G5020,        S, 249611,      0, +40.7469665, -073.8710900, 046500,   0.260406582768635,    0.128636622932116, ...
trc<-trc[!is.na(trc$pct10kbelow)&!is.na(trc$pct_single_mother),]
trc<-trc[!is.na(trc$pct_unemployed_looking),]
summary(trc)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x -79.76214 -71.84771
## y  40.49117  45.01586
## Is projected: FALSE 
## proj4string : [+proj=longlat +datum=NAD83 +no_defs]
## Data attributes:
##    STATEFP            COUNTYFP           TRACTCE             GEOID          
##  Length:3928        Length:3928        Length:3928        Length:3928       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      NAME             NAMELSAD            MTFCC             FUNCSTAT        
##  Length:3928        Length:3928        Length:3928        Length:3928       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     ALAND              AWATER            INTPTLAT           INTPTLON        
##  Length:3928        Length:3928        Length:3928        Length:3928       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     tract           pct_single_mother pct_african_american
##  Length:3928        Min.   :0.00000   Min.   :0.00000     
##  Class :character   1st Qu.:0.08003   1st Qu.:0.01241     
##  Mode  :character   Median :0.12036   Median :0.04430     
##                     Mean   :0.14630   Mean   :0.17320     
##                     3rd Qu.:0.18057   3rd Qu.:0.20586     
##                     Max.   :0.61111   Max.   :0.98843     
##                                                           
##  pct_unemployed_looking pct_family5children      zip           pct_eitc      
##  Min.   :0.00000        Min.   :0.00000     Min.   : 1002   Min.   :0.00000  
##  1st Qu.:0.03468        1st Qu.:0.03796     1st Qu.: 3108   1st Qu.:0.07895  
##  Median :0.05360        Median :0.06421     Median :10010   Median :0.12601  
##  Mean   :0.06110        Mean   :0.07715     Mean   : 7992   Mean   :0.15607  
##  3rd Qu.:0.07893        3rd Qu.:0.09838     3rd Qu.:11373   3rd Qu.:0.22530  
##  Max.   :0.42857        Max.   :0.71383     Max.   :55308   Max.   :0.46271  
##                         NA's   :3                                            
##  dependent_density  pct10kbelow    
##  Min.   :0.02027   Min.   :0.1081  
##  1st Qu.:0.44670   1st Qu.:0.2578  
##  Median :0.52261   Median :0.2995  
##  Mean   :0.53420   Mean   :0.3195  
##  3rd Qu.:0.61470   3rd Qu.:0.3746  
##  Max.   :2.00316   Max.   :0.6452  
## 
ny.nb <- poly2nb(trc)
ny.listw<-nb2listw(ny.nb, zero.policy=TRUE)
summary(ny.nb)
## Neighbour list object:
## Number of regions: 3928 
## Number of nonzero links: 20874 
## Percentage nonzero weights: 0.1352891 
## Average number of links: 5.314155 
## 34 regions with no links:
## 299 301 526 606 642 706 793 853 869 1283 1287 1444 1450 1457 1479 1497 1723 1796 1956 2037 3119 3404 3870 3905 3909 4063 4101 4182 4432 4449 4477 4796 4873 4878
## Link number distribution:
## 
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  26  29 
##  34  92 201 384 609 795 750 586 276 107  55  23   6   4   1   1   1   1   1   1 
## 92 least connected regions:
## 22 50 99 121 335 338 350 355 482 506 537 560 598 611 612 659 661 686 754 791 827 896 977 980 981 1009 1016 1087 1126 1194 1228 1365 1464 1481 1641 1671 1691 1692 1719 1788 1839 1873 2245 2285 2287 2290 2334 2385 2450 2453 2500 2507 2533 2536 2621 2783 2788 2891 3051 3092 3159 3176 3267 3466 3480 3504 3515 3555 3604 3663 3669 3691 3826 3885 3904 3910 3939 3989 4013 4099 4108 4164 4183 4187 4258 4307 4318 4368 4472 4510 4551 4787 with 1 link
## 1 most connected region:
## 1555 with 29 links
trc@data$lag.pct10kbelow <- lag.listw(ny.listw, trc@data$pct10kbelow)
## Warning in lag.listw(ny.listw, trc@data$pct10kbelow): NAs in lagged values
M1 <- lm(trc@data$lag.pct10kbelow ~ trc@data$pct10kbelow)
plot( trc@data$lag.pct10kbelow ~ trc@data$pct10kbelow, pch=20, asp=1, las=1)
abline(M1, col="blue")

1.1 Local Moran’s I

  • We can decompose the global Moran’s I down to its components thus constructing a localized measure of autocorrelation–i.e. a map of “hot spots” and “cold spots”.
moran.plot(trc$pct10kbelow, ny.listw, zero.policy = TRUE)

moran.test(trc$pct10kbelow, ny.listw, zero.policy = TRUE)
## 
##  Moran I test under randomisation
## 
## data:  trc$pct10kbelow  
## weights: ny.listw  n reduced by no-neighbour observations
##   
## 
## Moran I statistic standard deviate = 32.164, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.3360493119     -0.0002568713      0.0001093270
  • The expectation of the Moran’s \(I\), indicating the value of \(I\) under a spatial random process is \(\frac{-1}{(n-1)}\)

  • The standard deviate is \(\frac{(I-E[I])}{\sqrt(var(I))}\)

  • How bad is the dependence problem?

# Examine Moran's I between observations and lags over several orders.
plot.spcor(sp.correlogram(ny.listw$neighbours, trc$pct10kbelow, order = 20, method = "I",zero.policy=T), xlab = "Spatial lags",  main = "Spatial correlogram: Autocorrelation CIs")

References

Anselin, Luc. 2020. Spatial Data Science. University of Chicago Center for Spatial Data Science. https://spatialanalysis.github.io/tutorials/.

Gimond, Manuel. 2017. “Intro to Gis and Spatial Analysis.” https://mgimond.github.io/Spatial/index.html.

Tobler, Waldo R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography 46 (sup1): 234–40. http://www.geog.ucsb.edu/~tobler/publications/pdf_docs/A-Computer-Movie.pdf.