Suggested citation:

Mendez C. (2020). Spatial autocorrelation analysis in R. R Studio/RPubs. Available at https://rpubs.com/quarcs-lab/spatial-autocorrelation

This work is licensed under the Creative Commons Attribution-Share Alike 4.0 International License.

Acknowledgment:

Material adapted from multiple sources, in particular the course materials of Guy Lansley & James Cheshire (2016).

1 Libraries

knitr::opts_chunk$set(echo = TRUE)

library(tidyverse)  # Modern data science workflow
library(sf)
library(sp)
library(spdep)
library(rgdal)
library(rgeos)
library(tmap)
library(tmaptools)
library(spgwr)
library(grid)
library(gridExtra)


# Change the presentation of decimal numbers to 4 and avoid scientific notation
options(prompt="R> ", digits=4, scipen=999)

2 Motivation

Waldo Tober’s first law of geography is that “Everything is related to everything else, but near things are more related than distant things.” So we would expect most geographic phenomena to exert a spatial autocorrelation of some kind. In population data this is often the case as persons with similar characteristics tend to reside in similar neighbourhoods due to a range of reasons including house prices, proximity to workplaces and cultural factors.

3 Tutorial objectives

  • Run various measures of spatial correlation
  • Evaluate both global and local measures of spatial autocorrelation
  • Identify spatial clusters

4 Replication files

6 Import datasets

6.1 Non-spatial data

Census.Data <-read.csv("practicaldata.csv")
glimpse(Census.Data)
## Rows: 749
## Columns: 5
## $ OA            <chr> "E00004120", "E00004121", "E00004122", "E00004123", "E00…
## $ White_British <dbl> 42.36, 47.20, 40.68, 49.66, 51.14, 41.42, 48.54, 48.68, …
## $ Low_Occupancy <dbl> 6.2937, 5.9322, 2.9126, 0.9259, 2.0000, 3.9326, 5.5556, …
## $ Unemployed    <dbl> 1.8939, 2.6882, 1.2121, 2.8037, 3.8168, 3.8462, 4.5455, …
## $ Qualification <dbl> 73.63, 69.90, 67.58, 60.78, 65.99, 74.21, 62.45, 60.35, …

6.2 Spatial data

Output.Areas <- readOGR(".", "Camden_oa11")
## Warning in OGRSpatialRef(dsn, layer, morphFromESRI = morphFromESRI, dumpSRS =
## dumpSRS, : Discarded datum OSGB_1936 in Proj4 definition: +proj=tmerc +lat_0=49
## +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/carlos/GitHub/tutorial-spatial-autocorrelation", layer: "Camden_oa11"
## with 749 features
## It has 1 fields
# Run it in console (long output)
glimpse(Output.Areas)

Using the sf package

Output.Areas2 <- read_sf("Camden_oa11.shp")
glimpse(Output.Areas2)
## Rows: 749
## Columns: 2
## $ OA11CD   <chr> "E00004527", "E00004525", "E00004522", "E00004287", "E0000420…
## $ geometry <POLYGON [m]> POLYGON ((530648 181230, 53..., POLYGON ((530511 1815…

7 Merge datasets

OA.Census <- merge(Output.Areas, Census.Data, by.x="OA11CD", by.y="OA")

Transform it into a sf object

OA.Census_sf <- st_as_sf(OA.Census)

8 Spatial distribution

tm_shape(OA.Census_sf) + 
  tm_fill("Qualification",
          palette = "Reds", 
          style = "quantile", 
          title = "% with a Qualification") +
  tm_borders(alpha=.4)  

9 Neightbour structure

9.1 Find queen neighbours

neighbours <- poly2nb(OA.Census)
neighbours
## Neighbour list object:
## Number of regions: 749 
## Number of nonzero links: 4342 
## Percentage nonzero weights: 0.774 
## Average number of links: 5.797
neighbours_sf <- poly2nb(OA.Census_sf)
neighbours_sf
## Neighbour list object:
## Number of regions: 749 
## Number of nonzero links: 4342 
## Percentage nonzero weights: 0.774 
## Average number of links: 5.797

9.3 Find rook neighbours

neighbours2 <- poly2nb(OA.Census, queen = FALSE)
neighbours2
## Neighbour list object:
## Number of regions: 749 
## Number of nonzero links: 4176 
## Percentage nonzero weights: 0.7444 
## Average number of links: 5.575

9.4 Plot rook neighbours

plot(OA.Census, border = 'lightgrey')
plot(neighbours2, coordinates(OA.Census), add=TRUE, col='blue')

9.5 Plot queen vs rook

plot(OA.Census, border = 'lightgrey')
plot(neighbours, coordinates(OA.Census), add=TRUE, col='red')
plot(neighbours2, coordinates(OA.Census), add=TRUE, col='blue')

10 Global spatial autocorrelation

10.1 Define weights matrix

listw <- nb2listw(neighbours2)
listw
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 749 
## Number of nonzero links: 4176 
## Percentage nonzero weights: 0.7444 
## Average number of links: 5.575 
## 
## Weights style: W 
## Weights constants summary:
##     n     nn  S0    S1   S2
## W 749 561001 749 285.4 3114

10.2 Global Moran test

The correlation score is between -1 and 1. Much like a correlation coefficient, 1 determines perfect positive spatial autocorrelation (so our data is clustered), 0 identifies the data is randomly distributed and -1 represents negative spatial autocorrelation (so dissimilar values are next to each other).

globalMoran <- moran.test(OA.Census$Qualification, listw)
globalMoran
## 
##  Moran I test under randomisation
## 
## data:  OA.Census$Qualification  
## weights: listw    
## 
## Moran I statistic standard deviate = 24, p-value <0.0000000000000002
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##         0.5448699        -0.0013369         0.0005056
globalMoran[["estimate"]][["Moran I statistic"]]
## [1] 0.5449
globalMoran[["p.value"]]
## [1] 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001188

The Moran I statistic is 0.54, we can, therefore, determine that there our qualification variable is positively autocorrelated in Camden. In other words, the data does spatially cluster. We can also consider the p-value as a measure of the statistical significance of the model.

11 Local spatial autocorrelation

11.1 Moran scatterplot

moran <- moran.plot(OA.Census$Qualification, listw = nb2listw(neighbours2, style = "W"))

11.2 Compute local Moran

local <- localmoran(x = OA.Census$Qualification, listw = nb2listw(neighbours2, style = "W"))

By considering the help page for the localmoran function (run ?localmoran in R console) we can observe the arguments and outputs. We get a number of useful statistics from the model which are as defined:

  • Ii: local moran statistic

  • E.Ii: expectation of local moran statistic

  • Var.Ii: variance of local moran statistic

  • Z.Ii: standard deviate of local moran statistic

  • Pr(): p-value of local moran statistic

11.3 Plot local Moran

  • A map the local moran statistic (Ii).

A positive value for Ii indicates that the unit is surrounded by units with similar values.

# binds results to our polygon shapefile
moran.map <- cbind(OA.Census, local)
tm_shape(moran.map) +
  tm_fill(col = "Ii",
          style = "quantile",
          title = "local moran statistic") 
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

11.4 Plot LISA clusters

quadrant <- vector(mode="numeric",length=nrow(local))

# centers the variable of interest around its mean
m.qualification <- OA.Census$Qualification - mean(OA.Census$Qualification)     

# centers the local Moran's around the mean
m.local <- local[,1] - mean(local[,1])    

# significance threshold
signif <- 0.1 

# builds a data quadrant
quadrant[m.qualification >0 & m.local>0] <- 4  
quadrant[m.qualification <0 & m.local<0] <- 1      
quadrant[m.qualification <0 & m.local>0] <- 2
quadrant[m.qualification >0 & m.local<0] <- 3
quadrant[local[,5]>signif] <- 0   

# plot in r
brks <- c(0,1,2,3,4)
colors <- c("white","blue",rgb(0,0,1,alpha=0.4),rgb(1,0,0,alpha=0.4),"red")
plot(OA.Census,border="lightgray",col=colors[findInterval(quadrant,brks,all.inside=FALSE)])
box()
legend("bottomleft", legend = c("insignificant","low-low","low-high","high-low","high-high"),
       fill=colors,bty="n")

12 Getis-Ord approach

The Getis-Ord Gi Statistic looks at neighbours within a defined proximity to identify where either high or low values cluster spatially. Here statistically significant hot-spots are recognised as areas of high values where other areas within a neighbourhood range also share high values too.

First, we need to define a new set of neighbours. Whilst the spatial autocorrection considered units which shared borders, for Getis-Ord we are defining neighbours based on proximity. The example below shows the results where we have a search radius of 250 metres.

However, here a search radius of just 250 metres fails to define nearest neighbours for some areas so we will need to set the radius as 800 metres or more for our model in Camden.

  • creates centroid and joins neighbours within 0 and 800 units
nb <- dnearneigh(coordinates(OA.Census), 0, 800)
  • creates listw
nb_lw <- nb2listw(nb, style = 'B')

12.1 Plot data and neighbours

plot(OA.Census, border = 'lightgrey')
plot(nb, coordinates(OA.Census), add=TRUE, col = 'red')

12.2 Getis-Ord Gi statistic

NOTE: On some machines the cbind may not work with a spatial data file, in this case, you will need to change OA.Census to so that R knows which part of the spatial data file to join. If you take this approach the subsequent column ordering may be different to what is shown in the example below.

local_g <- localG(OA.Census$Qualification, nb_lw)
local_g <- cbind(OA.Census, as.matrix(local_g))
names(local_g)[6] <- "gstat"

12.3 Cluster map

tm_shape(local_g) + 
  tm_fill("gstat", 
          palette = "RdBu",
          style = "pretty") +
  tm_borders(alpha=.4)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

The Gi Statistic is represented as a Z-score. Greater values represent a greater intensity of clustering and the direction (positive or negative) indicates high or low clusters. The final map should indicate the location of hot-spots across Camden. Repeat this for another variable.

---
title: "Spatial autocorrelation analysis in R"
author: "Carlos Mendez"
output:
  html_document:
    code_download: true
    df_print: paged
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
    toc_depth: 4
    number_sections: true
    code_folding: "show"
    theme: "cosmo"
    highlight: "monochrome"
  html_notebook:
    code_folding: show
    highlight: monochrome
    number_sections: yes
    theme: cosmo
    toc: yes
    toc_depth: 4
    toc_float:
      collapsed: no
      smooth_scroll: no
  pdf_document: default
  word_document: default
bibliography: biblio.bib
---

<style>
h1.title {font-size: 18pt; color: DarkBlue;} 
body, h1, h2, h3, h4 {font-family: "Palatino", serif;}
body {font-size: 12pt;}
/* Headers */
h1,h2,h3,h4,h5,h6{font-size: 14pt; color: #00008B;}
body {color: #333333;}
a, a:hover {color: #8B3A62;}
pre {font-size: 12px;}
</style>

Suggested citation: 

> Mendez C. (2020).  Spatial autocorrelation analysis in R. R Studio/RPubs. Available at <https://rpubs.com/quarcs-lab/spatial-autocorrelation>

This work is licensed under the Creative Commons Attribution-Share Alike 4.0 International License. 
![](License.png)

Acknowledgment:

Material adapted from multiple sources, in particular the course materials of [Guy Lansley & James Cheshire (2016)](https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r).

# Libraries

```{r message=FALSE, warning=FALSE}
knitr::opts_chunk$set(echo = TRUE)

library(tidyverse)  # Modern data science workflow
library(sf)
library(sp)
library(spdep)
library(rgdal)
library(rgeos)
library(tmap)
library(tmaptools)
library(spgwr)
library(grid)
library(gridExtra)


# Change the presentation of decimal numbers to 4 and avoid scientific notation
options(prompt="R> ", digits=4, scipen=999)
```

# Motivation

Waldo Tober’s first law of geography is that “Everything is related to everything else, but near things are more related than distant things.” So we would expect most geographic phenomena to exert a spatial autocorrelation of some kind. In population data this is often the case as persons with similar characteristics tend to reside in similar neighbourhoods due to a range of reasons including house prices, proximity to workplaces and cultural factors.


# Tutorial objectives

- Run various measures of spatial correlation
- Evaluate both global and local measures of spatial autocorrelation
- Identify spatial clusters

# Replication files

- All necessary files can be downloaded from  [Guy Lansley & James Cheshire (2016)](https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r).

- If you are a member of the [QuaRCS lab](https://quarcs-lab.org), you can run this tutorial in [R Studio Cloud](https://rstudio.cloud/project/961530) and access the files in the following [Github Repository](https://github.com/quarcs-lab/tutorial-spatial-autocorrelation)


# Preliminary material

- [Moran's I](https://youtu.be/qWQ1Pa5cbtw)

- [Interpretations of Moran's I](https://youtu.be/_J_bmWmOF3I)

- [Moran Scatter Plot](https://youtu.be/ZfOc2Hn-GFc)

- [Geary's C](https://youtu.be/EKIKEeAw0W8)

- [Local Spatial Autocorrelation Clusters](https://youtu.be/gXovtSGNz2w)

- [LISA Principle](https://youtu.be/GvkgJWrNMyI)

- [Local Moran](https://youtu.be/_SmNHt4r79k)

- [Local G Statistics](https://youtu.be/urfsjGo-XXc)

- [Issues and Interpretation](https://youtu.be/zjYEIhOtDTs)


# Import datasets


## Non-spatial data

```{r}
Census.Data <-read.csv("practicaldata.csv")
```

```{r}
glimpse(Census.Data)
```


## Spatial data

```{r}
Output.Areas <- readOGR(".", "Camden_oa11")
```

```{r eval=FALSE}
# Run it in console (long output)
glimpse(Output.Areas)
```

Using the `sf` package

```{r}
Output.Areas2 <- read_sf("Camden_oa11.shp")
```

```{r}
glimpse(Output.Areas2)
```

# Merge datasets

```{r}
OA.Census <- merge(Output.Areas, Census.Data, by.x="OA11CD", by.y="OA")
```

Transform it into a `sf` object

```{r}
OA.Census_sf <- st_as_sf(OA.Census)
```

# Spatial distribution

```{r}
tm_shape(OA.Census_sf) + 
  tm_fill("Qualification",
          palette = "Reds", 
          style = "quantile", 
          title = "% with a Qualification") +
  tm_borders(alpha=.4)  
```

# Neightbour structure

## Find queen neighbours


```{r}
neighbours <- poly2nb(OA.Census)
neighbours
```

```{r}
neighbours_sf <- poly2nb(OA.Census_sf)
neighbours_sf
```


## Plot queen neighbours links 

```{r}
plot(OA.Census, border = 'lightgrey')
plot(neighbours, coordinates(OA.Census), add=TRUE, col='red')
```

## Find rook neighbours

```{r}
neighbours2 <- poly2nb(OA.Census, queen = FALSE)
neighbours2
```

## Plot rook neighbours

```{r}
plot(OA.Census, border = 'lightgrey')
plot(neighbours2, coordinates(OA.Census), add=TRUE, col='blue')
```


## Plot queen vs rook

```{r}
plot(OA.Census, border = 'lightgrey')
plot(neighbours, coordinates(OA.Census), add=TRUE, col='red')
plot(neighbours2, coordinates(OA.Census), add=TRUE, col='blue')
```


# Global spatial autocorrelation

## Define weights matrix

```{r}
listw <- nb2listw(neighbours2)
listw
```

## Global Moran test

The correlation score is between -1 and 1. Much like a correlation coefficient, 1 determines perfect positive spatial autocorrelation (so our data is clustered), 0 identifies the data is randomly distributed and -1 represents negative spatial autocorrelation (so dissimilar values are next to each other).


```{r}
globalMoran <- moran.test(OA.Census$Qualification, listw)
globalMoran
```

```{r}
globalMoran[["estimate"]][["Moran I statistic"]]
```

```{r}
globalMoran[["p.value"]]
```


The Moran I statistic is 0.54, we can, therefore, determine that there our qualification variable is positively autocorrelated in Camden. In other words, the data does spatially cluster. We can also consider the p-value as a measure of the statistical significance of the model.


# Local spatial autocorrelation

## Moran scatterplot 

```{r}
moran <- moran.plot(OA.Census$Qualification, listw = nb2listw(neighbours2, style = "W"))
```

## Compute local Moran

```{r}
local <- localmoran(x = OA.Census$Qualification, listw = nb2listw(neighbours2, style = "W"))
```

By considering the help page for the localmoran function (run `?localmoran` in R console) we can observe the arguments and outputs. We get a number of useful statistics from the model which are as defined:

- Ii:	local moran statistic

- E.Ii: expectation of local moran statistic

- Var.Ii:	variance of local moran statistic

- Z.Ii:	standard deviate of local moran statistic

- Pr():	p-value of local moran statistic

## Plot local Moran

- A map the local moran statistic (Ii). 

A positive value for Ii indicates that the unit is surrounded by units with similar values.

```{r}
# binds results to our polygon shapefile
moran.map <- cbind(OA.Census, local)
```

```{r}
tm_shape(moran.map) +
  tm_fill(col = "Ii",
          style = "quantile",
          title = "local moran statistic") 
```

## Plot LISA clusters

```{r}
quadrant <- vector(mode="numeric",length=nrow(local))

# centers the variable of interest around its mean
m.qualification <- OA.Census$Qualification - mean(OA.Census$Qualification)     

# centers the local Moran's around the mean
m.local <- local[,1] - mean(local[,1])    

# significance threshold
signif <- 0.1 

# builds a data quadrant
quadrant[m.qualification >0 & m.local>0] <- 4  
quadrant[m.qualification <0 & m.local<0] <- 1      
quadrant[m.qualification <0 & m.local>0] <- 2
quadrant[m.qualification >0 & m.local<0] <- 3
quadrant[local[,5]>signif] <- 0   

# plot in r
brks <- c(0,1,2,3,4)
colors <- c("white","blue",rgb(0,0,1,alpha=0.4),rgb(1,0,0,alpha=0.4),"red")
plot(OA.Census,border="lightgray",col=colors[findInterval(quadrant,brks,all.inside=FALSE)])
box()
legend("bottomleft", legend = c("insignificant","low-low","low-high","high-low","high-high"),
       fill=colors,bty="n")
```

# Getis-Ord approach

The Getis-Ord Gi Statistic looks at neighbours within a defined proximity to identify where either high or low values cluster spatially. Here statistically significant hot-spots are recognised as areas of high values where other areas within a neighbourhood range also share high values too.

First, we need to define a new set of neighbours. Whilst the spatial autocorrection considered units which shared borders, for Getis-Ord we are defining neighbours based on proximity. The example below shows the results where we have a search radius of 250 metres.

However, here a search radius of just 250 metres fails to define nearest neighbours for some areas so we will need to set the radius as 800 metres or more for our model in Camden.

- creates centroid and joins neighbours within 0 and 800 units

```{r}
nb <- dnearneigh(coordinates(OA.Census), 0, 800)
```


- creates listw

```{r}
nb_lw <- nb2listw(nb, style = 'B')
```

## Plot data and neighbours

```{r}
plot(OA.Census, border = 'lightgrey')
plot(nb, coordinates(OA.Census), add=TRUE, col = 'red')
```

## Getis-Ord Gi statistic

NOTE: On some machines the `cbind` may not work with a spatial data file, in this case, you will need to change OA.Census to OA.Census@data so that R knows which part of the spatial data file to join. If you take this approach the subsequent column ordering may be different to what is shown in the example below.


```{r}
local_g <- localG(OA.Census$Qualification, nb_lw)
local_g <- cbind(OA.Census, as.matrix(local_g))
names(local_g)[6] <- "gstat"
```

## Cluster map

```{r}
tm_shape(local_g) + 
  tm_fill("gstat", 
          palette = "RdBu",
          style = "pretty") +
  tm_borders(alpha=.4)
```

The Gi Statistic is represented as a Z-score. Greater values represent a greater intensity of clustering and the direction (positive or negative) indicates high or low clusters. The final map should indicate the location of hot-spots across Camden. Repeat this for another variable.


# References

- [Bivand, R. S., & Wong, D. W. (2018). Comparing implementations of global and local indicators of spatial association. Test, 27(3), 716-748.](https://link.springer.com/article/10.1007/s11749-018-0599-x)



# Other tutorials

- [GeoDa Workbook](https://geodacenter.github.io/documentation.html)

- [ECS530: (VI) Spatial autocorrelation](https://rsbivand.github.io/ECS530_h19/ECS530_VI.html)
    - [Code and data](https://github.com/rsbivand/ECS530_h19/blob/master/ECS530_VI.Rmd)
    - [Video](https://youtu.be/vySVctD__Lc)

END
