IS415 Geospatial Analytics and Application

Instructor: Dr. Kam Tin Seong.

Assoc. Professor of Information Systems (Practice)

1. Overview

In this hands-on exercise, we will learn how to use appropriate localised geospatial statistics analysis and thematic mapping of R to perform spatial clustering on greographically referenced attribute. The 2 major analysis methods are:

  • cluster & outlier analysis (using Local Indicator of Spatial Association (LISA) of spdep package)

  • hot spot & cold spot areas analysis (using Getis-Ord’s Gi-statistics od spdep package)

Specifically, we are going to analyse the spatial pattern of GDP per capital of Hunan Province counties,

  • Whether development based on GDP per capita is distributed within a planning area?

  • Are there signs of spatial clustering and where are these clusters?

1.1 Dataset

  • Hunan county shapefile containing the county’s boundary layer, spatial data in polygon objects

  • Hunan_GDPPC.csv containing the GPD per capital values at the county level

2. Load/Install necessary R packages

We will be using these R packages:

  • spdep - for spatial statistical analysis

  • rgdal - for handling spatial data

  • tmap - for choropleth mapping

  • tidyverse - for tidying data (readr, ggplot2 & dplyr)

packages = c('spdep','rgdal','tmap','tidyverse')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

3. Importing Data into R

We import the shapefile using readOGR() function of rgdal package.

Then, we import the csv file using read_csv() function of readr package.

3.1 Importing geospatial data (shapefile)

  • Imported as SpatialPolygonDataFrame
hunan <- readOGR(dsn = "data/shapefile/Hunan.shp", layer = "Hunan")
## OGR data source with driver: ESRI Shapefile 
## Source: "D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Week 7\Hands-on_Ex07\data\shapefile\Hunan.shp", layer: "Hunan"
## with 88 features
## It has 7 fields

3.2 Importing aspatial data (csv file)

  • Imported as tibble data.frame
gdppc <- read_csv("data/attributeTable/Hunan_GDPPC.csv")
## Parsed with column specification:
## cols(
##   County = col_character(),
##   City = col_character(),
##   GDPPC2005 = col_double(),
##   GDPPC2006 = col_double(),
##   GDPPC2007 = col_double(),
##   GDPPC2008 = col_double(),
##   GDPPC2009 = col_double(),
##   GDPPC2010 = col_double(),
##   GDPPC2011 = col_double(),
##   GDPPC2012 = col_double()
## )

3.3 Combining geospatial and aspatial data

We will use left_join() function of dplyr package to combine hunan (SpatialPolygonsDataFrame) and gdppc (tbl dataframe)

hunan@data <- left_join(hunan@data, gdppc)
## Joining, by = "County"

Note: no new output data has been created, gdppc data frame has all the same variables as hunan, so the gdppc data frame are updated into the data frame of hunan SpatialPolygonDataFrame.

3.4 Choropleth map

We use qtm() function from tmap package, it stands for quick thematic map and is an easy way to plot maps.

qtm(hunan, "GDPPC2012")

The choropleth map above shows clear sign of spatial autocorrelation. (Global Moran I value)

4. Cluster & Outlier Analysis

What is Local Indicators of Spatial Association (LISA)?

  • Statistics that evaluate the existence of clusters in a spatial arrangement

Analysis steps:

  • Deriving spatial weights matrix

  • Computing local Moran’s I

  • Plotting Moran scatterplot

  • Mapping local Moran’s I

  • Plotting LISA map

4.1 Deriving spatial weight matrix

Creating Queen Contiguity based neighbours

We use poly2nb() function of spdep package, this function builds neighbours list based on regions with contiguous boundaries (diagonals too).

wm_q <- poly2nb(hunan, queen = TRUE)

summary(wm_q)
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 448 
## Percentage nonzero weights: 5.785124 
## Average number of links: 5.090909 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 11 
##  2  2 12 16 24 14 11  4  2  1 
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links

From the summary, we can find out that:

  • There are 88 area units/counties in Hunan

  • Most connected area unit has 11 neighbours which is shown by the first row 11 with the next row showing 1 meaning only 1 area unit has 11 neighbours

  • There are 2 area units with only 1 neighbour

plot(hunan, border = 'lightgrey')
plot(wm_q, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")

4.1.1 Row-standardised weights matrix

rswm_q <- nb2listw(wm_q, zero.policy = TRUE)

summary(rswm_q)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 448 
## Percentage nonzero weights: 5.785124 
## Average number of links: 5.090909 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 11 
##  2  2 12 16 24 14 11  4  2  1 
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 88 7744 88 37.86334 365.9147

4.1.2 (Class exercise) Adaptive weight matrix of 8 neighbours

To control the number of neighbours directly, we can use k-nearest neighbours knearneigh() function, either accepting asymmetric neighbours or imposing symmetry as shown below

longlat argument is TRUE because the crs of hunan is already in WGS84. However, if it were to be false, it will use the great circle method to process it.

knn8 <- knn2nb(knearneigh(coordinates(hunan), k =8, longlat = TRUE), row.names = row.names(hunan$GDPPC2012))

knn_lw <- nb2listw(knn8, style = "B")
summary(knn8)
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 704 
## Percentage nonzero weights: 9.090909 
## Average number of links: 8 
## Non-symmetric neighbours list
## Link number distribution:
## 
##  8 
## 88 
## 88 least connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 88 most connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
plot(hunan, border = 'lightgrey')
plot(knn_lw, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = 'red')

4.2 Computing Local Moran’s I (for Queen Contiguity neighbours)

We use localmoran() function of spdep package, it will compute li values, given a set of zi values and listw object providing neighbour weighting information for the polygon associated with the zi values.

fips <- order(hunan$County)
localMI <- localmoran(hunan$GDPPC2012, rswm_q)
head(localMI)
##             Ii        E.Ii    Var.Ii         Z.Ii Pr(z > 0)
## 0 -0.001468468 -0.01149425 0.1768267  0.023842100 0.4904893
## 1  0.025878173 -0.01149425 0.1768267  0.088874545 0.4645908
## 2 -0.011987646 -0.01149425 0.2234962 -0.001043657 0.5004164
## 3  0.001022468 -0.01149425 0.2234962  0.026476192 0.4894388
## 4  0.014814881 -0.01149425 0.2234962  0.055650812 0.4778100
## 5 -0.038793829 -0.01149425 0.1768267 -0.064920523 0.5258814

4.2.1 Computing Local Moran’s I (for adaptive weight matrix 8 neighbours)

Since the localmoran() function doesn’t give index to the results, we use order()function so that we can perform joins and it will return the correct result.

fips1 <- order(hunan$County)
localMI1 <- localmoran(hunan$GDPPC2012, knn_lw)
head(localMI1)
##             Ii        E.Ii   Var.Ii        Z.Ii Pr(z > 0)
## 1 -0.037251807 -0.09195402 6.836637  0.02092106 0.4916543
## 2 -0.192769199 -0.09195402 6.836637 -0.03855713 0.5153783
## 3  0.013637073 -0.09195402 6.836637  0.04038369 0.4838936
## 4  0.002769951 -0.09195402 6.836637  0.03622752 0.4855505
## 5 -0.007674909 -0.09195402 6.836637  0.03223285 0.4871432
## 6 -0.259921222 -0.09195402 6.836637 -0.06423966 0.5256103

Using the printCoefmat() function, we can see also see the content of the local Moran matrix.

printCoefmat(data.frame(localMI1[fips1,], row.names = hunan$County[fips1]), check.names = FALSE)
##                        Ii        E.Ii      Var.Ii        Z.Ii Pr.z...0.
## Anhua         -9.6865e-02 -9.1954e-02  6.8366e+00 -1.8782e-03    0.5007
## Anren         -3.1946e+00 -9.1954e-02  6.8366e+00 -1.1866e+00    0.8823
## Anxiang       -3.7252e-02 -9.1954e-02  6.8366e+00  2.0921e-02    0.4917
## Baojing        3.0039e+00 -9.1954e-02  6.8366e+00  1.1840e+00    0.1182
## Chaling       -3.1217e-01 -9.1954e-02  6.8366e+00 -8.4223e-02    0.5336
## Changning     -2.5354e-04 -9.1954e-02  6.8366e+00  3.5071e-02    0.4860
## Changsha       4.5362e+01 -9.1954e-02  6.8366e+00  1.7384e+01    0.0000
## Chengbu        4.7981e+00 -9.1954e-02  6.8366e+00  1.8702e+00    0.0307
## Chenxi         1.0778e+00 -9.1954e-02  6.8366e+00  4.4739e-01    0.3273
## Cili           3.1224e-01 -9.1954e-02  6.8366e+00  1.5458e-01    0.4386
## Dao            1.1530e+00 -9.1954e-02  6.8366e+00  4.7615e-01    0.3170
## Dongan         1.0037e+00 -9.1954e-02  6.8366e+00  4.1905e-01    0.3376
## Dongkou        3.7264e+00 -9.1954e-02  6.8366e+00  1.4604e+00    0.0721
## Fenghuang      1.4478e+00 -9.1954e-02  6.8366e+00  5.8890e-01    0.2780
## Guidong       -2.3061e+00 -9.1954e-02  6.8366e+00 -8.4682e-01    0.8015
## Guiyang        1.7848e+00 -9.1954e-02  6.8366e+00  7.1777e-01    0.2364
## Guzhang        2.7198e+00 -9.1954e-02  6.8366e+00  1.0753e+00    0.1411
## Hanshou       -1.9277e-01 -9.1954e-02  6.8366e+00 -3.8557e-02    0.5154
## Hengdong      -6.9586e-02 -9.1954e-02  6.8366e+00  8.5548e-03    0.4966
## Hengnan        4.7834e-02 -9.1954e-02  6.8366e+00  5.3462e-02    0.4787
## Hengshan       1.0657e-01 -9.1954e-02  6.8366e+00  7.5925e-02    0.4697
## Hengyang       7.3384e-02 -9.1954e-02  6.8366e+00  6.3234e-02    0.4748
## Hongjiang      1.7214e+00 -9.1954e-02  6.8366e+00  6.9353e-01    0.2440
## Huarong        1.2252e+00 -9.1954e-02  6.8366e+00  5.0375e-01    0.3072
## Huayuan        1.3473e+00 -9.1954e-02  6.8366e+00  5.5043e-01    0.2910
## Huitong        2.1880e+00 -9.1954e-02  6.8366e+00  8.7198e-01    0.1916
## Jiahe         -9.3416e-01 -9.1954e-02  6.8366e+00 -3.2210e-01    0.6263
## Jianghua       1.3463e+00 -9.1954e-02  6.8366e+00  5.5008e-01    0.2911
## Jiangyong      1.1769e+00 -9.1954e-02  6.8366e+00  4.8529e-01    0.3137
## Jingzhou       1.3692e+00 -9.1954e-02  6.8366e+00  5.5883e-01    0.2881
## Jinshi         1.3637e-02 -9.1954e-02  6.8366e+00  4.0384e-02    0.4839
## Jishou        -2.4882e+00 -9.1954e-02  6.8366e+00 -9.1646e-01    0.8203
## Lanshan        5.3431e-01 -9.1954e-02  6.8366e+00  2.3952e-01    0.4054
## Leiyang        3.6429e-01 -9.1954e-02  6.8366e+00  1.7449e-01    0.4307
## Lengshuijiang -1.3481e+01 -9.1954e-02  6.8366e+00 -5.1206e+00    1.0000
## Li             2.7700e-03 -9.1954e-02  6.8366e+00  3.6228e-02    0.4856
## Lianyuan      -2.2915e+00 -9.1954e-02  6.8366e+00 -8.4121e-01    0.7999
## Liling         1.3127e+01 -9.1954e-02  6.8366e+00  5.0557e+00    0.0000
## Linli         -7.6749e-03 -9.1954e-02  6.8366e+00  3.2233e-02    0.4871
## Linwu          5.9182e-02 -9.1954e-02  6.8366e+00  5.7802e-02    0.4770
## Linxiang       8.9333e-01 -9.1954e-02  6.8366e+00  3.7682e-01    0.3532
## Liuyang        2.6814e+01 -9.1954e-02  6.8366e+00  1.0290e+01    0.0000
## Longhui        2.1325e+00 -9.1954e-02  6.8366e+00  8.5074e-01    0.1975
## Longshan       3.3841e+00 -9.1954e-02  6.8366e+00  1.3294e+00    0.0919
## Luxi           1.3170e+00 -9.1954e-02  6.8366e+00  5.3885e-01    0.2950
## Mayang         1.2656e+00 -9.1954e-02  6.8366e+00  5.1921e-01    0.3018
## Miluo          1.2402e+01 -9.1954e-02  6.8366e+00  4.7784e+00    0.0000
## Nan           -6.0565e-01 -9.1954e-02  6.8366e+00 -1.9646e-01    0.5779
## Ningxiang      1.3096e+01 -9.1954e-02  6.8366e+00  5.0437e+00    0.0000
## Ningyuan       1.4858e+00 -9.1954e-02  6.8366e+00  6.0341e-01    0.2731
## Pingjiang     -6.0562e+00 -9.1954e-02  6.8366e+00 -2.2811e+00    0.9887
## Qidong         1.0180e+00 -9.1954e-02  6.8366e+00  4.2451e-01    0.3356
## Qiyang         1.0586e+00 -9.1954e-02  6.8366e+00  4.4005e-01    0.3300
## Rucheng       -1.4785e+00 -9.1954e-02  6.8366e+00 -5.3029e-01    0.7020
## Sangzhi        2.5006e+00 -9.1954e-02  6.8366e+00  9.9154e-01    0.1607
## Shaodong      -2.6376e-02 -9.1954e-02  6.8366e+00  2.5081e-02    0.4900
## Shaoshan       1.0831e+01 -9.1954e-02  6.8366e+00  4.1776e+00    0.0000
## Shaoyang       1.5201e+00 -9.1954e-02  6.8366e+00  6.1652e-01    0.2688
## Shimen        -2.5992e-01 -9.1954e-02  6.8366e+00 -6.4240e-02    0.5256
## Shuangfeng    -4.9102e-01 -9.1954e-02  6.8366e+00 -1.5262e-01    0.5607
## Shuangpai      3.5368e-01 -9.1954e-02  6.8366e+00  1.7044e-01    0.4323
## Suining        3.1437e+00 -9.1954e-02  6.8366e+00  1.2375e+00    0.1080
## Taojiang      -2.6068e+00 -9.1954e-02  6.8366e+00 -9.6180e-01    0.8319
## Taoyuan        9.0485e-02 -9.1954e-02  6.8366e+00  6.9774e-02    0.4722
## Tongdao        4.0982e+00 -9.1954e-02  6.8366e+00  1.6025e+00    0.0545
## Wangcheng      3.4050e+01 -9.1954e-02  6.8366e+00  1.3058e+01    0.0000
## Wugang         4.8230e+00 -9.1954e-02  6.8366e+00  1.8797e+00    0.0301
## Xiangtan       1.6075e+00 -9.1954e-02  6.8366e+00  6.4996e-01    0.2579
## Xiangxiang     2.2502e+00 -9.1954e-02  6.8366e+00  8.9575e-01    0.1852
## Xiangyin       5.2141e+00 -9.1954e-02  6.8366e+00  2.0293e+00    0.0212
## Xinhua         1.8045e+00 -9.1954e-02  6.8366e+00  7.2531e-01    0.2341
## Xinhuang       1.9527e+00 -9.1954e-02  6.8366e+00  7.8199e-01    0.2171
## Xinning        5.1008e+00 -9.1954e-02  6.8366e+00  1.9860e+00    0.0235
## Xinshao        1.1025e+00 -9.1954e-02  6.8366e+00  4.5683e-01    0.3239
## Xintian        2.9296e-01 -9.1954e-02  6.8366e+00  1.4721e-01    0.4415
## Xupu           2.4821e-01 -9.1954e-02  6.8366e+00  1.3010e-01    0.4482
## Yanling       -3.5181e-01 -9.1954e-02  6.8366e+00 -9.9383e-02    0.5396
## Yizhang       -5.3061e-01 -9.1954e-02  6.8366e+00 -1.6776e-01    0.5666
## Yongshun       3.4110e+00 -9.1954e-02  6.8366e+00  1.3397e+00    0.0902
## Yongxing       9.6546e-01 -9.1954e-02  6.8366e+00  4.0441e-01    0.3430
## You            3.9581e-01 -9.1954e-02  6.8366e+00  1.8655e-01    0.4260
## Yuanjiang      1.9578e-01 -9.1954e-02  6.8366e+00  1.1005e-01    0.4562
## Yuanling       5.3798e-02 -9.1954e-02  6.8366e+00  5.5743e-02    0.4778
## Yueyang        2.8183e-01 -9.1954e-02  6.8366e+00  1.4296e-01    0.4432
## Zhijiang       9.6683e-01 -9.1954e-02  6.8366e+00  4.0493e-01    0.3428
## Zhongfang     -1.9505e+00 -9.1954e-02  6.8366e+00 -7.1081e-01    0.7614
## Zhuzhou        2.5374e+00 -9.1954e-02  6.8366e+00  1.0056e+00    0.1573
## Zixing        -5.0635e+00 -9.1954e-02  6.8366e+00 -1.9014e+00    0.9714

The results:

  • Ii: the local Moran’s I statistics

  • E.Ii: the expectation of local moran statistic under the randomisation hypothesis

  • Var.Ii: the variance of local moran statistic under the randomisation hypothesis

  • Z.Ii:the standard deviate of local moran statistic

  • Pr(): the p-value of local moran statistic

Moran’s I is a measure of the degree to which the value at a target site is similar to values at adjacent sites.
Moran’s I is large and positive when the value for a given target (or for all locations in the global case) is similar to adjacent values and negative when the value at a target is dissimilar to adjacent values.

4.3 Mapping the Local Moran’s I

Note = We always to have plot both the local moran’s I values with the p-values side by side.

Before mapping the map, it is wise to append the localMI (Queen Contiguity neighbours) into hunan SpatialPolygonDataFrame.

hunan.localMI <- cbind(hunan, localMI)

Using tmap package functions,

tm_shape(hunan.localMI) +
  tm_fill(col = "Ii",
          style = "pretty",
          palette = "RdBu",
          title = "local moran statistics")+
  tm_borders(alpha = 0.5)
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

4.4 Mapping Local Moran’s I p-values

tm_shape(hunan.localMI) +
  tm_fill(col = "Pr.z...0.",
          breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
          palette = "-Blues",
          title = "local Moran's I p-values")+
  tm_borders(alpha = 0.5)

4.5 Mapping both local Moran’s I values & p-values

localMI.map <- tm_shape(hunan.localMI) +
  tm_fill(col = "Ii",
          style = "pretty",
          title = "local moran statistics")+
  tm_borders(alpha = 0.5)

pvalue.map <- tm_shape(hunan.localMI) +
  tm_fill(col = "Pr.z...0.",
          breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
          palette = "-Blues",
          title = "local Moran's I p-values")+
  tm_borders(alpha = 0.5)

tmap_arrange(localMI.map, pvalue.map, asp = 1, ncol = 2)
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

We can see that the 3 green subzone/county areas are high in local Moran’s I values meaning positive autocorrelation. These 3 areas are also highly adjacent in values (GDPPC) with their neighbours and they are statistically significant as clusters because their the p-values are smaller than the confidence interval of 95%. We can also see that although there are negative local Moran’s I values indicating outliers and negative autocorrelation, they are not significant enough from the p-values. However, we do not know what kind of cluster is this? low GDPPC or high GDPPC? so we have to conduct the LISA cluster map.

5. Creating LISA Cluster map

LISA cluster map shows the significant locations color coded by type of spatial autocorrelation.

Steps:

  1. plot the Moran scatterplot

  2. generate LISA cluster map

5.1 Plotting Moran scatterplot

We plot the Moran scatterplot of GDPPC 2012 using moran.plot() function of spdep package.

nci <- moran.plot(hunan$GDPPC2012, rswm_q, labels = as.character(hunan$County), xlab = "GDPPC 2012", ylab = "Spatially Lag GDPPC 2012")

We can see that the plot is split into 4 quadrants, the top right corner belong to areas that have high GDPPC & surrounded by other areas that have average level of GDPPC (high-high locations)

5.2 Plotting Moran scatterplot with standardised variable

We will use scale() function to centers & scales the variable.

  • Centering - subtracting the mean (omitting NAs) from the corresponding columns

  • Scaling - dividing the centered variable by their standard deviations

hunan$Z.GDPPC2012 <- scale(hunan$GDPPC2012) %>% as.vector

knitr::kable(head(hunan, n=5))
NAME_2 ID_3 NAME_3 ENGTYPE_3 Shape_Leng Shape_Area County City GDPPC2005 GDPPC2006 GDPPC2007 GDPPC2008 GDPPC2009 GDPPC2010 GDPPC2011 GDPPC2012 Z.GDPPC2012
Changde 21098 Anxiang County 1.869074 0.1005619 Anxiang Changde 8184 10995 12670 14128 16763 19817 22840 23667 -0.0492059
Changde 21100 Hanshou County 2.360691 0.1997875 Hanshou Changde 6560 7422 8512 11478 12140 15906 18561 20981 -0.2283412
Changde 21101 Jinshi County City 1.425620 0.0530241 Jinshi Changde 9956 11223 14258 16969 20528 24824 30286 34592 0.6794062
Changde 21102 Li County 3.474324 0.1890812 Li Changde 8394 10304 11048 12723 15786 18426 21847 24473 0.0045480
Changde 21103 Linli County 2.289506 0.1145036 Linli Changde 8850 9855 11373 14564 16160 19273 23172 25554 0.0766422

as.vector() function make sures that the resulting data type is a vector, map neatly into dataframe.

nci2 <- moran.plot(hunan$Z.GDPPC2012, rswm_q, labels = as.character(hunan$County), xlab = "GDPPC 2012", ylab = "Spatially Lag GDPPC 2012")

5.3 Preparing the LISA cluster map

Create a numeric vector data type that has the length of the rows of localMI which is

quadrant <- vector(mode = "numeric", length = nrow(localMI))

Next, we centers the variable of interest (GDPPC 2012) around it’s mean

hunan$GDPP2012 is a numeric vector and mean(local) is a single number so the number is subtracted from every element of the vector (vector operation).

localMI[,1] calls the Moran I statistic value.

DV <- hunan$GDPPC2012 - mean(hunan$GDPPC2012)

C_mI <- localMI[,1] - mean(localMI[,1])

Next, we will set a statistical significance level (confidence interval) for the local Moran.

signif <- 0.05
Quadrants (Interpreting scatter plot & the local Moran values)

From the scatterplot, we get the dashed line, outlining the different quadrants but we’re not able to get the different observations in each quadrant.

That’s why we’re creating a new vector names quadrant to get the different quadrant values for each observation so it can be plotted with tmap.

The four commands below define:

  • high-high Quadrant 4 - Cluster/Positive autocorrelation

  • low-low Quadrant 1 - Cluster/Positive autocorrelation

  • low-high Quadrant 2 - Outlier/Negative autocorrelation

  • high-low Quadrant 3 - Outlier/Negative autocorrelation

  • non-significant Moran (p-value of localMI is compared to the significance level)

quadrant[DV >0 & C_mI>0] <- 4      
quadrant[DV <0 & C_mI<0] <- 1      
quadrant[DV <0 & C_mI>0] <- 2
quadrant[DV >0 & C_mI<0] <- 3
quadrant[localMI[,5]>signif] <- 0

5.4 Plotting the LISA cluster map

Now, we can see the different quadrant each observation belongs to.

hunan.localMI$quadrant <- quadrant 
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")

knitr::kable(head(hunan.localMI, n=5))
NAME_2 ID_3 NAME_3 ENGTYPE_3 Shape_Leng Shape_Area County City GDPPC2005 GDPPC2006 GDPPC2007 GDPPC2008 GDPPC2009 GDPPC2010 GDPPC2011 GDPPC2012 Ii E.Ii Var.Ii Z.Ii Pr.z…0. quadrant
0 Changde 21098 Anxiang County 1.869074 0.1005619 Anxiang Changde 8184 10995 12670 14128 16763 19817 22840 23667 -0.0014685 -0.0114943 0.1768267 0.0238421 0.4904893 0
1 Changde 21100 Hanshou County 2.360691 0.1997875 Hanshou Changde 6560 7422 8512 11478 12140 15906 18561 20981 0.0258782 -0.0114943 0.1768267 0.0888745 0.4645908 0
2 Changde 21101 Jinshi County City 1.425620 0.0530241 Jinshi Changde 9956 11223 14258 16969 20528 24824 30286 34592 -0.0119876 -0.0114943 0.2234962 -0.0010437 0.5004164 0
3 Changde 21102 Li County 3.474324 0.1890812 Li Changde 8394 10304 11048 12723 15786 18426 21847 24473 0.0010225 -0.0114943 0.2234962 0.0264762 0.4894388 0
4 Changde 21103 Linli County 2.289506 0.1145036 Linli Changde 8850 9855 11373 14564 16160 19273 23172 25554 0.0148149 -0.0114943 0.2234962 0.0556508 0.4778100 0
tm_shape(hunan.localMI) +
  tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant)))+1], labels = clusters[c(sort(unique(quadrant)))+1], popup.vars = c("Postal.Code")) +
  tm_view(set.zoom.limits = c(11,17)) +
  tm_borders(alpha=0.5)

We can see that from the map, there’s a high-high cluster indicating positive autocorrelation for the subzone/county areas having adjacent high GDPPC values. There is also a low-high cluster indicating negative autocorrelation/outliers whereby these areas with low GDPPC values are surrounded by high GDPPC areas.

6. Hot Spot & Cold Spot Analysis

Another form of LISA/Spatial statistics to detect spatial anomalies is the Getis and Ord’s G-statistics.

The statistics will look at neighbours within a defined proximity to identify where either high or low values cluster spatially/in space.

A significant positive value of G ( G> 0) indicates that there is a " cluster" of high values of the variable analyzed with reference to their average. Also, a significant but negative index value of G (G < 0) is shown as a group of low values relative to the average of the variable analyzed.

  • Hot spot area = G>0, it’s also High-high in Local Moran’s I values

  • Cold spot area = G<0, it’s also Low-low in Local Moran’s I values

Steps:

  1. Deriving spatial weight matrix

  2. Computing Gi statistics

  3. Mapping Gi statistics

6.1 Deriving distance-based weight matrix

For Getis-Ord, we define neighbours based on distance/proximity. (while spatial autocorrelation considered units which shared borders)

The two types of distance-based weight matrix:

  • Fixed distance weight matrix

  • Adaptive distance weight matrix

6.2.1 Fixed distance weight matrix plot

We use dnearneigh() function of spdep package to calculate the fixed distance weight matrix. The function identifies neighbours of region points by Euclidean distance between lower and upper bounds in km.

dnb <- dnearneigh(coordinates(hunan), 0, 85, longlat = TRUE)
dnb
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 642 
## Percentage nonzero weights: 8.290289 
## Average number of links: 7.295455
Fixed distance weight matrix plot
plot(hunan, border = "lightgrey")
plot(dnb, coordinates(hunan), add = TRUE, col = "red")

The nb2listw() supplements a neighbours list with spatial weights for the chosen coding scheme. The input of nb2listw() must be object of class nb. The syntax of the function has two major arguments, namely style and zero.poly.

  • style can take values “W”, “B”, “C”, “U”, “minmax” and “S”. B is the basic binary coding, W is row standardised (sums over all links to n), C is globally standardised (sums over all links to n), U is equal to C divided by the number of neighbours (sums over all links to unity), while S is the variance-stabilizing coding scheme proposed by Tiefelsdorf et al. 1999, p. 167-168 (sums over all links to n).

  • If zero policy is set to TRUE, weights vectors of zero length are inserted for regions without neighbour in the neighbours list. These will in turn generate lag values of zero, equivalent to the sum of products of the zero row t(rep(0, length=length(neighbours))) %*% x, for arbitraty numerical vector x of length length(neighbours). The spatially lagged value of x for the zero-neighbour region will then be zero, which may (or may not) be a sensible choice.

dnb_lw <- nb2listw(dnb, style = "B")
summary(dnb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 642 
## Percentage nonzero weights: 8.290289 
## Average number of links: 7.295455 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 10 11 
##  1  4  7  6 12 12 13 23  7  3 
## 1 least connected region:
## 88 with 2 links
## 3 most connected regions:
## 35 47 86 with 11 links
## 
## Weights style: B 
## Weights constants summary:
##    n   nn  S0   S1    S2
## B 88 7744 642 1284 20320

The output of nb2listw() function is a spatial weights object.

Note: It is this weights list object that will be used in the computation of Gi statistics and not the Neighbour list object.

6.2.2 Adaptive distance weight matrix

Instead of using a fixed distance to derive the weight matrix, a fixed number of neighbours can also be used to derive the weight matrix, the adaptive distance method.

coords <- coordinates(hunan)
knb <- knn2nb(knearneigh(coords, k = 8, longlat = TRUE), row.names = row.names(hunan$gdppc))

knb_lw <- nb2listw(knb, style = "B")
summary(knb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 704 
## Percentage nonzero weights: 9.090909 
## Average number of links: 8 
## Non-symmetric neighbours list
## Link number distribution:
## 
##  8 
## 88 
## 88 least connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 88 most connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 
## Weights style: B 
## Weights constants summary:
##    n   nn  S0   S1    S2
## B 88 7744 704 1286 23044
Adaptive distance weight matrix plot
plot(hunan, border = "lightgrey")
plot(knb, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")

6.2 Computing Gi statistics

Gi statistics of fixed distance weight matrix.
fips <- order(hunan$County)
gi.fixed <- localG(hunan$GDPPC2012, dnb_lw)
#gi.fixed

The output of localG() function of spdep package is a vector of G/Gstar values with attributes “gstari” set to TRUE/FALSE, “call” set to the function call and class “localG”

The Gi statistics is represented as a Z-score. Greater values represent a greater intensity of clustering and the direction (positive or negative) indicates high or low clusters.

Next, we will join the fixed distance weight matrix Gi statistics value to the hunan SpatialPolygonDataFrame.

hunan.gi <- cbind(hunan, as.matrix(gi.fixed))
names(hunan.gi)[17] <- "gstat"
head(hunan.gi@data)
##    NAME_2  ID_3  NAME_3   ENGTYPE_3 Shape_Leng Shape_Area  County    City
## 1 Changde 21098 Anxiang      County   1.869074 0.10056190 Anxiang Changde
## 2 Changde 21100 Hanshou      County   2.360691 0.19978745 Hanshou Changde
## 3 Changde 21101  Jinshi County City   1.425620 0.05302413  Jinshi Changde
## 4 Changde 21102      Li      County   3.474325 0.18908121      Li Changde
## 5 Changde 21103   Linli      County   2.289506 0.11450357   Linli Changde
## 6 Changde 21104  Shimen      County   4.171918 0.37194707  Shimen Changde
##   GDPPC2005 GDPPC2006 GDPPC2007 GDPPC2008 GDPPC2009 GDPPC2010 GDPPC2011
## 1      8184     10995     12670     14128     16763     19817     22840
## 2      6560      7422      8512     11478     12140     15906     18561
## 3      9956     11223     14258     16969     20528     24824     30286
## 4      8394     10304     11048     12723     15786     18426     21847
## 5      8850      9855     11373     14564     16160     19273     23172
## 6      9244     10556     11760     12902     17355     20501     24409
##   GDPPC2012        gstat
## 1     23667 -0.049205949
## 2     20981 -0.228341158
## 3     34592  0.679406172
## 4     24473  0.004547952
## 5     25554  0.076642204
## 6     27137  0.182215933
##   structure.c.0.331637226690737..0.300225037067981...0.0192872220698243..
## 1                                                              0.33163723
## 2                                                              0.30022504
## 3                                                             -0.01928722
## 4                                                              0.23347826
## 5                                                             -0.03392157
## 6                                                             -0.47966539
Gi statistics of adaptive distance weight matrix.
fips <- order(hunan$County)
gi.adaptive <- localG(hunan$GDPPC2012, knb_lw)
hunan.gi <- cbind(hunan.gi, as.matrix(gi.adaptive))
names(hunan.gi)[19] <- "gstat_adaptive"
head(hunan.gi@data)
##    NAME_2  ID_3  NAME_3   ENGTYPE_3 Shape_Leng Shape_Area  County    City
## 1 Changde 21098 Anxiang      County   1.869074 0.10056190 Anxiang Changde
## 2 Changde 21100 Hanshou      County   2.360691 0.19978745 Hanshou Changde
## 3 Changde 21101  Jinshi County City   1.425620 0.05302413  Jinshi Changde
## 4 Changde 21102      Li      County   3.474325 0.18908121      Li Changde
## 5 Changde 21103   Linli      County   2.289506 0.11450357   Linli Changde
## 6 Changde 21104  Shimen      County   4.171918 0.37194707  Shimen Changde
##   GDPPC2005 GDPPC2006 GDPPC2007 GDPPC2008 GDPPC2009 GDPPC2010 GDPPC2011
## 1      8184     10995     12670     14128     16763     19817     22840
## 2      6560      7422      8512     11478     12140     15906     18561
## 3      9956     11223     14258     16969     20528     24824     30286
## 4      8394     10304     11048     12723     15786     18426     21847
## 5      8850      9855     11373     14564     16160     19273     23172
## 6      9244     10556     11760     12902     17355     20501     24409
##   GDPPC2012        gstat
## 1     23667 -0.049205949
## 2     20981 -0.228341158
## 3     34592  0.679406172
## 4     24473  0.004547952
## 5     25554  0.076642204
## 6     27137  0.182215933
##   structure.c.0.331637226690737..0.300225037067981...0.0192872220698243..
## 1                                                              0.33163723
## 2                                                              0.30022504
## 3                                                             -0.01928722
## 4                                                              0.23347826
## 5                                                             -0.03392157
## 6                                                             -0.47966539
##   gstat_adaptive
## 1     0.27442880
## 2     0.30022504
## 3     0.03044770
## 4     0.22227212
## 5    -0.03392157
## 6    -0.51413315

6.3 Mapping local Gi statistics

Mapping local Gi values with fixed distance weights
tm_shape(hunan.gi)+
  tm_fill(col = "gstat",
          style = "pretty",
          palette = "-RdBu",
          title = "local Gi value")+
  tm_borders(alpha = 0.5)
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

In the choropleth map above, counties shaded in red are the hot spot areas and counties shaded in blue are the cold spot areas. The darkness of the colours representing the intensity of the Gi values.

In conclusion, the choropleth above shows clear sign of east-west divide in the GDP per capita by county of Hunan province in 2012. The hot spot areas were centre around Changsha city at the east of the provice. The cold spot areas, on the other hand, mainly comprise of counties located on the western part of the province centre around the forested Shaoyang prefecture city.

Mapping local Gi values with adaptive distance weights
tm_shape(hunan.gi)+
  tm_fill(col = "gstat_adaptive",
          style = "pretty",
          palette = "-RdBu",
          title = "local Gi value")+
  tm_borders(alpha = 0.5)
## Variable(s) "gstat_adaptive" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.