IS415 Geospatial Analytics and Application

Instructor: Dr. Kam Tin Seong.

Assoc. Professor of Information Systems (Practice)

1. Overview

In this hands-on exercise, we will learn how to use appropriate localised geospatial statistics analysis and thematic mapping of R to perform spatial clustering on greographically referenced attribute. The 2 major analysis methods are:

cluster & outlier analysis (using Local Indicator of Spatial Association (LISA) of spdep package)
hot spot & cold spot areas analysis (using Getis-Ord’s Gi-statistics od spdep package)

Specifically, we are going to analyse the spatial pattern of GDP per capital of Hunan Province counties,

Whether development based on GDP per capita is distributed within a planning area?
Are there signs of spatial clustering and where are these clusters?

1.1 Dataset

Hunan county shapefile containing the county’s boundary layer, spatial data in polygon objects
Hunan_GDPPC.csv containing the GPD per capital values at the county level

2. Load/Install necessary R packages

We will be using these R packages:

spdep - for spatial statistical analysis
rgdal - for handling spatial data
tmap - for choropleth mapping
tidyverse - for tidying data (readr, ggplot2 & dplyr)

packages = c('spdep','rgdal','tmap','tidyverse')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

3. Importing Data into R

We import the shapefile using readOGR() function of rgdal package.

Then, we import the csv file using read_csv() function of readr package.

3.1 Importing geospatial data (shapefile)

Imported as SpatialPolygonDataFrame

hunan <- readOGR(dsn = "data/shapefile/Hunan.shp", layer = "Hunan")

## OGR data source with driver: ESRI Shapefile 
## Source: "D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Week 7\Hands-on_Ex07\data\shapefile\Hunan.shp", layer: "Hunan"
## with 88 features
## It has 7 fields

3.2 Importing aspatial data (csv file)

Imported as tibble data.frame

gdppc <- read_csv("data/attributeTable/Hunan_GDPPC.csv")

## Parsed with column specification:
## cols(
##   County = col_character(),
##   City = col_character(),
##   GDPPC2005 = col_double(),
##   GDPPC2006 = col_double(),
##   GDPPC2007 = col_double(),
##   GDPPC2008 = col_double(),
##   GDPPC2009 = col_double(),
##   GDPPC2010 = col_double(),
##   GDPPC2011 = col_double(),
##   GDPPC2012 = col_double()
## )

3.3 Combining geospatial and aspatial data

We will use left_join() function of dplyr package to combine hunan (SpatialPolygonsDataFrame) and gdppc (tbl dataframe)

hunan@data <- left_join(hunan@data, gdppc)

## Joining, by = "County"

Note: no new output data has been created, gdppc data frame has all the same variables as hunan, so the gdppc data frame are updated into the data frame of hunan SpatialPolygonDataFrame.

3.4 Choropleth map

We use qtm() function from tmap package, it stands for quick thematic map and is an easy way to plot maps.

qtm(hunan, "GDPPC2012")

The choropleth map above shows clear sign of spatial autocorrelation. (Global Moran I value)

4. Cluster & Outlier Analysis

What is Local Indicators of Spatial Association (LISA)?

Statistics that evaluate the existence of clusters in a spatial arrangement

Analysis steps:

Deriving spatial weights matrix
Computing local Moran’s I
Plotting Moran scatterplot
Mapping local Moran’s I
Plotting LISA map

4.1 Deriving spatial weight matrix

Creating Queen Contiguity based neighbours

We use poly2nb() function of spdep package, this function builds neighbours list based on regions with contiguous boundaries (diagonals too).

wm_q <- poly2nb(hunan, queen = TRUE)

summary(wm_q)

## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 448 
## Percentage nonzero weights: 5.785124 
## Average number of links: 5.090909 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 11 
##  2  2 12 16 24 14 11  4  2  1 
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links

From the summary, we can find out that:

There are 88 area units/counties in Hunan
Most connected area unit has 11 neighbours which is shown by the first row 11 with the next row showing 1 meaning only 1 area unit has 11 neighbours
There are 2 area units with only 1 neighbour

plot(hunan, border = 'lightgrey')
plot(wm_q, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")

4.1.1 Row-standardised weights matrix

rswm_q <- nb2listw(wm_q, zero.policy = TRUE)

summary(rswm_q)

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 448 
## Percentage nonzero weights: 5.785124 
## Average number of links: 5.090909 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 11 
##  2  2 12 16 24 14 11  4  2  1 
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 88 7744 88 37.86334 365.9147

4.1.2 (Class exercise) Adaptive weight matrix of 8 neighbours

To control the number of neighbours directly, we can use k-nearest neighbours knearneigh() function, either accepting asymmetric neighbours or imposing symmetry as shown below

longlat argument is TRUE because the crs of hunan is already in WGS84. However, if it were to be false, it will use the great circle method to process it.

knn8 <- knn2nb(knearneigh(coordinates(hunan), k =8, longlat = TRUE), row.names = row.names(hunan$GDPPC2012))

knn_lw <- nb2listw(knn8, style = "B")

summary(knn8)

## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 704 
## Percentage nonzero weights: 9.090909 
## Average number of links: 8 
## Non-symmetric neighbours list
## Link number distribution:
## 
##  8 
## 88 
## 88 least connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 88 most connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links

plot(hunan, border = 'lightgrey')
plot(knn_lw, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = 'red')

4.2 Computing Local Moran’s I (for Queen Contiguity neighbours)

We use localmoran() function of spdep package, it will compute li values, given a set of zi values and listw object providing neighbour weighting information for the polygon associated with the zi values.

fips <- order(hunan$County)
localMI <- localmoran(hunan$GDPPC2012, rswm_q)
head(localMI)

##             Ii        E.Ii    Var.Ii         Z.Ii Pr(z > 0)
## 0 -0.001468468 -0.01149425 0.1768267  0.023842100 0.4904893
## 1  0.025878173 -0.01149425 0.1768267  0.088874545 0.4645908
## 2 -0.011987646 -0.01149425 0.2234962 -0.001043657 0.5004164
## 3  0.001022468 -0.01149425 0.2234962  0.026476192 0.4894388
## 4  0.014814881 -0.01149425 0.2234962  0.055650812 0.4778100
## 5 -0.038793829 -0.01149425 0.1768267 -0.064920523 0.5258814

4.2.1 Computing Local Moran’s I (for adaptive weight matrix 8 neighbours)

Since the localmoran() function doesn’t give index to the results, we use order()function so that we can perform joins and it will return the correct result.

fips1 <- order(hunan$County)
localMI1 <- localmoran(hunan$GDPPC2012, knn_lw)
head(localMI1)

##             Ii        E.Ii   Var.Ii        Z.Ii Pr(z > 0)
## 1 -0.037251807 -0.09195402 6.836637  0.02092106 0.4916543
## 2 -0.192769199 -0.09195402 6.836637 -0.03855713 0.5153783
## 3  0.013637073 -0.09195402 6.836637  0.04038369 0.4838936
## 4  0.002769951 -0.09195402 6.836637  0.03622752 0.4855505
## 5 -0.007674909 -0.09195402 6.836637  0.03223285 0.4871432
## 6 -0.259921222 -0.09195402 6.836637 -0.06423966 0.5256103

Using the printCoefmat() function, we can see also see the content of the local Moran matrix.

printCoefmat(data.frame(localMI1[fips1,], row.names = hunan$County[fips1]), check.names = FALSE)

##                        Ii        E.Ii      Var.Ii        Z.Ii Pr.z...0.
## Anhua         -9.6865e-02 -9.1954e-02  6.8366e+00 -1.8782e-03    0.5007
## Anren         -3.1946e+00 -9.1954e-02  6.8366e+00 -1.1866e+00    0.8823
## Anxiang       -3.7252e-02 -9.1954e-02  6.8366e+00  2.0921e-02    0.4917
## Baojing        3.0039e+00 -9.1954e-02  6.8366e+00  1.1840e+00    0.1182
## Chaling       -3.1217e-01 -9.1954e-02  6.8366e+00 -8.4223e-02    0.5336
## Changning     -2.5354e-04 -9.1954e-02  6.8366e+00  3.5071e-02    0.4860
## Changsha       4.5362e+01 -9.1954e-02  6.8366e+00  1.7384e+01    0.0000
## Chengbu        4.7981e+00 -9.1954e-02  6.8366e+00  1.8702e+00    0.0307
## Chenxi         1.0778e+00 -9.1954e-02  6.8366e+00  4.4739e-01    0.3273
## Cili           3.1224e-01 -9.1954e-02  6.8366e+00  1.5458e-01    0.4386
## Dao            1.1530e+00 -9.1954e-02  6.8366e+00  4.7615e-01    0.3170
## Dongan         1.0037e+00 -9.1954e-02  6.8366e+00  4.1905e-01    0.3376
## Dongkou        3.7264e+00 -9.1954e-02  6.8366e+00  1.4604e+00    0.0721
## Fenghuang      1.4478e+00 -9.1954e-02  6.8366e+00  5.8890e-01    0.2780
## Guidong       -2.3061e+00 -9.1954e-02  6.8366e+00 -8.4682e-01    0.8015
## Guiyang        1.7848e+00 -9.1954e-02  6.8366e+00  7.1777e-01    0.2364
## Guzhang        2.7198e+00 -9.1954e-02  6.8366e+00  1.0753e+00    0.1411
## Hanshou       -1.9277e-01 -9.1954e-02  6.8366e+00 -3.8557e-02    0.5154
## Hengdong      -6.9586e-02 -9.1954e-02  6.8366e+00  8.5548e-03    0.4966
## Hengnan        4.7834e-02 -9.1954e-02  6.8366e+00  5.3462e-02    0.4787
## Hengshan       1.0657e-01 -9.1954e-02  6.8366e+00  7.5925e-02    0.4697
## Hengyang       7.3384e-02 -9.1954e-02  6.8366e+00  6.3234e-02    0.4748
## Hongjiang      1.7214e+00 -9.1954e-02  6.8366e+00  6.9353e-01    0.2440
## Huarong        1.2252e+00 -9.1954e-02  6.8366e+00  5.0375e-01    0.3072
## Huayuan        1.3473e+00 -9.1954e-02  6.8366e+00  5.5043e-01    0.2910
## Huitong        2.1880e+00 -9.1954e-02  6.8366e+00  8.7198e-01    0.1916
## Jiahe         -9.3416e-01 -9.1954e-02  6.8366e+00 -3.2210e-01    0.6263
## Jianghua       1.3463e+00 -9.1954e-02  6.8366e+00  5.5008e-01    0.2911
## Jiangyong      1.1769e+00 -9.1954e-02  6.8366e+00  4.8529e-01    0.3137
## Jingzhou       1.3692e+00 -9.1954e-02  6.8366e+00  5.5883e-01    0.2881
## Jinshi         1.3637e-02 -9.1954e-02  6.8366e+00  4.0384e-02    0.4839
## Jishou        -2.4882e+00 -9.1954e-02  6.8366e+00 -9.1646e-01    0.8203
## Lanshan        5.3431e-01 -9.1954e-02  6.8366e+00  2.3952e-01    0.4054
## Leiyang        3.6429e-01 -9.1954e-02  6.8366e+00  1.7449e-01    0.4307
## Lengshuijiang -1.3481e+01 -9.1954e-02  6.8366e+00 -5.1206e+00    1.0000
## Li             2.7700e-03 -9.1954e-02  6.8366e+00  3.6228e-02    0.4856
## Lianyuan      -2.2915e+00 -9.1954e-02  6.8366e+00 -8.4121e-01    0.7999
## Liling         1.3127e+01 -9.1954e-02  6.8366e+00  5.0557e+00    0.0000
## Linli         -7.6749e-03 -9.1954e-02  6.8366e+00  3.2233e-02    0.4871
## Linwu          5.9182e-02 -9.1954e-02  6.8366e+00  5.7802e-02    0.4770
## Linxiang       8.9333e-01 -9.1954e-02  6.8366e+00  3.7682e-01    0.3532
## Liuyang        2.6814e+01 -9.1954e-02  6.8366e+00  1.0290e+01    0.0000
## Longhui        2.1325e+00 -9.1954e-02  6.8366e+00  8.5074e-01    0.1975
## Longshan       3.3841e+00 -9.1954e-02  6.8366e+00  1.3294e+00    0.0919
## Luxi           1.3170e+00 -9.1954e-02  6.8366e+00  5.3885e-01    0.2950
## Mayang         1.2656e+00 -9.1954e-02  6.8366e+00  5.1921e-01    0.3018
## Miluo          1.2402e+01 -9.1954e-02  6.8366e+00  4.7784e+00    0.0000
## Nan           -6.0565e-01 -9.1954e-02  6.8366e+00 -1.9646e-01    0.5779
## Ningxiang      1.3096e+01 -9.1954e-02  6.8366e+00  5.0437e+00    0.0000
## Ningyuan       1.4858e+00 -9.1954e-02  6.8366e+00  6.0341e-01    0.2731
## Pingjiang     -6.0562e+00 -9.1954e-02  6.8366e+00 -2.2811e+00    0.9887
## Qidong         1.0180e+00 -9.1954e-02  6.8366e+00  4.2451e-01    0.3356
## Qiyang         1.0586e+00 -9.1954e-02  6.8366e+00  4.4005e-01    0.3300
## Rucheng       -1.4785e+00 -9.1954e-02  6.8366e+00 -5.3029e-01    0.7020
## Sangzhi        2.5006e+00 -9.1954e-02  6.8366e+00  9.9154e-01    0.1607
## Shaodong      -2.6376e-02 -9.1954e-02  6.8366e+00  2.5081e-02    0.4900
## Shaoshan       1.0831e+01 -9.1954e-02  6.8366e+00  4.1776e+00    0.0000
## Shaoyang       1.5201e+00 -9.1954e-02  6.8366e+00  6.1652e-01    0.2688
## Shimen        -2.5992e-01 -9.1954e-02  6.8366e+00 -6.4240e-02    0.5256
## Shuangfeng    -4.9102e-01 -9.1954e-02  6.8366e+00 -1.5262e-01    0.5607
## Shuangpai      3.5368e-01 -9.1954e-02  6.8366e+00  1.7044e-01    0.4323
## Suining        3.1437e+00 -9.1954e-02  6.8366e+00  1.2375e+00    0.1080
## Taojiang      -2.6068e+00 -9.1954e-02  6.8366e+00 -9.6180e-01    0.8319
## Taoyuan        9.0485e-02 -9.1954e-02  6.8366e+00  6.9774e-02    0.4722
## Tongdao        4.0982e+00 -9.1954e-02  6.8366e+00  1.6025e+00    0.0545
## Wangcheng      3.4050e+01 -9.1954e-02  6.8366e+00  1.3058e+01    0.0000
## Wugang         4.8230e+00 -9.1954e-02  6.8366e+00  1.8797e+00    0.0301
## Xiangtan       1.6075e+00 -9.1954e-02  6.8366e+00  6.4996e-01    0.2579
## Xiangxiang     2.2502e+00 -9.1954e-02  6.8366e+00  8.9575e-01    0.1852
## Xiangyin       5.2141e+00 -9.1954e-02  6.8366e+00  2.0293e+00    0.0212
## Xinhua         1.8045e+00 -9.1954e-02  6.8366e+00  7.2531e-01    0.2341
## Xinhuang       1.9527e+00 -9.1954e-02  6.8366e+00  7.8199e-01    0.2171
## Xinning        5.1008e+00 -9.1954e-02  6.8366e+00  1.9860e+00    0.0235
## Xinshao        1.1025e+00 -9.1954e-02  6.8366e+00  4.5683e-01    0.3239
## Xintian        2.9296e-01 -9.1954e-02  6.8366e+00  1.4721e-01    0.4415
## Xupu           2.4821e-01 -9.1954e-02  6.8366e+00  1.3010e-01    0.4482
## Yanling       -3.5181e-01 -9.1954e-02  6.8366e+00 -9.9383e-02    0.5396
## Yizhang       -5.3061e-01 -9.1954e-02  6.8366e+00 -1.6776e-01    0.5666
## Yongshun       3.4110e+00 -9.1954e-02  6.8366e+00  1.3397e+00    0.0902
## Yongxing       9.6546e-01 -9.1954e-02  6.8366e+00  4.0441e-01    0.3430
## You            3.9581e-01 -9.1954e-02  6.8366e+00  1.8655e-01    0.4260
## Yuanjiang      1.9578e-01 -9.1954e-02  6.8366e+00  1.1005e-01    0.4562
## Yuanling       5.3798e-02 -9.1954e-02  6.8366e+00  5.5743e-02    0.4778
## Yueyang        2.8183e-01 -9.1954e-02  6.8366e+00  1.4296e-01    0.4432
## Zhijiang       9.6683e-01 -9.1954e-02  6.8366e+00  4.0493e-01    0.3428
## Zhongfang     -1.9505e+00 -9.1954e-02  6.8366e+00 -7.1081e-01    0.7614
## Zhuzhou        2.5374e+00 -9.1954e-02  6.8366e+00  1.0056e+00    0.1573
## Zixing        -5.0635e+00 -9.1954e-02  6.8366e+00 -1.9014e+00    0.9714

The results:

Ii: the local Moran’s I statistics
E.Ii: the expectation of local moran statistic under the randomisation hypothesis
Var.Ii: the variance of local moran statistic under the randomisation hypothesis
Z.Ii:the standard deviate of local moran statistic
Pr(): the p-value of local moran statistic

Moran’s I is a measure of the degree to which the value at a target site is similar to values at adjacent sites.

Moran’s I is large and positive when the value for a given target (or for all locations in the global case) is similar to adjacent values and negative when the value at a target is dissimilar to adjacent values.

4.3 Mapping the Local Moran’s I

Note = We always to have plot both the local moran’s I values with the p-values side by side.

Before mapping the map, it is wise to append the localMI (Queen Contiguity neighbours) into hunan SpatialPolygonDataFrame.

hunan.localMI <- cbind(hunan, localMI)

Using tmap package functions,

tm_shape(hunan.localMI) +
  tm_fill(col = "Ii",
          style = "pretty",
          palette = "RdBu",
          title = "local moran statistics")+
  tm_borders(alpha = 0.5)

## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

4.4 Mapping Local Moran’s I p-values

tm_shape(hunan.localMI) +
  tm_fill(col = "Pr.z...0.",
          breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
          palette = "-Blues",
          title = "local Moran's I p-values")+
  tm_borders(alpha = 0.5)

4.5 Mapping both local Moran’s I values & p-values

localMI.map <- tm_shape(hunan.localMI) +
  tm_fill(col = "Ii",
          style = "pretty",
          title = "local moran statistics")+
  tm_borders(alpha = 0.5)

pvalue.map <- tm_shape(hunan.localMI) +
  tm_fill(col = "Pr.z...0.",
          breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
          palette = "-Blues",
          title = "local Moran's I p-values")+
  tm_borders(alpha = 0.5)

tmap_arrange(localMI.map, pvalue.map, asp = 1, ncol = 2)

## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

We can see that the 3 green subzone/county areas are high in local Moran’s I values meaning positive autocorrelation. These 3 areas are also highly adjacent in values (GDPPC) with their neighbours and they are statistically significant as clusters because their the p-values are smaller than the confidence interval of 95%. We can also see that although there are negative local Moran’s I values indicating outliers and negative autocorrelation, they are not significant enough from the p-values. However, we do not know what kind of cluster is this? low GDPPC or high GDPPC? so we have to conduct the LISA cluster map.

5. Creating LISA Cluster map

LISA cluster map shows the significant locations color coded by type of spatial autocorrelation.

Steps:

plot the Moran scatterplot
generate LISA cluster map

5.1 Plotting Moran scatterplot

We plot the Moran scatterplot of GDPPC 2012 using moran.plot() function of spdep package.

nci <- moran.plot(hunan$GDPPC2012, rswm_q, labels = as.character(hunan$County), xlab = "GDPPC 2012", ylab = "Spatially Lag GDPPC 2012")

We can see that the plot is split into 4 quadrants, the top right corner belong to areas that have high GDPPC & surrounded by other areas that have average level of GDPPC (high-high locations)

5.2 Plotting Moran scatterplot with standardised variable

We will use scale() function to centers & scales the variable.

Centering - subtracting the mean (omitting NAs) from the corresponding columns
Scaling - dividing the centered variable by their standard deviations

hunan$Z.GDPPC2012 <- scale(hunan$GDPPC2012) %>% as.vector

knitr::kable(head(hunan, n=5))

NAME_2	ID_3	NAME_3	ENGTYPE_3	Shape_Leng	Shape_Area	County	City	GDPPC2005	GDPPC2006	GDPPC2007	GDPPC2008	GDPPC2009	GDPPC2010	GDPPC2011	GDPPC2012	Z.GDPPC2012
Changde	21098	Anxiang	County	1.869074	0.1005619	Anxiang	Changde	8184	10995	12670	14128	16763	19817	22840	23667	-0.0492059
Changde	21100	Hanshou	County	2.360691	0.1997875	Hanshou	Changde	6560	7422	8512	11478	12140	15906	18561	20981	-0.2283412
Changde	21101	Jinshi	County City	1.425620	0.0530241	Jinshi	Changde	9956	11223	14258	16969	20528	24824	30286	34592	0.6794062
Changde	21102	Li	County	3.474324	0.1890812	Li	Changde	8394	10304	11048	12723	15786	18426	21847	24473	0.0045480
Changde	21103	Linli	County	2.289506	0.1145036	Linli	Changde	8850	9855	11373	14564	16160	19273	23172	25554	0.0766422

as.vector() function make sures that the resulting data type is a vector, map neatly into dataframe.

nci2 <- moran.plot(hunan$Z.GDPPC2012, rswm_q, labels = as.character(hunan$County), xlab = "GDPPC 2012", ylab = "Spatially Lag GDPPC 2012")

5.3 Preparing the LISA cluster map

Create a numeric vector data type that has the length of the rows of localMI which is

quadrant <- vector(mode = "numeric", length = nrow(localMI))

Next, we centers the variable of interest (GDPPC 2012) around it’s mean

hunan$GDPP2012 is a numeric vector and mean(local) is a single number so the number is subtracted from every element of the vector (vector operation).

localMI[,1] calls the Moran I statistic value.

DV <- hunan$GDPPC2012 - mean(hunan$GDPPC2012)

C_mI <- localMI[,1] - mean(localMI[,1])

Next, we will set a statistical significance level (confidence interval) for the local Moran.

signif <- 0.05

Quadrants (Interpreting scatter plot & the local Moran values)

From the scatterplot, we get the dashed line, outlining the different quadrants but we’re not able to get the different observations in each quadrant.

That’s why we’re creating a new vector names quadrant to get the different quadrant values for each observation so it can be plotted with tmap.

The four commands below define:

high-high Quadrant 4 - Cluster/Positive autocorrelation
low-low Quadrant 1 - Cluster/Positive autocorrelation
low-high Quadrant 2 - Outlier/Negative autocorrelation
high-low Quadrant 3 - Outlier/Negative autocorrelation
non-significant Moran (p-value of localMI is compared to the significance level)

quadrant[DV >0 & C_mI>0] <- 4      
quadrant[DV <0 & C_mI<0] <- 1      
quadrant[DV <0 & C_mI>0] <- 2
quadrant[DV >0 & C_mI<0] <- 3
quadrant[localMI[,5]>signif] <- 0

5.4 Plotting the LISA cluster map

Now, we can see the different quadrant each observation belongs to.

hunan.localMI$quadrant <- quadrant 
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")

knitr::kable(head(hunan.localMI, n=5))

	NAME_2	ID_3	NAME_3	ENGTYPE_3	Shape_Leng	Shape_Area	County	City	GDPPC2005	GDPPC2006	GDPPC2007	GDPPC2008	GDPPC2009	GDPPC2010	GDPPC2011	GDPPC2012	Ii	E.Ii	Var.Ii	Z.Ii	Pr.z…0.
0	Changde	21098	Anxiang	County	1.869074	0.1005619	Anxiang	Changde	8184	10995	12670	14128	16763	19817	22840	23667	-0.0014685	-0.0114943	0.1768267	0.0238421	0.4904893
1	Changde	21100	Hanshou	County	2.360691	0.1997875	Hanshou	Changde	6560	7422	8512	11478	12140	15906	18561	20981	0.0258782	-0.0114943	0.1768267	0.0888745	0.4645908
2	Changde	21101	Jinshi	County City	1.425620	0.0530241	Jinshi	Changde	9956	11223	14258	16969	20528	24824	30286	34592	-0.0119876	-0.0114943	0.2234962	-0.0010437	0.5004164
3	Changde	21102	Li	County	3.474324	0.1890812	Li	Changde	8394	10304	11048	12723	15786	18426	21847	24473	0.0010225	-0.0114943	0.2234962	0.0264762	0.4894388
4	Changde	21103	Linli	County	2.289506	0.1145036	Linli	Changde	8850	9855	11373	14564	16160	19273	23172	25554	0.0148149	-0.0114943	0.2234962	0.0556508	0.4778100

tm_shape(hunan.localMI) +
  tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant)))+1], labels = clusters[c(sort(unique(quadrant)))+1], popup.vars = c("Postal.Code")) +
  tm_view(set.zoom.limits = c(11,17)) +
  tm_borders(alpha=0.5)

We can see that from the map, there’s a high-high cluster indicating positive autocorrelation for the subzone/county areas having adjacent high GDPPC values. There is also a low-high cluster indicating negative autocorrelation/outliers whereby these areas with low GDPPC values are surrounded by high GDPPC areas.

6. Hot Spot & Cold Spot Analysis

Another form of LISA/Spatial statistics to detect spatial anomalies is the Getis and Ord’s G-statistics.

The statistics will look at neighbours within a defined proximity to identify where either high or low values cluster spatially/in space.

A significant positive value of G ( G> 0) indicates that there is a " cluster" of high values of the variable analyzed with reference to their average. Also, a significant but negative index value of G (G < 0) is shown as a group of low values relative to the average of the variable analyzed.

Hot spot area = G>0, it’s also High-high in Local Moran’s I values
Cold spot area = G<0, it’s also Low-low in Local Moran’s I values

Steps:

Deriving spatial weight matrix
Computing Gi statistics
Mapping Gi statistics

6.1 Deriving distance-based weight matrix

For Getis-Ord, we define neighbours based on distance/proximity. (while spatial autocorrelation considered units which shared borders)

The two types of distance-based weight matrix:

Fixed distance weight matrix
Adaptive distance weight matrix

6.2.1 Fixed distance weight matrix plot

We use dnearneigh() function of spdep package to calculate the fixed distance weight matrix. The function identifies neighbours of region points by Euclidean distance between lower and upper bounds in km.

dnb <- dnearneigh(coordinates(hunan), 0, 85, longlat = TRUE)
dnb

## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 642 
## Percentage nonzero weights: 8.290289 
## Average number of links: 7.295455

Fixed distance weight matrix plot

plot(hunan, border = "lightgrey")
plot(dnb, coordinates(hunan), add = TRUE, col = "red")

The nb2listw() supplements a neighbours list with spatial weights for the chosen coding scheme. The input of nb2listw() must be object of class nb. The syntax of the function has two major arguments, namely style and zero.poly.

style can take values “W”, “B”, “C”, “U”, “minmax” and “S”. B is the basic binary coding, W is row standardised (sums over all links to n), C is globally standardised (sums over all links to n), U is equal to C divided by the number of neighbours (sums over all links to unity), while S is the variance-stabilizing coding scheme proposed by Tiefelsdorf et al. 1999, p. 167-168 (sums over all links to n).
If zero policy is set to TRUE, weights vectors of zero length are inserted for regions without neighbour in the neighbours list. These will in turn generate lag values of zero, equivalent to the sum of products of the zero row t(rep(0, length=length(neighbours))) %*% x, for arbitraty numerical vector x of length length(neighbours). The spatially lagged value of x for the zero-neighbour region will then be zero, which may (or may not) be a sensible choice.

dnb_lw <- nb2listw(dnb, style = "B")
summary(dnb_lw)

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 642 
## Percentage nonzero weights: 8.290289 
## Average number of links: 7.295455 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 10 11 
##  1  4  7  6 12 12 13 23  7  3 
## 1 least connected region:
## 88 with 2 links
## 3 most connected regions:
## 35 47 86 with 11 links
## 
## Weights style: B 
## Weights constants summary:
##    n   nn  S0   S1    S2
## B 88 7744 642 1284 20320

The output of nb2listw() function is a spatial weights object.

Note: It is this weights list object that will be used in the computation of Gi statistics and not the Neighbour list object.

6.2.2 Adaptive distance weight matrix

Instead of using a fixed distance to derive the weight matrix, a fixed number of neighbours can also be used to derive the weight matrix, the adaptive distance method.

coords <- coordinates(hunan)
knb <- knn2nb(knearneigh(coords, k = 8, longlat = TRUE), row.names = row.names(hunan$gdppc))

knb_lw <- nb2listw(knb, style = "B")
summary(knb_lw)

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 704 
## Percentage nonzero weights: 9.090909 
## Average number of links: 8 
## Non-symmetric neighbours list
## Link number distribution:
## 
##  8 
## 88 
## 88 least connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 88 most connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 
## Weights style: B 
## Weights constants summary:
##    n   nn  S0   S1    S2
## B 88 7744 704 1286 23044

Adaptive distance weight matrix plot

plot(hunan, border = "lightgrey")
plot(knb, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")

6.2 Computing Gi statistics

Gi statistics of fixed distance weight matrix.

fips <- order(hunan$County)
gi.fixed <- localG(hunan$GDPPC2012, dnb_lw)
#gi.fixed

The output of localG() function of spdep package is a vector of G/Gstar values with attributes “gstari” set to TRUE/FALSE, “call” set to the function call and class “localG”

The Gi statistics is represented as a Z-score. Greater values represent a greater intensity of clustering and the direction (positive or negative) indicates high or low clusters.

Next, we will join the fixed distance weight matrix Gi statistics value to the hunan SpatialPolygonDataFrame.

hunan.gi <- cbind(hunan, as.matrix(gi.fixed))
names(hunan.gi)[17] <- "gstat"
head(hunan.gi@data)

##    NAME_2  ID_3  NAME_3   ENGTYPE_3 Shape_Leng Shape_Area  County    City
## 1 Changde 21098 Anxiang      County   1.869074 0.10056190 Anxiang Changde
## 2 Changde 21100 Hanshou      County   2.360691 0.19978745 Hanshou Changde
## 3 Changde 21101  Jinshi County City   1.425620 0.05302413  Jinshi Changde
## 4 Changde 21102      Li      County   3.474325 0.18908121      Li Changde
## 5 Changde 21103   Linli      County   2.289506 0.11450357   Linli Changde
## 6 Changde 21104  Shimen      County   4.171918 0.37194707  Shimen Changde
##   GDPPC2005 GDPPC2006 GDPPC2007 GDPPC2008 GDPPC2009 GDPPC2010 GDPPC2011
## 1      8184     10995     12670     14128     16763     19817     22840
## 2      6560      7422      8512     11478     12140     15906     18561
## 3      9956     11223     14258     16969     20528     24824     30286
## 4      8394     10304     11048     12723     15786     18426     21847
## 5      8850      9855     11373     14564     16160     19273     23172
## 6      9244     10556     11760     12902     17355     20501     24409
##   GDPPC2012        gstat
## 1     23667 -0.049205949
## 2     20981 -0.228341158
## 3     34592  0.679406172
## 4     24473  0.004547952
## 5     25554  0.076642204
## 6     27137  0.182215933
##   structure.c.0.331637226690737..0.300225037067981...0.0192872220698243..
## 1                                                              0.33163723
## 2                                                              0.30022504
## 3                                                             -0.01928722
## 4                                                              0.23347826
## 5                                                             -0.03392157
## 6                                                             -0.47966539

Gi statistics of adaptive distance weight matrix.

fips <- order(hunan$County)
gi.adaptive <- localG(hunan$GDPPC2012, knb_lw)
hunan.gi <- cbind(hunan.gi, as.matrix(gi.adaptive))
names(hunan.gi)[19] <- "gstat_adaptive"
head(hunan.gi@data)

##    NAME_2  ID_3  NAME_3   ENGTYPE_3 Shape_Leng Shape_Area  County    City
## 1 Changde 21098 Anxiang      County   1.869074 0.10056190 Anxiang Changde
## 2 Changde 21100 Hanshou      County   2.360691 0.19978745 Hanshou Changde
## 3 Changde 21101  Jinshi County City   1.425620 0.05302413  Jinshi Changde
## 4 Changde 21102      Li      County   3.474325 0.18908121      Li Changde
## 5 Changde 21103   Linli      County   2.289506 0.11450357   Linli Changde
## 6 Changde 21104  Shimen      County   4.171918 0.37194707  Shimen Changde
##   GDPPC2005 GDPPC2006 GDPPC2007 GDPPC2008 GDPPC2009 GDPPC2010 GDPPC2011
## 1      8184     10995     12670     14128     16763     19817     22840
## 2      6560      7422      8512     11478     12140     15906     18561
## 3      9956     11223     14258     16969     20528     24824     30286
## 4      8394     10304     11048     12723     15786     18426     21847
## 5      8850      9855     11373     14564     16160     19273     23172
## 6      9244     10556     11760     12902     17355     20501     24409
##   GDPPC2012        gstat
## 1     23667 -0.049205949
## 2     20981 -0.228341158
## 3     34592  0.679406172
## 4     24473  0.004547952
## 5     25554  0.076642204
## 6     27137  0.182215933
##   structure.c.0.331637226690737..0.300225037067981...0.0192872220698243..
## 1                                                              0.33163723
## 2                                                              0.30022504
## 3                                                             -0.01928722
## 4                                                              0.23347826
## 5                                                             -0.03392157
## 6                                                             -0.47966539
##   gstat_adaptive
## 1     0.27442880
## 2     0.30022504
## 3     0.03044770
## 4     0.22227212
## 5    -0.03392157
## 6    -0.51413315

6.3 Mapping local Gi statistics

Mapping local Gi values with fixed distance weights

tm_shape(hunan.gi)+
  tm_fill(col = "gstat",
          style = "pretty",
          palette = "-RdBu",
          title = "local Gi value")+
  tm_borders(alpha = 0.5)

## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

In the choropleth map above, counties shaded in red are the hot spot areas and counties shaded in blue are the cold spot areas. The darkness of the colours representing the intensity of the Gi values.

In conclusion, the choropleth above shows clear sign of east-west divide in the GDP per capita by county of Hunan province in 2012. The hot spot areas were centre around Changsha city at the east of the provice. The cold spot areas, on the other hand, mainly comprise of counties located on the western part of the province centre around the forested Shaoyang prefecture city.

Mapping local Gi values with adaptive distance weights

tm_shape(hunan.gi)+
  tm_fill(col = "gstat_adaptive",
          style = "pretty",
          palette = "-RdBu",
          title = "local Gi value")+
  tm_borders(alpha = 0.5)

## Variable(s) "gstat_adaptive" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

In_class_exercise07

Erika Aldisa

9/28/2020 (updated: 2020-10-04)