Introduction

In this hands-on exercise, you will gain hands-on experience on using appropriate localised geospatial statistics analysis and thematic mapping functions of R to perform spatial clustering on geographically referenced attribute. There are two major analysis, namely:

  • cluster and outlier analysis; and
  • hot spot and cold spot areas analysis

Learning Outcome

By the end of this hands-on exercise, you will able:

  • to convert GIS polygon data into R’s SpatialPolygonDataFrame object by using appropriate functions of rgdal package of R.
  • to computer spatial weight matrix by using appropriate functions from spdep package;
  • to detect clusters and/or outliers by using Local Indicator of Spatial Association (LISA) of spdep package;
  • to detect hot spot or/and cold spot area by using Getis-Ord’s Gi-statistics of spdep package; and
  • to visualise the analysis output by using ggplot2 and tmap package.

Getting Started

The analytical question

In spatial policy, one of the main development objective of the local govenment and planners is to ensure equal distribution of development in the provice. Our task in this study, hence, is to apply appropriate spatial statistical methods to discover if development are even distributed within a planning area. If the answer is No. Then, our next question will be “is there sign of spatial clustering?”. And, if the answer for this question is yes, then our next question will be “where are these clusters?”

In this case study, we are interested to exame the spatial pattern of a selected development indicator (i.e. GDP per capita) of Hunan Provice, People Republic of China.(https://en.wikipedia.org/wiki/Hunan)

The data

Two data sets will be used in this study. They are:

  • Hanan: This is a GIS data in ESRI shapefile format. It consists of county bondary information of Hunan provice of People Republic of China. The spatial data are casptured in polygon objects.
  • Hunan_GDPPC.csv: This data file consists of GDP per capita values (one of the popular used development indicator) at the county level.

Installing and loading R packages

Before we get started, it is important for us to install the necessary R packages into R and launch these R packages into R environment.

The spatial statistical analysis package that will be used is this study is spdep (https://cran.r-project.org/web/packages/spdep/index.html).

The other R packages needed for this study and their functions are as follows:

  • Spatial data handling
    • rgdal
  • Attribute data handling
    • tidyverse, especially readr, ggplot2 and dplyr
  • Choropleth mapping
    • tmap

The code chunks below installs and launches these R packages into R environment.

packages = c('rgdal', 'spdep',  'tmap', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}

Note: With tidyverse, we do not have to install readr, ggplot2 and dplyr packages seperately. In fact, tidyverse also installes other very useful R packages such as tidyr.

Data Import and Prepatation

Importing geospatial data into R environment

In this section, you willimport Yunan county boundary GIS data and its associated attrbiute table into R environment.

The Yunan county boundary GIS data is in ESRI shapefile format. It will be imported into R environment by using the readOGR() function of rgdal.

The code chunks used are shown below:

hunan <- readOGR(dsn = "data/shapefile", layer = "Hunan")
## OGR data source with driver: ESRI Shapefile 
## Source: "D:\IS415-AY2020-21T1\03-Hands-on Exercises\Hands-on_Ex07\data\shapefile", layer: "Hunan"
## with 88 features
## It has 7 fields

The imported Hunan’s county boundary is called hunan. It is saved in SpatialPolygonDataFrame format.

Importing aspatial data into R environment

The csv file will be import using read_csv function of readr package.

The code chunks used are shown below:

gdppc <- read_csv ("data/attributeTable/Hunan_GDPPC.csv")
## Parsed with column specification:
## cols(
##   County = col_character(),
##   City = col_character(),
##   GDPPC2005 = col_double(),
##   GDPPC2006 = col_double(),
##   GDPPC2007 = col_double(),
##   GDPPC2008 = col_double(),
##   GDPPC2009 = col_double(),
##   GDPPC2010 = col_double(),
##   GDPPC2011 = col_double(),
##   GDPPC2012 = col_double()
## )

The imported Hunan GDP per capita attribute data set is called gdppc. It is saved in R’s * tibble data.frame* format.

Joining geospatial data with aspatial data

Next, we combines both data sets into one. This is performed using the left_join function of dplyr package. The hunan SpatialPolygonDataFrame will be used as the base data object and the gdppc data.frame will be used as the join table.

The code chunks below is used to perform the task.

hunan@data <- left_join(hunan@data,gdppc)
## Joining, by = "County"

The message above shows that County field is the common field used to perform the left-join.

It is important to note that there is no new output data been created. Instead, the data fields from gdppc data frame are now updated into the data frame of hunan SpatialPolygonDataFrame.

Preparing a choropleth map

To have a quick look at the distribution of GDPPC 2012 of Yunan provice by county, a choropleth map will be prepared.

The code chunks below are used to prepare the choroplethby using the qtm() function of tmap package.

qtm(hunan, "GDPPC2012")

The choropleth above shows clear sign of spatial autocorrelation.

Cluster and Outlier Analysis

In this section, you will learn how to apply appropriate Local Indicators for Spatial Association (LISA), especially local Moran’I to detect cluster and/or outlier from GDP per capita 2012 of Hunan Province, PRC.

Local Indicators of Spatial Association or LISA are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable. For instance if we are studying cancer rates among census tracts in a given city local clusters in the rates mean that there are areas that have higher or lower rates than is to be expected by chance alone; that is, the values occurring are above or below those of a random distribution in space.

The analysis consists of five steps:

  • Deriving spatial weights matrix
  • Computing local Moran’s I
  • Plotting Moran scatterplot
  • Mapping local Moran’s I
  • Plotting LISA map

Deriving spatial weight matrix

Creating (QUEEN) contiguity based neighbours

To construct a contiguity based neighbours, the poly2nb() function of spdep will be used. The function builds a neighbours list based on regions with contiguous boundaries, that is sharing one or more boundary point.

poly2nb() can be used to construct either Queen or Rook contiguity neighbours. The default is Queen. If queen=TRUE, a single shared boundary point meets the contiguity condition, if FALSE, more than one shared point is required; note that more than one shared boundary point does not necessarily mean a shared boundary line

The codes below construct a Queen contiguity matrix.

wm_q <- poly2nb(hunan, queen=TRUE)

We can examine the content of wm_q by using the summary()

summary(wm_q)
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 448 
## Percentage nonzero weights: 5.785124 
## Average number of links: 5.090909 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 11 
##  2  2 12 16 24 14 11  4  2  1 
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links

You can also display the contiguity map by using plot() of BASE R.

plot(hunan, border="lightgrey")
plot(wm_q, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col= "red")

DIY: Create other forms of spatial weight matrix by using appropriate functions provided by spdep package.

Row-standardised weights matrix

Spatial weights are often row standardized, particularly with binary weighting strategies. Row standardization is used to create proportional weights in cases where features have an unequal number of neighbors. Row standardization involves dividing each neighbor weight for a feature by the sum of all neighbor weights for that feature, and is recommended whenever the distribution of your features is potentially biased due to sampling design or an imposed aggregation scheme.

The code chunks below construct construct a row-standardised weights matrix by using wm_q as the input matrix.

rswm_q <- nb2listw(wm_q, zero.policy = TRUE)
summary(rswm_q)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 448 
## Percentage nonzero weights: 5.785124 
## Average number of links: 5.090909 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 11 
##  2  2 12 16 24 14 11  4  2  1 
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 88 7744 88 37.86334 365.9147

Computing local Moran’s I

To compute local Moran’s I, the localmoran() function of spdep will be used. It computes Ii values, given a set of zi values and a listw object providing neighbour weighting information for the polygon associated with the zi values.

The code chunks below are used to compute local Moran’s I of GDPPC2012 at the county level.

fips <- order(hunan$County)
localMI <- localmoran(hunan$GDPPC2012, rswm_q)
head(localMI)
##             Ii        E.Ii    Var.Ii         Z.Ii Pr(z > 0)
## 0 -0.001468468 -0.01149425 0.1768267  0.023842100 0.4904893
## 1  0.025878173 -0.01149425 0.1768267  0.088874545 0.4645908
## 2 -0.011987646 -0.01149425 0.2234962 -0.001043657 0.5004164
## 3  0.001022468 -0.01149425 0.2234962  0.026476192 0.4894388
## 4  0.014814881 -0.01149425 0.2234962  0.055650812 0.4778100
## 5 -0.038793829 -0.01149425 0.1768267 -0.064920523 0.5258814

localmoran() function returns a matrix of values whose columns are:

  • Ii: the local Moran’s I statistics
  • E.Ii: the expectation of local moran statistic under the randomisation hypothesis
  • Var.Ii: the variance of local moran statistic under the randomisation hypothesis
  • Z.Ii:the standard deviate of local moran statistic
  • Pr(): the p-value of local moran statistic

The code chunk below list the content of the local Moran matrix derived by using printCoefmat().

printCoefmat(data.frame(localMI[fips,], row.names=hunan$County[fips]), check.names=FALSE)
##                        Ii        E.Ii      Var.Ii        Z.Ii Pr.z...0.
## Anhua         -2.2493e-02 -1.1494e-02  1.2349e-01 -3.1298e-02    0.5125
## Anren         -3.9932e-01 -1.1494e-02  1.0682e-01 -1.1866e+00    0.8823
## Anxiang       -1.4685e-03 -1.1494e-02  1.7683e-01  2.3842e-02    0.4905
## Baojing        3.4737e-01 -1.1494e-02  1.7683e-01  8.5341e-01    0.1967
## Chaling        2.0559e-02 -1.1494e-02  3.0128e-01  5.8397e-02    0.4767
## Changning     -2.9868e-05 -1.1494e-02  1.7683e-01  2.7263e-02    0.4891
## Changsha       4.9022e+00 -1.1494e-02  1.4571e-01  1.2872e+01    0.0000
## Chengbu        7.3725e-01 -1.1494e-02  2.2350e-01  1.5838e+00    0.0566
## Chenxi         1.4544e-01 -1.1494e-02  1.7683e-01  3.7319e-01    0.3545
## Cili           7.3176e-02 -1.1494e-02  3.0128e-01  1.5426e-01    0.4387
## Dao            2.1420e-01 -1.1494e-02  2.2350e-01  4.7741e-01    0.3165
## Dongan         1.5210e-01 -1.1494e-02  2.2350e-01  3.4604e-01    0.3647
## Dongkou        5.2918e-01 -1.1494e-02  1.7683e-01  1.2858e+00    0.0993
## Fenghuang      1.8013e-01 -1.1494e-02  2.2350e-01  4.0534e-01    0.3426
## Guidong       -5.9160e-01 -1.1494e-02  3.0128e-01 -1.0569e+00    0.8547
## Guiyang        1.8240e-01 -1.1494e-02  9.3859e-02  6.3290e-01    0.2634
## Guzhang        2.8466e-01 -1.1494e-02  1.7683e-01  7.0428e-01    0.2406
## Hanshou        2.5878e-02 -1.1494e-02  1.7683e-01  8.8875e-02    0.4646
## Hengdong       9.9964e-03 -1.1494e-02  1.4571e-01  5.6299e-02    0.4776
## Hengnan        2.8064e-02 -1.1494e-02  1.2349e-01  1.1257e-01    0.4552
## Hengshan      -5.8201e-03 -1.1494e-02  1.7683e-01  1.3494e-02    0.4946
## Hengyang       6.2997e-02 -1.1494e-02  1.7683e-01  1.7715e-01    0.4297
## Hongjiang      1.8790e-01 -1.1494e-02  1.4571e-01  5.2234e-01    0.3007
## Huarong       -1.5389e-02 -1.1494e-02  4.5684e-01 -5.7619e-03    0.5023
## Huayuan        8.3772e-02 -1.1494e-02  3.0128e-01  1.7356e-01    0.4311
## Huitong        2.5997e-01 -1.1494e-02  2.2350e-01  5.7422e-01    0.2829
## Jiahe         -1.2431e-01 -1.1494e-02  1.7683e-01 -2.6828e-01    0.6058
## Jianghua       2.8651e-01 -1.1494e-02  2.2350e-01  6.3036e-01    0.2642
## Jiangyong      2.4337e-01 -1.1494e-02  4.5684e-01  3.7707e-01    0.3531
## Jingzhou       1.8270e-01 -1.1494e-02  3.0128e-01  3.5379e-01    0.3617
## Jinshi        -1.1988e-02 -1.1494e-02  2.2350e-01 -1.0437e-03    0.5004
## Jishou        -2.8680e-01 -1.1494e-02  1.7683e-01 -6.5470e-01    0.7437
## Lanshan        6.3334e-02 -1.1494e-02  2.2350e-01  1.5828e-01    0.4371
## Leiyang        1.1581e-02 -1.1494e-02  1.7683e-01  5.4874e-02    0.4781
## Lengshuijiang -1.7903e+00 -1.1494e-02  3.0128e-01 -3.2408e+00    0.9994
## Li             1.0225e-03 -1.1494e-02  2.2350e-01  2.6476e-02    0.4894
## Lianyuan      -1.4672e-01 -1.1494e-02  1.0682e-01 -4.1375e-01    0.6605
## Liling         1.3774e+00 -1.1494e-02  3.0128e-01  2.5304e+00    0.0057
## Linli          1.4815e-02 -1.1494e-02  2.2350e-01  5.5651e-02    0.4778
## Linwu         -2.4621e-03 -1.1494e-02  2.2350e-01  1.9105e-02    0.4924
## Linxiang       6.5904e-02 -1.1494e-02  9.2354e-01  8.0539e-02    0.4679
## Liuyang        3.3688e+00 -1.1494e-02  2.2350e-01  7.1503e+00    0.0000
## Longhui        8.0801e-01 -1.1494e-02  1.4571e-01  2.1468e+00    0.0159
## Longshan       7.5663e-01 -1.1494e-02  3.0128e-01  1.3994e+00    0.0808
## Luxi           1.8177e-01 -1.1494e-02  1.4571e-01  5.0628e-01    0.3063
## Mayang         2.1852e-01 -1.1494e-02  1.7683e-01  5.4699e-01    0.2922
## Miluo          1.8704e+00 -1.1494e-02  1.7683e-01  4.4752e+00    0.0000
## Nan           -9.5789e-03 -1.1494e-02  1.4571e-01  5.0176e-03    0.4980
## Ningxiang      1.5607e+00 -1.1494e-02  1.2349e-01  4.4739e+00    0.0000
## Ningyuan       2.0910e-01 -1.1494e-02  1.2349e-01  6.2774e-01    0.2651
## Pingjiang     -9.8964e-01 -1.1494e-02  2.2350e-01 -2.0690e+00    0.9807
## Qidong         1.1806e-01 -1.1494e-02  1.2349e-01  3.6867e-01    0.3562
## Qiyang         6.1966e-02 -1.1494e-02  1.2349e-01  2.0904e-01    0.4172
## Rucheng       -3.6992e-01 -1.1494e-02  3.0128e-01 -6.5300e-01    0.7431
## Sangzhi        2.5053e-01 -1.1494e-02  1.4571e-01  6.8643e-01    0.2462
## Shaodong      -3.2659e-02 -1.1494e-02  1.4571e-01 -5.5445e-02    0.5221
## Shaoshan       2.1223e+00 -1.1494e-02  3.0128e-01  3.8874e+00    0.0001
## Shaoyang       5.9499e-01 -1.1494e-02  1.2349e-01  1.7259e+00    0.0422
## Shimen        -3.8794e-02 -1.1494e-02  1.7683e-01 -6.4921e-02    0.5259
## Shuangfeng     9.2835e-03 -1.1494e-02  1.4571e-01  5.4431e-02    0.4783
## Shuangpai      8.0591e-02 -1.1494e-02  3.0128e-01  1.6777e-01    0.4334
## Suining        3.7585e-01 -1.1494e-02  1.2349e-01  1.1022e+00    0.1352
## Taojiang      -2.5394e-01 -1.1494e-02  1.2349e-01 -6.8991e-01    0.7549
## Taoyuan        1.4729e-02 -1.1494e-02  7.5002e-02  9.5754e-02    0.4619
## Tongdao        4.6482e-01 -1.1494e-02  3.0128e-01  8.6779e-01    0.1928
## Wangcheng      4.4220e+00 -1.1494e-02  1.4571e-01  1.1614e+01    0.0000
## Wugang         7.1003e-01 -1.1494e-02  1.4571e-01  1.8902e+00    0.0294
## Xiangtan       2.4530e-01 -1.1494e-02  9.3859e-02  8.3820e-01    0.2010
## Xiangxiang     2.6271e-01 -1.1494e-02  1.7683e-01  6.5207e-01    0.2572
## Xiangyin       5.4525e-01 -1.1494e-02  1.7683e-01  1.3240e+00    0.0928
## Xinhua         1.1810e-01 -1.1494e-02  1.4571e-01  3.3950e-01    0.3671
## Xinhuang       1.5725e-01 -1.1494e-02  9.2354e-01  1.7559e-01    0.4303
## Xinning        6.8928e-01 -1.1494e-02  2.2350e-01  1.4823e+00    0.0691
## Xinshao        5.7578e-02 -1.1494e-02  1.4571e-01  1.8095e-01    0.4282
## Xintian       -7.4050e-03 -1.1494e-02  2.2350e-01  8.6498e-03    0.4965
## Xupu           3.2406e-01 -1.1494e-02  1.0682e-01  1.0267e+00    0.1523
## Yanling       -6.9021e-02 -1.1494e-02  1.7683e-01 -1.3680e-01    0.5544
## Yizhang       -2.6844e-01 -1.1494e-02  2.2350e-01 -5.4350e-01    0.7066
## Yongshun       6.3064e-01 -1.1494e-02  1.7683e-01  1.5271e+00    0.0634
## Yongxing       4.3411e-01 -1.1494e-02  1.7683e-01  1.0597e+00    0.1446
## You            7.8750e-02 -1.1494e-02  1.7683e-01  2.1461e-01    0.4150
## Yuanjiang      2.0004e-04 -1.1494e-02  1.7683e-01  2.7810e-02    0.4889
## Yuanling       8.7298e-03 -1.1494e-02  1.0682e-01  6.1878e-02    0.4753
## Yueyang        4.1189e-02 -1.1494e-02  1.2349e-01  1.4992e-01    0.4404
## Zhijiang       1.0476e-01 -1.1494e-02  1.7683e-01  2.7647e-01    0.3911
## Zhongfang     -2.2685e-01 -1.1494e-02  1.7683e-01 -5.1212e-01    0.6957
## Zhuzhou        3.2864e-01 -1.1494e-02  1.4571e-01  8.9105e-01    0.1865
## Zixing        -7.6849e-01 -1.1494e-02  1.2349e-01 -2.1541e+00    0.9844

Mapping the local Moran’s I

Before mapping the local Moran’s I map, it is wise to append the local Moran’s I dataframe (i.e. localMI) onto hunan SpatialPolygonDataFrame. The code chunks below can be used to perform the task. The out SpatialPolygonDataFrame is called hunan.localMI.

hunan.localMI <- cbind(hunan,localMI)

Mapping local Moran’s I values

Using choropleth mapping functions of tmap package, we can plot the local Moran’s I values by using the code chinks below.

tm_shape(hunan.localMI) +
  tm_fill(col = "Ii", 
          style = "pretty",
          palette = "RdBu",
          title = "local moran statistics") +
  tm_borders(alpha = 0.5)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Mapping local Moran’s I p-values

The choropleth shows there is evidence for both positive and negative Ii values. However, it is useful to consider the p-values for each of these values, as consider above.

The code chunks below produce a choropleth map of Moran’s I p-values by using functions of tmap package.

tm_shape(hunan.localMI) +
  tm_fill(col = "Pr.z...0.", 
          breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
          palette="-Blues", 
          title = "local Moran's I p-values") +
  tm_borders(alpha = 0.5)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

Mapping both local Moran’s I values and p-values

For effective interpretation, it is better to plot both the local Moran’s I values map and its correpence p-values map next to each other.

The code chunk below will be used to create such visualisation.

localMI.map <- tm_shape(hunan.localMI) +
  tm_fill(col = "Ii", 
          style = "pretty", 
          title = "local moran statistics") +
  tm_borders(alpha = 0.5)

pvalue.map <- tm_shape(hunan.localMI) +
  tm_fill(col = "Pr.z...0.", 
          breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
          palette="-Blues", 
          title = "local Moran's I p-values") +
  tm_borders(alpha = 0.5)

tmap_arrange(localMI.map, pvalue.map, asp=1, ncol=2)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

Creating a LISA Cluster Map

The LISA Cluster Map shows the significant locations color coded by type of spatial autocorrelation. The first step before we can generate the LISA cluster map is to plot the Moran scatterplot.

Plotting Moran scatterplot

The Moran scatterplot is an illustration of the relationship between the values of the chosen attribute at each location and the average value of the same attribute at neighboring locations.

The code chunk below plots the Moran scatterplot of GDPPC 2012 by using moran.plot() of spdep.

nci <- moran.plot(hunan$GDPPC2012, rswm_q, labels=as.character(hunan$County), xlab="GDPPC 2012", ylab="Spatially Lag GDPPC 2012")

Notice that the plot is split in 4 quadrants. The top right corner belongs to areas that have high GDPPC and are surrounded by other areas that have the average level of GDPPC. This are the high-high locations in the lesson slide.

Plotting Moran scatterplot with standardised variable

Scaling the variable of interest

First we will use scale() to centers and scales the variable. here centering is done by subtracting the mean (omitting NAs) the corresponding columns, and scaling is done by dividing the (centered) variable by their standard deviations.

hunan$Z.GDPPC2012 <- scale(hunan$GDPPC2012) %>% as.vector 

The as.vector() added to the end is to make sure that the data type we get out of this is a vector, that map neatly into out dataframe.

Now, we are ready to plot the Moran scatterplot again by using the code chunk below.

nci2 <- moran.plot(hunan$Z.GDPPC2012, rswm_q, labels=as.character(hunan$County), xlab="z-GDPPC 2012", ylab="Spatially Lag z-GDPPC 2012")

The code chunks below show the steps to prepare a LISA cluster map.

quadrant <- vector(mode="numeric",length=nrow(localMI))

Next, we centers the variable of interest around its mean.

DV <- hunan$GDPPC2012 - mean(hunan$GDPPC2012)     

This is follow by centering the local Moran’s around the mean.

C_mI <- localMI[,1] - mean(localMI[,1])    

Next, we will set a statistical significance level for the local Moran.

signif <- 0.05       

These four command lines define the high-high, low-low, low-high and high-low categories.

quadrant[DV >0 & C_mI>0] <- 4      
quadrant[DV <0 & C_mI<0] <- 1      
quadrant[DV <0 & C_mI>0] <- 2
quadrant[DV >0 & C_mI<0] <- 3

Lastly, places non-significant Moran in the category 0.

quadrant[localMI[,5]>signif] <- 0

In fact, we can combined all the steps into one single code chunk as shown below:

quadrant <- vector(mode="numeric",length=nrow(localMI))
DV <- hunan$GDPPC2012 - mean(hunan$GDPPC2012)     
C_mI <- localMI[,1] - mean(localMI[,1])    
signif <- 0.05       
quadrant[DV >0 & C_mI>0] <- 4      
quadrant[DV <0 & C_mI<0] <- 1      
quadrant[DV <0 & C_mI>0] <- 2
quadrant[DV >0 & C_mI<0] <- 3
quadrant[localMI[,5]>signif] <- 0

Now, we can build the LISA map by using the code chunks below.

hunan.localMI$quadrant <- quadrant
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")

tm_shape(hunan.localMI) +
  tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant)))+1], labels = clusters[c(sort(unique(quadrant)))+1], popup.vars = c("Postal.Code")) +
  tm_view(set.zoom.limits = c(11,17)) +
  tm_borders(alpha=0.5)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

Hot Spot and Cold Spot Area Analysis

Beside detecting cluster and outliers, localised spatial statistics can be also used to detect hot spot and/or cold spot areas.

The term ‘hot spot’ has been used generically across disciplines to describe a region or value that is higher relative to its surroundings (Lepers et al 2005, Aben et al 2012, Isobe et al 2015).

Getis and Ord’s G-Statistics

An alternative spatial statistics to detect spatial anomalies is the Getis and Ord’s G-statistics (Getis and Ord, 1972; Ord and Getis, 1995). It looks at neighbours within a defined proximity to identify where either high or low values clutser spatially. Here, statistically significant hot-spots are recognised as areas of high values where other areas within a neighbourhood range also share high values too.

The analysis consists of three steps:

  • Deriving spatial weight matrix
  • Computing Gi statistics
  • Mapping Gi statistics

Deriving distance-based weight matrix

First, we need to define a new set of neighbours. Whist the spatial autocorrelation considered units which shared borders, for Getis-Ord we are defining neighbours based on distance or also known as proximity.

There are two type of distance-based proximity matrix, they are:

  • fixed diatnce weight matrix; and
  • adaptive distance weight matrix.

Creating fixed distance proximity matrix

The dnearneigh() of spdep package will be used to calculate the fixed distance proximity matrix. The function identifies neighbours of region points by Euclidean distance between lower (greater than) and upper (less than or equal to) bounds, or with longlat = TRUE, by Great Circle distance in kilometers.

dnb <- dnearneigh(coordinates(hunan), 0, 85, longlat = TRUE)
dnb
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 642 
## Percentage nonzero weights: 8.290289 
## Average number of links: 7.295455

Next, we can display the the data and neighbours by using the code chunk below.

plot(hunan, border = 'lightgrey')
plot(dnb, coordinates(hunan), add=TRUE, col='red')

The nb2listw() supplements a neighbours list with spatial weights for the chosen coding scheme. The input of nb2listw() must be an object of class nb. The syntax of the function has two major arguments, namely style and zero.poly.

  • style can take values “W”, “B”, “C”, “U”, “minmax” and “S”. B is the basic binary coding, W is row standardised (sums over all links to n), C is globally standardised (sums over all links to n), U is equal to C divided by the number of neighbours (sums over all links to unity), while S is the variance-stabilizing coding scheme proposed by Tiefelsdorf et al. 1999, p. 167-168 (sums over all links to n).

  • If zero policy is set to TRUE, weights vectors of zero length are inserted for regions without neighbour in the neighbours list. These will in turn generate lag values of zero, equivalent to the sum of products of the zero row t(rep(0, length=length(neighbours))) %*% x, for arbitraty numerical vector x of length length(neighbours). The spatially lagged value of x for the zero-neighbour region will then be zero, which may (or may not) be a sensible choice.

dnb_lw <- nb2listw(dnb, style = 'B')
summary(dnb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 642 
## Percentage nonzero weights: 8.290289 
## Average number of links: 7.295455 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 10 11 
##  1  4  7  6 12 12 13 23  7  3 
## 1 least connected region:
## 88 with 2 links
## 3 most connected regions:
## 35 47 86 with 11 links
## 
## Weights style: B 
## Weights constants summary:
##    n   nn  S0   S1    S2
## B 88 7744 642 1284 20320

The output of nb2listw() is a spatial weights object

Note: It is this weights list object that will be used in the computation of Gi statistics and not the Neighbour list object.

Creating adaptive proximity matrix

Instead of using a fixed distance to deirved the proximity matrix, a fixed number of neighbours can be also used to derived the proximity matrix. This is called adaptive distance method.

The code chunk below will be used to derive an adaptive distance weight matrix.

coords <- coordinates(hunan)
knb <- knn2nb(knearneigh(coords, k=8, longlat = TRUE), row.names=row.names(hunan$gdppc))
knb_lw <- nb2listw(knb, style = 'B')
summary(knb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 704 
## Percentage nonzero weights: 9.090909 
## Average number of links: 8 
## Non-symmetric neighbours list
## Link number distribution:
## 
##  8 
## 88 
## 88 least connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 88 most connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 
## Weights style: B 
## Weights constants summary:
##    n   nn  S0   S1    S2
## B 88 7744 704 1286 23044

Next, we will plot the centroids of the counties and neighbours by using the code chunk below.

plot(hunan, border="lightgrey")
plot(knb, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")

Computing Gi statistics

Gi statistics using fixed distance

fips <- order(hunan$County)
gi.fixed <- localG(hunan$GDPPC2012, dnb_lw)
gi.fixed
##  [1]  0.33163723  0.30022504 -0.01928722  0.23347826 -0.03392157 -0.47966539
##  [7]  3.06337634  3.52132953  4.11133055  1.47912938  1.27491171 -0.76133000
## [13] -0.46258874  0.71813500 -0.06417930  0.35354999 -0.47451050  0.07974191
## [19]  0.31476657 -0.11054769  0.99939906  1.17558026 -1.14239533 -1.07658961
## [25] -1.81487603 -1.48866180 -1.51077556 -0.85413670 -1.26953503 -0.97326334
## [31] -0.15304038 -1.74908457 -1.15759460 -1.98833828  1.85919377 -0.42408526
## [37] -1.72147258 -1.50399992 -2.01128392 -0.89344375 -0.16956076 -2.08488236
## [43] -2.18178008 -1.83990103 -0.49900063  3.33561294  2.15476757 -1.51897356
## [49] -1.01067443 -1.47959479 -1.55031179 -1.89408587 -1.63621036 -1.20000485
## [55] -1.19430447  0.43525680  1.03532181  1.63264419 -1.02035154 -0.71045405
## [61] -0.49125537 -0.83991124 -0.32538867  0.50743726  0.04170660  3.03419107
## [67]  3.97888204  3.73879186 -0.23490311  0.40866791  3.53139674  0.68667429
## [73]  0.25243121  4.25912956 -1.11862241  0.86084777 -1.28368880  2.90232729
## [79] -0.88607519  1.71967284 -1.42763295 -0.08768341  0.71242694  5.14611891
## [85] -0.23532015  4.21922996 -0.97145381 -0.71800062
## attr(,"gstari")
## [1] FALSE
## attr(,"call")
## localG(x = hunan$GDPPC2012, listw = dnb_lw)
## attr(,"class")
## [1] "localG"

The output of localG() is a vector of G or Gstar values, with attributes “gstari” set to TRUE or FALSE, “call” set to the function call, and class “localG”.

The Gi statistics is represented as a Z-score. Greater values represent a greater intensity of clustering and the direction (positive or negative) indicates high or low clusters.

Next, we will join the Gi values to their correponding hunan SpatialPolygonDataFrame using the code chunk below.

hunan.gi <- cbind(hunan, as.matrix(gi.fixed))
names(hunan.gi)[17] <- "gstat"

In fact, the code chunk above performs three tasks. First, it convert the output vector (i.e. gi.fixed) into r matrix object by using as.matrix(). Next, cbind() is used to join and gi.fixed matrix to produce a new SpatialPolygonDataFrame called hunan.gi. Lastly, the field name of the gi values is renamed to gstat_fixed by using names().

Gi statistics using adaptive distance

The code chunk below are used to compute the Gi values for GDPPC2012 by using an adaptive distance weight matrix (i.e knb_lw).

fips <- order(hunan$County)
gi.adaptive <- localG(hunan$GDPPC2012, knb_lw)
hunan.gi <- cbind(hunan, as.matrix(gi.adaptive))
names(hunan.gi)[17] <- "gstat_adaptive"

Visualising local Gi

It is time for us to visualise the locations of hot spot and cold spot areas. The choropleth mapping functions of tmap package will be used to map the Gi values.

Mapping Gi values with fixed distance weights

The code chunk below shows the functions used to map the Gi values derived using fixed distance weight matrix.

tm_shape(hunan.gi) +
  tm_fill(col = "gstat", 
          style = "pretty",
          palette="-RdBu",
          title = "local Gi") +
  tm_borders(alpha = 0.5)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

In the choropleth map above, counties shaded in red are the hot spot areas and counties shaded in blue are the cold spot areas. The darkness of the colours representing the intensity of the Gi values.

In conclusion, the choropleth above shows clear sign of east-west divide in the GDP per capita by county of Hunan province in 2012. The hot spot areas were centre around Changsha city at the east of the provice. The cold spot areas, on the other hand, mainly comprise of counties located on the western part of the province centre around the forested Shaoyang prefecture city.

Mapping Gi values with adaptive distance weights

The code chunk below shows the functions used to map the Gi values derived using adaptive distance weight matrix.

tm_shape(hunan.gi) +
  tm_fill(col = "gstat",
          style = "pretty",
          palette = "-RdBu",
          title = "local Gi") +
  tm_borders(alpha = 0.5)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.