In this hands-on exercise, you will gain hands-on experience on using appropriate localised geospatial statistics analysis and thematic mapping functions of R to perform spatial clustering on geographically referenced attribute. There are two major analysis, namely:
By the end of this hands-on exercise, you will able:
In spatial policy, one of the main development objective of the local govenment and planners is to ensure equal distribution of development in the provice. Our task in this study, hence, is to apply appropriate spatial statistical methods to discover if development are even distributed within a planning area. If the answer is No. Then, our next question will be “is there sign of spatial clustering?”. And, if the answer for this question is yes, then our next question will be “where are these clusters?”
In this case study, we are interested to exame the spatial pattern of a selected development indicator (i.e. GDP per capita) of Hunan Provice, People Republic of China.(https://en.wikipedia.org/wiki/Hunan)
Two data sets will be used in this study. They are:
Before we get started, it is important for us to install the necessary R packages into R and launch these R packages into R environment.
The spatial statistical analysis package that will be used is this study is spdep (https://cran.r-project.org/web/packages/spdep/index.html).
The other R packages needed for this study and their functions are as follows:
The code chunks below installs and launches these R packages into R environment.
packages = c('rgdal', 'spdep', 'tmap', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Note: With tidyverse, we do not have to install readr, ggplot2 and dplyr packages seperately. In fact, tidyverse also installes other very useful R packages such as tidyr.
In this section, you willimport Yunan county boundary GIS data and its associated attrbiute table into R environment.
The Yunan county boundary GIS data is in ESRI shapefile format. It will be imported into R environment by using the readOGR() function of rgdal.
The code chunks used are shown below:
hunan <- readOGR(dsn = "data/shapefile", layer = "Hunan")
## OGR data source with driver: ESRI Shapefile
## Source: "D:\IS415-AY2020-21T1\03-Hands-on Exercises\Hands-on_Ex07\data\shapefile", layer: "Hunan"
## with 88 features
## It has 7 fields
The imported Hunan’s county boundary is called hunan. It is saved in SpatialPolygonDataFrame format.
The csv file will be import using read_csv function of readr package.
The code chunks used are shown below:
gdppc <- read_csv ("data/attributeTable/Hunan_GDPPC.csv")
## Parsed with column specification:
## cols(
## County = col_character(),
## City = col_character(),
## GDPPC2005 = col_double(),
## GDPPC2006 = col_double(),
## GDPPC2007 = col_double(),
## GDPPC2008 = col_double(),
## GDPPC2009 = col_double(),
## GDPPC2010 = col_double(),
## GDPPC2011 = col_double(),
## GDPPC2012 = col_double()
## )
The imported Hunan GDP per capita attribute data set is called gdppc. It is saved in R’s * tibble data.frame* format.
Next, we combines both data sets into one. This is performed using the left_join function of dplyr package. The hunan SpatialPolygonDataFrame will be used as the base data object and the gdppc data.frame will be used as the join table.
The code chunks below is used to perform the task.
hunan@data <- left_join(hunan@data,gdppc)
## Joining, by = "County"
The message above shows that County field is the common field used to perform the left-join.
It is important to note that there is no new output data been created. Instead, the data fields from gdppc data frame are now updated into the data frame of hunan SpatialPolygonDataFrame.
To have a quick look at the distribution of GDPPC 2012 of Yunan provice by county, a choropleth map will be prepared.
The code chunks below are used to prepare the choroplethby using the qtm() function of tmap package.
qtm(hunan, "GDPPC2012")
The choropleth above shows clear sign of spatial autocorrelation.
In this section, you will learn how to apply appropriate Local Indicators for Spatial Association (LISA), especially local Moran’I to detect cluster and/or outlier from GDP per capita 2012 of Hunan Province, PRC.
Local Indicators of Spatial Association or LISA are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable. For instance if we are studying cancer rates among census tracts in a given city local clusters in the rates mean that there are areas that have higher or lower rates than is to be expected by chance alone; that is, the values occurring are above or below those of a random distribution in space.
The analysis consists of five steps:
To construct a contiguity based neighbours, the poly2nb() function of spdep will be used. The function builds a neighbours list based on regions with contiguous boundaries, that is sharing one or more boundary point.
poly2nb() can be used to construct either Queen or Rook contiguity neighbours. The default is Queen. If queen=TRUE, a single shared boundary point meets the contiguity condition, if FALSE, more than one shared point is required; note that more than one shared boundary point does not necessarily mean a shared boundary line
The codes below construct a Queen contiguity matrix.
wm_q <- poly2nb(hunan, queen=TRUE)
We can examine the content of wm_q by using the summary()
summary(wm_q)
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 448
## Percentage nonzero weights: 5.785124
## Average number of links: 5.090909
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 11
## 2 2 12 16 24 14 11 4 2 1
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links
You can also display the contiguity map by using plot() of BASE R.
plot(hunan, border="lightgrey")
plot(wm_q, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col= "red")
DIY: Create other forms of spatial weight matrix by using appropriate functions provided by spdep package.
Spatial weights are often row standardized, particularly with binary weighting strategies. Row standardization is used to create proportional weights in cases where features have an unequal number of neighbors. Row standardization involves dividing each neighbor weight for a feature by the sum of all neighbor weights for that feature, and is recommended whenever the distribution of your features is potentially biased due to sampling design or an imposed aggregation scheme.
The code chunks below construct construct a row-standardised weights matrix by using wm_q as the input matrix.
rswm_q <- nb2listw(wm_q, zero.policy = TRUE)
summary(rswm_q)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 448
## Percentage nonzero weights: 5.785124
## Average number of links: 5.090909
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 11
## 2 2 12 16 24 14 11 4 2 1
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 88 7744 88 37.86334 365.9147
To compute local Moran’s I, the localmoran() function of spdep will be used. It computes Ii values, given a set of zi values and a listw object providing neighbour weighting information for the polygon associated with the zi values.
The code chunks below are used to compute local Moran’s I of GDPPC2012 at the county level.
fips <- order(hunan$County)
localMI <- localmoran(hunan$GDPPC2012, rswm_q)
head(localMI)
## Ii E.Ii Var.Ii Z.Ii Pr(z > 0)
## 0 -0.001468468 -0.01149425 0.1768267 0.023842100 0.4904893
## 1 0.025878173 -0.01149425 0.1768267 0.088874545 0.4645908
## 2 -0.011987646 -0.01149425 0.2234962 -0.001043657 0.5004164
## 3 0.001022468 -0.01149425 0.2234962 0.026476192 0.4894388
## 4 0.014814881 -0.01149425 0.2234962 0.055650812 0.4778100
## 5 -0.038793829 -0.01149425 0.1768267 -0.064920523 0.5258814
localmoran() function returns a matrix of values whose columns are:
The code chunk below list the content of the local Moran matrix derived by using printCoefmat().
printCoefmat(data.frame(localMI[fips,], row.names=hunan$County[fips]), check.names=FALSE)
## Ii E.Ii Var.Ii Z.Ii Pr.z...0.
## Anhua -2.2493e-02 -1.1494e-02 1.2349e-01 -3.1298e-02 0.5125
## Anren -3.9932e-01 -1.1494e-02 1.0682e-01 -1.1866e+00 0.8823
## Anxiang -1.4685e-03 -1.1494e-02 1.7683e-01 2.3842e-02 0.4905
## Baojing 3.4737e-01 -1.1494e-02 1.7683e-01 8.5341e-01 0.1967
## Chaling 2.0559e-02 -1.1494e-02 3.0128e-01 5.8397e-02 0.4767
## Changning -2.9868e-05 -1.1494e-02 1.7683e-01 2.7263e-02 0.4891
## Changsha 4.9022e+00 -1.1494e-02 1.4571e-01 1.2872e+01 0.0000
## Chengbu 7.3725e-01 -1.1494e-02 2.2350e-01 1.5838e+00 0.0566
## Chenxi 1.4544e-01 -1.1494e-02 1.7683e-01 3.7319e-01 0.3545
## Cili 7.3176e-02 -1.1494e-02 3.0128e-01 1.5426e-01 0.4387
## Dao 2.1420e-01 -1.1494e-02 2.2350e-01 4.7741e-01 0.3165
## Dongan 1.5210e-01 -1.1494e-02 2.2350e-01 3.4604e-01 0.3647
## Dongkou 5.2918e-01 -1.1494e-02 1.7683e-01 1.2858e+00 0.0993
## Fenghuang 1.8013e-01 -1.1494e-02 2.2350e-01 4.0534e-01 0.3426
## Guidong -5.9160e-01 -1.1494e-02 3.0128e-01 -1.0569e+00 0.8547
## Guiyang 1.8240e-01 -1.1494e-02 9.3859e-02 6.3290e-01 0.2634
## Guzhang 2.8466e-01 -1.1494e-02 1.7683e-01 7.0428e-01 0.2406
## Hanshou 2.5878e-02 -1.1494e-02 1.7683e-01 8.8875e-02 0.4646
## Hengdong 9.9964e-03 -1.1494e-02 1.4571e-01 5.6299e-02 0.4776
## Hengnan 2.8064e-02 -1.1494e-02 1.2349e-01 1.1257e-01 0.4552
## Hengshan -5.8201e-03 -1.1494e-02 1.7683e-01 1.3494e-02 0.4946
## Hengyang 6.2997e-02 -1.1494e-02 1.7683e-01 1.7715e-01 0.4297
## Hongjiang 1.8790e-01 -1.1494e-02 1.4571e-01 5.2234e-01 0.3007
## Huarong -1.5389e-02 -1.1494e-02 4.5684e-01 -5.7619e-03 0.5023
## Huayuan 8.3772e-02 -1.1494e-02 3.0128e-01 1.7356e-01 0.4311
## Huitong 2.5997e-01 -1.1494e-02 2.2350e-01 5.7422e-01 0.2829
## Jiahe -1.2431e-01 -1.1494e-02 1.7683e-01 -2.6828e-01 0.6058
## Jianghua 2.8651e-01 -1.1494e-02 2.2350e-01 6.3036e-01 0.2642
## Jiangyong 2.4337e-01 -1.1494e-02 4.5684e-01 3.7707e-01 0.3531
## Jingzhou 1.8270e-01 -1.1494e-02 3.0128e-01 3.5379e-01 0.3617
## Jinshi -1.1988e-02 -1.1494e-02 2.2350e-01 -1.0437e-03 0.5004
## Jishou -2.8680e-01 -1.1494e-02 1.7683e-01 -6.5470e-01 0.7437
## Lanshan 6.3334e-02 -1.1494e-02 2.2350e-01 1.5828e-01 0.4371
## Leiyang 1.1581e-02 -1.1494e-02 1.7683e-01 5.4874e-02 0.4781
## Lengshuijiang -1.7903e+00 -1.1494e-02 3.0128e-01 -3.2408e+00 0.9994
## Li 1.0225e-03 -1.1494e-02 2.2350e-01 2.6476e-02 0.4894
## Lianyuan -1.4672e-01 -1.1494e-02 1.0682e-01 -4.1375e-01 0.6605
## Liling 1.3774e+00 -1.1494e-02 3.0128e-01 2.5304e+00 0.0057
## Linli 1.4815e-02 -1.1494e-02 2.2350e-01 5.5651e-02 0.4778
## Linwu -2.4621e-03 -1.1494e-02 2.2350e-01 1.9105e-02 0.4924
## Linxiang 6.5904e-02 -1.1494e-02 9.2354e-01 8.0539e-02 0.4679
## Liuyang 3.3688e+00 -1.1494e-02 2.2350e-01 7.1503e+00 0.0000
## Longhui 8.0801e-01 -1.1494e-02 1.4571e-01 2.1468e+00 0.0159
## Longshan 7.5663e-01 -1.1494e-02 3.0128e-01 1.3994e+00 0.0808
## Luxi 1.8177e-01 -1.1494e-02 1.4571e-01 5.0628e-01 0.3063
## Mayang 2.1852e-01 -1.1494e-02 1.7683e-01 5.4699e-01 0.2922
## Miluo 1.8704e+00 -1.1494e-02 1.7683e-01 4.4752e+00 0.0000
## Nan -9.5789e-03 -1.1494e-02 1.4571e-01 5.0176e-03 0.4980
## Ningxiang 1.5607e+00 -1.1494e-02 1.2349e-01 4.4739e+00 0.0000
## Ningyuan 2.0910e-01 -1.1494e-02 1.2349e-01 6.2774e-01 0.2651
## Pingjiang -9.8964e-01 -1.1494e-02 2.2350e-01 -2.0690e+00 0.9807
## Qidong 1.1806e-01 -1.1494e-02 1.2349e-01 3.6867e-01 0.3562
## Qiyang 6.1966e-02 -1.1494e-02 1.2349e-01 2.0904e-01 0.4172
## Rucheng -3.6992e-01 -1.1494e-02 3.0128e-01 -6.5300e-01 0.7431
## Sangzhi 2.5053e-01 -1.1494e-02 1.4571e-01 6.8643e-01 0.2462
## Shaodong -3.2659e-02 -1.1494e-02 1.4571e-01 -5.5445e-02 0.5221
## Shaoshan 2.1223e+00 -1.1494e-02 3.0128e-01 3.8874e+00 0.0001
## Shaoyang 5.9499e-01 -1.1494e-02 1.2349e-01 1.7259e+00 0.0422
## Shimen -3.8794e-02 -1.1494e-02 1.7683e-01 -6.4921e-02 0.5259
## Shuangfeng 9.2835e-03 -1.1494e-02 1.4571e-01 5.4431e-02 0.4783
## Shuangpai 8.0591e-02 -1.1494e-02 3.0128e-01 1.6777e-01 0.4334
## Suining 3.7585e-01 -1.1494e-02 1.2349e-01 1.1022e+00 0.1352
## Taojiang -2.5394e-01 -1.1494e-02 1.2349e-01 -6.8991e-01 0.7549
## Taoyuan 1.4729e-02 -1.1494e-02 7.5002e-02 9.5754e-02 0.4619
## Tongdao 4.6482e-01 -1.1494e-02 3.0128e-01 8.6779e-01 0.1928
## Wangcheng 4.4220e+00 -1.1494e-02 1.4571e-01 1.1614e+01 0.0000
## Wugang 7.1003e-01 -1.1494e-02 1.4571e-01 1.8902e+00 0.0294
## Xiangtan 2.4530e-01 -1.1494e-02 9.3859e-02 8.3820e-01 0.2010
## Xiangxiang 2.6271e-01 -1.1494e-02 1.7683e-01 6.5207e-01 0.2572
## Xiangyin 5.4525e-01 -1.1494e-02 1.7683e-01 1.3240e+00 0.0928
## Xinhua 1.1810e-01 -1.1494e-02 1.4571e-01 3.3950e-01 0.3671
## Xinhuang 1.5725e-01 -1.1494e-02 9.2354e-01 1.7559e-01 0.4303
## Xinning 6.8928e-01 -1.1494e-02 2.2350e-01 1.4823e+00 0.0691
## Xinshao 5.7578e-02 -1.1494e-02 1.4571e-01 1.8095e-01 0.4282
## Xintian -7.4050e-03 -1.1494e-02 2.2350e-01 8.6498e-03 0.4965
## Xupu 3.2406e-01 -1.1494e-02 1.0682e-01 1.0267e+00 0.1523
## Yanling -6.9021e-02 -1.1494e-02 1.7683e-01 -1.3680e-01 0.5544
## Yizhang -2.6844e-01 -1.1494e-02 2.2350e-01 -5.4350e-01 0.7066
## Yongshun 6.3064e-01 -1.1494e-02 1.7683e-01 1.5271e+00 0.0634
## Yongxing 4.3411e-01 -1.1494e-02 1.7683e-01 1.0597e+00 0.1446
## You 7.8750e-02 -1.1494e-02 1.7683e-01 2.1461e-01 0.4150
## Yuanjiang 2.0004e-04 -1.1494e-02 1.7683e-01 2.7810e-02 0.4889
## Yuanling 8.7298e-03 -1.1494e-02 1.0682e-01 6.1878e-02 0.4753
## Yueyang 4.1189e-02 -1.1494e-02 1.2349e-01 1.4992e-01 0.4404
## Zhijiang 1.0476e-01 -1.1494e-02 1.7683e-01 2.7647e-01 0.3911
## Zhongfang -2.2685e-01 -1.1494e-02 1.7683e-01 -5.1212e-01 0.6957
## Zhuzhou 3.2864e-01 -1.1494e-02 1.4571e-01 8.9105e-01 0.1865
## Zixing -7.6849e-01 -1.1494e-02 1.2349e-01 -2.1541e+00 0.9844
Before mapping the local Moran’s I map, it is wise to append the local Moran’s I dataframe (i.e. localMI) onto hunan SpatialPolygonDataFrame. The code chunks below can be used to perform the task. The out SpatialPolygonDataFrame is called hunan.localMI.
hunan.localMI <- cbind(hunan,localMI)
Using choropleth mapping functions of tmap package, we can plot the local Moran’s I values by using the code chinks below.
tm_shape(hunan.localMI) +
tm_fill(col = "Ii",
style = "pretty",
palette = "RdBu",
title = "local moran statistics") +
tm_borders(alpha = 0.5)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
The choropleth shows there is evidence for both positive and negative Ii values. However, it is useful to consider the p-values for each of these values, as consider above.
The code chunks below produce a choropleth map of Moran’s I p-values by using functions of tmap package.
tm_shape(hunan.localMI) +
tm_fill(col = "Pr.z...0.",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Blues",
title = "local Moran's I p-values") +
tm_borders(alpha = 0.5)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
For effective interpretation, it is better to plot both the local Moran’s I values map and its correpence p-values map next to each other.
The code chunk below will be used to create such visualisation.
localMI.map <- tm_shape(hunan.localMI) +
tm_fill(col = "Ii",
style = "pretty",
title = "local moran statistics") +
tm_borders(alpha = 0.5)
pvalue.map <- tm_shape(hunan.localMI) +
tm_fill(col = "Pr.z...0.",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Blues",
title = "local Moran's I p-values") +
tm_borders(alpha = 0.5)
tmap_arrange(localMI.map, pvalue.map, asp=1, ncol=2)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
The LISA Cluster Map shows the significant locations color coded by type of spatial autocorrelation. The first step before we can generate the LISA cluster map is to plot the Moran scatterplot.
The Moran scatterplot is an illustration of the relationship between the values of the chosen attribute at each location and the average value of the same attribute at neighboring locations.
The code chunk below plots the Moran scatterplot of GDPPC 2012 by using moran.plot() of spdep.
nci <- moran.plot(hunan$GDPPC2012, rswm_q, labels=as.character(hunan$County), xlab="GDPPC 2012", ylab="Spatially Lag GDPPC 2012")
Notice that the plot is split in 4 quadrants. The top right corner belongs to areas that have high GDPPC and are surrounded by other areas that have the average level of GDPPC. This are the high-high locations in the lesson slide.
First we will use scale() to centers and scales the variable. here centering is done by subtracting the mean (omitting NAs) the corresponding columns, and scaling is done by dividing the (centered) variable by their standard deviations.
hunan$Z.GDPPC2012 <- scale(hunan$GDPPC2012) %>% as.vector
The as.vector() added to the end is to make sure that the data type we get out of this is a vector, that map neatly into out dataframe.
Now, we are ready to plot the Moran scatterplot again by using the code chunk below.
nci2 <- moran.plot(hunan$Z.GDPPC2012, rswm_q, labels=as.character(hunan$County), xlab="z-GDPPC 2012", ylab="Spatially Lag z-GDPPC 2012")
The code chunks below show the steps to prepare a LISA cluster map.
quadrant <- vector(mode="numeric",length=nrow(localMI))
Next, we centers the variable of interest around its mean.
DV <- hunan$GDPPC2012 - mean(hunan$GDPPC2012)
This is follow by centering the local Moran’s around the mean.
C_mI <- localMI[,1] - mean(localMI[,1])
Next, we will set a statistical significance level for the local Moran.
signif <- 0.05
These four command lines define the high-high, low-low, low-high and high-low categories.
quadrant[DV >0 & C_mI>0] <- 4
quadrant[DV <0 & C_mI<0] <- 1
quadrant[DV <0 & C_mI>0] <- 2
quadrant[DV >0 & C_mI<0] <- 3
Lastly, places non-significant Moran in the category 0.
quadrant[localMI[,5]>signif] <- 0
In fact, we can combined all the steps into one single code chunk as shown below:
quadrant <- vector(mode="numeric",length=nrow(localMI))
DV <- hunan$GDPPC2012 - mean(hunan$GDPPC2012)
C_mI <- localMI[,1] - mean(localMI[,1])
signif <- 0.05
quadrant[DV >0 & C_mI>0] <- 4
quadrant[DV <0 & C_mI<0] <- 1
quadrant[DV <0 & C_mI>0] <- 2
quadrant[DV >0 & C_mI<0] <- 3
quadrant[localMI[,5]>signif] <- 0
Now, we can build the LISA map by using the code chunks below.
hunan.localMI$quadrant <- quadrant
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
tm_shape(hunan.localMI) +
tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant)))+1], labels = clusters[c(sort(unique(quadrant)))+1], popup.vars = c("Postal.Code")) +
tm_view(set.zoom.limits = c(11,17)) +
tm_borders(alpha=0.5)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
Beside detecting cluster and outliers, localised spatial statistics can be also used to detect hot spot and/or cold spot areas.
The term ‘hot spot’ has been used generically across disciplines to describe a region or value that is higher relative to its surroundings (Lepers et al 2005, Aben et al 2012, Isobe et al 2015).
An alternative spatial statistics to detect spatial anomalies is the Getis and Ord’s G-statistics (Getis and Ord, 1972; Ord and Getis, 1995). It looks at neighbours within a defined proximity to identify where either high or low values clutser spatially. Here, statistically significant hot-spots are recognised as areas of high values where other areas within a neighbourhood range also share high values too.
The analysis consists of three steps:
First, we need to define a new set of neighbours. Whist the spatial autocorrelation considered units which shared borders, for Getis-Ord we are defining neighbours based on distance or also known as proximity.
There are two type of distance-based proximity matrix, they are:
The dnearneigh() of spdep package will be used to calculate the fixed distance proximity matrix. The function identifies neighbours of region points by Euclidean distance between lower (greater than) and upper (less than or equal to) bounds, or with longlat = TRUE, by Great Circle distance in kilometers.
dnb <- dnearneigh(coordinates(hunan), 0, 85, longlat = TRUE)
dnb
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 642
## Percentage nonzero weights: 8.290289
## Average number of links: 7.295455
Next, we can display the the data and neighbours by using the code chunk below.
plot(hunan, border = 'lightgrey')
plot(dnb, coordinates(hunan), add=TRUE, col='red')
The nb2listw() supplements a neighbours list with spatial weights for the chosen coding scheme. The input of nb2listw() must be an object of class nb. The syntax of the function has two major arguments, namely style and zero.poly.
style can take values “W”, “B”, “C”, “U”, “minmax” and “S”. B is the basic binary coding, W is row standardised (sums over all links to n), C is globally standardised (sums over all links to n), U is equal to C divided by the number of neighbours (sums over all links to unity), while S is the variance-stabilizing coding scheme proposed by Tiefelsdorf et al. 1999, p. 167-168 (sums over all links to n).
If zero policy is set to TRUE, weights vectors of zero length are inserted for regions without neighbour in the neighbours list. These will in turn generate lag values of zero, equivalent to the sum of products of the zero row t(rep(0, length=length(neighbours))) %*% x, for arbitraty numerical vector x of length length(neighbours). The spatially lagged value of x for the zero-neighbour region will then be zero, which may (or may not) be a sensible choice.
dnb_lw <- nb2listw(dnb, style = 'B')
summary(dnb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 642
## Percentage nonzero weights: 8.290289
## Average number of links: 7.295455
## Link number distribution:
##
## 2 3 4 5 6 7 8 9 10 11
## 1 4 7 6 12 12 13 23 7 3
## 1 least connected region:
## 88 with 2 links
## 3 most connected regions:
## 35 47 86 with 11 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 88 7744 642 1284 20320
The output of nb2listw() is a spatial weights object
Note: It is this weights list object that will be used in the computation of Gi statistics and not the Neighbour list object.
Instead of using a fixed distance to deirved the proximity matrix, a fixed number of neighbours can be also used to derived the proximity matrix. This is called adaptive distance method.
The code chunk below will be used to derive an adaptive distance weight matrix.
coords <- coordinates(hunan)
knb <- knn2nb(knearneigh(coords, k=8, longlat = TRUE), row.names=row.names(hunan$gdppc))
knb_lw <- nb2listw(knb, style = 'B')
summary(knb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 704
## Percentage nonzero weights: 9.090909
## Average number of links: 8
## Non-symmetric neighbours list
## Link number distribution:
##
## 8
## 88
## 88 least connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 88 most connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 88 7744 704 1286 23044
Next, we will plot the centroids of the counties and neighbours by using the code chunk below.
plot(hunan, border="lightgrey")
plot(knb, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")
fips <- order(hunan$County)
gi.fixed <- localG(hunan$GDPPC2012, dnb_lw)
gi.fixed
## [1] 0.33163723 0.30022504 -0.01928722 0.23347826 -0.03392157 -0.47966539
## [7] 3.06337634 3.52132953 4.11133055 1.47912938 1.27491171 -0.76133000
## [13] -0.46258874 0.71813500 -0.06417930 0.35354999 -0.47451050 0.07974191
## [19] 0.31476657 -0.11054769 0.99939906 1.17558026 -1.14239533 -1.07658961
## [25] -1.81487603 -1.48866180 -1.51077556 -0.85413670 -1.26953503 -0.97326334
## [31] -0.15304038 -1.74908457 -1.15759460 -1.98833828 1.85919377 -0.42408526
## [37] -1.72147258 -1.50399992 -2.01128392 -0.89344375 -0.16956076 -2.08488236
## [43] -2.18178008 -1.83990103 -0.49900063 3.33561294 2.15476757 -1.51897356
## [49] -1.01067443 -1.47959479 -1.55031179 -1.89408587 -1.63621036 -1.20000485
## [55] -1.19430447 0.43525680 1.03532181 1.63264419 -1.02035154 -0.71045405
## [61] -0.49125537 -0.83991124 -0.32538867 0.50743726 0.04170660 3.03419107
## [67] 3.97888204 3.73879186 -0.23490311 0.40866791 3.53139674 0.68667429
## [73] 0.25243121 4.25912956 -1.11862241 0.86084777 -1.28368880 2.90232729
## [79] -0.88607519 1.71967284 -1.42763295 -0.08768341 0.71242694 5.14611891
## [85] -0.23532015 4.21922996 -0.97145381 -0.71800062
## attr(,"gstari")
## [1] FALSE
## attr(,"call")
## localG(x = hunan$GDPPC2012, listw = dnb_lw)
## attr(,"class")
## [1] "localG"
The output of localG() is a vector of G or Gstar values, with attributes “gstari” set to TRUE or FALSE, “call” set to the function call, and class “localG”.
The Gi statistics is represented as a Z-score. Greater values represent a greater intensity of clustering and the direction (positive or negative) indicates high or low clusters.
Next, we will join the Gi values to their correponding hunan SpatialPolygonDataFrame using the code chunk below.
hunan.gi <- cbind(hunan, as.matrix(gi.fixed))
names(hunan.gi)[17] <- "gstat"
In fact, the code chunk above performs three tasks. First, it convert the output vector (i.e. gi.fixed) into r matrix object by using as.matrix(). Next, cbind() is used to join hun@data and gi.fixed matrix to produce a new SpatialPolygonDataFrame called hunan.gi. Lastly, the field name of the gi values is renamed to gstat_fixed by using names().
The code chunk below are used to compute the Gi values for GDPPC2012 by using an adaptive distance weight matrix (i.e knb_lw).
fips <- order(hunan$County)
gi.adaptive <- localG(hunan$GDPPC2012, knb_lw)
hunan.gi <- cbind(hunan, as.matrix(gi.adaptive))
names(hunan.gi)[17] <- "gstat_adaptive"
It is time for us to visualise the locations of hot spot and cold spot areas. The choropleth mapping functions of tmap package will be used to map the Gi values.
The code chunk below shows the functions used to map the Gi values derived using fixed distance weight matrix.
tm_shape(hunan.gi) +
tm_fill(col = "gstat",
style = "pretty",
palette="-RdBu",
title = "local Gi") +
tm_borders(alpha = 0.5)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
In the choropleth map above, counties shaded in red are the hot spot areas and counties shaded in blue are the cold spot areas. The darkness of the colours representing the intensity of the Gi values.
In conclusion, the choropleth above shows clear sign of east-west divide in the GDP per capita by county of Hunan province in 2012. The hot spot areas were centre around Changsha city at the east of the provice. The cold spot areas, on the other hand, mainly comprise of counties located on the western part of the province centre around the forested Shaoyang prefecture city.
The code chunk below shows the functions used to map the Gi values derived using adaptive distance weight matrix.
tm_shape(hunan.gi) +
tm_fill(col = "gstat",
style = "pretty",
palette = "-RdBu",
title = "local Gi") +
tm_borders(alpha = 0.5)
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.