In this hands-on exercise, we will learn how to use appropriate localised geospatial statistics analysis and thematic mapping of R to perform spatial clustering on greographically referenced attribute. The 2 major analysis methods are:
cluster & outlier analysis (using Local Indicator of Spatial Association (LISA) of spdep package)
hot spot & cold spot areas analysis (using Getis-Ord’s Gi-statistics od spdep package)
Specifically, we are going to analyse the spatial pattern of GDP per capital of Hunan Province counties,
Whether development based on GDP per capita is distributed within a planning area?
Are there signs of spatial clustering and where are these clusters?
Hunan county shapefile containing the county’s boundary layer, spatial data in polygon objects
Hunan_GDPPC.csv containing the GPD per capital values at the county level
We will be using these R packages:
spdep - for spatial statistical analysis
rgdal - for handling spatial data
tmap - for choropleth mapping
tidyverse - for tidying data (readr, ggplot2 & dplyr)
packages = c('spdep','rgdal','tmap','tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
We import the shapefile using readOGR() function of rgdal package.
Then, we import the csv file using read_csv() function of readr package.
hunan <- readOGR(dsn = "data/shapefile/Hunan.shp", layer = "Hunan")
## OGR data source with driver: ESRI Shapefile
## Source: "D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Week 7\Hands-on_Ex07\data\shapefile\Hunan.shp", layer: "Hunan"
## with 88 features
## It has 7 fields
gdppc <- read_csv("data/attributeTable/Hunan_GDPPC.csv")
## Parsed with column specification:
## cols(
## County = col_character(),
## City = col_character(),
## GDPPC2005 = col_double(),
## GDPPC2006 = col_double(),
## GDPPC2007 = col_double(),
## GDPPC2008 = col_double(),
## GDPPC2009 = col_double(),
## GDPPC2010 = col_double(),
## GDPPC2011 = col_double(),
## GDPPC2012 = col_double()
## )
We will use left_join() function of dplyr package to combine hunan (SpatialPolygonsDataFrame) and gdppc (tbl dataframe)
hunan@data <- left_join(hunan@data, gdppc)
## Joining, by = "County"
Note: no new output data has been created, gdppc data frame has all the same variables as hunan, so the gdppc data frame are updated into the data frame of hunan SpatialPolygonDataFrame.
We use qtm() function from tmap package, it stands for quick thematic map and is an easy way to plot maps.
qtm(hunan, "GDPPC2012")
The choropleth map above shows clear sign of spatial autocorrelation. (Global Moran I value)
What is Local Indicators of Spatial Association (LISA)?
Analysis steps:
Deriving spatial weights matrix
Computing local Moran’s I
Plotting Moran scatterplot
Mapping local Moran’s I
Plotting LISA map
We use poly2nb() function of spdep package, this function builds neighbours list based on regions with contiguous boundaries (diagonals too).
wm_q <- poly2nb(hunan, queen = TRUE)
summary(wm_q)
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 448
## Percentage nonzero weights: 5.785124
## Average number of links: 5.090909
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 11
## 2 2 12 16 24 14 11 4 2 1
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links
From the summary, we can find out that:
There are 88 area units/counties in Hunan
Most connected area unit has 11 neighbours which is shown by the first row 11 with the next row showing 1 meaning only 1 area unit has 11 neighbours
There are 2 area units with only 1 neighbour
plot(hunan, border = 'lightgrey')
plot(wm_q, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")
rswm_q <- nb2listw(wm_q, zero.policy = TRUE)
summary(rswm_q)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 448
## Percentage nonzero weights: 5.785124
## Average number of links: 5.090909
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 11
## 2 2 12 16 24 14 11 4 2 1
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 88 7744 88 37.86334 365.9147
To control the number of neighbours directly, we can use k-nearest neighbours knearneigh() function, either accepting asymmetric neighbours or imposing symmetry as shown below
longlat argument is TRUE because the crs of hunan is already in WGS84. However, if it were to be false, it will use the great circle method to process it.
knn8 <- knn2nb(knearneigh(coordinates(hunan), k =8, longlat = TRUE), row.names = row.names(hunan$GDPPC2012))
knn_lw <- nb2listw(knn8, style = "B")
summary(knn8)
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 704
## Percentage nonzero weights: 9.090909
## Average number of links: 8
## Non-symmetric neighbours list
## Link number distribution:
##
## 8
## 88
## 88 least connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 88 most connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
plot(hunan, border = 'lightgrey')
plot(knn_lw, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = 'red')
We use localmoran() function of spdep package, it will compute li values, given a set of zi values and listw object providing neighbour weighting information for the polygon associated with the zi values.
fips <- order(hunan$County)
localMI <- localmoran(hunan$GDPPC2012, rswm_q)
head(localMI)
## Ii E.Ii Var.Ii Z.Ii Pr(z > 0)
## 0 -0.001468468 -0.01149425 0.1768267 0.023842100 0.4904893
## 1 0.025878173 -0.01149425 0.1768267 0.088874545 0.4645908
## 2 -0.011987646 -0.01149425 0.2234962 -0.001043657 0.5004164
## 3 0.001022468 -0.01149425 0.2234962 0.026476192 0.4894388
## 4 0.014814881 -0.01149425 0.2234962 0.055650812 0.4778100
## 5 -0.038793829 -0.01149425 0.1768267 -0.064920523 0.5258814
Since the localmoran() function doesn’t give index to the results, we use order()function so that we can perform joins and it will return the correct result.
fips1 <- order(hunan$County)
localMI1 <- localmoran(hunan$GDPPC2012, knn_lw)
head(localMI1)
## Ii E.Ii Var.Ii Z.Ii Pr(z > 0)
## 1 -0.037251807 -0.09195402 6.836637 0.02092106 0.4916543
## 2 -0.192769199 -0.09195402 6.836637 -0.03855713 0.5153783
## 3 0.013637073 -0.09195402 6.836637 0.04038369 0.4838936
## 4 0.002769951 -0.09195402 6.836637 0.03622752 0.4855505
## 5 -0.007674909 -0.09195402 6.836637 0.03223285 0.4871432
## 6 -0.259921222 -0.09195402 6.836637 -0.06423966 0.5256103
Using the printCoefmat() function, we can see also see the content of the local Moran matrix.
printCoefmat(data.frame(localMI1[fips1,], row.names = hunan$County[fips1]), check.names = FALSE)
## Ii E.Ii Var.Ii Z.Ii Pr.z...0.
## Anhua -9.6865e-02 -9.1954e-02 6.8366e+00 -1.8782e-03 0.5007
## Anren -3.1946e+00 -9.1954e-02 6.8366e+00 -1.1866e+00 0.8823
## Anxiang -3.7252e-02 -9.1954e-02 6.8366e+00 2.0921e-02 0.4917
## Baojing 3.0039e+00 -9.1954e-02 6.8366e+00 1.1840e+00 0.1182
## Chaling -3.1217e-01 -9.1954e-02 6.8366e+00 -8.4223e-02 0.5336
## Changning -2.5354e-04 -9.1954e-02 6.8366e+00 3.5071e-02 0.4860
## Changsha 4.5362e+01 -9.1954e-02 6.8366e+00 1.7384e+01 0.0000
## Chengbu 4.7981e+00 -9.1954e-02 6.8366e+00 1.8702e+00 0.0307
## Chenxi 1.0778e+00 -9.1954e-02 6.8366e+00 4.4739e-01 0.3273
## Cili 3.1224e-01 -9.1954e-02 6.8366e+00 1.5458e-01 0.4386
## Dao 1.1530e+00 -9.1954e-02 6.8366e+00 4.7615e-01 0.3170
## Dongan 1.0037e+00 -9.1954e-02 6.8366e+00 4.1905e-01 0.3376
## Dongkou 3.7264e+00 -9.1954e-02 6.8366e+00 1.4604e+00 0.0721
## Fenghuang 1.4478e+00 -9.1954e-02 6.8366e+00 5.8890e-01 0.2780
## Guidong -2.3061e+00 -9.1954e-02 6.8366e+00 -8.4682e-01 0.8015
## Guiyang 1.7848e+00 -9.1954e-02 6.8366e+00 7.1777e-01 0.2364
## Guzhang 2.7198e+00 -9.1954e-02 6.8366e+00 1.0753e+00 0.1411
## Hanshou -1.9277e-01 -9.1954e-02 6.8366e+00 -3.8557e-02 0.5154
## Hengdong -6.9586e-02 -9.1954e-02 6.8366e+00 8.5548e-03 0.4966
## Hengnan 4.7834e-02 -9.1954e-02 6.8366e+00 5.3462e-02 0.4787
## Hengshan 1.0657e-01 -9.1954e-02 6.8366e+00 7.5925e-02 0.4697
## Hengyang 7.3384e-02 -9.1954e-02 6.8366e+00 6.3234e-02 0.4748
## Hongjiang 1.7214e+00 -9.1954e-02 6.8366e+00 6.9353e-01 0.2440
## Huarong 1.2252e+00 -9.1954e-02 6.8366e+00 5.0375e-01 0.3072
## Huayuan 1.3473e+00 -9.1954e-02 6.8366e+00 5.5043e-01 0.2910
## Huitong 2.1880e+00 -9.1954e-02 6.8366e+00 8.7198e-01 0.1916
## Jiahe -9.3416e-01 -9.1954e-02 6.8366e+00 -3.2210e-01 0.6263
## Jianghua 1.3463e+00 -9.1954e-02 6.8366e+00 5.5008e-01 0.2911
## Jiangyong 1.1769e+00 -9.1954e-02 6.8366e+00 4.8529e-01 0.3137
## Jingzhou 1.3692e+00 -9.1954e-02 6.8366e+00 5.5883e-01 0.2881
## Jinshi 1.3637e-02 -9.1954e-02 6.8366e+00 4.0384e-02 0.4839
## Jishou -2.4882e+00 -9.1954e-02 6.8366e+00 -9.1646e-01 0.8203
## Lanshan 5.3431e-01 -9.1954e-02 6.8366e+00 2.3952e-01 0.4054
## Leiyang 3.6429e-01 -9.1954e-02 6.8366e+00 1.7449e-01 0.4307
## Lengshuijiang -1.3481e+01 -9.1954e-02 6.8366e+00 -5.1206e+00 1.0000
## Li 2.7700e-03 -9.1954e-02 6.8366e+00 3.6228e-02 0.4856
## Lianyuan -2.2915e+00 -9.1954e-02 6.8366e+00 -8.4121e-01 0.7999
## Liling 1.3127e+01 -9.1954e-02 6.8366e+00 5.0557e+00 0.0000
## Linli -7.6749e-03 -9.1954e-02 6.8366e+00 3.2233e-02 0.4871
## Linwu 5.9182e-02 -9.1954e-02 6.8366e+00 5.7802e-02 0.4770
## Linxiang 8.9333e-01 -9.1954e-02 6.8366e+00 3.7682e-01 0.3532
## Liuyang 2.6814e+01 -9.1954e-02 6.8366e+00 1.0290e+01 0.0000
## Longhui 2.1325e+00 -9.1954e-02 6.8366e+00 8.5074e-01 0.1975
## Longshan 3.3841e+00 -9.1954e-02 6.8366e+00 1.3294e+00 0.0919
## Luxi 1.3170e+00 -9.1954e-02 6.8366e+00 5.3885e-01 0.2950
## Mayang 1.2656e+00 -9.1954e-02 6.8366e+00 5.1921e-01 0.3018
## Miluo 1.2402e+01 -9.1954e-02 6.8366e+00 4.7784e+00 0.0000
## Nan -6.0565e-01 -9.1954e-02 6.8366e+00 -1.9646e-01 0.5779
## Ningxiang 1.3096e+01 -9.1954e-02 6.8366e+00 5.0437e+00 0.0000
## Ningyuan 1.4858e+00 -9.1954e-02 6.8366e+00 6.0341e-01 0.2731
## Pingjiang -6.0562e+00 -9.1954e-02 6.8366e+00 -2.2811e+00 0.9887
## Qidong 1.0180e+00 -9.1954e-02 6.8366e+00 4.2451e-01 0.3356
## Qiyang 1.0586e+00 -9.1954e-02 6.8366e+00 4.4005e-01 0.3300
## Rucheng -1.4785e+00 -9.1954e-02 6.8366e+00 -5.3029e-01 0.7020
## Sangzhi 2.5006e+00 -9.1954e-02 6.8366e+00 9.9154e-01 0.1607
## Shaodong -2.6376e-02 -9.1954e-02 6.8366e+00 2.5081e-02 0.4900
## Shaoshan 1.0831e+01 -9.1954e-02 6.8366e+00 4.1776e+00 0.0000
## Shaoyang 1.5201e+00 -9.1954e-02 6.8366e+00 6.1652e-01 0.2688
## Shimen -2.5992e-01 -9.1954e-02 6.8366e+00 -6.4240e-02 0.5256
## Shuangfeng -4.9102e-01 -9.1954e-02 6.8366e+00 -1.5262e-01 0.5607
## Shuangpai 3.5368e-01 -9.1954e-02 6.8366e+00 1.7044e-01 0.4323
## Suining 3.1437e+00 -9.1954e-02 6.8366e+00 1.2375e+00 0.1080
## Taojiang -2.6068e+00 -9.1954e-02 6.8366e+00 -9.6180e-01 0.8319
## Taoyuan 9.0485e-02 -9.1954e-02 6.8366e+00 6.9774e-02 0.4722
## Tongdao 4.0982e+00 -9.1954e-02 6.8366e+00 1.6025e+00 0.0545
## Wangcheng 3.4050e+01 -9.1954e-02 6.8366e+00 1.3058e+01 0.0000
## Wugang 4.8230e+00 -9.1954e-02 6.8366e+00 1.8797e+00 0.0301
## Xiangtan 1.6075e+00 -9.1954e-02 6.8366e+00 6.4996e-01 0.2579
## Xiangxiang 2.2502e+00 -9.1954e-02 6.8366e+00 8.9575e-01 0.1852
## Xiangyin 5.2141e+00 -9.1954e-02 6.8366e+00 2.0293e+00 0.0212
## Xinhua 1.8045e+00 -9.1954e-02 6.8366e+00 7.2531e-01 0.2341
## Xinhuang 1.9527e+00 -9.1954e-02 6.8366e+00 7.8199e-01 0.2171
## Xinning 5.1008e+00 -9.1954e-02 6.8366e+00 1.9860e+00 0.0235
## Xinshao 1.1025e+00 -9.1954e-02 6.8366e+00 4.5683e-01 0.3239
## Xintian 2.9296e-01 -9.1954e-02 6.8366e+00 1.4721e-01 0.4415
## Xupu 2.4821e-01 -9.1954e-02 6.8366e+00 1.3010e-01 0.4482
## Yanling -3.5181e-01 -9.1954e-02 6.8366e+00 -9.9383e-02 0.5396
## Yizhang -5.3061e-01 -9.1954e-02 6.8366e+00 -1.6776e-01 0.5666
## Yongshun 3.4110e+00 -9.1954e-02 6.8366e+00 1.3397e+00 0.0902
## Yongxing 9.6546e-01 -9.1954e-02 6.8366e+00 4.0441e-01 0.3430
## You 3.9581e-01 -9.1954e-02 6.8366e+00 1.8655e-01 0.4260
## Yuanjiang 1.9578e-01 -9.1954e-02 6.8366e+00 1.1005e-01 0.4562
## Yuanling 5.3798e-02 -9.1954e-02 6.8366e+00 5.5743e-02 0.4778
## Yueyang 2.8183e-01 -9.1954e-02 6.8366e+00 1.4296e-01 0.4432
## Zhijiang 9.6683e-01 -9.1954e-02 6.8366e+00 4.0493e-01 0.3428
## Zhongfang -1.9505e+00 -9.1954e-02 6.8366e+00 -7.1081e-01 0.7614
## Zhuzhou 2.5374e+00 -9.1954e-02 6.8366e+00 1.0056e+00 0.1573
## Zixing -5.0635e+00 -9.1954e-02 6.8366e+00 -1.9014e+00 0.9714
The results:
Ii: the local Moran’s I statistics
E.Ii: the expectation of local moran statistic under the randomisation hypothesis
Var.Ii: the variance of local moran statistic under the randomisation hypothesis
Z.Ii:the standard deviate of local moran statistic
Pr(): the p-value of local moran statistic
Before mapping the map, it is wise to append the localMI (Queen Contiguity neighbours) into hunan SpatialPolygonDataFrame.
hunan.localMI <- cbind(hunan, localMI)
Using tmap package functions,
tm_shape(hunan.localMI) +
tm_fill(col = "Ii",
style = "pretty",
palette = "RdBu",
title = "local moran statistics")+
tm_borders(alpha = 0.5)
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
tm_shape(hunan.localMI) +
tm_fill(col = "Pr.z...0.",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette = "-Blues",
title = "local Moran's I p-values")+
tm_borders(alpha = 0.5)
localMI.map <- tm_shape(hunan.localMI) +
tm_fill(col = "Ii",
style = "pretty",
title = "local moran statistics")+
tm_borders(alpha = 0.5)
pvalue.map <- tm_shape(hunan.localMI) +
tm_fill(col = "Pr.z...0.",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette = "-Blues",
title = "local Moran's I p-values")+
tm_borders(alpha = 0.5)
tmap_arrange(localMI.map, pvalue.map, asp = 1, ncol = 2)
## Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
We can see that the 3 green subzone/county areas are high in local Moran’s I values meaning positive autocorrelation. These 3 areas are also highly adjacent in values (GDPPC) with their neighbours and they are statistically significant as clusters because their the p-values are smaller than the confidence interval of 95%. We can also see that although there are negative local Moran’s I values indicating outliers and negative autocorrelation, they are not significant enough from the p-values. However, we do not know what kind of cluster is this? low GDPPC or high GDPPC? so we have to conduct the LISA cluster map.
LISA cluster map shows the significant locations color coded by type of spatial autocorrelation.
Steps:
plot the Moran scatterplot
generate LISA cluster map
We plot the Moran scatterplot of GDPPC 2012 using moran.plot() function of spdep package.
nci <- moran.plot(hunan$GDPPC2012, rswm_q, labels = as.character(hunan$County), xlab = "GDPPC 2012", ylab = "Spatially Lag GDPPC 2012")
We can see that the plot is split into 4 quadrants, the top right corner belong to areas that have high GDPPC & surrounded by other areas that have average level of GDPPC (high-high locations)
We will use scale() function to centers & scales the variable.
Centering - subtracting the mean (omitting NAs) from the corresponding columns
Scaling - dividing the centered variable by their standard deviations
hunan$Z.GDPPC2012 <- scale(hunan$GDPPC2012) %>% as.vector
knitr::kable(head(hunan, n=5))
| NAME_2 | ID_3 | NAME_3 | ENGTYPE_3 | Shape_Leng | Shape_Area | County | City | GDPPC2005 | GDPPC2006 | GDPPC2007 | GDPPC2008 | GDPPC2009 | GDPPC2010 | GDPPC2011 | GDPPC2012 | Z.GDPPC2012 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Changde | 21098 | Anxiang | County | 1.869074 | 0.1005619 | Anxiang | Changde | 8184 | 10995 | 12670 | 14128 | 16763 | 19817 | 22840 | 23667 | -0.0492059 |
| Changde | 21100 | Hanshou | County | 2.360691 | 0.1997875 | Hanshou | Changde | 6560 | 7422 | 8512 | 11478 | 12140 | 15906 | 18561 | 20981 | -0.2283412 |
| Changde | 21101 | Jinshi | County City | 1.425620 | 0.0530241 | Jinshi | Changde | 9956 | 11223 | 14258 | 16969 | 20528 | 24824 | 30286 | 34592 | 0.6794062 |
| Changde | 21102 | Li | County | 3.474324 | 0.1890812 | Li | Changde | 8394 | 10304 | 11048 | 12723 | 15786 | 18426 | 21847 | 24473 | 0.0045480 |
| Changde | 21103 | Linli | County | 2.289506 | 0.1145036 | Linli | Changde | 8850 | 9855 | 11373 | 14564 | 16160 | 19273 | 23172 | 25554 | 0.0766422 |
as.vector() function make sures that the resulting data type is a vector, map neatly into dataframe.
nci2 <- moran.plot(hunan$Z.GDPPC2012, rswm_q, labels = as.character(hunan$County), xlab = "GDPPC 2012", ylab = "Spatially Lag GDPPC 2012")
Create a numeric vector data type that has the length of the rows of localMI which is
quadrant <- vector(mode = "numeric", length = nrow(localMI))
Next, we centers the variable of interest (GDPPC 2012) around it’s mean
hunan$GDPP2012 is a numeric vector and mean(local) is a single number so the number is subtracted from every element of the vector (vector operation).
localMI[,1] calls the Moran I statistic value.
DV <- hunan$GDPPC2012 - mean(hunan$GDPPC2012)
C_mI <- localMI[,1] - mean(localMI[,1])
Next, we will set a statistical significance level (confidence interval) for the local Moran.
signif <- 0.05
From the scatterplot, we get the dashed line, outlining the different quadrants but we’re not able to get the different observations in each quadrant.
That’s why we’re creating a new vector names quadrant to get the different quadrant values for each observation so it can be plotted with tmap.
The four commands below define:
high-high Quadrant 4 - Cluster/Positive autocorrelation
low-low Quadrant 1 - Cluster/Positive autocorrelation
low-high Quadrant 2 - Outlier/Negative autocorrelation
high-low Quadrant 3 - Outlier/Negative autocorrelation
non-significant Moran (p-value of localMI is compared to the significance level)
quadrant[DV >0 & C_mI>0] <- 4
quadrant[DV <0 & C_mI<0] <- 1
quadrant[DV <0 & C_mI>0] <- 2
quadrant[DV >0 & C_mI<0] <- 3
quadrant[localMI[,5]>signif] <- 0
Now, we can see the different quadrant each observation belongs to.
hunan.localMI$quadrant <- quadrant
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
knitr::kable(head(hunan.localMI, n=5))
| NAME_2 | ID_3 | NAME_3 | ENGTYPE_3 | Shape_Leng | Shape_Area | County | City | GDPPC2005 | GDPPC2006 | GDPPC2007 | GDPPC2008 | GDPPC2009 | GDPPC2010 | GDPPC2011 | GDPPC2012 | Ii | E.Ii | Var.Ii | Z.Ii | Pr.z…0. | quadrant | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Changde | 21098 | Anxiang | County | 1.869074 | 0.1005619 | Anxiang | Changde | 8184 | 10995 | 12670 | 14128 | 16763 | 19817 | 22840 | 23667 | -0.0014685 | -0.0114943 | 0.1768267 | 0.0238421 | 0.4904893 | 0 |
| 1 | Changde | 21100 | Hanshou | County | 2.360691 | 0.1997875 | Hanshou | Changde | 6560 | 7422 | 8512 | 11478 | 12140 | 15906 | 18561 | 20981 | 0.0258782 | -0.0114943 | 0.1768267 | 0.0888745 | 0.4645908 | 0 |
| 2 | Changde | 21101 | Jinshi | County City | 1.425620 | 0.0530241 | Jinshi | Changde | 9956 | 11223 | 14258 | 16969 | 20528 | 24824 | 30286 | 34592 | -0.0119876 | -0.0114943 | 0.2234962 | -0.0010437 | 0.5004164 | 0 |
| 3 | Changde | 21102 | Li | County | 3.474324 | 0.1890812 | Li | Changde | 8394 | 10304 | 11048 | 12723 | 15786 | 18426 | 21847 | 24473 | 0.0010225 | -0.0114943 | 0.2234962 | 0.0264762 | 0.4894388 | 0 |
| 4 | Changde | 21103 | Linli | County | 2.289506 | 0.1145036 | Linli | Changde | 8850 | 9855 | 11373 | 14564 | 16160 | 19273 | 23172 | 25554 | 0.0148149 | -0.0114943 | 0.2234962 | 0.0556508 | 0.4778100 | 0 |
tm_shape(hunan.localMI) +
tm_fill(col = "quadrant", style = "cat", palette = colors[c(sort(unique(quadrant)))+1], labels = clusters[c(sort(unique(quadrant)))+1], popup.vars = c("Postal.Code")) +
tm_view(set.zoom.limits = c(11,17)) +
tm_borders(alpha=0.5)
We can see that from the map, there’s a high-high cluster indicating positive autocorrelation for the subzone/county areas having adjacent high GDPPC values. There is also a low-high cluster indicating negative autocorrelation/outliers whereby these areas with low GDPPC values are surrounded by high GDPPC areas.
Another form of LISA/Spatial statistics to detect spatial anomalies is the Getis and Ord’s G-statistics.
The statistics will look at neighbours within a defined proximity to identify where either high or low values cluster spatially/in space.
A significant positive value of G ( G> 0) indicates that there is a " cluster" of high values of the variable analyzed with reference to their average. Also, a significant but negative index value of G (G < 0) is shown as a group of low values relative to the average of the variable analyzed.
Hot spot area = G>0, it’s also High-high in Local Moran’s I values
Cold spot area = G<0, it’s also Low-low in Local Moran’s I values
Steps:
Deriving spatial weight matrix
Computing Gi statistics
Mapping Gi statistics
For Getis-Ord, we define neighbours based on distance/proximity. (while spatial autocorrelation considered units which shared borders)
The two types of distance-based weight matrix:
Fixed distance weight matrix
Adaptive distance weight matrix
We use dnearneigh() function of spdep package to calculate the fixed distance weight matrix. The function identifies neighbours of region points by Euclidean distance between lower and upper bounds in km.
dnb <- dnearneigh(coordinates(hunan), 0, 85, longlat = TRUE)
dnb
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 642
## Percentage nonzero weights: 8.290289
## Average number of links: 7.295455
plot(hunan, border = "lightgrey")
plot(dnb, coordinates(hunan), add = TRUE, col = "red")
The nb2listw() supplements a neighbours list with spatial weights for the chosen coding scheme. The input of nb2listw() must be object of class nb. The syntax of the function has two major arguments, namely style and zero.poly.
style can take values “W”, “B”, “C”, “U”, “minmax” and “S”. B is the basic binary coding, W is row standardised (sums over all links to n), C is globally standardised (sums over all links to n), U is equal to C divided by the number of neighbours (sums over all links to unity), while S is the variance-stabilizing coding scheme proposed by Tiefelsdorf et al. 1999, p. 167-168 (sums over all links to n).
If zero policy is set to TRUE, weights vectors of zero length are inserted for regions without neighbour in the neighbours list. These will in turn generate lag values of zero, equivalent to the sum of products of the zero row t(rep(0, length=length(neighbours))) %*% x, for arbitraty numerical vector x of length length(neighbours). The spatially lagged value of x for the zero-neighbour region will then be zero, which may (or may not) be a sensible choice.
dnb_lw <- nb2listw(dnb, style = "B")
summary(dnb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 642
## Percentage nonzero weights: 8.290289
## Average number of links: 7.295455
## Link number distribution:
##
## 2 3 4 5 6 7 8 9 10 11
## 1 4 7 6 12 12 13 23 7 3
## 1 least connected region:
## 88 with 2 links
## 3 most connected regions:
## 35 47 86 with 11 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 88 7744 642 1284 20320
The output of nb2listw() function is a spatial weights object.
Note: It is this weights list object that will be used in the computation of Gi statistics and not the Neighbour list object.
Instead of using a fixed distance to derive the weight matrix, a fixed number of neighbours can also be used to derive the weight matrix, the adaptive distance method.
coords <- coordinates(hunan)
knb <- knn2nb(knearneigh(coords, k = 8, longlat = TRUE), row.names = row.names(hunan$gdppc))
knb_lw <- nb2listw(knb, style = "B")
summary(knb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 704
## Percentage nonzero weights: 9.090909
## Average number of links: 8
## Non-symmetric neighbours list
## Link number distribution:
##
## 8
## 88
## 88 least connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
## 88 most connected regions:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 88 7744 704 1286 23044
plot(hunan, border = "lightgrey")
plot(knb, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")
fips <- order(hunan$County)
gi.fixed <- localG(hunan$GDPPC2012, dnb_lw)
#gi.fixed
The output of localG() function of spdep package is a vector of G/Gstar values with attributes “gstari” set to TRUE/FALSE, “call” set to the function call and class “localG”
The Gi statistics is represented as a Z-score. Greater values represent a greater intensity of clustering and the direction (positive or negative) indicates high or low clusters.
Next, we will join the fixed distance weight matrix Gi statistics value to the hunan SpatialPolygonDataFrame.
hunan.gi <- cbind(hunan, as.matrix(gi.fixed))
names(hunan.gi)[17] <- "gstat"
head(hunan.gi@data)
## NAME_2 ID_3 NAME_3 ENGTYPE_3 Shape_Leng Shape_Area County City
## 1 Changde 21098 Anxiang County 1.869074 0.10056190 Anxiang Changde
## 2 Changde 21100 Hanshou County 2.360691 0.19978745 Hanshou Changde
## 3 Changde 21101 Jinshi County City 1.425620 0.05302413 Jinshi Changde
## 4 Changde 21102 Li County 3.474325 0.18908121 Li Changde
## 5 Changde 21103 Linli County 2.289506 0.11450357 Linli Changde
## 6 Changde 21104 Shimen County 4.171918 0.37194707 Shimen Changde
## GDPPC2005 GDPPC2006 GDPPC2007 GDPPC2008 GDPPC2009 GDPPC2010 GDPPC2011
## 1 8184 10995 12670 14128 16763 19817 22840
## 2 6560 7422 8512 11478 12140 15906 18561
## 3 9956 11223 14258 16969 20528 24824 30286
## 4 8394 10304 11048 12723 15786 18426 21847
## 5 8850 9855 11373 14564 16160 19273 23172
## 6 9244 10556 11760 12902 17355 20501 24409
## GDPPC2012 gstat
## 1 23667 -0.049205949
## 2 20981 -0.228341158
## 3 34592 0.679406172
## 4 24473 0.004547952
## 5 25554 0.076642204
## 6 27137 0.182215933
## structure.c.0.331637226690737..0.300225037067981...0.0192872220698243..
## 1 0.33163723
## 2 0.30022504
## 3 -0.01928722
## 4 0.23347826
## 5 -0.03392157
## 6 -0.47966539
fips <- order(hunan$County)
gi.adaptive <- localG(hunan$GDPPC2012, knb_lw)
hunan.gi <- cbind(hunan.gi, as.matrix(gi.adaptive))
names(hunan.gi)[19] <- "gstat_adaptive"
head(hunan.gi@data)
## NAME_2 ID_3 NAME_3 ENGTYPE_3 Shape_Leng Shape_Area County City
## 1 Changde 21098 Anxiang County 1.869074 0.10056190 Anxiang Changde
## 2 Changde 21100 Hanshou County 2.360691 0.19978745 Hanshou Changde
## 3 Changde 21101 Jinshi County City 1.425620 0.05302413 Jinshi Changde
## 4 Changde 21102 Li County 3.474325 0.18908121 Li Changde
## 5 Changde 21103 Linli County 2.289506 0.11450357 Linli Changde
## 6 Changde 21104 Shimen County 4.171918 0.37194707 Shimen Changde
## GDPPC2005 GDPPC2006 GDPPC2007 GDPPC2008 GDPPC2009 GDPPC2010 GDPPC2011
## 1 8184 10995 12670 14128 16763 19817 22840
## 2 6560 7422 8512 11478 12140 15906 18561
## 3 9956 11223 14258 16969 20528 24824 30286
## 4 8394 10304 11048 12723 15786 18426 21847
## 5 8850 9855 11373 14564 16160 19273 23172
## 6 9244 10556 11760 12902 17355 20501 24409
## GDPPC2012 gstat
## 1 23667 -0.049205949
## 2 20981 -0.228341158
## 3 34592 0.679406172
## 4 24473 0.004547952
## 5 25554 0.076642204
## 6 27137 0.182215933
## structure.c.0.331637226690737..0.300225037067981...0.0192872220698243..
## 1 0.33163723
## 2 0.30022504
## 3 -0.01928722
## 4 0.23347826
## 5 -0.03392157
## 6 -0.47966539
## gstat_adaptive
## 1 0.27442880
## 2 0.30022504
## 3 0.03044770
## 4 0.22227212
## 5 -0.03392157
## 6 -0.51413315
tm_shape(hunan.gi)+
tm_fill(col = "gstat",
style = "pretty",
palette = "-RdBu",
title = "local Gi value")+
tm_borders(alpha = 0.5)
## Variable(s) "gstat" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
In the choropleth map above, counties shaded in red are the hot spot areas and counties shaded in blue are the cold spot areas. The darkness of the colours representing the intensity of the Gi values.
In conclusion, the choropleth above shows clear sign of east-west divide in the GDP per capita by county of Hunan province in 2012. The hot spot areas were centre around Changsha city at the east of the provice. The cold spot areas, on the other hand, mainly comprise of counties located on the western part of the province centre around the forested Shaoyang prefecture city.
tm_shape(hunan.gi)+
tm_fill(col = "gstat_adaptive",
style = "pretty",
palette = "-RdBu",
title = "local Gi value")+
tm_borders(alpha = 0.5)
## Variable(s) "gstat_adaptive" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.