In this hands-on exercise, we will learn how to compute Spatial Autocorrelation statistics using the spdep package.
Hunan county shapefile containing the county’s boundary layer
Hunan_2012.csv containing selected Hunan’s local development indicators
We will be using these R packages:
sf
tmap
spdep
rgdal
tidyverse
packages = c('sf','tmap','spdep','rgdal', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
We import the shapefile using readOGR() function of rgdal package.
Then, we import the csv file using read_csv() function of readr package.
hunan <- readOGR(dsn = "data/shapefile", layer = "Hunan")
## OGR data source with driver: ESRI Shapefile
## Source: "D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Week 6\Hands-on_Ex06\data\shapefile", layer: "Hunan"
## with 88 features
## It has 7 fields
It has 88 features meaning 88 polygons (counties).
hunan2012 <- read_csv("data/attribute/Hunan_2012.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## County = col_character(),
## City = col_character()
## )
## See spec(...) for full column specifications.
We will use left_join() function of dplyr package to combine hunan (SpatialPolygonsDataFrame) and hunan2012 (tbl dataframe)
hunan@data <- left_join(hunan@data, hunan2012)
## Joining, by = "County"
knitr::kable(head(hunan, n=5))
| NAME_2 | ID_3 | NAME_3 | ENGTYPE_3 | Shape_Leng | Shape_Area | County | City | avg_wage | deposite | FAI | Gov_Rev | Gov_Exp | GDP | GDPPC | GIO | Loan | NIPCR | Bed | Emp | EmpR | EmpRT | Pri_Stu | Sec_Stu | Household | Household_R | NOIP | Pop_R | RSCG | Pop_T | Agri | Service | Disp_Inc | RORP | ROREmp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Changde | 21098 | Anxiang | County | 1.869074 | 0.1005619 | Anxiang | Changde | 31935 | 5517.2 | 3541.0 | 243.64 | 1779.5 | 12482.0 | 23667 | 5108.9 | 2806.9 | 7693.7 | 1931 | 336.39 | 270.5 | 205.9 | 19.584 | 17.819 | 148.1 | 135.4 | 53 | 346.0 | 3957.9 | 528.3 | 4524.41 | 14100 | 16610 | 0.6549309 | 0.8041262 |
| Changde | 21100 | Hanshou | County | 2.360691 | 0.1997875 | Hanshou | Changde | 32265 | 7979.0 | 8665.0 | 386.13 | 2062.4 | 15788.0 | 20981 | 13491.0 | 4550.0 | 8269.9 | 2560 | 456.78 | 388.8 | 246.7 | 42.097 | 33.029 | 240.2 | 208.7 | 95 | 553.2 | 4460.5 | 804.6 | 6545.35 | 17727 | 18925 | 0.6875466 | 0.8511756 |
| Changde | 21101 | Jinshi | County City | 1.425620 | 0.0530241 | Jinshi | Changde | 28692 | 4581.7 | 4777.0 | 373.31 | 1148.4 | 8706.9 | 34592 | 10935.0 | 2242.0 | 8169.9 | 848 | 122.78 | 82.1 | 61.7 | 8.723 | 7.592 | 81.9 | 43.7 | 77 | 92.4 | 3683.0 | 251.8 | 2562.46 | 7525 | 19498 | 0.3669579 | 0.6686757 |
| Changde | 21102 | Li | County | 3.474324 | 0.1890812 | Li | Changde | 32541 | 13487.0 | 16066.0 | 709.61 | 2459.5 | 20322.0 | 24473 | 18402.0 | 6748.0 | 8377.0 | 2038 | 513.44 | 426.8 | 227.1 | 38.975 | 33.938 | 268.5 | 256.0 | 96 | 539.7 | 7110.2 | 832.5 | 7562.34 | 53160 | 18985 | 0.6482883 | 0.8312558 |
| Changde | 21103 | Linli | County | 2.289506 | 0.1145036 | Linli | Changde | 32667 | 564.1 | 7781.2 | 336.86 | 1538.7 | 10355.0 | 25554 | 8214.0 | 358.0 | 8143.1 | 1440 | 307.36 | 272.2 | 100.8 | 23.286 | 18.943 | 129.1 | 157.2 | 99 | 246.6 | 3604.9 | 409.3 | 3583.91 | 7031 | 18604 | 0.6024921 | 0.8856065 |
We use qtm() function from tmap package, it stands for quick thematic map and is an easy way to plot maps.
In this map, we plot the “GPDPC” variable which is the GPD per capita of each county because it relates to the economy and economy relates to the development of the county.
qtm(hunan, "GDPPC")
H0: There distribution of GDPPC per county are randomly distributed
H1: The distribution of GDPPC per county are not randomly distributed, there are clusters (positive autocorrelation) or outliers (negative autocorrelation)
Significance/confidence interval: 95% or 0.05
In class, we have learnt about contiguity (common boundary) based neighbours and how to create spatial weights for either distance-based neighbours or adjacency-based neighbours.
In this section, we will use poly2nb() function of spded package to compute contiguity weight matrices for the study area.
This function builds a neighbours list based on regions with contiguous boundaries.
Default ‘queen’ argument is TRUE so if we do not specify, it will return a list of first order neighbours using the Queen criteria
wm_q <- poly2nb(hunan, queen = TRUE)
summary(wm_q)
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 448
## Percentage nonzero weights: 5.785124
## Average number of links: 5.090909
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 11
## 2 2 12 16 24 14 11 4 2 1
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links
From the summary, we can find out that:
There are 88 area units/counties in Hunan
Most connected area unit has 11 neighbours which is shown by the first row 11 with the next row showing 1 meaning only 1 area unit has 11 neighbours
There are 2 area units with only 1 neighbour
wm_q[[1]]
## [1] 2 3 4 57 85
wm_q list all the neighbouring polygons of the ID we call. So for polygon ID 1: 2,3,4,57,85 are all its neighbours.
So, what is polygon ID 1 and which area units are the 5 neighbours?
hunan$NAME_3[1]
## [1] "Anxiang"
hunan$NAME_3[c(2,3,4,57,85)]
## [1] "Hanshou" "Jinshi" "Li" "Nan" "Taoyuan"
Thus, Polygon ID 1 is Anxiang and Anxiang has 5 neighbours, namely Hanshou, Jinshi, Li, Nan, Taoyuan.
str(wm_q)
## List of 88
## $ : int [1:5] 2 3 4 57 85
## $ : int [1:5] 1 57 58 78 85
## $ : int [1:4] 1 4 5 85
## $ : int [1:4] 1 3 5 6
## $ : int [1:4] 3 4 6 85
## $ : int [1:5] 4 5 69 75 85
## $ : int [1:4] 67 71 74 84
## $ : int [1:7] 9 46 47 56 78 80 86
## $ : int [1:6] 8 66 68 78 84 86
## $ : int [1:8] 16 17 19 20 22 70 72 73
## $ : int [1:3] 14 17 72
## $ : int [1:5] 13 60 61 63 83
## $ : int [1:4] 12 15 60 83
## $ : int [1:3] 11 15 17
## $ : int [1:4] 13 14 17 83
## $ : int [1:5] 10 17 22 72 83
## $ : int [1:7] 10 11 14 15 16 72 83
## $ : int [1:5] 20 22 23 77 83
## $ : int [1:6] 10 20 21 73 74 86
## $ : int [1:7] 10 18 19 21 22 23 82
## $ : int [1:5] 19 20 35 82 86
## $ : int [1:5] 10 16 18 20 83
## $ : int [1:7] 18 20 38 41 77 79 82
## $ : int [1:5] 25 28 31 32 54
## $ : int [1:5] 24 28 31 33 81
## $ : int [1:4] 27 33 42 81
## $ : int [1:3] 26 29 42
## $ : int [1:5] 24 25 33 49 54
## $ : int [1:3] 27 37 42
## $ : int 33
## $ : int [1:8] 24 25 32 36 39 40 56 81
## $ : int [1:8] 24 31 50 54 55 56 75 85
## $ : int [1:5] 25 26 28 30 81
## $ : int [1:3] 36 45 80
## $ : int [1:6] 21 41 47 80 82 86
## $ : int [1:6] 31 34 40 45 56 80
## $ : int [1:4] 29 42 43 44
## $ : int [1:4] 23 44 77 79
## $ : int [1:5] 31 40 42 43 81
## $ : int [1:6] 31 36 39 43 45 79
## $ : int [1:6] 23 35 45 79 80 82
## $ : int [1:7] 26 27 29 37 39 43 81
## $ : int [1:6] 37 39 40 42 44 79
## $ : int [1:4] 37 38 43 79
## $ : int [1:6] 34 36 40 41 79 80
## $ : int [1:3] 8 47 86
## $ : int [1:5] 8 35 46 80 86
## $ : int [1:5] 50 51 52 53 55
## $ : int [1:4] 28 51 52 54
## $ : int [1:5] 32 48 52 54 55
## $ : int [1:3] 48 49 52
## $ : int [1:5] 48 49 50 51 54
## $ : int [1:3] 48 55 75
## $ : int [1:6] 24 28 32 49 50 52
## $ : int [1:5] 32 48 50 53 75
## $ : int [1:7] 8 31 32 36 78 80 85
## $ : int [1:6] 1 2 58 64 76 85
## $ : int [1:5] 2 57 68 76 78
## $ : int [1:4] 60 61 87 88
## $ : int [1:4] 12 13 59 61
## $ : int [1:7] 12 59 60 62 63 77 87
## $ : int [1:3] 61 77 87
## $ : int [1:4] 12 61 77 83
## $ : int [1:2] 57 76
## $ : int 76
## $ : int [1:5] 9 67 68 76 84
## $ : int [1:4] 7 66 76 84
## $ : int [1:5] 9 58 66 76 78
## $ : int [1:3] 6 75 85
## $ : int [1:3] 10 72 73
## $ : int [1:3] 7 73 74
## $ : int [1:5] 10 11 16 17 70
## $ : int [1:5] 10 19 70 71 74
## $ : int [1:6] 7 19 71 73 84 86
## $ : int [1:6] 6 32 53 55 69 85
## $ : int [1:7] 57 58 64 65 66 67 68
## $ : int [1:7] 18 23 38 61 62 63 83
## $ : int [1:7] 2 8 9 56 58 68 85
## $ : int [1:7] 23 38 40 41 43 44 45
## $ : int [1:8] 8 34 35 36 41 45 47 56
## $ : int [1:6] 25 26 31 33 39 42
## $ : int [1:5] 20 21 23 35 41
## $ : int [1:9] 12 13 15 16 17 18 22 63 77
## $ : int [1:6] 7 9 66 67 74 86
## $ : int [1:11] 1 2 3 5 6 32 56 57 69 75 ...
## $ : int [1:9] 8 9 19 21 35 46 47 74 84
## $ : int [1:4] 59 61 62 88
## $ : int [1:2] 59 87
## - attr(*, "class")= chr "nb"
## - attr(*, "region.id")= chr [1:88] "0" "1" "2" "3" ...
## - attr(*, "call")= language poly2nb(pl = hunan, queen = TRUE)
## - attr(*, "type")= chr "queen"
## - attr(*, "sym")= logi TRUE
As you can see from above, the ID is not printed so you need to know that the last row which is “int [1:2] 59 87” is the last observation and it has 2 neighbours, 59 and 87.
There are some observations that has only 1 neighbour as shown by “int 76”.
wm_r <- poly2nb(hunan, queen = FALSE)
summary(wm_r)
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 440
## Percentage nonzero weights: 5.681818
## Average number of links: 5
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10
## 2 2 12 20 21 14 11 3 2 1
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 10 links
From the summary, we can find out that:
There are 88 area units/counties in Hunan
Most connected area unit has 10 neighbours which is shown by the first row 10 with the next row showing 1 meaning only 1 area unit has 10 neighbours
There are 2 area units with only 1 neighbour
wm_r[[2]]
## [1] 57 58 78 85
So for polygon ID 2: 57,58, 78, 85 are all its neighbours.
So, what is polygon ID 2 and which area units are the 4 neighbours?
hunan$NAME_3[2]
## [1] "Hanshou"
hunan$NAME_3[c(57,58,78,85)]
## [1] "Nan" "Yuanjiang" "Taojiang" "Taoyuan"
Thus, Polygon ID 2 is Hanshou and Hanshou has 4 neighbours, namely Nan, Yuanjiang, Taojiang, Taoyuan.
plot(hunan, borders = 'lightgrey')
plot(wm_q, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")
plot(hunan, borders = 'lightgrey')
plot(wm_r, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")
#par(mfrow = c(1,2))
plot(hunan, borders = 'lightgrey')
plot(wm_q, coordinates(hunan), pch = 19, cex = 0.6,add = TRUE, col = "red")
plot(wm_r, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "blue")
As we know that ROOK contiguity neighbours only take the direct shared boundary neighbours so they will have less neighbours edges seen on the map plot. This is shown by the red edges, that are the QUEEN contiguity neighbours, that are not covered in blue meaning that those edges are present in QUEEN but not in ROOK.
We will derive distance-based weight matrices by using dnearneigh() function of spdep package.
The function identifies neighbours of area points by Euclidean distance with a distance range with lower d1 and upper d2 controlled by the bounds= argument.
If unprojected coordinates are used and either specified in the coordinates object x or with x as a two column matrix & longlat = TRUE, great circle distances in km will be calculated assuming the WGS84 reference ellipsoid.
To determine the upper limit for distance band:
Return a matrix with the indices of points belonging to the set of the k nearest neighbours of each other by using knearneigh() function of spdep package
Convert the knn object returned by knearneigh() function into a neighbours list of class nb with a list of integer vector containing neighbour region number ids by using knn2nb() function
Return the length of neighbour relationship edges by using nbdlists() function to return the units of the coorinates if the coordinates are projected, in km otherwise
Remove the list structure of the returned object by using unlist() function
coords <- coordinates(hunan)
head(coords)
## [,1] [,2]
## 0 112.1531 29.44362
## 1 112.0372 28.86489
## 2 111.8917 29.47107
## 3 111.7031 29.74499
## 4 111.6138 29.49258
## 5 111.0341 29.79863
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
summary(k1dists)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 24.79 32.57 38.01 39.07 44.52 61.79
The summary shows that the largest first nearest neighbour distance is 61.79 km, so using this as the upper band/threshold gives certainty that all units will have at least 1 neighbour because it’s the max distance.
We will derive distance-based weight matrices by using dnearneigh() function of spdep package.
wm_d62 <- dnearneigh(coords, 0 ,62, longlat = TRUE)
wm_d62
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 324
## Percentage nonzero weights: 4.183884
## Average number of links: 3.681818
str(wm_d62)
## List of 88
## $ : int [1:5] 3 4 5 57 64
## $ : int [1:4] 57 58 78 85
## $ : int [1:4] 1 4 5 57
## $ : int [1:3] 1 3 5
## $ : int [1:4] 1 3 4 85
## $ : int 69
## $ : int [1:2] 67 84
## $ : int [1:4] 9 46 47 78
## $ : int [1:4] 8 46 68 84
## $ : int [1:4] 16 22 70 72
## $ : int [1:3] 14 17 72
## $ : int [1:5] 13 60 61 63 83
## $ : int [1:4] 12 15 60 83
## $ : int [1:2] 11 17
## $ : int 13
## $ : int [1:4] 10 17 22 83
## $ : int [1:3] 11 14 16
## $ : int [1:3] 20 22 63
## $ : int [1:5] 20 21 73 74 82
## $ : int [1:5] 18 19 21 22 82
## $ : int [1:6] 19 20 35 74 82 86
## $ : int [1:4] 10 16 18 20
## $ : int [1:3] 41 77 82
## $ : int [1:4] 25 28 31 54
## $ : int [1:4] 24 28 33 81
## $ : int [1:4] 27 33 42 81
## $ : int [1:2] 26 29
## $ : int [1:6] 24 25 33 49 52 54
## $ : int [1:2] 27 37
## $ : int 33
## $ : int [1:2] 24 36
## $ : int 50
## $ : int [1:5] 25 26 28 30 81
## $ : int [1:3] 36 45 80
## $ : int [1:6] 21 41 46 47 80 82
## $ : int [1:5] 31 34 45 56 80
## $ : int [1:2] 29 42
## $ : int [1:3] 44 77 79
## $ : int [1:4] 40 42 43 81
## $ : int [1:3] 39 45 79
## $ : int [1:5] 23 35 45 79 82
## $ : int [1:5] 26 37 39 43 81
## $ : int [1:3] 39 42 44
## $ : int [1:2] 38 43
## $ : int [1:6] 34 36 40 41 79 80
## $ : int [1:5] 8 9 35 47 86
## $ : int [1:5] 8 35 46 80 86
## $ : int [1:5] 50 51 52 53 55
## $ : int [1:4] 28 51 52 54
## $ : int [1:6] 32 48 51 52 54 55
## $ : int [1:4] 48 49 50 52
## $ : int [1:6] 28 48 49 50 51 54
## $ : int [1:2] 48 55
## $ : int [1:5] 24 28 49 50 52
## $ : int [1:4] 48 50 53 75
## $ : int 36
## $ : int [1:5] 1 2 3 58 64
## $ : int [1:5] 2 57 64 66 68
## $ : int [1:3] 60 87 88
## $ : int [1:4] 12 13 59 61
## $ : int [1:5] 12 60 62 63 87
## $ : int [1:4] 61 63 77 87
## $ : int [1:5] 12 18 61 62 83
## $ : int [1:4] 1 57 58 76
## $ : int 76
## $ : int [1:5] 58 67 68 76 84
## $ : int [1:2] 7 66
## $ : int [1:4] 9 58 66 84
## $ : int [1:2] 6 75
## $ : int [1:3] 10 72 73
## $ : int [1:2] 73 74
## $ : int [1:3] 10 11 70
## $ : int [1:4] 19 70 71 74
## $ : int [1:5] 19 21 71 73 86
## $ : int [1:2] 55 69
## $ : int [1:3] 64 65 66
## $ : int [1:3] 23 38 62
## $ : int [1:2] 2 8
## $ : int [1:4] 38 40 41 45
## $ : int [1:5] 34 35 36 45 47
## $ : int [1:5] 25 26 33 39 42
## $ : int [1:6] 19 20 21 23 35 41
## $ : int [1:4] 12 13 16 63
## $ : int [1:4] 7 9 66 68
## $ : int [1:2] 2 5
## $ : int [1:4] 21 46 47 74
## $ : int [1:4] 59 61 62 88
## $ : int [1:2] 59 87
## - attr(*, "class")= chr "nb"
## - attr(*, "nbtype")= chr "distance"
## - attr(*, "distances")= num [1:2] 0 62
## - attr(*, "region.id")= chr [1:88] "1" "2" "3" "4" ...
## - attr(*, "call")= language dnearneigh(x = coords, d1 = 0, d2 = 62, longlat = TRUE)
## - attr(*, "dnn")= num [1:2] 0 62
## - attr(*, "bounds")= chr [1:2] "GT" "LE"
## - attr(*, "sym")= logi TRUE
We used str() function to display the weight matrix but another way to display the structure of the distance weight matrix is to combine table() and card() of spdep.
table(hunan$County, card(wm_d62))
##
## 1 2 3 4 5 6
## Anhua 1 0 0 0 0 0
## Anren 0 0 0 1 0 0
## Anxiang 0 0 0 0 1 0
## Baojing 0 0 0 0 1 0
## Chaling 0 0 1 0 0 0
## Changning 0 0 1 0 0 0
## Changsha 0 0 0 1 0 0
## Chengbu 0 1 0 0 0 0
## Chenxi 0 0 0 1 0 0
## Cili 0 1 0 0 0 0
## Dao 0 0 0 1 0 0
## Dongan 0 0 1 0 0 0
## Dongkou 0 0 0 1 0 0
## Fenghuang 0 0 0 1 0 0
## Guidong 0 0 1 0 0 0
## Guiyang 0 0 0 1 0 0
## Guzhang 0 0 0 0 0 1
## Hanshou 0 0 0 1 0 0
## Hengdong 0 0 0 0 1 0
## Hengnan 0 0 0 0 1 0
## Hengshan 0 0 0 0 0 1
## Hengyang 0 0 0 0 0 1
## Hongjiang 0 0 0 0 1 0
## Huarong 0 0 0 1 0 0
## Huayuan 0 0 0 1 0 0
## Huitong 0 0 0 1 0 0
## Jiahe 0 0 0 0 1 0
## Jianghua 0 0 1 0 0 0
## Jiangyong 0 1 0 0 0 0
## Jingzhou 0 1 0 0 0 0
## Jinshi 0 0 0 1 0 0
## Jishou 0 0 0 0 0 1
## Lanshan 0 0 0 1 0 0
## Leiyang 0 0 0 1 0 0
## Lengshuijiang 0 0 1 0 0 0
## Li 0 0 1 0 0 0
## Lianyuan 0 0 0 0 1 0
## Liling 0 1 0 0 0 0
## Linli 0 0 0 1 0 0
## Linwu 0 0 0 1 0 0
## Linxiang 1 0 0 0 0 0
## Liuyang 0 1 0 0 0 0
## Longhui 0 0 1 0 0 0
## Longshan 0 1 0 0 0 0
## Luxi 0 0 0 0 1 0
## Mayang 0 0 0 0 0 1
## Miluo 0 0 0 0 1 0
## Nan 0 0 0 0 1 0
## Ningxiang 0 0 0 1 0 0
## Ningyuan 0 0 0 0 1 0
## Pingjiang 0 1 0 0 0 0
## Qidong 0 0 1 0 0 0
## Qiyang 0 0 1 0 0 0
## Rucheng 0 1 0 0 0 0
## Sangzhi 0 1 0 0 0 0
## Shaodong 0 0 0 0 1 0
## Shaoshan 0 0 0 0 1 0
## Shaoyang 0 0 0 1 0 0
## Shimen 1 0 0 0 0 0
## Shuangfeng 0 0 0 0 0 1
## Shuangpai 0 0 0 1 0 0
## Suining 0 0 0 0 1 0
## Taojiang 0 1 0 0 0 0
## Taoyuan 0 1 0 0 0 0
## Tongdao 0 1 0 0 0 0
## Wangcheng 0 0 0 1 0 0
## Wugang 0 0 1 0 0 0
## Xiangtan 0 0 0 1 0 0
## Xiangxiang 0 0 0 0 1 0
## Xiangyin 0 0 0 1 0 0
## Xinhua 0 0 0 0 1 0
## Xinhuang 1 0 0 0 0 0
## Xinning 0 1 0 0 0 0
## Xinshao 0 0 0 0 0 1
## Xintian 0 0 0 0 1 0
## Xupu 0 1 0 0 0 0
## Yanling 0 0 1 0 0 0
## Yizhang 1 0 0 0 0 0
## Yongshun 0 0 0 1 0 0
## Yongxing 0 0 0 1 0 0
## You 0 0 0 1 0 0
## Yuanjiang 0 0 0 0 1 0
## Yuanling 1 0 0 0 0 0
## Yueyang 0 0 1 0 0 0
## Zhijiang 0 0 0 0 1 0
## Zhongfang 0 0 0 1 0 0
## Zhuzhou 0 0 0 0 1 0
## Zixing 0 0 1 0 0 0
We use n.comp.nb() function of spdep package to find the number of disjoint connected subgraphs in the wm_d62 weight matrix. (whether all the points are connected in one network or is there a another network in the same map)
nc value - number of disjoint connected networks of neighbours
comp.id - vector with indices of the disjoint connected networks of neighbours
n_comp <- n.comp.nb(wm_d62)
n_comp$nc
## [1] 1
table(n_comp$comp.id)
##
## 1
## 88
We can see that all the counties return 1 for their nc value, meaning there’s only 1 connected network.
Plotting the base map of hunan, k1 (computed by knearneigh() function which return the links of 1st nearest neighbour) and wm_d62 (the links of neighbours within 62 km) separately to see the difference.
par(mfrow = c(1,2))
plot(hunan, border = "lightgrey")
plot(k1, coords, add = TRUE, col = "red", length = 0.08)
title(main = "Links of 1st nearest neighbour", cex.main = 0.9)
plot(hunan, border = "lightgrey")
plot(wm_d62, coords, add = TRUE, pch = 19, cex = 0.6)
title(main = "Neighbours within 62 km", cex.main = 0.9)
Plotting them together.
plot(hunan, border = 'lightgrey')
plot(wm_d62, coords, add = TRUE)
plot(k1, coords, add = TRUE, col = 'red', length = 0.08)
Fixed distance weight matrix’s characteristic is that more densely settled areas (urban areas) tend to have more neighbours and the less densely settled areas (rural counties) tend to have lesser neighbours.
It’s possible to control the number of neighbours directly by using k-nearest neighbours knearneigh() function, either accepting asymmetric neighbours or imposing symmetry as shown below.
knn6 <- knn2nb(knearneigh(coords, k =6))
knn6
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 528
## Percentage nonzero weights: 6.818182
## Average number of links: 6
## Non-symmetric neighbours list
Notice that all of the counties have 6 neighbours, not more than 6 or less than 6.
str(knn6)
## List of 88
## $ : int [1:6] 2 3 4 5 57 64
## $ : int [1:6] 1 3 57 58 78 85
## $ : int [1:6] 1 2 4 5 57 85
## $ : int [1:6] 1 3 5 6 69 85
## $ : int [1:6] 1 3 4 6 69 85
## $ : int [1:6] 3 4 5 69 75 85
## $ : int [1:6] 9 66 67 71 74 84
## $ : int [1:6] 9 46 47 78 80 86
## $ : int [1:6] 8 46 66 68 84 86
## $ : int [1:6] 16 19 22 70 72 73
## $ : int [1:6] 10 14 16 17 70 72
## $ : int [1:6] 13 15 60 61 63 83
## $ : int [1:6] 12 15 60 61 63 83
## $ : int [1:6] 11 15 16 17 72 83
## $ : int [1:6] 12 13 14 17 60 83
## $ : int [1:6] 10 11 17 22 72 83
## $ : int [1:6] 10 11 14 16 72 83
## $ : int [1:6] 20 22 23 63 77 83
## $ : int [1:6] 10 20 21 73 74 82
## $ : int [1:6] 18 19 21 22 23 82
## $ : int [1:6] 19 20 35 74 82 86
## $ : int [1:6] 10 16 18 19 20 83
## $ : int [1:6] 18 20 41 77 79 82
## $ : int [1:6] 25 28 31 52 54 81
## $ : int [1:6] 24 28 31 33 54 81
## $ : int [1:6] 25 27 29 33 42 81
## $ : int [1:6] 26 29 30 37 42 81
## $ : int [1:6] 24 25 33 49 52 54
## $ : int [1:6] 26 27 37 42 43 81
## $ : int [1:6] 26 27 28 33 49 81
## $ : int [1:6] 24 25 36 39 40 54
## $ : int [1:6] 24 31 50 54 55 56
## $ : int [1:6] 25 26 28 30 49 81
## $ : int [1:6] 36 40 41 45 56 80
## $ : int [1:6] 21 41 46 47 80 82
## $ : int [1:6] 31 34 40 45 56 80
## $ : int [1:6] 26 27 29 42 43 44
## $ : int [1:6] 23 43 44 62 77 79
## $ : int [1:6] 25 40 42 43 44 81
## $ : int [1:6] 31 36 39 43 45 79
## $ : int [1:6] 23 35 45 79 80 82
## $ : int [1:6] 26 27 37 39 43 81
## $ : int [1:6] 37 39 40 42 44 79
## $ : int [1:6] 37 38 39 42 43 79
## $ : int [1:6] 34 36 40 41 79 80
## $ : int [1:6] 8 9 35 47 78 86
## $ : int [1:6] 8 21 35 46 80 86
## $ : int [1:6] 49 50 51 52 53 55
## $ : int [1:6] 28 33 48 51 52 54
## $ : int [1:6] 32 48 51 52 54 55
## $ : int [1:6] 28 48 49 50 52 54
## $ : int [1:6] 28 48 49 50 51 54
## $ : int [1:6] 48 50 51 52 55 75
## $ : int [1:6] 24 28 49 50 51 52
## $ : int [1:6] 32 48 50 52 53 75
## $ : int [1:6] 32 34 36 78 80 85
## $ : int [1:6] 1 2 3 58 64 68
## $ : int [1:6] 2 57 64 66 68 78
## $ : int [1:6] 12 13 60 61 87 88
## $ : int [1:6] 12 13 59 61 63 87
## $ : int [1:6] 12 13 60 62 63 87
## $ : int [1:6] 12 38 61 63 77 87
## $ : int [1:6] 12 18 60 61 62 83
## $ : int [1:6] 1 3 57 58 68 76
## $ : int [1:6] 58 64 66 67 68 76
## $ : int [1:6] 9 58 67 68 76 84
## $ : int [1:6] 7 65 66 68 76 84
## $ : int [1:6] 9 57 58 66 78 84
## $ : int [1:6] 4 5 6 32 75 85
## $ : int [1:6] 10 16 19 22 72 73
## $ : int [1:6] 7 19 73 74 84 86
## $ : int [1:6] 10 11 14 16 17 70
## $ : int [1:6] 10 19 21 70 71 74
## $ : int [1:6] 19 21 71 73 84 86
## $ : int [1:6] 6 32 50 53 55 69
## $ : int [1:6] 58 64 65 66 67 68
## $ : int [1:6] 18 23 38 61 62 63
## $ : int [1:6] 2 8 9 46 58 68
## $ : int [1:6] 38 40 41 43 44 45
## $ : int [1:6] 34 35 36 41 45 47
## $ : int [1:6] 25 26 28 33 39 42
## $ : int [1:6] 19 20 21 23 35 41
## $ : int [1:6] 12 13 15 16 22 63
## $ : int [1:6] 7 9 66 68 71 74
## $ : int [1:6] 2 3 4 5 56 69
## $ : int [1:6] 8 9 21 46 47 74
## $ : int [1:6] 59 60 61 62 63 88
## $ : int [1:6] 59 60 61 62 63 87
## - attr(*, "region.id")= chr [1:88] "1" "2" "3" "4" ...
## - attr(*, "call")= language knearneigh(x = coords, k = 6)
## - attr(*, "sym")= logi FALSE
## - attr(*, "type")= chr "knn"
## - attr(*, "knn-k")= num 6
## - attr(*, "class")= chr "nb"
plot(hunan, border = 'lightgrey')
plot(knn6, coords, pch = 19, cex = 0.6, add = TRUE, col = 'red')
We will derive spatial weight matrices based on Inverse Distance method by using nbdists() function of spdep package.
dist <- nbdists(wm_q, coords, longlat = TRUE)
ids <- lapply(dist, function(x) 1/(x))
ids
## [[1]]
## [1] 0.01535405 0.03916350 0.01820896 0.02807922 0.01145113
##
## [[2]]
## [1] 0.01535405 0.01764308 0.01925924 0.02323898 0.01719350
##
## [[3]]
## [1] 0.03916350 0.02822040 0.03695795 0.01395765
##
## [[4]]
## [1] 0.01820896 0.02822040 0.03414741 0.01539065
##
## [[5]]
## [1] 0.03695795 0.03414741 0.01524598 0.01618354
##
## [[6]]
## [1] 0.015390649 0.015245977 0.021748129 0.011883901 0.009810297
##
## [[7]]
## [1] 0.01708612 0.01473997 0.01150924 0.01872915
##
## [[8]]
## [1] 0.02022144 0.03453056 0.02529256 0.01036340 0.02284457 0.01500600 0.01515314
##
## [[9]]
## [1] 0.02022144 0.01574888 0.02109502 0.01508028 0.02902705 0.01502980
##
## [[10]]
## [1] 0.02281552 0.01387777 0.01538326 0.01346650 0.02100510 0.02631658 0.01874863
## [8] 0.01500046
##
## [[11]]
## [1] 0.01882869 0.02243492 0.02247473
##
## [[12]]
## [1] 0.02779227 0.02419652 0.02333385 0.02986130 0.02335429
##
## [[13]]
## [1] 0.02779227 0.02650020 0.02670323 0.01714243
##
## [[14]]
## [1] 0.01882869 0.01233868 0.02098555
##
## [[15]]
## [1] 0.02650020 0.01233868 0.01096284 0.01562226
##
## [[16]]
## [1] 0.02281552 0.02466962 0.02765018 0.01476814 0.01671430
##
## [[17]]
## [1] 0.01387777 0.02243492 0.02098555 0.01096284 0.02466962 0.01593341 0.01437996
##
## [[18]]
## [1] 0.02039779 0.02032767 0.01481665 0.01473691 0.01459380
##
## [[19]]
## [1] 0.01538326 0.01926323 0.02668415 0.02140253 0.01613589 0.01412874
##
## [[20]]
## [1] 0.01346650 0.02039779 0.01926323 0.01723025 0.02153130 0.01469240 0.02327034
##
## [[21]]
## [1] 0.02668415 0.01723025 0.01766299 0.02644986 0.02163800
##
## [[22]]
## [1] 0.02100510 0.02765018 0.02032767 0.02153130 0.01489296
##
## [[23]]
## [1] 0.01481665 0.01469240 0.01401432 0.02246233 0.01880425 0.01530458 0.01849605
##
## [[24]]
## [1] 0.02354598 0.01837201 0.02607264 0.01220154 0.02514180
##
## [[25]]
## [1] 0.02354598 0.02188032 0.01577283 0.01949232 0.02947957
##
## [[26]]
## [1] 0.02166102 0.01743220 0.02207751 0.02209114
##
## [[27]]
## [1] 0.02166102 0.02490625 0.01562326
##
## [[28]]
## [1] 0.01837201 0.02188032 0.02229549 0.03076171 0.02039506
##
## [[29]]
## [1] 0.02490625 0.01686587 0.01395022
##
## [[30]]
## [1] 0.02090587
##
## [[31]]
## [1] 0.02607264 0.01577283 0.01219005 0.01724850 0.01229012 0.01609781 0.01139438
## [8] 0.01150130
##
## [[32]]
## [1] 0.01220154 0.01219005 0.01712515 0.01340413 0.01280928 0.01198216 0.01053374
## [8] 0.01065655
##
## [[33]]
## [1] 0.01949232 0.01743220 0.02229549 0.02090587 0.01979045
##
## [[34]]
## [1] 0.03113041 0.03589551 0.02882915
##
## [[35]]
## [1] 0.01766299 0.02185795 0.02616766 0.02111721 0.02108253 0.01509020
##
## [[36]]
## [1] 0.01724850 0.03113041 0.01571707 0.01860991 0.02073549 0.01680129
##
## [[37]]
## [1] 0.01686587 0.02234793 0.01510990 0.01550676
##
## [[38]]
## [1] 0.01401432 0.02407426 0.02276151 0.01719415
##
## [[39]]
## [1] 0.01229012 0.02172543 0.01711924 0.02629732 0.01896385
##
## [[40]]
## [1] 0.01609781 0.01571707 0.02172543 0.01506473 0.01987922 0.01894207
##
## [[41]]
## [1] 0.02246233 0.02185795 0.02205991 0.01912542 0.01601083 0.01742892
##
## [[42]]
## [1] 0.02207751 0.01562326 0.01395022 0.02234793 0.01711924 0.01836831 0.01683518
##
## [[43]]
## [1] 0.01510990 0.02629732 0.01506473 0.01836831 0.03112027 0.01530782
##
## [[44]]
## [1] 0.01550676 0.02407426 0.03112027 0.01486508
##
## [[45]]
## [1] 0.03589551 0.01860991 0.01987922 0.02205991 0.02107101 0.01982700
##
## [[46]]
## [1] 0.03453056 0.04033752 0.02689769
##
## [[47]]
## [1] 0.02529256 0.02616766 0.04033752 0.01949145 0.02181458
##
## [[48]]
## [1] 0.02313819 0.03370576 0.02289485 0.01630057 0.01818085
##
## [[49]]
## [1] 0.03076171 0.02138091 0.02394529 0.01990000
##
## [[50]]
## [1] 0.01712515 0.02313819 0.02551427 0.02051530 0.02187179
##
## [[51]]
## [1] 0.03370576 0.02138091 0.02873854
##
## [[52]]
## [1] 0.02289485 0.02394529 0.02551427 0.02873854 0.03516672
##
## [[53]]
## [1] 0.01630057 0.01979945 0.01253977
##
## [[54]]
## [1] 0.02514180 0.02039506 0.01340413 0.01990000 0.02051530 0.03516672
##
## [[55]]
## [1] 0.01280928 0.01818085 0.02187179 0.01979945 0.01882298
##
## [[56]]
## [1] 0.01036340 0.01139438 0.01198216 0.02073549 0.01214479 0.01362855 0.01341697
##
## [[57]]
## [1] 0.028079221 0.017643082 0.031423501 0.029114131 0.013520292 0.009903702
##
## [[58]]
## [1] 0.01925924 0.03142350 0.02722997 0.01434859 0.01567192
##
## [[59]]
## [1] 0.01696711 0.01265572 0.01667105 0.01785036
##
## [[60]]
## [1] 0.02419652 0.02670323 0.01696711 0.02343040
##
## [[61]]
## [1] 0.02333385 0.01265572 0.02343040 0.02514093 0.02790764 0.01219751 0.02362452
##
## [[62]]
## [1] 0.02514093 0.02002219 0.02110260
##
## [[63]]
## [1] 0.02986130 0.02790764 0.01407043 0.01805987
##
## [[64]]
## [1] 0.02911413 0.01689892
##
## [[65]]
## [1] 0.02471705
##
## [[66]]
## [1] 0.01574888 0.01726461 0.03068853 0.01954805 0.01810569
##
## [[67]]
## [1] 0.01708612 0.01726461 0.01349843 0.01361172
##
## [[68]]
## [1] 0.02109502 0.02722997 0.03068853 0.01406357 0.01546511
##
## [[69]]
## [1] 0.02174813 0.01645838 0.01419926
##
## [[70]]
## [1] 0.02631658 0.01963168 0.02278487
##
## [[71]]
## [1] 0.01473997 0.01838483 0.03197403
##
## [[72]]
## [1] 0.01874863 0.02247473 0.01476814 0.01593341 0.01963168
##
## [[73]]
## [1] 0.01500046 0.02140253 0.02278487 0.01838483 0.01652709
##
## [[74]]
## [1] 0.01150924 0.01613589 0.03197403 0.01652709 0.01342099 0.02864567
##
## [[75]]
## [1] 0.011883901 0.010533736 0.012539774 0.018822977 0.016458383 0.008217581
##
## [[76]]
## [1] 0.01352029 0.01434859 0.01689892 0.02471705 0.01954805 0.01349843 0.01406357
##
## [[77]]
## [1] 0.014736909 0.018804247 0.022761507 0.012197506 0.020022195 0.014070428
## [7] 0.008440896
##
## [[78]]
## [1] 0.02323898 0.02284457 0.01508028 0.01214479 0.01567192 0.01546511 0.01140779
##
## [[79]]
## [1] 0.01530458 0.01719415 0.01894207 0.01912542 0.01530782 0.01486508 0.02107101
##
## [[80]]
## [1] 0.01500600 0.02882915 0.02111721 0.01680129 0.01601083 0.01982700 0.01949145
## [8] 0.01362855
##
## [[81]]
## [1] 0.02947957 0.02209114 0.01150130 0.01979045 0.01896385 0.01683518
##
## [[82]]
## [1] 0.02327034 0.02644986 0.01849605 0.02108253 0.01742892
##
## [[83]]
## [1] 0.023354289 0.017142433 0.015622258 0.016714303 0.014379961 0.014593799
## [7] 0.014892965 0.018059871 0.008440896
##
## [[84]]
## [1] 0.01872915 0.02902705 0.01810569 0.01361172 0.01342099 0.01297994
##
## [[85]]
## [1] 0.011451133 0.017193502 0.013957649 0.016183544 0.009810297 0.010656545
## [7] 0.013416965 0.009903702 0.014199260 0.008217581 0.011407794
##
## [[86]]
## [1] 0.01515314 0.01502980 0.01412874 0.02163800 0.01509020 0.02689769 0.02181458
## [8] 0.02864567 0.01297994
##
## [[87]]
## [1] 0.01667105 0.02362452 0.02110260 0.02058034
##
## [[88]]
## [1] 0.01785036 0.02058034
For IDW, we can see that by inversing the distance, the larger distances get scaled down and the smaller distances get scaled up. All the values are also in decimals because we divide 1 by the value.
The most intuitive way to summarize the neighbours’ values:
assigning the fraction 1/(no. of neighbours) to each neighbouring county
summing the weighted income values
However, the polygons along the edges of the study area will base their lagged values on fewer polygona thus potentially over/under-estimating the true nature of the spatial autocorrelation.
So, we will use the nb2listw() function of spdep package which will supplement a neighbours list with spatial weights for the chosen coding scheme (style argument).
rswm_q <- nb2listw(wm_q, style = "W", zero.policy = TRUE)
rswm_q
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 448
## Percentage nonzero weights: 5.785124
## Average number of links: 5.090909
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 88 7744 88 37.86334 365.9147
The ‘zero.policy=TRUE’ argument allows for lists of non-neighbours.
Style=“W” means neighbouring polygon will be assigned equal weight, it’s row standardised (sums over all links to n)
Style=“B” means basic binary coding which is a more robust option.
To see the weight of the first polygon (polygon ID 1)’s four neighbours:
rswm_q$weights[1]
## [[1]]
## [1] 0.2 0.2 0.2 0.2 0.2
Each neighbour is assigned a 0.2 of the total weight. This means that when R computes the average neighbouring income values, each neighbour’s income will be multiplied by 0.2 before being tallied.
We will try to derive the row standardised distance weight matrix using the style B, basic binary coding.
rswm_ids <- nb2listw(wm_q, glist = ids, style = "B", zero.policy = TRUE)
rswm_ids
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 448
## Percentage nonzero weights: 5.785124
## Average number of links: 5.090909
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 88 7744 8.786712 0.3776401 3.813528
To see the row standardised + inversed distance weight of the first polygon (polygon ID 1)’s four neighbours:
rswm_ids$weights[1]
## [[1]]
## [1] 0.01535405 0.03916350 0.01820896 0.02807922 0.01145113
summary(unlist(rswm_ids$weights))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.008218 0.015088 0.018739 0.019613 0.022823 0.040338
We will create 4 different spatial lagged variables:
spatialy lagged variables with row-standardized weights and sum of neighbouring values
spatial window average, and spatial window sum
With a neighbor structure defined by the non-zero elements of the spatial weights matrix W, a spatially lagged variable is a weighted sum or a weighted average of the neighbouring values for that variable.
We will use lag.listw() or lag() function of spdep package to compute the lag vector var (numeric vector of the same length as the listw object rswm_q)
GDPPC.lag <- lag.listw(rswm_q, hunan$GDPPC)
rswm_q$neighbours[1]
## [[1]]
## [1] 2 3 4 57 85
rswm_q$weights[1]
## [[1]]
## [1] 0.2 0.2 0.2 0.2 0.2
GDPPC.lag[1]
## [1] 24847.2
This means that 24847.20 is the value of tallying the first polygon’s (Anxiang) weighted GDPPC values so it would be (0.2xGDPPC of polygon 2 + 0.2xGDPPC of polygon 3 + 0.2xGDPPC of polygon 4 + 0.2xGDPPC of polygon 57 + 0.2xGDPPC of polygon 85)
We will append the spatially lag GDPPC values into hunan SpatialPolygonDataFrame by subsetting the list and turning it into data frame before finally using left_join() function of dplyr package to add the data to hunan.
lag.list <- list(hunan$NAME_3, lag.listw(rswm_q, hunan$GDPPC))
lag.res <- as.data.frame(lag.list)
colnames(lag.res) <- c("NAME_3", "lag GDPPC (queen)")
head(lag.res)
## NAME_3 lag GDPPC (queen)
## 1 Anxiang 24847.20
## 2 Hanshou 22724.80
## 3 Jinshi 24143.25
## 4 Li 27737.50
## 5 Linli 27270.25
## 6 Shimen 21248.80
hunan@data <- left_join(hunan@data, lag.res)
## Joining, by = "NAME_3"
head(hunan@data)
## NAME_2 ID_3 NAME_3 ENGTYPE_3 Shape_Leng Shape_Area County City
## 1 Changde 21098 Anxiang County 1.869074 0.10056190 Anxiang Changde
## 2 Changde 21100 Hanshou County 2.360691 0.19978745 Hanshou Changde
## 3 Changde 21101 Jinshi County City 1.425620 0.05302413 Jinshi Changde
## 4 Changde 21102 Li County 3.474325 0.18908121 Li Changde
## 5 Changde 21103 Linli County 2.289506 0.11450357 Linli Changde
## 6 Changde 21104 Shimen County 4.171918 0.37194707 Shimen Changde
## avg_wage deposite FAI Gov_Rev Gov_Exp GDP GDPPC GIO Loan NIPCR
## 1 31935 5517.2 3541.0 243.64 1779.5 12482.0 23667 5108.9 2806.9 7693.7
## 2 32265 7979.0 8665.0 386.13 2062.4 15788.0 20981 13491.0 4550.0 8269.9
## 3 28692 4581.7 4777.0 373.31 1148.4 8706.9 34592 10935.0 2242.0 8169.9
## 4 32541 13487.0 16066.0 709.61 2459.5 20322.0 24473 18402.0 6748.0 8377.0
## 5 32667 564.1 7781.2 336.86 1538.7 10355.0 25554 8214.0 358.0 8143.1
## 6 33261 8334.4 10531.0 548.33 2178.8 16293.0 27137 17795.0 6026.5 6156.0
## Bed Emp EmpR EmpRT Pri_Stu Sec_Stu Household Household_R NOIP Pop_R
## 1 1931 336.39 270.5 205.9 19.584 17.819 148.1 135.4 53 346.0
## 2 2560 456.78 388.8 246.7 42.097 33.029 240.2 208.7 95 553.2
## 3 848 122.78 82.1 61.7 8.723 7.592 81.9 43.7 77 92.4
## 4 2038 513.44 426.8 227.1 38.975 33.938 268.5 256.0 96 539.7
## 5 1440 307.36 272.2 100.8 23.286 18.943 129.1 157.2 99 246.6
## 6 2502 392.05 329.6 193.8 29.245 26.104 190.6 184.7 122 399.2
## RSCG Pop_T Agri Service Disp_Inc RORP ROREmp lag GDPPC (queen)
## 1 3957.9 528.3 4524.41 14100 16610 0.6549309 0.8041262 24847.20
## 2 4460.5 804.6 6545.35 17727 18925 0.6875466 0.8511756 22724.80
## 3 3683.0 251.8 2562.46 7525 19498 0.3669579 0.6686757 24143.25
## 4 7110.2 832.5 7562.34 53160 18985 0.6482883 0.8312558 27737.50
## 5 3604.9 409.3 3583.91 7031 18604 0.6024921 0.8856065 27270.25
## 6 6490.7 600.5 5266.51 6981 19275 0.6647794 0.8407091 21248.80
Next, we will plot both the original GDPPC value and the spatially lagged GDPPC values for comparison.
gdppc <- qtm(hunan, "GDPPC")
lag_gdppc <- qtm(hunan, "lag GDPPC (queen)")
tmap_arrange(gdppc, lag_gdppc, asp = 1, ncol =2)
The spatial window sum uses and includes the diagonal element. We will assign knn6 to a new variable to alter its structure to add the diagonal elements by using the include.self() function of spdep package which
knn6a <- knn6
include.self(knn6a)
## Neighbour list object:
## Number of regions: 88
## Number of nonzero links: 616
## Percentage nonzero weights: 7.954545
## Average number of links: 7
## Non-symmetric neighbours list
Let’s see the difference.
plot(hunan, border = 'lightgrey')
plot(include.self(knn6a), coords, pch = 19, cex = 0.6, col = 'blue', add = TRUE)
plot(knn6, coords, pch = 19, cex = 0.6, col = 'red', add = TRUE)
The spatial window sum is simply the sum of one area’s GDPPC value and its neighbours’ GDPPC value. When dealing with a binary variable, the spatial window sum corresponds to the number of events (observations with a value of 1) within the window centered on a location (including the value at that location). Thus, we will assign binary weights to the neighbour structure.
binary.knn6 <- lapply(knn6a, function(x) 0*x+1)
binary.knn6[1]
## [[1]]
## [1] 1 1 1 1 1 1
Next, we will use nb2listw() function
wm_knn6 <- nb2listw(knn6a, glist = binary.knn6, style = "B")
Then, we will compute the spatially lagged GDPPC values for knn6 (adaptive weight matrix of maximum 6 neighbours).
lag_knn6.list <- list(hunan$NAME_3, lag.listw(wm_knn6, hunan$GDPPC))
lag_knn6.res <- as.data.frame(lag_knn6.list)
colnames(lag_knn6.res) <- c("NAME_3", "lag GDPPC (adaptive distance)")
head(lag_knn6.res)
## NAME_3 lag GDPPC (adaptive distance)
## 1 Anxiang 157324
## 2 Hanshou 148216
## 3 Jinshi 138865
## 4 Li 152543
## 5 Linli 151462
## 6 Shimen 140836
hunan@data <- left_join(hunan@data, lag_knn6.res)
## Joining, by = "NAME_3"
head(hunan@data)
## NAME_2 ID_3 NAME_3 ENGTYPE_3 Shape_Leng Shape_Area County City
## 1 Changde 21098 Anxiang County 1.869074 0.10056190 Anxiang Changde
## 2 Changde 21100 Hanshou County 2.360691 0.19978745 Hanshou Changde
## 3 Changde 21101 Jinshi County City 1.425620 0.05302413 Jinshi Changde
## 4 Changde 21102 Li County 3.474325 0.18908121 Li Changde
## 5 Changde 21103 Linli County 2.289506 0.11450357 Linli Changde
## 6 Changde 21104 Shimen County 4.171918 0.37194707 Shimen Changde
## avg_wage deposite FAI Gov_Rev Gov_Exp GDP GDPPC GIO Loan NIPCR
## 1 31935 5517.2 3541.0 243.64 1779.5 12482.0 23667 5108.9 2806.9 7693.7
## 2 32265 7979.0 8665.0 386.13 2062.4 15788.0 20981 13491.0 4550.0 8269.9
## 3 28692 4581.7 4777.0 373.31 1148.4 8706.9 34592 10935.0 2242.0 8169.9
## 4 32541 13487.0 16066.0 709.61 2459.5 20322.0 24473 18402.0 6748.0 8377.0
## 5 32667 564.1 7781.2 336.86 1538.7 10355.0 25554 8214.0 358.0 8143.1
## 6 33261 8334.4 10531.0 548.33 2178.8 16293.0 27137 17795.0 6026.5 6156.0
## Bed Emp EmpR EmpRT Pri_Stu Sec_Stu Household Household_R NOIP Pop_R
## 1 1931 336.39 270.5 205.9 19.584 17.819 148.1 135.4 53 346.0
## 2 2560 456.78 388.8 246.7 42.097 33.029 240.2 208.7 95 553.2
## 3 848 122.78 82.1 61.7 8.723 7.592 81.9 43.7 77 92.4
## 4 2038 513.44 426.8 227.1 38.975 33.938 268.5 256.0 96 539.7
## 5 1440 307.36 272.2 100.8 23.286 18.943 129.1 157.2 99 246.6
## 6 2502 392.05 329.6 193.8 29.245 26.104 190.6 184.7 122 399.2
## RSCG Pop_T Agri Service Disp_Inc RORP ROREmp lag GDPPC (queen)
## 1 3957.9 528.3 4524.41 14100 16610 0.6549309 0.8041262 24847.20
## 2 4460.5 804.6 6545.35 17727 18925 0.6875466 0.8511756 22724.80
## 3 3683.0 251.8 2562.46 7525 19498 0.3669579 0.6686757 24143.25
## 4 7110.2 832.5 7562.34 53160 18985 0.6482883 0.8312558 27737.50
## 5 3604.9 409.3 3583.91 7031 18604 0.6024921 0.8856065 27270.25
## 6 6490.7 600.5 5266.51 6981 19275 0.6647794 0.8407091 21248.80
## lag GDPPC (adaptive distance)
## 1 157324
## 2 148216
## 3 138865
## 4 152543
## 5 151462
## 6 140836
gdppc1 <- qtm(hunan, "GDPPC")
lag_gdppc1 <- qtm(hunan, "lag GDPPC (adaptive distance)")
tmap_arrange(gdppc1, lag_gdppc1, asp = 1, ncol =2)
We will perform Moran’s I statistics testing using moran.test() function of spdep package.
moran.test(hunan$GDPPC, listw = rswm_q, zero.policy = TRUE, na.action = na.omit)
##
## Moran I test under randomisation
##
## data: hunan$GDPPC
## weights: rswm_q
##
## Moran I statistic standard deviate = 4.7351, p-value = 1.095e-06
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.300749970 -0.011494253 0.004348351
Instead of using moran.test() function, we will perform permutation test for Moran’s I statistic by using moran.mc() function of spdep package.
set.seed(1234)
bperm = moran.mc(hunan$GDPPC, listw = rswm_q, nsim = 999, zero.policy = TRUE, na.action = na.omit)
bperm
##
## Monte-Carlo simulation of Moran I
##
## data: hunan$GDPPC
## weights: rswm_q
## number of simulations + 1: 1000
##
## statistic = 0.30075, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater
For both monte carlo simulation and normal Moran’s I statistics, both of the p-value are 0.001 and lower and this means that if confidence interval is 98%, it means that distribution of GDPPC values in the counties are not randomly distributed. Thus, since the Moran I statistics (0.3) is bigger than 0, the spatial pattern is “clustered”.
summary(bperm$res[1:999])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.18339 -0.06168 -0.02125 -0.01505 0.02611 0.27593
hist(bperm$res, freq = TRUE, breaks = 20, xlab = "Simulated Moran's I")
abline(v=0, col="red")
We will perform Geary’s C test for spatial autocorrelation analysis using geary.test() function of spdep package.
geary.test(hunan$GDPPC, listw = rswm_q)
##
## Geary C test under randomisation
##
## data: hunan$GDPPC
## weights: rswm_q
##
## Geary C statistic standard deviate = 3.6108, p-value = 0.0001526
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic Expectation Variance
## 0.6907223 1.0000000 0.0073364
We will perform permutation test for Geary’s C statistic by using geary.mc() function of spdep package.
set.seed(1234)
bperm = geary.mc(hunan$GDPPC, listw = rswm_q, nsim = 999)
bperm
##
## Monte-Carlo simulation of Geary C
##
## data: hunan$GDPPC
## weights: rswm_q
## number of simulations + 1: 1000
##
## statistic = 0.69072, observed rank = 1, p-value = 0.001
## alternative hypothesis: greater
For both monte carlo simulation and normal Geary’s C statistics, both of the p-value are 0.001 and lower and this means that if confidence interval is 98%, it means that distribution of GDPPC values in the counties are not randomly distributed. Thus, since the c statistics value is smaller than 1, the spatial pattern is “clustered”.
summary(bperm$res[1:999])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.7142 0.9502 1.0052 1.0044 1.0595 1.2722
hist(bperm$res, freq = TRUE, breaks = 20, xlab = "Simulated Geary's C")
abline(v=1, col="red")
Spatial correlograms are great to examine patterns of spatial autocorrelation in your data. They show how correlated pairs of spatial observations are when you increase the distance (lag) between them.
MI_corr <- sp.correlogram(wm_q, hunan$GDPPC, order = 6, method = "I", style = "B")
plot(MI_corr)
GC_corr <- sp.correlogram(wm_q, hunan$GDPPC, order = 6, method = "C", style = "W")
plot(GC_corr)
Let’s get the mean distance of the lags order 1-6 to see the distances. We use nblag() function which creates higher order neighbour lists, where higher order neighbours are only lags links from each other on the graph described by the input neighbours list. Credit
nb6 <- nblag(wm_q, 6)
correlogram_bins <- sapply(nb6, function(x) mean(unlist(nbdists(x,coords, longlat = TRUE))))
correlogram_bins
## [1] 55.87101 106.63547 162.68559 219.06365 271.94224 323.42132
summary(correlogram_bins)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 55.87 120.65 190.87 189.94 258.72 323.42
From the correlograms, we can see that Moran’s I values and Geary’s c values are inversely related. We can also see that the spatial pattern of GDPPC variable is clustered from lag 1 to 4 (from 55.87 km to 219.06 km) and dispersed from lag 4 onwards (from 219.06 km to maximum distance of 323.42 km).