IS415 Geospatial Analytics and Application

Instructor: Dr. Kam Tin Seong.

Assoc. Professor of Information Systems (Practice)

1. Overview

In this hands-on exercise, we will learn how to compute Spatial Autocorrelation statistics using the spdep package.

1.1 Dataset

Hunan county shapefile containing the county’s boundary layer
Hunan_2012.csv containing selected Hunan’s local development indicators

2. Load/Install necessary R packages

We will be using these R packages:

sf
tmap
spdep
rgdal
tidyverse

packages = c('sf','tmap','spdep','rgdal', 'tidyverse')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

3. Importing Data into R

We import the shapefile using readOGR() function of rgdal package.

Then, we import the csv file using read_csv() function of readr package.

3.1 Importing geospatial data (shapefile)

hunan <- readOGR(dsn = "data/shapefile", layer = "Hunan")

## OGR data source with driver: ESRI Shapefile 
## Source: "D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Week 6\Hands-on_Ex06\data\shapefile", layer: "Hunan"
## with 88 features
## It has 7 fields

It has 88 features meaning 88 polygons (counties).

3.2 Importing aspatial data (csv file)

hunan2012 <- read_csv("data/attribute/Hunan_2012.csv")

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   County = col_character(),
##   City = col_character()
## )

## See spec(...) for full column specifications.

3.3 Combining geospatial and aspatial data

We will use left_join() function of dplyr package to combine hunan (SpatialPolygonsDataFrame) and hunan2012 (tbl dataframe)

hunan@data <- left_join(hunan@data, hunan2012)

## Joining, by = "County"

knitr::kable(head(hunan, n=5))

NAME_2	ID_3	NAME_3	ENGTYPE_3	Shape_Leng	Shape_Area	County	City	avg_wage	deposite	FAI	Gov_Rev	Gov_Exp	GDP	GDPPC	GIO	Loan	NIPCR	Bed	Emp	EmpR	EmpRT	Pri_Stu	Sec_Stu	Household	Household_R	NOIP	Pop_R	RSCG	Pop_T	Agri	Service	Disp_Inc	RORP	ROREmp
Changde	21098	Anxiang	County	1.869074	0.1005619	Anxiang	Changde	31935	5517.2	3541.0	243.64	1779.5	12482.0	23667	5108.9	2806.9	7693.7	1931	336.39	270.5	205.9	19.584	17.819	148.1	135.4	53	346.0	3957.9	528.3	4524.41	14100	16610	0.6549309	0.8041262
Changde	21100	Hanshou	County	2.360691	0.1997875	Hanshou	Changde	32265	7979.0	8665.0	386.13	2062.4	15788.0	20981	13491.0	4550.0	8269.9	2560	456.78	388.8	246.7	42.097	33.029	240.2	208.7	95	553.2	4460.5	804.6	6545.35	17727	18925	0.6875466	0.8511756
Changde	21101	Jinshi	County City	1.425620	0.0530241	Jinshi	Changde	28692	4581.7	4777.0	373.31	1148.4	8706.9	34592	10935.0	2242.0	8169.9	848	122.78	82.1	61.7	8.723	7.592	81.9	43.7	77	92.4	3683.0	251.8	2562.46	7525	19498	0.3669579	0.6686757
Changde	21102	Li	County	3.474324	0.1890812	Li	Changde	32541	13487.0	16066.0	709.61	2459.5	20322.0	24473	18402.0	6748.0	8377.0	2038	513.44	426.8	227.1	38.975	33.938	268.5	256.0	96	539.7	7110.2	832.5	7562.34	53160	18985	0.6482883	0.8312558
Changde	21103	Linli	County	2.289506	0.1145036	Linli	Changde	32667	564.1	7781.2	336.86	1538.7	10355.0	25554	8214.0	358.0	8143.1	1440	307.36	272.2	100.8	23.286	18.943	129.1	157.2	99	246.6	3604.9	409.3	3583.91	7031	18604	0.6024921	0.8856065

3.4 Visualizing Regional Development Indicator

We use qtm() function from tmap package, it stands for quick thematic map and is an easy way to plot maps.

In this map, we plot the “GPDPC” variable which is the GPD per capita of each county because it relates to the economy and economy relates to the development of the county.

qtm(hunan, "GDPPC")

DIY: With reference to the figure above, define the null (H0) and alternative (H1) hypotheses. Select the confident interval for the statistical test.

H0: There distribution of GDPPC per county are randomly distributed

H1: The distribution of GDPPC per county are not randomly distributed, there are clusters (positive autocorrelation) or outliers (negative autocorrelation)

Significance/confidence interval: 95% or 0.05

4. Modelling Spatial Neighbours

In class, we have learnt about contiguity (common boundary) based neighbours and how to create spatial weights for either distance-based neighbours or adjacency-based neighbours.

In this section, we will use poly2nb() function of spded package to compute contiguity weight matrices for the study area.

This function builds a neighbours list based on regions with contiguous boundaries.

Default ‘queen’ argument is TRUE so if we do not specify, it will return a list of first order neighbours using the Queen criteria

4.1 Creating Queen Contiguity based neighbours

wm_q <- poly2nb(hunan, queen = TRUE)

summary(wm_q)

## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 448 
## Percentage nonzero weights: 5.785124 
## Average number of links: 5.090909 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 11 
##  2  2 12 16 24 14 11  4  2  1 
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 11 links

From the summary, we can find out that:

There are 88 area units/counties in Hunan
Most connected area unit has 11 neighbours which is shown by the first row 11 with the next row showing 1 meaning only 1 area unit has 11 neighbours
There are 2 area units with only 1 neighbour

wm_q[[1]]

## [1]  2  3  4 57 85

wm_q list all the neighbouring polygons of the ID we call. So for polygon ID 1: 2,3,4,57,85 are all its neighbours.

So, what is polygon ID 1 and which area units are the 5 neighbours?

We can find out by calling the NAME_3 variable using $ and subsetting it by calling the ID 1

hunan$NAME_3[1]

## [1] "Anxiang"

hunan$NAME_3[c(2,3,4,57,85)]

## [1] "Hanshou" "Jinshi"  "Li"      "Nan"     "Taoyuan"

Thus, Polygon ID 1 is Anxiang and Anxiang has 5 neighbours, namely Hanshou, Jinshi, Li, Nan, Taoyuan.

str(wm_q)

## List of 88
##  $ : int [1:5] 2 3 4 57 85
##  $ : int [1:5] 1 57 58 78 85
##  $ : int [1:4] 1 4 5 85
##  $ : int [1:4] 1 3 5 6
##  $ : int [1:4] 3 4 6 85
##  $ : int [1:5] 4 5 69 75 85
##  $ : int [1:4] 67 71 74 84
##  $ : int [1:7] 9 46 47 56 78 80 86
##  $ : int [1:6] 8 66 68 78 84 86
##  $ : int [1:8] 16 17 19 20 22 70 72 73
##  $ : int [1:3] 14 17 72
##  $ : int [1:5] 13 60 61 63 83
##  $ : int [1:4] 12 15 60 83
##  $ : int [1:3] 11 15 17
##  $ : int [1:4] 13 14 17 83
##  $ : int [1:5] 10 17 22 72 83
##  $ : int [1:7] 10 11 14 15 16 72 83
##  $ : int [1:5] 20 22 23 77 83
##  $ : int [1:6] 10 20 21 73 74 86
##  $ : int [1:7] 10 18 19 21 22 23 82
##  $ : int [1:5] 19 20 35 82 86
##  $ : int [1:5] 10 16 18 20 83
##  $ : int [1:7] 18 20 38 41 77 79 82
##  $ : int [1:5] 25 28 31 32 54
##  $ : int [1:5] 24 28 31 33 81
##  $ : int [1:4] 27 33 42 81
##  $ : int [1:3] 26 29 42
##  $ : int [1:5] 24 25 33 49 54
##  $ : int [1:3] 27 37 42
##  $ : int 33
##  $ : int [1:8] 24 25 32 36 39 40 56 81
##  $ : int [1:8] 24 31 50 54 55 56 75 85
##  $ : int [1:5] 25 26 28 30 81
##  $ : int [1:3] 36 45 80
##  $ : int [1:6] 21 41 47 80 82 86
##  $ : int [1:6] 31 34 40 45 56 80
##  $ : int [1:4] 29 42 43 44
##  $ : int [1:4] 23 44 77 79
##  $ : int [1:5] 31 40 42 43 81
##  $ : int [1:6] 31 36 39 43 45 79
##  $ : int [1:6] 23 35 45 79 80 82
##  $ : int [1:7] 26 27 29 37 39 43 81
##  $ : int [1:6] 37 39 40 42 44 79
##  $ : int [1:4] 37 38 43 79
##  $ : int [1:6] 34 36 40 41 79 80
##  $ : int [1:3] 8 47 86
##  $ : int [1:5] 8 35 46 80 86
##  $ : int [1:5] 50 51 52 53 55
##  $ : int [1:4] 28 51 52 54
##  $ : int [1:5] 32 48 52 54 55
##  $ : int [1:3] 48 49 52
##  $ : int [1:5] 48 49 50 51 54
##  $ : int [1:3] 48 55 75
##  $ : int [1:6] 24 28 32 49 50 52
##  $ : int [1:5] 32 48 50 53 75
##  $ : int [1:7] 8 31 32 36 78 80 85
##  $ : int [1:6] 1 2 58 64 76 85
##  $ : int [1:5] 2 57 68 76 78
##  $ : int [1:4] 60 61 87 88
##  $ : int [1:4] 12 13 59 61
##  $ : int [1:7] 12 59 60 62 63 77 87
##  $ : int [1:3] 61 77 87
##  $ : int [1:4] 12 61 77 83
##  $ : int [1:2] 57 76
##  $ : int 76
##  $ : int [1:5] 9 67 68 76 84
##  $ : int [1:4] 7 66 76 84
##  $ : int [1:5] 9 58 66 76 78
##  $ : int [1:3] 6 75 85
##  $ : int [1:3] 10 72 73
##  $ : int [1:3] 7 73 74
##  $ : int [1:5] 10 11 16 17 70
##  $ : int [1:5] 10 19 70 71 74
##  $ : int [1:6] 7 19 71 73 84 86
##  $ : int [1:6] 6 32 53 55 69 85
##  $ : int [1:7] 57 58 64 65 66 67 68
##  $ : int [1:7] 18 23 38 61 62 63 83
##  $ : int [1:7] 2 8 9 56 58 68 85
##  $ : int [1:7] 23 38 40 41 43 44 45
##  $ : int [1:8] 8 34 35 36 41 45 47 56
##  $ : int [1:6] 25 26 31 33 39 42
##  $ : int [1:5] 20 21 23 35 41
##  $ : int [1:9] 12 13 15 16 17 18 22 63 77
##  $ : int [1:6] 7 9 66 67 74 86
##  $ : int [1:11] 1 2 3 5 6 32 56 57 69 75 ...
##  $ : int [1:9] 8 9 19 21 35 46 47 74 84
##  $ : int [1:4] 59 61 62 88
##  $ : int [1:2] 59 87
##  - attr(*, "class")= chr "nb"
##  - attr(*, "region.id")= chr [1:88] "0" "1" "2" "3" ...
##  - attr(*, "call")= language poly2nb(pl = hunan, queen = TRUE)
##  - attr(*, "type")= chr "queen"
##  - attr(*, "sym")= logi TRUE

As you can see from above, the ID is not printed so you need to know that the last row which is “int [1:2] 59 87” is the last observation and it has 2 neighbours, 59 and 87.

There are some observations that has only 1 neighbour as shown by “int 76”.

4.2 Creating Rook contiguity based neighbours

wm_r <- poly2nb(hunan, queen = FALSE)

summary(wm_r)

## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 440 
## Percentage nonzero weights: 5.681818 
## Average number of links: 5 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 10 
##  2  2 12 20 21 14 11  3  2  1 
## 2 least connected regions:
## 29 64 with 1 link
## 1 most connected region:
## 84 with 10 links

From the summary, we can find out that:

There are 88 area units/counties in Hunan
Most connected area unit has 10 neighbours which is shown by the first row 10 with the next row showing 1 meaning only 1 area unit has 10 neighbours
There are 2 area units with only 1 neighbour

wm_r[[2]]

## [1] 57 58 78 85

So for polygon ID 2: 57,58, 78, 85 are all its neighbours.

So, what is polygon ID 2 and which area units are the 4 neighbours?

hunan$NAME_3[2]

## [1] "Hanshou"

hunan$NAME_3[c(57,58,78,85)]

## [1] "Nan"       "Yuanjiang" "Taojiang"  "Taoyuan"

Thus, Polygon ID 2 is Hanshou and Hanshou has 4 neighbours, namely Nan, Yuanjiang, Taojiang, Taoyuan.

4.3 Plotting Queen contiguity based neighbours maps

plot(hunan, borders = 'lightgrey') 
plot(wm_q, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")

4.4 Plotting Rook contiguity based neighbours maps

plot(hunan, borders = 'lightgrey') 
plot(wm_r, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "red")

Plotting the 2 maps next to each other

#par(mfrow = c(1,2))

plot(hunan, borders = 'lightgrey') 
plot(wm_q, coordinates(hunan), pch = 19, cex = 0.6,add = TRUE, col = "red")
plot(wm_r, coordinates(hunan), pch = 19, cex = 0.6, add = TRUE, col = "blue")

As we know that ROOK contiguity neighbours only take the direct shared boundary neighbours so they will have less neighbours edges seen on the map plot. This is shown by the red edges, that are the QUEEN contiguity neighbours, that are not covered in blue meaning that those edges are present in QUEEN but not in ROOK.

4.5 Computing distance based neighbours

We will derive distance-based weight matrices by using dnearneigh() function of spdep package.

The function identifies neighbours of area points by Euclidean distance with a distance range with lower d1 and upper d2 controlled by the bounds= argument.

If unprojected coordinates are used and either specified in the coordinates object x or with x as a two column matrix & longlat = TRUE, great circle distances in km will be calculated assuming the WGS84 reference ellipsoid.

4.5.1 Determine the cut-off distance

To determine the upper limit for distance band:

Return a matrix with the indices of points belonging to the set of the k nearest neighbours of each other by using knearneigh() function of spdep package
Convert the knn object returned by knearneigh() function into a neighbours list of class nb with a list of integer vector containing neighbour region number ids by using knn2nb() function
Return the length of neighbour relationship edges by using nbdlists() function to return the units of the coorinates if the coordinates are projected, in km otherwise
Remove the list structure of the returned object by using unlist() function

coords <- coordinates(hunan)
head(coords)

##       [,1]     [,2]
## 0 112.1531 29.44362
## 1 112.0372 28.86489
## 2 111.8917 29.47107
## 3 111.7031 29.74499
## 4 111.6138 29.49258
## 5 111.0341 29.79863

k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
summary(k1dists)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   24.79   32.57   38.01   39.07   44.52   61.79

The summary shows that the largest first nearest neighbour distance is 61.79 km, so using this as the upper band/threshold gives certainty that all units will have at least 1 neighbour because it’s the max distance.

4.5.2 Computing fixed distance weight matrix

We will derive distance-based weight matrices by using dnearneigh() function of spdep package.

wm_d62 <- dnearneigh(coords, 0 ,62, longlat = TRUE)
wm_d62

## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 324 
## Percentage nonzero weights: 4.183884 
## Average number of links: 3.681818

Quiz: What is the meaning of “Average number of links: 3.681818” shown above?

It means that on average, each region point areas are nearest in distance to another 3.68 point areas?

str(wm_d62)

## List of 88
##  $ : int [1:5] 3 4 5 57 64
##  $ : int [1:4] 57 58 78 85
##  $ : int [1:4] 1 4 5 57
##  $ : int [1:3] 1 3 5
##  $ : int [1:4] 1 3 4 85
##  $ : int 69
##  $ : int [1:2] 67 84
##  $ : int [1:4] 9 46 47 78
##  $ : int [1:4] 8 46 68 84
##  $ : int [1:4] 16 22 70 72
##  $ : int [1:3] 14 17 72
##  $ : int [1:5] 13 60 61 63 83
##  $ : int [1:4] 12 15 60 83
##  $ : int [1:2] 11 17
##  $ : int 13
##  $ : int [1:4] 10 17 22 83
##  $ : int [1:3] 11 14 16
##  $ : int [1:3] 20 22 63
##  $ : int [1:5] 20 21 73 74 82
##  $ : int [1:5] 18 19 21 22 82
##  $ : int [1:6] 19 20 35 74 82 86
##  $ : int [1:4] 10 16 18 20
##  $ : int [1:3] 41 77 82
##  $ : int [1:4] 25 28 31 54
##  $ : int [1:4] 24 28 33 81
##  $ : int [1:4] 27 33 42 81
##  $ : int [1:2] 26 29
##  $ : int [1:6] 24 25 33 49 52 54
##  $ : int [1:2] 27 37
##  $ : int 33
##  $ : int [1:2] 24 36
##  $ : int 50
##  $ : int [1:5] 25 26 28 30 81
##  $ : int [1:3] 36 45 80
##  $ : int [1:6] 21 41 46 47 80 82
##  $ : int [1:5] 31 34 45 56 80
##  $ : int [1:2] 29 42
##  $ : int [1:3] 44 77 79
##  $ : int [1:4] 40 42 43 81
##  $ : int [1:3] 39 45 79
##  $ : int [1:5] 23 35 45 79 82
##  $ : int [1:5] 26 37 39 43 81
##  $ : int [1:3] 39 42 44
##  $ : int [1:2] 38 43
##  $ : int [1:6] 34 36 40 41 79 80
##  $ : int [1:5] 8 9 35 47 86
##  $ : int [1:5] 8 35 46 80 86
##  $ : int [1:5] 50 51 52 53 55
##  $ : int [1:4] 28 51 52 54
##  $ : int [1:6] 32 48 51 52 54 55
##  $ : int [1:4] 48 49 50 52
##  $ : int [1:6] 28 48 49 50 51 54
##  $ : int [1:2] 48 55
##  $ : int [1:5] 24 28 49 50 52
##  $ : int [1:4] 48 50 53 75
##  $ : int 36
##  $ : int [1:5] 1 2 3 58 64
##  $ : int [1:5] 2 57 64 66 68
##  $ : int [1:3] 60 87 88
##  $ : int [1:4] 12 13 59 61
##  $ : int [1:5] 12 60 62 63 87
##  $ : int [1:4] 61 63 77 87
##  $ : int [1:5] 12 18 61 62 83
##  $ : int [1:4] 1 57 58 76
##  $ : int 76
##  $ : int [1:5] 58 67 68 76 84
##  $ : int [1:2] 7 66
##  $ : int [1:4] 9 58 66 84
##  $ : int [1:2] 6 75
##  $ : int [1:3] 10 72 73
##  $ : int [1:2] 73 74
##  $ : int [1:3] 10 11 70
##  $ : int [1:4] 19 70 71 74
##  $ : int [1:5] 19 21 71 73 86
##  $ : int [1:2] 55 69
##  $ : int [1:3] 64 65 66
##  $ : int [1:3] 23 38 62
##  $ : int [1:2] 2 8
##  $ : int [1:4] 38 40 41 45
##  $ : int [1:5] 34 35 36 45 47
##  $ : int [1:5] 25 26 33 39 42
##  $ : int [1:6] 19 20 21 23 35 41
##  $ : int [1:4] 12 13 16 63
##  $ : int [1:4] 7 9 66 68
##  $ : int [1:2] 2 5
##  $ : int [1:4] 21 46 47 74
##  $ : int [1:4] 59 61 62 88
##  $ : int [1:2] 59 87
##  - attr(*, "class")= chr "nb"
##  - attr(*, "nbtype")= chr "distance"
##  - attr(*, "distances")= num [1:2] 0 62
##  - attr(*, "region.id")= chr [1:88] "1" "2" "3" "4" ...
##  - attr(*, "call")= language dnearneigh(x = coords, d1 = 0, d2 = 62, longlat = TRUE)
##  - attr(*, "dnn")= num [1:2] 0 62
##  - attr(*, "bounds")= chr [1:2] "GT" "LE"
##  - attr(*, "sym")= logi TRUE

We used str() function to display the weight matrix but another way to display the structure of the distance weight matrix is to combine table() and card() of spdep.

table(hunan$County, card(wm_d62))

##                
##                 1 2 3 4 5 6
##   Anhua         1 0 0 0 0 0
##   Anren         0 0 0 1 0 0
##   Anxiang       0 0 0 0 1 0
##   Baojing       0 0 0 0 1 0
##   Chaling       0 0 1 0 0 0
##   Changning     0 0 1 0 0 0
##   Changsha      0 0 0 1 0 0
##   Chengbu       0 1 0 0 0 0
##   Chenxi        0 0 0 1 0 0
##   Cili          0 1 0 0 0 0
##   Dao           0 0 0 1 0 0
##   Dongan        0 0 1 0 0 0
##   Dongkou       0 0 0 1 0 0
##   Fenghuang     0 0 0 1 0 0
##   Guidong       0 0 1 0 0 0
##   Guiyang       0 0 0 1 0 0
##   Guzhang       0 0 0 0 0 1
##   Hanshou       0 0 0 1 0 0
##   Hengdong      0 0 0 0 1 0
##   Hengnan       0 0 0 0 1 0
##   Hengshan      0 0 0 0 0 1
##   Hengyang      0 0 0 0 0 1
##   Hongjiang     0 0 0 0 1 0
##   Huarong       0 0 0 1 0 0
##   Huayuan       0 0 0 1 0 0
##   Huitong       0 0 0 1 0 0
##   Jiahe         0 0 0 0 1 0
##   Jianghua      0 0 1 0 0 0
##   Jiangyong     0 1 0 0 0 0
##   Jingzhou      0 1 0 0 0 0
##   Jinshi        0 0 0 1 0 0
##   Jishou        0 0 0 0 0 1
##   Lanshan       0 0 0 1 0 0
##   Leiyang       0 0 0 1 0 0
##   Lengshuijiang 0 0 1 0 0 0
##   Li            0 0 1 0 0 0
##   Lianyuan      0 0 0 0 1 0
##   Liling        0 1 0 0 0 0
##   Linli         0 0 0 1 0 0
##   Linwu         0 0 0 1 0 0
##   Linxiang      1 0 0 0 0 0
##   Liuyang       0 1 0 0 0 0
##   Longhui       0 0 1 0 0 0
##   Longshan      0 1 0 0 0 0
##   Luxi          0 0 0 0 1 0
##   Mayang        0 0 0 0 0 1
##   Miluo         0 0 0 0 1 0
##   Nan           0 0 0 0 1 0
##   Ningxiang     0 0 0 1 0 0
##   Ningyuan      0 0 0 0 1 0
##   Pingjiang     0 1 0 0 0 0
##   Qidong        0 0 1 0 0 0
##   Qiyang        0 0 1 0 0 0
##   Rucheng       0 1 0 0 0 0
##   Sangzhi       0 1 0 0 0 0
##   Shaodong      0 0 0 0 1 0
##   Shaoshan      0 0 0 0 1 0
##   Shaoyang      0 0 0 1 0 0
##   Shimen        1 0 0 0 0 0
##   Shuangfeng    0 0 0 0 0 1
##   Shuangpai     0 0 0 1 0 0
##   Suining       0 0 0 0 1 0
##   Taojiang      0 1 0 0 0 0
##   Taoyuan       0 1 0 0 0 0
##   Tongdao       0 1 0 0 0 0
##   Wangcheng     0 0 0 1 0 0
##   Wugang        0 0 1 0 0 0
##   Xiangtan      0 0 0 1 0 0
##   Xiangxiang    0 0 0 0 1 0
##   Xiangyin      0 0 0 1 0 0
##   Xinhua        0 0 0 0 1 0
##   Xinhuang      1 0 0 0 0 0
##   Xinning       0 1 0 0 0 0
##   Xinshao       0 0 0 0 0 1
##   Xintian       0 0 0 0 1 0
##   Xupu          0 1 0 0 0 0
##   Yanling       0 0 1 0 0 0
##   Yizhang       1 0 0 0 0 0
##   Yongshun      0 0 0 1 0 0
##   Yongxing      0 0 0 1 0 0
##   You           0 0 0 1 0 0
##   Yuanjiang     0 0 0 0 1 0
##   Yuanling      1 0 0 0 0 0
##   Yueyang       0 0 1 0 0 0
##   Zhijiang      0 0 0 0 1 0
##   Zhongfang     0 0 0 1 0 0
##   Zhuzhou       0 0 0 0 1 0
##   Zixing        0 0 1 0 0 0

We use n.comp.nb() function of spdep package to find the number of disjoint connected subgraphs in the wm_d62 weight matrix. (whether all the points are connected in one network or is there a another network in the same map)

nc value - number of disjoint connected networks of neighbours

comp.id - vector with indices of the disjoint connected networks of neighbours

n_comp <- n.comp.nb(wm_d62)
n_comp$nc

## [1] 1

table(n_comp$comp.id)

## 
##  1 
## 88

We can see that all the counties return 1 for their nc value, meaning there’s only 1 connected network.

4.5.3 Plotting fixed distance weight matrix

Plotting the base map of hunan, k1 (computed by knearneigh() function which return the links of 1st nearest neighbour) and wm_d62 (the links of neighbours within 62 km) separately to see the difference.

par(mfrow = c(1,2))

plot(hunan, border = "lightgrey")
plot(k1, coords, add = TRUE, col = "red", length = 0.08)
title(main = "Links of 1st nearest neighbour", cex.main = 0.9)
plot(hunan, border = "lightgrey")
plot(wm_d62, coords, add = TRUE, pch = 19, cex = 0.6)
title(main = "Neighbours within 62 km", cex.main = 0.9)

Plotting them together.

plot(hunan, border = 'lightgrey')
plot(wm_d62, coords, add = TRUE)
plot(k1, coords, add = TRUE, col = 'red', length = 0.08)

The red lines above show the links of the 1st nearest neighbours and the black lines show the link of neighbours within the cut-off distance of 62 km.

4.6 Computing adaptive distance weight matrix

Fixed distance weight matrix’s characteristic is that more densely settled areas (urban areas) tend to have more neighbours and the less densely settled areas (rural counties) tend to have lesser neighbours.

It’s possible to control the number of neighbours directly by using k-nearest neighbours knearneigh() function, either accepting asymmetric neighbours or imposing symmetry as shown below.

knn6 <- knn2nb(knearneigh(coords, k =6))
knn6

## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 528 
## Percentage nonzero weights: 6.818182 
## Average number of links: 6 
## Non-symmetric neighbours list

Notice that all of the counties have 6 neighbours, not more than 6 or less than 6.

str(knn6)

## List of 88
##  $ : int [1:6] 2 3 4 5 57 64
##  $ : int [1:6] 1 3 57 58 78 85
##  $ : int [1:6] 1 2 4 5 57 85
##  $ : int [1:6] 1 3 5 6 69 85
##  $ : int [1:6] 1 3 4 6 69 85
##  $ : int [1:6] 3 4 5 69 75 85
##  $ : int [1:6] 9 66 67 71 74 84
##  $ : int [1:6] 9 46 47 78 80 86
##  $ : int [1:6] 8 46 66 68 84 86
##  $ : int [1:6] 16 19 22 70 72 73
##  $ : int [1:6] 10 14 16 17 70 72
##  $ : int [1:6] 13 15 60 61 63 83
##  $ : int [1:6] 12 15 60 61 63 83
##  $ : int [1:6] 11 15 16 17 72 83
##  $ : int [1:6] 12 13 14 17 60 83
##  $ : int [1:6] 10 11 17 22 72 83
##  $ : int [1:6] 10 11 14 16 72 83
##  $ : int [1:6] 20 22 23 63 77 83
##  $ : int [1:6] 10 20 21 73 74 82
##  $ : int [1:6] 18 19 21 22 23 82
##  $ : int [1:6] 19 20 35 74 82 86
##  $ : int [1:6] 10 16 18 19 20 83
##  $ : int [1:6] 18 20 41 77 79 82
##  $ : int [1:6] 25 28 31 52 54 81
##  $ : int [1:6] 24 28 31 33 54 81
##  $ : int [1:6] 25 27 29 33 42 81
##  $ : int [1:6] 26 29 30 37 42 81
##  $ : int [1:6] 24 25 33 49 52 54
##  $ : int [1:6] 26 27 37 42 43 81
##  $ : int [1:6] 26 27 28 33 49 81
##  $ : int [1:6] 24 25 36 39 40 54
##  $ : int [1:6] 24 31 50 54 55 56
##  $ : int [1:6] 25 26 28 30 49 81
##  $ : int [1:6] 36 40 41 45 56 80
##  $ : int [1:6] 21 41 46 47 80 82
##  $ : int [1:6] 31 34 40 45 56 80
##  $ : int [1:6] 26 27 29 42 43 44
##  $ : int [1:6] 23 43 44 62 77 79
##  $ : int [1:6] 25 40 42 43 44 81
##  $ : int [1:6] 31 36 39 43 45 79
##  $ : int [1:6] 23 35 45 79 80 82
##  $ : int [1:6] 26 27 37 39 43 81
##  $ : int [1:6] 37 39 40 42 44 79
##  $ : int [1:6] 37 38 39 42 43 79
##  $ : int [1:6] 34 36 40 41 79 80
##  $ : int [1:6] 8 9 35 47 78 86
##  $ : int [1:6] 8 21 35 46 80 86
##  $ : int [1:6] 49 50 51 52 53 55
##  $ : int [1:6] 28 33 48 51 52 54
##  $ : int [1:6] 32 48 51 52 54 55
##  $ : int [1:6] 28 48 49 50 52 54
##  $ : int [1:6] 28 48 49 50 51 54
##  $ : int [1:6] 48 50 51 52 55 75
##  $ : int [1:6] 24 28 49 50 51 52
##  $ : int [1:6] 32 48 50 52 53 75
##  $ : int [1:6] 32 34 36 78 80 85
##  $ : int [1:6] 1 2 3 58 64 68
##  $ : int [1:6] 2 57 64 66 68 78
##  $ : int [1:6] 12 13 60 61 87 88
##  $ : int [1:6] 12 13 59 61 63 87
##  $ : int [1:6] 12 13 60 62 63 87
##  $ : int [1:6] 12 38 61 63 77 87
##  $ : int [1:6] 12 18 60 61 62 83
##  $ : int [1:6] 1 3 57 58 68 76
##  $ : int [1:6] 58 64 66 67 68 76
##  $ : int [1:6] 9 58 67 68 76 84
##  $ : int [1:6] 7 65 66 68 76 84
##  $ : int [1:6] 9 57 58 66 78 84
##  $ : int [1:6] 4 5 6 32 75 85
##  $ : int [1:6] 10 16 19 22 72 73
##  $ : int [1:6] 7 19 73 74 84 86
##  $ : int [1:6] 10 11 14 16 17 70
##  $ : int [1:6] 10 19 21 70 71 74
##  $ : int [1:6] 19 21 71 73 84 86
##  $ : int [1:6] 6 32 50 53 55 69
##  $ : int [1:6] 58 64 65 66 67 68
##  $ : int [1:6] 18 23 38 61 62 63
##  $ : int [1:6] 2 8 9 46 58 68
##  $ : int [1:6] 38 40 41 43 44 45
##  $ : int [1:6] 34 35 36 41 45 47
##  $ : int [1:6] 25 26 28 33 39 42
##  $ : int [1:6] 19 20 21 23 35 41
##  $ : int [1:6] 12 13 15 16 22 63
##  $ : int [1:6] 7 9 66 68 71 74
##  $ : int [1:6] 2 3 4 5 56 69
##  $ : int [1:6] 8 9 21 46 47 74
##  $ : int [1:6] 59 60 61 62 63 88
##  $ : int [1:6] 59 60 61 62 63 87
##  - attr(*, "region.id")= chr [1:88] "1" "2" "3" "4" ...
##  - attr(*, "call")= language knearneigh(x = coords, k = 6)
##  - attr(*, "sym")= logi FALSE
##  - attr(*, "type")= chr "knn"
##  - attr(*, "knn-k")= num 6
##  - attr(*, "class")= chr "nb"

4.6.1 Plotting distance based neighbours (adaptive)

plot(hunan, border = 'lightgrey')
plot(knn6, coords, pch = 19, cex = 0.6, add = TRUE, col = 'red')

4.7 Computing weights based on IDW (Inverse Distance Weights)

We will derive spatial weight matrices based on Inverse Distance method by using nbdists() function of spdep package.

dist <- nbdists(wm_q, coords, longlat = TRUE)
ids <- lapply(dist, function(x) 1/(x))
ids

## [[1]]
## [1] 0.01535405 0.03916350 0.01820896 0.02807922 0.01145113
## 
## [[2]]
## [1] 0.01535405 0.01764308 0.01925924 0.02323898 0.01719350
## 
## [[3]]
## [1] 0.03916350 0.02822040 0.03695795 0.01395765
## 
## [[4]]
## [1] 0.01820896 0.02822040 0.03414741 0.01539065
## 
## [[5]]
## [1] 0.03695795 0.03414741 0.01524598 0.01618354
## 
## [[6]]
## [1] 0.015390649 0.015245977 0.021748129 0.011883901 0.009810297
## 
## [[7]]
## [1] 0.01708612 0.01473997 0.01150924 0.01872915
## 
## [[8]]
## [1] 0.02022144 0.03453056 0.02529256 0.01036340 0.02284457 0.01500600 0.01515314
## 
## [[9]]
## [1] 0.02022144 0.01574888 0.02109502 0.01508028 0.02902705 0.01502980
## 
## [[10]]
## [1] 0.02281552 0.01387777 0.01538326 0.01346650 0.02100510 0.02631658 0.01874863
## [8] 0.01500046
## 
## [[11]]
## [1] 0.01882869 0.02243492 0.02247473
## 
## [[12]]
## [1] 0.02779227 0.02419652 0.02333385 0.02986130 0.02335429
## 
## [[13]]
## [1] 0.02779227 0.02650020 0.02670323 0.01714243
## 
## [[14]]
## [1] 0.01882869 0.01233868 0.02098555
## 
## [[15]]
## [1] 0.02650020 0.01233868 0.01096284 0.01562226
## 
## [[16]]
## [1] 0.02281552 0.02466962 0.02765018 0.01476814 0.01671430
## 
## [[17]]
## [1] 0.01387777 0.02243492 0.02098555 0.01096284 0.02466962 0.01593341 0.01437996
## 
## [[18]]
## [1] 0.02039779 0.02032767 0.01481665 0.01473691 0.01459380
## 
## [[19]]
## [1] 0.01538326 0.01926323 0.02668415 0.02140253 0.01613589 0.01412874
## 
## [[20]]
## [1] 0.01346650 0.02039779 0.01926323 0.01723025 0.02153130 0.01469240 0.02327034
## 
## [[21]]
## [1] 0.02668415 0.01723025 0.01766299 0.02644986 0.02163800
## 
## [[22]]
## [1] 0.02100510 0.02765018 0.02032767 0.02153130 0.01489296
## 
## [[23]]
## [1] 0.01481665 0.01469240 0.01401432 0.02246233 0.01880425 0.01530458 0.01849605
## 
## [[24]]
## [1] 0.02354598 0.01837201 0.02607264 0.01220154 0.02514180
## 
## [[25]]
## [1] 0.02354598 0.02188032 0.01577283 0.01949232 0.02947957
## 
## [[26]]
## [1] 0.02166102 0.01743220 0.02207751 0.02209114
## 
## [[27]]
## [1] 0.02166102 0.02490625 0.01562326
## 
## [[28]]
## [1] 0.01837201 0.02188032 0.02229549 0.03076171 0.02039506
## 
## [[29]]
## [1] 0.02490625 0.01686587 0.01395022
## 
## [[30]]
## [1] 0.02090587
## 
## [[31]]
## [1] 0.02607264 0.01577283 0.01219005 0.01724850 0.01229012 0.01609781 0.01139438
## [8] 0.01150130
## 
## [[32]]
## [1] 0.01220154 0.01219005 0.01712515 0.01340413 0.01280928 0.01198216 0.01053374
## [8] 0.01065655
## 
## [[33]]
## [1] 0.01949232 0.01743220 0.02229549 0.02090587 0.01979045
## 
## [[34]]
## [1] 0.03113041 0.03589551 0.02882915
## 
## [[35]]
## [1] 0.01766299 0.02185795 0.02616766 0.02111721 0.02108253 0.01509020
## 
## [[36]]
## [1] 0.01724850 0.03113041 0.01571707 0.01860991 0.02073549 0.01680129
## 
## [[37]]
## [1] 0.01686587 0.02234793 0.01510990 0.01550676
## 
## [[38]]
## [1] 0.01401432 0.02407426 0.02276151 0.01719415
## 
## [[39]]
## [1] 0.01229012 0.02172543 0.01711924 0.02629732 0.01896385
## 
## [[40]]
## [1] 0.01609781 0.01571707 0.02172543 0.01506473 0.01987922 0.01894207
## 
## [[41]]
## [1] 0.02246233 0.02185795 0.02205991 0.01912542 0.01601083 0.01742892
## 
## [[42]]
## [1] 0.02207751 0.01562326 0.01395022 0.02234793 0.01711924 0.01836831 0.01683518
## 
## [[43]]
## [1] 0.01510990 0.02629732 0.01506473 0.01836831 0.03112027 0.01530782
## 
## [[44]]
## [1] 0.01550676 0.02407426 0.03112027 0.01486508
## 
## [[45]]
## [1] 0.03589551 0.01860991 0.01987922 0.02205991 0.02107101 0.01982700
## 
## [[46]]
## [1] 0.03453056 0.04033752 0.02689769
## 
## [[47]]
## [1] 0.02529256 0.02616766 0.04033752 0.01949145 0.02181458
## 
## [[48]]
## [1] 0.02313819 0.03370576 0.02289485 0.01630057 0.01818085
## 
## [[49]]
## [1] 0.03076171 0.02138091 0.02394529 0.01990000
## 
## [[50]]
## [1] 0.01712515 0.02313819 0.02551427 0.02051530 0.02187179
## 
## [[51]]
## [1] 0.03370576 0.02138091 0.02873854
## 
## [[52]]
## [1] 0.02289485 0.02394529 0.02551427 0.02873854 0.03516672
## 
## [[53]]
## [1] 0.01630057 0.01979945 0.01253977
## 
## [[54]]
## [1] 0.02514180 0.02039506 0.01340413 0.01990000 0.02051530 0.03516672
## 
## [[55]]
## [1] 0.01280928 0.01818085 0.02187179 0.01979945 0.01882298
## 
## [[56]]
## [1] 0.01036340 0.01139438 0.01198216 0.02073549 0.01214479 0.01362855 0.01341697
## 
## [[57]]
## [1] 0.028079221 0.017643082 0.031423501 0.029114131 0.013520292 0.009903702
## 
## [[58]]
## [1] 0.01925924 0.03142350 0.02722997 0.01434859 0.01567192
## 
## [[59]]
## [1] 0.01696711 0.01265572 0.01667105 0.01785036
## 
## [[60]]
## [1] 0.02419652 0.02670323 0.01696711 0.02343040
## 
## [[61]]
## [1] 0.02333385 0.01265572 0.02343040 0.02514093 0.02790764 0.01219751 0.02362452
## 
## [[62]]
## [1] 0.02514093 0.02002219 0.02110260
## 
## [[63]]
## [1] 0.02986130 0.02790764 0.01407043 0.01805987
## 
## [[64]]
## [1] 0.02911413 0.01689892
## 
## [[65]]
## [1] 0.02471705
## 
## [[66]]
## [1] 0.01574888 0.01726461 0.03068853 0.01954805 0.01810569
## 
## [[67]]
## [1] 0.01708612 0.01726461 0.01349843 0.01361172
## 
## [[68]]
## [1] 0.02109502 0.02722997 0.03068853 0.01406357 0.01546511
## 
## [[69]]
## [1] 0.02174813 0.01645838 0.01419926
## 
## [[70]]
## [1] 0.02631658 0.01963168 0.02278487
## 
## [[71]]
## [1] 0.01473997 0.01838483 0.03197403
## 
## [[72]]
## [1] 0.01874863 0.02247473 0.01476814 0.01593341 0.01963168
## 
## [[73]]
## [1] 0.01500046 0.02140253 0.02278487 0.01838483 0.01652709
## 
## [[74]]
## [1] 0.01150924 0.01613589 0.03197403 0.01652709 0.01342099 0.02864567
## 
## [[75]]
## [1] 0.011883901 0.010533736 0.012539774 0.018822977 0.016458383 0.008217581
## 
## [[76]]
## [1] 0.01352029 0.01434859 0.01689892 0.02471705 0.01954805 0.01349843 0.01406357
## 
## [[77]]
## [1] 0.014736909 0.018804247 0.022761507 0.012197506 0.020022195 0.014070428
## [7] 0.008440896
## 
## [[78]]
## [1] 0.02323898 0.02284457 0.01508028 0.01214479 0.01567192 0.01546511 0.01140779
## 
## [[79]]
## [1] 0.01530458 0.01719415 0.01894207 0.01912542 0.01530782 0.01486508 0.02107101
## 
## [[80]]
## [1] 0.01500600 0.02882915 0.02111721 0.01680129 0.01601083 0.01982700 0.01949145
## [8] 0.01362855
## 
## [[81]]
## [1] 0.02947957 0.02209114 0.01150130 0.01979045 0.01896385 0.01683518
## 
## [[82]]
## [1] 0.02327034 0.02644986 0.01849605 0.02108253 0.01742892
## 
## [[83]]
## [1] 0.023354289 0.017142433 0.015622258 0.016714303 0.014379961 0.014593799
## [7] 0.014892965 0.018059871 0.008440896
## 
## [[84]]
## [1] 0.01872915 0.02902705 0.01810569 0.01361172 0.01342099 0.01297994
## 
## [[85]]
##  [1] 0.011451133 0.017193502 0.013957649 0.016183544 0.009810297 0.010656545
##  [7] 0.013416965 0.009903702 0.014199260 0.008217581 0.011407794
## 
## [[86]]
## [1] 0.01515314 0.01502980 0.01412874 0.02163800 0.01509020 0.02689769 0.02181458
## [8] 0.02864567 0.01297994
## 
## [[87]]
## [1] 0.01667105 0.02362452 0.02110260 0.02058034
## 
## [[88]]
## [1] 0.01785036 0.02058034

For IDW, we can see that by inversing the distance, the larger distances get scaled down and the smaller distances get scaled up. All the values are also in decimals because we divide 1 by the value.

4.8 Row-standardised weights matrix

The most intuitive way to summarize the neighbours’ values:

assigning the fraction 1/(no. of neighbours) to each neighbouring county
summing the weighted income values

However, the polygons along the edges of the study area will base their lagged values on fewer polygona thus potentially over/under-estimating the true nature of the spatial autocorrelation.

So, we will use the nb2listw() function of spdep package which will supplement a neighbours list with spatial weights for the chosen coding scheme (style argument).

rswm_q <- nb2listw(wm_q, style = "W", zero.policy = TRUE)
rswm_q

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 448 
## Percentage nonzero weights: 5.785124 
## Average number of links: 5.090909 
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 88 7744 88 37.86334 365.9147

The ‘zero.policy=TRUE’ argument allows for lists of non-neighbours.

Style=“W” means neighbouring polygon will be assigned equal weight, it’s row standardised (sums over all links to n)

Style=“B” means basic binary coding which is a more robust option.

To see the weight of the first polygon (polygon ID 1)’s four neighbours:

rswm_q$weights[1]

## [[1]]
## [1] 0.2 0.2 0.2 0.2 0.2

Each neighbour is assigned a 0.2 of the total weight. This means that when R computes the average neighbouring income values, each neighbour’s income will be multiplied by 0.2 before being tallied.

We will try to derive the row standardised distance weight matrix using the style B, basic binary coding.

rswm_ids <- nb2listw(wm_q, glist = ids, style = "B", zero.policy = TRUE)
rswm_ids

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 448 
## Percentage nonzero weights: 5.785124 
## Average number of links: 5.090909 
## 
## Weights style: B 
## Weights constants summary:
##    n   nn       S0        S1       S2
## B 88 7744 8.786712 0.3776401 3.813528

To see the row standardised + inversed distance weight of the first polygon (polygon ID 1)’s four neighbours:

rswm_ids$weights[1]

## [[1]]
## [1] 0.01535405 0.03916350 0.01820896 0.02807922 0.01145113

summary(unlist(rswm_ids$weights))

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.008218 0.015088 0.018739 0.019613 0.022823 0.040338

5. Application of Spatial Weight Matrix

We will create 4 different spatial lagged variables:

spatialy lagged variables with row-standardized weights and sum of neighbouring values
spatial window average, and spatial window sum

What is a spatially lagged variable?

With a neighbor structure defined by the non-zero elements of the spatial weights matrix W, a spatially lagged variable is a weighted sum or a weighted average of the neighbouring values for that variable.

5.2 Spatial lag w/ row-standardized weights

We will use lag.listw() or lag() function of spdep package to compute the lag vector var (numeric vector of the same length as the listw object rswm_q)

GDPPC.lag <- lag.listw(rswm_q, hunan$GDPPC)

rswm_q$neighbours[1]

## [[1]]
## [1]  2  3  4 57 85

rswm_q$weights[1]

## [[1]]
## [1] 0.2 0.2 0.2 0.2 0.2

GDPPC.lag[1]

## [1] 24847.2

This means that 24847.20 is the value of tallying the first polygon’s (Anxiang) weighted GDPPC values so it would be (0.2xGDPPC of polygon 2 + 0.2xGDPPC of polygon 3 + 0.2xGDPPC of polygon 4 + 0.2xGDPPC of polygon 57 + 0.2xGDPPC of polygon 85)

We will append the spatially lag GDPPC values into hunan SpatialPolygonDataFrame by subsetting the list and turning it into data frame before finally using left_join() function of dplyr package to add the data to hunan.

lag.list <- list(hunan$NAME_3, lag.listw(rswm_q, hunan$GDPPC))
lag.res <- as.data.frame(lag.list)
colnames(lag.res) <- c("NAME_3", "lag GDPPC (queen)")

head(lag.res)

##    NAME_3 lag GDPPC (queen)
## 1 Anxiang          24847.20
## 2 Hanshou          22724.80
## 3  Jinshi          24143.25
## 4      Li          27737.50
## 5   Linli          27270.25
## 6  Shimen          21248.80

hunan@data <- left_join(hunan@data, lag.res)

## Joining, by = "NAME_3"

head(hunan@data)

##    NAME_2  ID_3  NAME_3   ENGTYPE_3 Shape_Leng Shape_Area  County    City
## 1 Changde 21098 Anxiang      County   1.869074 0.10056190 Anxiang Changde
## 2 Changde 21100 Hanshou      County   2.360691 0.19978745 Hanshou Changde
## 3 Changde 21101  Jinshi County City   1.425620 0.05302413  Jinshi Changde
## 4 Changde 21102      Li      County   3.474325 0.18908121      Li Changde
## 5 Changde 21103   Linli      County   2.289506 0.11450357   Linli Changde
## 6 Changde 21104  Shimen      County   4.171918 0.37194707  Shimen Changde
##   avg_wage deposite     FAI Gov_Rev Gov_Exp     GDP GDPPC     GIO   Loan  NIPCR
## 1    31935   5517.2  3541.0  243.64  1779.5 12482.0 23667  5108.9 2806.9 7693.7
## 2    32265   7979.0  8665.0  386.13  2062.4 15788.0 20981 13491.0 4550.0 8269.9
## 3    28692   4581.7  4777.0  373.31  1148.4  8706.9 34592 10935.0 2242.0 8169.9
## 4    32541  13487.0 16066.0  709.61  2459.5 20322.0 24473 18402.0 6748.0 8377.0
## 5    32667    564.1  7781.2  336.86  1538.7 10355.0 25554  8214.0  358.0 8143.1
## 6    33261   8334.4 10531.0  548.33  2178.8 16293.0 27137 17795.0 6026.5 6156.0
##    Bed    Emp  EmpR EmpRT Pri_Stu Sec_Stu Household Household_R NOIP Pop_R
## 1 1931 336.39 270.5 205.9  19.584  17.819     148.1       135.4   53 346.0
## 2 2560 456.78 388.8 246.7  42.097  33.029     240.2       208.7   95 553.2
## 3  848 122.78  82.1  61.7   8.723   7.592      81.9        43.7   77  92.4
## 4 2038 513.44 426.8 227.1  38.975  33.938     268.5       256.0   96 539.7
## 5 1440 307.36 272.2 100.8  23.286  18.943     129.1       157.2   99 246.6
## 6 2502 392.05 329.6 193.8  29.245  26.104     190.6       184.7  122 399.2
##     RSCG Pop_T    Agri Service Disp_Inc      RORP    ROREmp lag GDPPC (queen)
## 1 3957.9 528.3 4524.41   14100    16610 0.6549309 0.8041262          24847.20
## 2 4460.5 804.6 6545.35   17727    18925 0.6875466 0.8511756          22724.80
## 3 3683.0 251.8 2562.46    7525    19498 0.3669579 0.6686757          24143.25
## 4 7110.2 832.5 7562.34   53160    18985 0.6482883 0.8312558          27737.50
## 5 3604.9 409.3 3583.91    7031    18604 0.6024921 0.8856065          27270.25
## 6 6490.7 600.5 5266.51    6981    19275 0.6647794 0.8407091          21248.80

Next, we will plot both the original GDPPC value and the spatially lagged GDPPC values for comparison.

gdppc <- qtm(hunan, "GDPPC")
lag_gdppc <- qtm(hunan, "lag GDPPC (queen)")
tmap_arrange(gdppc, lag_gdppc, asp = 1, ncol =2)

5.2 Spatial window sum

The spatial window sum uses and includes the diagonal element. We will assign knn6 to a new variable to alter its structure to add the diagonal elements by using the include.self() function of spdep package which

knn6a <- knn6

include.self(knn6a)

## Neighbour list object:
## Number of regions: 88 
## Number of nonzero links: 616 
## Percentage nonzero weights: 7.954545 
## Average number of links: 7 
## Non-symmetric neighbours list

Let’s see the difference.

plot(hunan, border = 'lightgrey')

plot(include.self(knn6a), coords, pch = 19, cex = 0.6, col = 'blue', add = TRUE)
plot(knn6, coords, pch = 19, cex = 0.6, col = 'red', add = TRUE)

The spatial window sum is simply the sum of one area’s GDPPC value and its neighbours’ GDPPC value. When dealing with a binary variable, the spatial window sum corresponds to the number of events (observations with a value of 1) within the window centered on a location (including the value at that location). Thus, we will assign binary weights to the neighbour structure.

binary.knn6 <- lapply(knn6a, function(x) 0*x+1)
binary.knn6[1]

## [[1]]
## [1] 1 1 1 1 1 1

Next, we will use nb2listw() function

wm_knn6 <- nb2listw(knn6a, glist = binary.knn6, style = "B")

Then, we will compute the spatially lagged GDPPC values for knn6 (adaptive weight matrix of maximum 6 neighbours).

lag_knn6.list <- list(hunan$NAME_3, lag.listw(wm_knn6, hunan$GDPPC))
lag_knn6.res <- as.data.frame(lag_knn6.list)
colnames(lag_knn6.res) <- c("NAME_3", "lag GDPPC (adaptive distance)")

head(lag_knn6.res)

##    NAME_3 lag GDPPC (adaptive distance)
## 1 Anxiang                        157324
## 2 Hanshou                        148216
## 3  Jinshi                        138865
## 4      Li                        152543
## 5   Linli                        151462
## 6  Shimen                        140836

hunan@data <- left_join(hunan@data, lag_knn6.res)

## Joining, by = "NAME_3"

head(hunan@data)

##    NAME_2  ID_3  NAME_3   ENGTYPE_3 Shape_Leng Shape_Area  County    City
## 1 Changde 21098 Anxiang      County   1.869074 0.10056190 Anxiang Changde
## 2 Changde 21100 Hanshou      County   2.360691 0.19978745 Hanshou Changde
## 3 Changde 21101  Jinshi County City   1.425620 0.05302413  Jinshi Changde
## 4 Changde 21102      Li      County   3.474325 0.18908121      Li Changde
## 5 Changde 21103   Linli      County   2.289506 0.11450357   Linli Changde
## 6 Changde 21104  Shimen      County   4.171918 0.37194707  Shimen Changde
##   avg_wage deposite     FAI Gov_Rev Gov_Exp     GDP GDPPC     GIO   Loan  NIPCR
## 1    31935   5517.2  3541.0  243.64  1779.5 12482.0 23667  5108.9 2806.9 7693.7
## 2    32265   7979.0  8665.0  386.13  2062.4 15788.0 20981 13491.0 4550.0 8269.9
## 3    28692   4581.7  4777.0  373.31  1148.4  8706.9 34592 10935.0 2242.0 8169.9
## 4    32541  13487.0 16066.0  709.61  2459.5 20322.0 24473 18402.0 6748.0 8377.0
## 5    32667    564.1  7781.2  336.86  1538.7 10355.0 25554  8214.0  358.0 8143.1
## 6    33261   8334.4 10531.0  548.33  2178.8 16293.0 27137 17795.0 6026.5 6156.0
##    Bed    Emp  EmpR EmpRT Pri_Stu Sec_Stu Household Household_R NOIP Pop_R
## 1 1931 336.39 270.5 205.9  19.584  17.819     148.1       135.4   53 346.0
## 2 2560 456.78 388.8 246.7  42.097  33.029     240.2       208.7   95 553.2
## 3  848 122.78  82.1  61.7   8.723   7.592      81.9        43.7   77  92.4
## 4 2038 513.44 426.8 227.1  38.975  33.938     268.5       256.0   96 539.7
## 5 1440 307.36 272.2 100.8  23.286  18.943     129.1       157.2   99 246.6
## 6 2502 392.05 329.6 193.8  29.245  26.104     190.6       184.7  122 399.2
##     RSCG Pop_T    Agri Service Disp_Inc      RORP    ROREmp lag GDPPC (queen)
## 1 3957.9 528.3 4524.41   14100    16610 0.6549309 0.8041262          24847.20
## 2 4460.5 804.6 6545.35   17727    18925 0.6875466 0.8511756          22724.80
## 3 3683.0 251.8 2562.46    7525    19498 0.3669579 0.6686757          24143.25
## 4 7110.2 832.5 7562.34   53160    18985 0.6482883 0.8312558          27737.50
## 5 3604.9 409.3 3583.91    7031    18604 0.6024921 0.8856065          27270.25
## 6 6490.7 600.5 5266.51    6981    19275 0.6647794 0.8407091          21248.80
##   lag GDPPC (adaptive distance)
## 1                        157324
## 2                        148216
## 3                        138865
## 4                        152543
## 5                        151462
## 6                        140836

gdppc1 <- qtm(hunan, "GDPPC")
lag_gdppc1 <- qtm(hunan, "lag GDPPC (adaptive distance)")
tmap_arrange(gdppc1, lag_gdppc1, asp = 1, ncol =2)

6. Measure of Global Spatial Autocorrelation

We will perform Moran’s I statistics testing using moran.test() function of spdep package.

6.1 Moran’s I test

moran.test(hunan$GDPPC, listw = rswm_q, zero.policy = TRUE, na.action = na.omit)

## 
##  Moran I test under randomisation
## 
## data:  hunan$GDPPC  
## weights: rswm_q    
## 
## Moran I statistic standard deviate = 4.7351, p-value = 1.095e-06
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.300749970      -0.011494253       0.004348351

6.1.2 Computing Monte Carlo Moran’s I

Instead of using moran.test() function, we will perform permutation test for Moran’s I statistic by using moran.mc() function of spdep package.

set.seed(1234)
bperm = moran.mc(hunan$GDPPC, listw = rswm_q, nsim = 999, zero.policy = TRUE, na.action = na.omit)
bperm

## 
##  Monte-Carlo simulation of Moran I
## 
## data:  hunan$GDPPC 
## weights: rswm_q  
## number of simulations + 1: 1000 
## 
## statistic = 0.30075, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater

For both monte carlo simulation and normal Moran’s I statistics, both of the p-value are 0.001 and lower and this means that if confidence interval is 98%, it means that distribution of GDPPC values in the counties are not randomly distributed. Thus, since the Moran I statistics (0.3) is bigger than 0, the spatial pattern is “clustered”.

6.1.3 Visualizing Monte Carlo Moran’s I

summary(bperm$res[1:999])

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.18339 -0.06168 -0.02125 -0.01505  0.02611  0.27593

hist(bperm$res, freq = TRUE, breaks = 20, xlab = "Simulated Moran's I")

abline(v=0, col="red")

6.2 Geary’s C test

We will perform Geary’s C test for spatial autocorrelation analysis using geary.test() function of spdep package.

geary.test(hunan$GDPPC, listw = rswm_q)

## 
##  Geary C test under randomisation
## 
## data:  hunan$GDPPC 
## weights: rswm_q 
## 
## Geary C statistic standard deviate = 3.6108, p-value = 0.0001526
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic       Expectation          Variance 
##         0.6907223         1.0000000         0.0073364

6.2.1 Computing Monte Carlo Geary’s C

We will perform permutation test for Geary’s C statistic by using geary.mc() function of spdep package.

set.seed(1234)
bperm = geary.mc(hunan$GDPPC, listw = rswm_q, nsim = 999)
bperm

## 
##  Monte-Carlo simulation of Geary C
## 
## data:  hunan$GDPPC 
## weights: rswm_q 
## number of simulations + 1: 1000 
## 
## statistic = 0.69072, observed rank = 1, p-value = 0.001
## alternative hypothesis: greater

For both monte carlo simulation and normal Geary’s C statistics, both of the p-value are 0.001 and lower and this means that if confidence interval is 98%, it means that distribution of GDPPC values in the counties are not randomly distributed. Thus, since the c statistics value is smaller than 1, the spatial pattern is “clustered”.

6.2.2 Visualizing Monte Carlo Geary’s C

summary(bperm$res[1:999])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7142  0.9502  1.0052  1.0044  1.0595  1.2722

hist(bperm$res, freq = TRUE, breaks = 20, xlab = "Simulated Geary's C")

abline(v=1, col="red")

7. Spatial Correlogram

Spatial correlograms are great to examine patterns of spatial autocorrelation in your data. They show how correlated pairs of spatial observations are when you increase the distance (lag) between them.

7.1 Computing Moran’s I correlogram

MI_corr <- sp.correlogram(wm_q, hunan$GDPPC, order = 6, method = "I", style = "B")
plot(MI_corr)

7.2 Computing Geary’s C correlogram

GC_corr <- sp.correlogram(wm_q, hunan$GDPPC, order = 6, method = "C", style = "W")
plot(GC_corr)

Let’s get the mean distance of the lags order 1-6 to see the distances. We use nblag() function which creates higher order neighbour lists, where higher order neighbours are only lags links from each other on the graph described by the input neighbours list. Credit

nb6 <- nblag(wm_q, 6)
correlogram_bins <- sapply(nb6, function(x) mean(unlist(nbdists(x,coords, longlat = TRUE))))
correlogram_bins

## [1]  55.87101 106.63547 162.68559 219.06365 271.94224 323.42132

summary(correlogram_bins)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   55.87  120.65  190.87  189.94  258.72  323.42

Conclusion:

From the correlograms, we can see that Moran’s I values and Geary’s c values are inversely related. We can also see that the spatial pattern of GDPPC variable is clustered from lag 1 to 4 (from 55.87 km to 219.06 km) and dispersed from lag 4 onwards (from 219.06 km to maximum distance of 323.42 km).

Hands-on_Ex06

Erika Aldisa

9/21/2020