Packages & libraries

Spatial

install.packages(“sf”, dependencies = T)
install.packages(“sp”, dependencies = T)
install.packages(“spData”, dependencies = T)
install.packages(“spdep”, dependencies = T)
install.packages(“spatialreg”, dependencies = T) install.packages(“tmap”, dependencies = T)

Non-spatial

install.packages(“tidyverse”, dependencies = T)
install.packages(“kableExtra”, dependencies = T)

Initialising libraries

## Spatial
library("sf")
library("sp")
library("spData")
library("spdep")
library("spatstat")
library("spatialreg")
library("tmap")

## Non-spatial
library("tidyverse")
library("dplyr")
library("kableExtra")

Lecture 2b

Key points:

When describing UTM projections, two pieces of information must be provided: (1) the “zone” and, (2) the ellipsoid model
The UTMZone is derived by the following formula: $UTM Zone = 31 + \frac{LONG(E)}{6}$, if LONG(W), the formula will be $UTM Zone = 31 - \frac{LONG(W)}{6}$

Lecture 6a

1 Working with GAL files

Key points:

A GAL file must have the following format:
- Header (Line 1): Start with the mandatory 0 and the number of areal units (and optional column name)
- A region identifier will be assigned to each arreal unit from 1 to n
- Line 2: The region identifier of region 1 followed by the number of neighbouring regions of region 1
- Line 3: Region identifiers of the neighbours of region 1
- Do the same of the rest of the region
- Leave an empty line at the end!!
R class “nb” is a neighbour list object which is how neighbour information is handled in the R ecosystem
A shapefile will need to be loaded to anchor the external “nb” object
Ensure that the arreal units are in the same order as the “nb” object
When plotting the nb object on the map, in the function plot(), ensure that the nb object comes before the coords
THe nb list object created contains information o how areal units are related to one another

1.1 Create a GAL file from scratch

The figure shoes the boundaries of the 8 Romanian NUTS2 regions. Create a GAL file and read neighbours from the file.

North-west
Centre
North-east
South-east
South
Bucharest
South-west
West

GAL file created:

Read neighbours from GAL file:

(regions_of_romania.nb = read.gal("econ6027 cheatsheet/6a-1.1.gal"))

## Neighbour list object:
## Number of regions: 8 
## Number of nonzero links: 26 
## Percentage nonzero weights: 40.625 
## Average number of links: 3.25

1.2 Create a GAL file from nb and plot the nb object

Create a GAL file from nb. Plot the nb object.

Read neighbours from an existing GAL file:

(uk.nb = read.gal(file = "econ6027 cheatsheet/6a-1.2.gal")); class(uk.nb)

## Neighbour list object:
## Number of regions: 12 
## Number of nonzero links: 42 
## Percentage nonzero weights: 29.16667 
## Average number of links: 3.5

## [1] "nb"

Plot the nb object:

## Load the shapefile data that the nb object will be anchored on
(uk.boundaries = st_read("econ6027 cheatsheet/6a-1.2-shp/NUTS_Level_1__January_2018__Boundaries.shp"))

## Reading layer `NUTS_Level_1__January_2018__Boundaries' from data source 
##   `/Users/gertrude/Documents/01 university/02 smu mse/04 classes/econ6027 | spatial econometrics and data analysis/econ6027 cheatsheet/6a-1.2-shp/NUTS_Level_1__January_2018__Boundaries.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 12 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid

## Simple feature collection with 12 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid
## First 10 features:
##    objectid nuts118cd                nuts118nm  bng_e  bng_n      long      lat
## 1         1       UKC     North East (England) 417313 600358 -1.728900 55.29703
## 2         2       UKD     North West (England) 350015 506280 -2.772370 54.44945
## 3         3       UKE Yorkshire and The Humber 446903 448736 -1.287120 53.93264
## 4         4       UKF  East Midlands (England) 477660 322635 -0.849670 52.79572
## 5         5       UKG  West Midlands (England) 386294 295477 -2.203580 52.55697
## 6         6       UKH          East of England 571074 263229  0.504146 52.24067
## 7         7       UKI                   London 517516 178392 -0.308640 51.49227
## 8         8       UKJ     South East (England) 470062 172924 -0.993110 51.45097
## 9         9       UKK     South West (England) 285015 102567 -3.633430 50.81119
## 10       10       UKL                    Wales 263406 242881 -3.994160 52.06741
##     st_areasha st_lengths                       geometry
## 1   8609938893   657578.2 MULTIPOLYGON (((397942.9 65...
## 2  14182609113  1063052.7 MULTIPOLYGON (((357136.2 58...
## 3  15432322860   863264.1 MULTIPOLYGON (((480000 5176...
## 4  15658181358   889656.3 MULTIPOLYGON (((516022.7 41...
## 5  13002149549   774627.1 MULTIPOLYGON (((409403 3657...
## 6  19155125591  1083693.3 MULTIPOLYGON (((599964.1 34...
## 7   1585475935   270855.1 MULTIPOLYGON (((537543.8 19...
## 8  19115903194  1454554.9 MULTIPOLYGON (((493941.1 25...
## 9  23979440241  1647103.7 MULTIPOLYGON (((416007.6 24...
## 10 20820253175  1552749.7 MULTIPOLYGON (((314837 3813...

## Return the points that represents a centroid of a geometry
uk.coords = st_centroid(st_geometry(uk.boundaries))

## Plot the (i) boundary; (ii) nb file; and (iii) centroid
plot(st_geometry(uk.boundaries),
     main = "Plot nb object from GAL file, anchored onto shapefile",
     border = "grey60",
     axes = T)
  plot(uk.nb, uk.coords,
       pch = 20,
       cex = 0.6,
       add = T)

1.3 Create GAL file from nb

Create GAL file from nb:

write.nb.gal(nb = uk.nb,
             file = "uk_new.gal",
             oldstyle = F)

2 Create neighbours

Key points:

Key difference between the function st_point_on_surface() and st_centroid() is a point computed by st_centroid() might lie outside the polygon but a point computed by st_point_on_surface is guaranteed to lie wihin the polygon
2.1 Create contiguity neighbours
- The function poly2nb() constructs neighbours list (nb objects) from polygons
- By default, queen = T which implies that the neighbour list is constructed using the queen contiguity
- If queen = F, then the neighbour list is constructed using the rook contiguity
2.2 “k” nearest neighbour
- The function knearneigh() returns a matrix with the indices of points belonging to the set of k nearest neighbours of each other
- k = x, where x represents the number of nearest neighbours to be returned
- The function knn2nb() converts a the knn object (knearneigh) into a neighbour list of class nb with a list of integer vectors containing neighbour region number ids
2.3 Neighbours with a certain metric distance
- Since it involves metric distance, check that the unit is recorded in “m” using the function st_crs()$unit
- Note that distance-based neighbours can leave some spatial units neighbour-less and the degree of connectedness can exponentially increase with distance
- The function dnearneigh() identifies neighbours of the region points by Euclidean distance in the metric of the points between d1: lower (greater than or equal to) and d2: upper (less than or equal to) bounds

We will use the Singapore and UK planning areas in Sections 2 and 3 of this chapter. Load the datasets and assign a point that would lie within each polygon.

## Singapore
(sg_pl = st_read("econ6027 cheatsheet/6a-1.3-shp/MySingapura.shp"))

## Reading layer `MySingapura' from data source 
##   `/Users/gertrude/Documents/01 university/02 smu mse/04 classes/econ6027 | spatial econometrics and data analysis/econ6027 cheatsheet/6a-1.3-shp/MySingapura.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21

## Simple feature collection with 55 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
## First 10 features:
##    OBJECTID              PLN_AREA_N PLN_AREA_C CA_IND       REGION_N REGION_C
## 1         1                  BISHAN         BS      N CENTRAL REGION       CR
## 2         2             BUKIT BATOK         BK      N    WEST REGION       WR
## 3         3             BUKIT MERAH         BM      N CENTRAL REGION       CR
## 4         4           BUKIT PANJANG         BP      N    WEST REGION       WR
## 5         5             BUKIT TIMAH         BT      N CENTRAL REGION       CR
## 6         6 CENTRAL WATER CATCHMENT         CC      N   NORTH REGION       NR
## 7         7                  CHANGI         CH      N    EAST REGION       ER
## 8         8              CHANGI BAY         CB      N    EAST REGION       ER
## 9         9           CHOA CHU KANG         CK      N    WEST REGION       WR
## 10       10                CLEMENTI         CL      N    WEST REGION       WR
##             INC_CRC FMEL_UPD_D   X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area
## 1  BA616285F402846F 2014-12-05 28789.76 37450.89   13517.12    7618921
## 2  FB44C870B04B7F57 2014-12-05 19255.42 37527.65   15234.22   11133256
## 3  738B479882E4EE28 2014-12-05 26865.78 28662.87   29156.29   14462472
## 4  4A9C6E6BAF7BE998 2014-12-05 21287.04 38761.84   15891.85    9019940
## 5  C893AEAD20F42559 2014-12-05 23256.76 34689.00   22492.84   17526654
## 6  52D0068508B0348A 2014-12-05 24424.42 39849.05   30538.25   37147854
## 7  70EAFFC7C76EA5AD 2014-12-05 46307.48 36991.43   31342.12   40940490
## 8  ACFD43D88C72381C 2014-12-05 49502.49 34316.54   18731.56    1829822
## 9  F361929FAF5E9611 2014-12-05 18415.05 40833.42   11518.81    6117294
## 10 8A5BEDB748EAA0AA 2014-12-05 19923.36 33319.17   15024.87    9516228
##                          geometry
## 1  MULTIPOLYGON (((29772.19 38...
## 2  MULTIPOLYGON (((20294.46 39...
## 3  MULTIPOLYGON (((26228.63 30...
## 4  MULTIPOLYGON (((21448.72 41...
## 5  MULTIPOLYGON (((24031.39 36...
## 6  MULTIPOLYGON (((25073.29 43...
## 7  MULTIPOLYGON (((44586.3 417...
## 8  MULTIPOLYGON (((48860.11 34...
## 9  MULTIPOLYGON (((18349.06 43...
## 10 MULTIPOLYGON (((19680.06 31...

sg.coords = st_point_on_surface(st_geometry(sg_pl))

## UK
(uk_pl = st_read("econ6027 cheatsheet/6a-1.2-shp/NUTS_Level_1__January_2018__Boundaries.shp"))

## Reading layer `NUTS_Level_1__January_2018__Boundaries' from data source 
##   `/Users/gertrude/Documents/01 university/02 smu mse/04 classes/econ6027 | spatial econometrics and data analysis/econ6027 cheatsheet/6a-1.2-shp/NUTS_Level_1__January_2018__Boundaries.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 12 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid

## Simple feature collection with 12 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid
## First 10 features:
##    objectid nuts118cd                nuts118nm  bng_e  bng_n      long      lat
## 1         1       UKC     North East (England) 417313 600358 -1.728900 55.29703
## 2         2       UKD     North West (England) 350015 506280 -2.772370 54.44945
## 3         3       UKE Yorkshire and The Humber 446903 448736 -1.287120 53.93264
## 4         4       UKF  East Midlands (England) 477660 322635 -0.849670 52.79572
## 5         5       UKG  West Midlands (England) 386294 295477 -2.203580 52.55697
## 6         6       UKH          East of England 571074 263229  0.504146 52.24067
## 7         7       UKI                   London 517516 178392 -0.308640 51.49227
## 8         8       UKJ     South East (England) 470062 172924 -0.993110 51.45097
## 9         9       UKK     South West (England) 285015 102567 -3.633430 50.81119
## 10       10       UKL                    Wales 263406 242881 -3.994160 52.06741
##     st_areasha st_lengths                       geometry
## 1   8609938893   657578.2 MULTIPOLYGON (((397942.9 65...
## 2  14182609113  1063052.7 MULTIPOLYGON (((357136.2 58...
## 3  15432322860   863264.1 MULTIPOLYGON (((480000 5176...
## 4  15658181358   889656.3 MULTIPOLYGON (((516022.7 41...
## 5  13002149549   774627.1 MULTIPOLYGON (((409403 3657...
## 6  19155125591  1083693.3 MULTIPOLYGON (((599964.1 34...
## 7   1585475935   270855.1 MULTIPOLYGON (((537543.8 19...
## 8  19115903194  1454554.9 MULTIPOLYGON (((493941.1 25...
## 9  23979440241  1647103.7 MULTIPOLYGON (((416007.6 24...
## 10 20820253175  1552749.7 MULTIPOLYGON (((314837 3813...

uk.coords = st_point_on_surface(st_geometry(uk_pl))

2.1 Create contiguity neighbours

Singapore

Create a neighbours list using queen contiguity:

(sg.nb.qn = poly2nb(sg_pl,
                    queen = T))

## Neighbour list object:
## Number of regions: 55 
## Number of nonzero links: 258 
## Percentage nonzero weights: 8.528926 
## Average number of links: 4.690909 
## 1 region with no links:
## 38

Create a neighbours list using rook contiguity:

(sg.nb.rk = poly2nb(sg_pl,
                    queen = F))

## Neighbour list object:
## Number of regions: 55 
## Number of nonzero links: 258 
## Percentage nonzero weights: 8.528926 
## Average number of links: 4.690909 
## 1 region with no links:
## 38

Plot:

par(mfrow = c(1, 2))
plot(st_geometry(sg_pl),
     main = "Queen Contiguity (SG)",
     border = "grey60",
     axes = T)
  plot(sg.nb.qn, sg.coords,
       pch = 19, 
       cex = 0.6,
       add = T)
plot(st_geometry(sg_pl),
     main = "Rook Contiguity (SG)",
     border = "grey60",
     axes = T)
  plot(sg.nb.rk, sg.coords,
       pch = 19, 
       cex = 0.6,
       add = T)

Check if the neighbours are the same under the queen and rook contiguity:

isTRUE(all.equal(sg.nb.qn, sg.nb.rk,
                 check.attributes = F))

## [1] TRUE

Conclusion: For Singapore’s planning areas, queen contiguity and rook contiguity produces the same results.

United Kingdom

Create a neigbours list using the queen contiguity:

uk.nb.qn = poly2nb(uk_pl,
                   queen = T)

Create a neighbours list using the rook contiguity: Ø

uk.nb.rk = poly2nb(uk_pl,
                   queen = F)

Plot:

par(mfrow = c(1, 2))
plot(st_geometry(uk_pl),
     main = "Queen Contiguity (UK)",
     border = "grey60",
     axes = T)
  plot(uk.nb.qn, uk.coords,
       pch = 19, 
       cex = 0.6, 
       add = T)
plot(st_geometry(uk_pl),
     main = "Rook Contiguity (UK)",
     border = "grey60",
     axes = T)
  plot(uk.nb.rk, uk.coords,
       pch = 19, 
       cex = 0.6, 
       add = T)

Check if the neighbours are the same under the queen and rook contiguity:

isTRUE(all.equal(uk.nb.qn, uk.nb.rk,
                 check.attributes = F))

## [1] TRUE

Conclusion: For the United Kingdom’s planning areas, queen contiguity and rook contiguity produces the same results.

2.2 “k” nearest neighbours

Singapore

Create neighbours by identifying the number of nearest neighbours to be returned, represented by k = x:

IDs = row.names(sg_pl)

sg.nb.k.1 = knn2nb(knearneigh(sg.coords,
                              k = 1),
                   row.names = IDs)

sg.nb.k.2 = knn2nb(knearneigh(sg.coords,
                              k = 2),
                   row.names = IDs)

sg.nb.k.3 = knn2nb(knearneigh(sg.coords,
                              k = 3),
                   row.names = IDs)

Plot:

par(mfrow = c(1, 3))
plot(st_geometry(sg_pl),
     main = "k = 1 (SG)",
     border = "grey60",
     axes = T)
  plot(sg.nb.k.1, sg.coords,
       pch = 19,
       cex = 0.6,
       add = T)
plot(st_geometry(sg_pl),
     main = "k = 2 (SG)",
     border = "grey60",
     axes = T)
  plot(sg.nb.k.2, sg.coords,
       pch = 19,
       cex = 0.6,
       add = T)
plot(st_geometry(sg_pl),
     main = "k = 3 (SG)",
     border = "grey60",
     axes = T)
  plot(sg.nb.k.3, sg.coords,
       pch = 19,
       cex = 0.6,
       add = T)

United Kingdom

Create neighbours by identifying the number of nearest neighbours to be returned, represented by k = x:

IDs = row.names(uk_pl)

uk.nb.k.1 = knn2nb(knearneigh(uk.coords,
                              k = 1),
                   row.names = IDs)

uk.nb.k.2 = knn2nb(knearneigh(uk.coords,
                              k = 2),
                   row.names = IDs)

uk.nb.k.3 = knn2nb(knearneigh(uk.coords,
                              k = 3),
                   row.names = IDs)

Plot:

par(mfrow = c(1, 3))
plot(st_geometry(uk_pl),
     main = "k = 1 (UK)",
     border = "grey60",
     axes = T)
  plot(uk.nb.k.1, uk.coords,
       pch = 19,
       cex = 0.6,
       add = T)
plot(st_geometry(uk_pl),
     main = "k = 2 (UK)",
     border = "grey60",
     axes = T)
  plot(uk.nb.k.2, uk.coords,
       pch = 19,
       cex = 0.6,
       add = T)
plot(st_geometry(uk_pl),
     main = "k = 3 (UK)",
     border = "grey60",
     axes = T)
  plot(uk.nb.k.3, uk.coords,
       pch = 19,
       cex = 0.6,
       add = T)

2.3 Neighbours within a certain metric distance

Singapore

Create neighbours by identifying the number of neighbours to be returned based on the distance, represented by d = x:

IDs = row.names(sg_pl)
st_crs(sg_pl)$units

## [1] "m"

sg.nb.d.1 = dnearneigh(sg.coords,
                       d1 = 0,
                       d2 = 1000,
                       row.names = IDs)

sg.nb.d.2 = dnearneigh(sg.coords,
                       d1 = 0,
                       d2 = 5000,
                       row.names = IDs)

sg.nb.d.3 = dnearneigh(sg.coords,
                       d1 = 0,
                       d2 = 10000,
                       row.names = IDs)

Plot:

par(mfrow = c(1, 3))
plot(st_geometry(sg_pl),
     main = "Within 1km (SG)",
     border = "grey60",
     axes = T)
  plot(sg.nb.d.1, sg.coords,
       pch = 19,
       cex = 0.6,
       add = T)
plot(st_geometry(sg_pl),
     main = "Within 5km (SG)",
     border = "grey60",
     axes = T)
  plot(sg.nb.d.2, sg.coords,
       pch = 19,
       cex = 0.6,
       add = T)
plot(st_geometry(sg_pl),
     main = "Within 10km (SG)",
     border = "grey60",
     axes = T)
  plot(sg.nb.d.3, sg.coords,
       pch = 19,
       cex = 0.6,
       add = T)

United Kingdom

Create neighbours by identifying the number of neighbours to be returned based on the distance, represented by d = x:

IDs = row.names(uk_pl)
st_crs(uk_pl)$units

## [1] "m"

uk.nb.d.1 = dnearneigh(uk.coords,
                       d1 = 0,
                       d2 = 100000,
                       row.names = IDs)

uk.nb.d.2 = dnearneigh(uk.coords,
                       d1 = 0,
                       d2 = 300000,
                       row.names = IDs)

uk.nb.d.3 = dnearneigh(uk.coords,
                       d1 = 0,
                       d2 = 500000,
                       row.names = IDs)

Plot:

par(mfrow = c(1, 3))
plot(st_geometry(uk_pl),
     main = "Within 100km (UK)",
     border = "grey60",
     axes = T)
  plot(uk.nb.d.1, uk.coords,
       pch = 19,
       cex = 0.6,
       add = T)
plot(st_geometry(uk_pl),
     main = "Within 300km (UK)",
     border = "grey60",
     axes = T)
  plot(uk.nb.d.2, uk.coords,
       pch = 19,
       cex = 0.6,
       add = T)
plot(st_geometry(uk_pl),
     main = "Within 500km (UK)",
     border = "grey60",
     axes = T)
  plot(uk.nb.d.3, uk.coords,
       pch = 19,
       cex = 0.6,
       add = T)

3 Create a weights list object

Key points:

The nb list object has to be converted into a weights matrix before it can be used in statistical analysis
The function nb2listw() can be used to convert the nb list into a weight matrix
Row normalised weights matrix (“W”) can turn a symmetric weights matrix into an asymmetric one, there is also a chance of boosting the weightage of the units near the boundary Why?
By default, the function nb2listw() sets zero.policy = F. which means that there cannot be ‘loners’, changing it to zero.policy = T permit the weights list to be formed with zero-length weights vectors, that is weight vectors of zero length are inserted for regions without a neighbour in the neighbours list
Weights style:

Style options	Description
B	Basic binary coding
W	Row normalised (sums over all row links to n)
C	Globally standardised (sums all links to n)
U	Divided by the number of neighbours (sums over all links to unity)

United Kingdom

Convert the uk.nb object into a weights list object using the function nb2listw():

(uk.lw = nb2listw(neighbours = uk.nb,
                  style = "W",
                  zero.policy = F))

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 12 
## Number of nonzero links: 42 
## Percentage nonzero weights: 29.16667 
## Average number of links: 3.5 
## 
## Weights style: W 
## Weights constants summary:
##    n  nn S0       S1       S2
## W 12 144 12 7.751111 50.16889

Inspect uk.lw weights list:

names(uk.lw)

## [1] "style"      "neighbours" "weights"

names(attributes(uk.lw))

## [1] "names"     "class"     "region.id" "call"      "GeoDa"

summary(unlist(uk.lw$weights))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1667  0.2000  0.2667  0.2857  0.3333  1.0000

Singapore

Convert the sg.nb.qn object into a weights list object using the function nb2listw():

An error is returned when zero.policy was set to ‘F’, as there are empty neighbour sets found under queen contiguity.

# (sg.qn.lw = nb2listw(neighbours = sg.nb.qn,
#                      style = "W",
#                      zero.policy = T))
print(sg.nb.qn,
      zero.policy = T)

## Neighbour list object:
## Number of regions: 55 
## Number of nonzero links: 258 
## Percentage nonzero weights: 8.528926 
## Average number of links: 4.690909 
## 1 region with no links:
## 38

Lecture 6b

Preparation of data and creating nb object

We will use the ‘eire’ dataset in the package ‘spData’ in this chapter. Carry out the usual inspections: (1) Load the dataset, and check if all entries are valid and projected; (2) create the nb object using queen contiguity; (3) identify the coordinates of a point that lies in the polygon; (4) plot the base shape; and (5) create the weights list based on binary style and row normalised style.

The ‘eire’ dataset gives the counties of the Irish Republic. Description of ‘eire’ dataset:

Variable	Description
A	Percentage of sample with blood gorup A
towns	Towns/unit area
pale	Beyond the Pale 0, within the Pale 1
size	Number of blood type samples
ROADACC	Arterial road network accessibility in 1961
OWNCONS	Percentage in value terms of gross agricultural output of each country consumed by itself
POPCHG	1961 population as percentage of 1926
RETSALE	Value of retail sales British Pound000
INCOME	Total personal income British Pound000
names	County names

Load the dataset, and check if all entries are valid and projected

Since there is no CRS attached to the dataset, we will need to attach a CRS and ensure that the data is projected. We will used “+proj=UTM”, where we derive the “+zone” using coordinates: latitude – 53.7798N; longitude – 7.3055W. $UTM Zone = 31 -\frac{7.3055}{6} = 29.78242N$ $\Rightarrow$ “+zone=29”.

## Load data
(eire = st_read(system.file("shapes/eire.shp",
                           package = "spData")))

## Reading layer `eire' from data source 
##   `/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/spData/shapes/eire.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 26 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -4.12 ymin: 5768 xmax: 300.82 ymax: 6119.25
## CRS:           NA

## Simple feature collection with 26 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -4.12 ymin: 5768 xmax: 300.82 ymax: 6119.25
## CRS:           NA
## First 10 features:
##        A towns pale  size ROADACC OWNCONS POPCHG RETSALE INCOME    names
## 1  34.20  0.12    1  1087    3664     8.6     97    2962   7185   Carlow
## 2  29.68  0.01    0  2133    5000    15.0     69    4452   9459    Cavan
## 3  26.54  0.01    0   535    4321    19.0     78    3460  12435    Clare
## 4  23.92  0.03    0  1476    4118     9.0     90   28402  65901     Cork
## 5  27.91  0.03    0   989    7500    27.0     75    7478  17626  Donegal
## 6  32.79  0.61    1 18105    3078     9.4    142   89424 164631   Dublin
## 7  26.53  0.02    0  2254    4537    21.9     88    8972  26950   Galway
## 8  25.14  0.01    0   740    5140    17.0     78    6341  20510    Kerry
## 9  30.69  0.39    1  1437    3200     9.0    111    4803  14703  Kildare
## 10 31.41  0.18    1  1885    3708     8.0     87    4321  13585 Kilkenny
##                          geometry
## 1  MULTIPOLYGON (((240.62 5885...
## 2  MULTIPOLYGON (((192.16 5986...
## 3  MULTIPOLYGON (((145.23 5901...
## 4  MULTIPOLYGON (((83.48 5845....
## 5  MULTIPOLYGON (((150.41 6040...
## 6  MULTIPOLYGON (((270.77 5929...
## 7  MULTIPOLYGON (((51.39 5970....
## 8  MULTIPOLYGON (((74.06 5871....
## 9  MULTIPOLYGON (((229.67 5932...
## 10 MULTIPOLYGON (((189.96 5878...

st_make_valid(eire); table(st_is_valid(eire))

## Simple feature collection with 26 features and 10 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -4.12 ymin: 5768 xmax: 300.82 ymax: 6119.25
## CRS:           NA
## First 10 features:
##        A towns pale  size ROADACC OWNCONS POPCHG RETSALE INCOME    names
## 1  34.20  0.12    1  1087    3664     8.6     97    2962   7185   Carlow
## 2  29.68  0.01    0  2133    5000    15.0     69    4452   9459    Cavan
## 3  26.54  0.01    0   535    4321    19.0     78    3460  12435    Clare
## 4  23.92  0.03    0  1476    4118     9.0     90   28402  65901     Cork
## 5  27.91  0.03    0   989    7500    27.0     75    7478  17626  Donegal
## 6  32.79  0.61    1 18105    3078     9.4    142   89424 164631   Dublin
## 7  26.53  0.02    0  2254    4537    21.9     88    8972  26950   Galway
## 8  25.14  0.01    0   740    5140    17.0     78    6341  20510    Kerry
## 9  30.69  0.39    1  1437    3200     9.0    111    4803  14703  Kildare
## 10 31.41  0.18    1  1885    3708     8.0     87    4321  13585 Kilkenny
##                          geometry
## 1  POLYGON ((240.62 5885.61, 2...
## 2  POLYGON ((192.16 5986.94, 1...
## 3  POLYGON ((145.23 5901.12, 1...
## 4  POLYGON ((83.48 5845.19, 90...
## 5  POLYGON ((150.41 6040.45, 1...
## 6  POLYGON ((270.77 5929.79, 2...
## 7  POLYGON ((51.39 5970.01, 62...
## 8  POLYGON ((74.06 5871.1, 76....
## 9  POLYGON ((229.67 5932.56, 2...
## 10 POLYGON ((189.96 5878.34, 1...

## 
## TRUE 
##   26

st_crs(eire)

## Coordinate Reference System: NA

eire = st_set_crs(x = eire,
                  "+proj=utm +zone=29 +ellps=airy +units=km")
st_crs(eire); st_crs(eire)$proj; st_crs(eire)$proj4string

## Coordinate Reference System:
##   User input: +proj=utm +zone=29 +ellps=airy +units=km 
##   wkt:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["Unknown based on Airy 1830 ellipsoid",
##             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
##                 LENGTHUNIT["metre",1,
##                     ID["EPSG",9001]]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["UTM zone 29N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-9,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]],
##         ID["EPSG",16029]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["kilometre",1000,
##                 ID["EPSG",9036]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["kilometre",1000,
##                 ID["EPSG",9036]]]]

## [1] "utm"

## [1] "+proj=utm +zone=29 +ellps=airy +units=km +no_defs"

## Plot to check if all continuous polygons are valid and projected
plot(st_geometry(eire))

Create the nb object using queen contiguity

(eire.nb.qn = poly2nb(eire,
                      queen = T))

## Neighbour list object:
## Number of regions: 26 
## Number of nonzero links: 114 
## Percentage nonzero weights: 16.86391 
## Average number of links: 4.384615

Identify the coordinates of a point that lies in the polygon

eire.coords = st_point_on_surface(st_geometry(eire))

Plot the base shape

plot(st_geometry(eire),
     main = "eire dataset - Neighbours created from Queen Contiguity",
     border = "grey60",
     axes = T)
  plot(eire.nb.qn, eire.coords,
       pch = 19,
       cex = 0.6,
       add = T)

Create the weights list based on binary style and row normalised style

## List weights based on binary style ("B")
(eire.lw.b = nb2listw(neighbours = eire.nb,
                      style = "B"))

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 26 
## Number of nonzero links: 114 
## Percentage nonzero weights: 16.86391 
## Average number of links: 4.384615 
## 
## Weights style: B 
## Weights constants summary:
##    n  nn  S0  S1   S2
## B 26 676 114 228 2288

## List weights based on row normalised style ("B")
(eire.lw.w = nb2listw(neighbours = eire.nb,
                      style = "W"))

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 26 
## Number of nonzero links: 114 
## Percentage nonzero weights: 16.86391 
## Average number of links: 4.384615 
## 
## Weights style: W 
## Weights constants summary:
##    n  nn S0      S1       S2
## W 26 676 26 13.2227 108.8578

1 Join Count Analysis

Check if the variable is binary

eire$pale

##  [1] 1 0 0 0 0 1 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1

(eire$pale_factor = as.factor(x = eire$pale == 1))

##  [1] TRUE  FALSE FALSE FALSE FALSE TRUE  FALSE FALSE TRUE  TRUE  FALSE FALSE
## [13] FALSE TRUE  TRUE  FALSE TRUE  FALSE FALSE FALSE FALSE TRUE  TRUE  TRUE 
## [25] TRUE  TRUE 
## Levels: FALSE TRUE

summary(eire$pale_factor)

## FALSE  TRUE 
##    14    12

Plot the results

tm_shape(eire) +
  tm_polygons(col = "pale_factor")

joincount.multi(fx = eire$pale_factor,
                listw = eire.lw.b)

##             Joincount Expected Variance z-value
## FALSE:FALSE   18.0000  15.9600   8.4175  0.7031
## TRUE:TRUE     18.0000  11.5754   6.8634  2.4523
## TRUE:FALSE    21.0000  29.4646  11.9130 -2.4524
## Jtot          21.0000  29.4646  11.9130 -2.4524

Total number of neighbourhood relationships, $\sum_{i} \sum_{j} w_{ij} = 26$
Number of non-zero links, $2(18 + 18 + 21) = 114$
Expected number of BW, $\frac{12}{26} \times \frac{14}{25} \times 2 \times \frac{1}{2} \times 114 = 29.4646$
Expected number of WW, $\frac{14}{26} \times \frac{13}{25} \times \frac{1}{2} \times 114 = 15.9600$
Expected number of BB, $\frac{12}{26} \times \frac{11}{25} \times \frac{1}{2} \times 114 = 11.5754$

Conclusion: The observed BW count (21.000) is significantly less than the expected BW count (29.4646). Sufficient evidence to suggest significant clustering at 5% level of significance. Reject $H_{0}$ at 5% level of significance.

Note: - At 10% significance level, a Z-score of <-1.65 or >+1.65 is considered significant - At 5% significance level, a Z-score of <-1.96 or >+1.96 is considered significant - At 1% significance level, a Z-score of <-2.58 or >+2.58 is considered significant

2 Moran’s I

Key points:

$H_{0}: E[I] = -\frac{1}{n-1}$<> $\Rightarrow$ No spatial autocorrelation (CSR) in favour of positive spatial autocorrelation
Plot the scatterplot before before concluding using statistical analysis

Plot the attribute on the map to determine the ordinal data:

tm_shape(eire) + 
  tm_polygons(col = "A",
              palette = "YlGnBu",
              alpha = 0.5) +
  tm_layout(legend.outside = T)

Plot the Moran’s scatterplot:

moran.plot(eire$A,
           listw = eire.lw.w,
           main = "Moran's Scatterplot for variable A",
           labels = F,
           xlab = "A",
           ylab = "Spatially Lagged A")

From the plot, we can visually confirm that variable A exhibits positive spatial autocorrelation.

Run Moran’s test:

moran.test(x = eire$A,
           listw = eire.lw.w)

## 
##  Moran I test under randomisation
## 
## data:  eire$A  
## weights: eire.lw.w    
## 
## Moran I statistic standard deviate = 4.6851, p-value = 1.399e-06
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.55412382       -0.04000000        0.01608138

g.moran.p = moran.test(x = eire$A,
                       listw = eire.lw.w)$p.value
g.moran.i = moran.test(x = eire$A,
                       listw = eire.lw.w)$estimate[1]
g.moran.e.i = moran.test(x = eire$A,
                         listw = eire.lw.w)$estimate[2]

Conclusion: Since 1.3993826^{-6} < 0.05 and 0.5541238 > -0.04. Reject the null hypothesis of no spatial autocorrelation in favour of positive spatial autocorrelation 5% significance level. There seem to be spatial patterning in the incidence of group A type blood at the county level.

Local Moran’s I

Run Local Moran’s test:

l.moran = localmoran(x = eire$A,
                     listw = eire.lw.w)
head(l.moran)

##                   Ii          E.Ii       Var.Ii       Z.Ii Pr(z != E(Ii))
## Carlow   1.636481921 -0.1078854679 0.4170668396  2.7010656    0.006911771
## Cavan   -0.005625461 -0.0001168927 0.0005064758 -0.2447707    0.806633993
## Clare    0.438557030 -0.0440391047 0.3344584256  0.8344739    0.404013977
## Cork     0.596073603 -0.1552187377 0.7457784490  0.8699695    0.384317051
## Donegal  0.136917303 -0.0128996974 0.3310656763  0.2603778    0.794572336
## Dublin   1.109745739 -0.0526106276 0.3959729542  1.8471686    0.064722709

eire$l.moran.z = l.moran[, 4]

Plot:

tm_shape(eire) +
  tm_polygons(col = "l.moran.z",
              palette = "YlGnBu",
              alpha = 0.5) +
  tm_layout(main.title = "Local Moran's I - Z-test",
            legend.outside = T)

Show that the sum of Local Moran’s I statistics is proportional to the Global Moran’s I statistic:

sum.l.moran = sum(l.moran[ , 1]) / sum(unlist(eire.lw.w$weights))
sum(l.moran[ , 1]) / nrow(eire)

## [1] 0.5541238

Show that Moran’s I statistic is highly sensitive to the choice of spatial weights matrix by comparing the results of a row normalised weights list and the binary weights list.

moran.test(x = eire$A,
           listw = eire.lw.w)

## 
##  Moran I test under randomisation
## 
## data:  eire$A  
## weights: eire.lw.w    
## 
## Moran I statistic standard deviate = 4.6851, p-value = 1.399e-06
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.55412382       -0.04000000        0.01608138

moran.test(x = eire$A,
           listw = eire.lw.b)

## 
##  Moran I test under randomisation
## 
## data:  eire$A  
## weights: eire.lw.b    
## 
## Moran I statistic standard deviate = 4.4684, p-value = 3.94e-06
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.47944757       -0.04000000        0.01351367

The computed Moran’s I changes when the weight matrix is changed from row standardised to binary.

3 Geary’s C statistic

Key points:

$H_{0}: CSR = 1$<> $\Rightarrow$ No spatial autocorrelation (CSR)

Plot the attribute on the map to determine the ordinal data:

tm_shape(eire) + 
  tm_polygons(col = "A",
              palette = "YlGnBu",
              alpha = 0.5) +
  tm_layout(legend.outside = T,)

Run the Geary’s C test:

geary.test(eire$A,
           listw = eire.lw.w)

## 
##  Geary C test under randomisation
## 
## data:  eire$A 
## weights: eire.lw.w 
## 
## Geary C statistic standard deviate = 4.5146, p-value = 3.172e-06
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic       Expectation          Variance 
##        0.38011971        1.00000000        0.01885309

g.geary.p = geary.test(eire$A,
                       listw = eire.lw.w)$p.value
g.geary.c = geary.test(eire$A,
                       listw = eire.lw.w)$estimate[1]

Conclusion: Since 3.1722482^{-6} < 0.05, reject the null hypothesis of no spatial autocorrelation ($C = 1)$ at 5% significance level. There seem to be spatial patterning in the incidence of group A type blood at the county level.

4 Getis-Ord’s G statistic

Key points:

$H_{0}: CSR$ $\Rightarrow$ No spatial autocorrelation (CSR)

Plot the attribute on the map to determine the ordinal data:

tm_shape(eire) + 
  tm_polygons(col = "A",
              palette = "YlGnBu",
              alpha = 0.5) +
  tm_layout(legend.outside = T,)

Run the Getis-Ord’s G test:

globalG.test(x = eire$A,
             listw = eire.lw.b)

## 
##  Getis-Ord global G statistic
## 
## data:  eire$A 
## weights: eire.lw.b 
## 
## standard deviate = 1.3199, p-value = 0.09344
## alternative hypothesis: greater
## sample estimates:
## Global G statistic        Expectation           Variance 
##       1.787800e-01       1.753846e-01       6.617796e-06

g.getisord.p = globalG.test(x = eire$A,
                            listw = eire.lw.b)$p.value

Conclusion: Since 0.0934365 > 0.05, do not reject the null hypothesis at 5% significance level.

Local Getis-Ord’s G

Run the Local Getis-Ord’s G test:

l.getisord = localG(x = eire$A,
                    listw = eire.lw.b)
eire$l.getisord.z = l.getisord

Plot:

tm_shape(eire) +
  tm_polygons(col = "l.getisord.z",
              alpha = 0.5) +
  tm_layout(main.title = "Local Getis-Ord's G - Z-test",
            legend.outside = T)

## Variable(s) "l.getisord.z" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

5 Correlogram

correlogram.corr = sp.correlogram(neighbours = eire.nb.qn,
                                  var = eire$A,
                                  order = 3,
                                  method = "corr",
                                  style = "W")
correlogram.i = sp.correlogram(neighbours = eire.nb.qn,
                                  var = eire$A,
                                  order = 3,
                                  method = "I",
                                  style = "W")
correlogram.c = sp.correlogram(neighbours = eire.nb.qn,
                                  var = eire$A,
                                  order = 3,
                                  method = "C",
                                  style = "W")
par(mfrow = c(1, 3))
plot(correlogram.corr,
     main = "Contiguity Lag Orders\nCorrelation")
plot(correlogram.i,
     main = "Contiguity Lag Orders\nMoran's I")
plot(correlogram.c,
     main = "Contiguity Lag Orders\nGeary's C")

Conclusion: From the correlogram using “Correlation” method, a lag order of 2 might be more suitable. From the correlogram using “Moran’s I” method, a lag order of 1 might be more suitable. From the correlogram using “Geary’s C” method, a lag order of 2 might be more suitable.

Lecture 7a

Preparation of data

We will use the ‘uk’ dataset from Lecture 6. Carry out the usual inspections: (1) Load the dataset, and check if all entries are valid and projected; (2) load the gal file and convert it into weights list; (3) identify the coordinates of a point that lies in the polygon; (4) plot the base shape; and (5) create the weights list based on binary style and row normalised style.

Load the dataset, and check if all entries are valid and projected:

uk.data = st_read("econ6027 cheatsheet/uk_data/uk_data.shp")

## Reading layer `uk_data' from data source 
##   `/Users/gertrude/Documents/01 university/02 smu mse/04 classes/econ6027 | spatial econometrics and data analysis/econ6027 cheatsheet/uk_data/uk_data.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 12 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid

st_make_valid(uk.data); table(st_is_valid(uk.data))

## Simple feature collection with 12 features and 12 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid
## First 10 features:
##    objectd nts118c                  nts118n  bng_e  bng_n      long      lat
## 1        1     UKC     North East (England) 417313 600358 -1.728900 55.29703
## 2        2     UKD     North West (England) 350015 506280 -2.772370 54.44945
## 3        3     UKE Yorkshire and The Humber 446903 448736 -1.287120 53.93264
## 4        4     UKF  East Midlands (England) 477660 322635 -0.849670 52.79572
## 5        5     UKG  West Midlands (England) 386294 295477 -2.203580 52.55697
## 6        6     UKH          East of England 571074 263229  0.504146 52.24067
## 7        7     UKI                   London 517516 178392 -0.308640 51.49227
## 8        8     UKJ     South East (England) 470062 172924 -0.993110 51.45097
## 9        9     UKK     South West (England) 285015 102567 -3.633430 50.81119
## 10      10     UKL                    Wales 263406 242881 -3.994160 52.06741
##        st_arsh   st_lngt gross_v L_prod pct_bsb                       geometry
## 1   8609938893  657578.2     3.2   86.2    11.2 MULTIPOLYGON (((4e+05 65416...
## 2  14182609113 1063052.7     9.5   88.6    11.1 MULTIPOLYGON (((356937.8 58...
## 3  15432322860  863264.1     6.9   84.7    10.5 POLYGON ((480000 517670.2, ...
## 4  15658181358  889656.3     6.2   89.2    10.3 POLYGON ((516022.7 412210.9...
## 5  13002149549  774627.1     7.3   89.1    10.5 POLYGON ((409403 365710.8, ...
## 6  19155125591 1083693.3     8.7   96.8    10.5 MULTIPOLYGON (((595955.6 19...
## 7   1585475935  270855.1    21.6  139.7    14.6 POLYGON ((537543.8 199883.1...
## 8  19115903194 1454554.9    14.7  108.3    10.8 MULTIPOLYGON (((451299.1 96...
## 9  23979440241 1647103.7     7.7   89.8     9.6 MULTIPOLYGON (((89276.33 82...
## 10 20820253175 1552749.7     3.6   81.5     9.3 MULTIPOLYGON (((212996.6 19...

## 
## TRUE 
##   12

## To remove the "e" notation from the summary outputs
options(scipen = 999)

Load the gal file and convert it into weights list:

(uk.nb = read.gal(file = "econ6027 cheatsheet/6a-1.2.gal")); class(uk.nb)

## Neighbour list object:
## Number of regions: 12 
## Number of nonzero links: 42 
## Percentage nonzero weights: 29.16667 
## Average number of links: 3.5

## [1] "nb"

uk.lw = nb2listw(neighbours = uk.nb)

1 Choosing the best model

1.1 Setting up the models

Variable	Column	Description
GVA	gross_v	Gross Value Added
LP	L_prod	Labour Productivity
PBB	pct_bsb	Percentage of new business births

Model 1: $GVA_i = \beta_0 + \beta_1 LP_i +\beta_2 PBB_i + \epsilon_i$

mdl1 = lm(formula = gross_v ~ L_prod + pct_bsb,
          data = uk.data)
summary(mdl1)

## 
## Call:
## lm(formula = gross_v ~ L_prod + pct_bsb, data = uk.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1398 -0.9172 -0.4388  1.0958  2.5365 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -22.31118    3.38594  -6.589 0.000100 ***
## L_prod        0.27750    0.05346   5.191 0.000571 ***
## pct_bsb       0.42239    0.47243   0.894 0.394567    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.791 on 9 degrees of freedom
## Multiple R-squared:  0.9072, Adjusted R-squared:  0.8866 
## F-statistic: 43.99 on 2 and 9 DF,  p-value: 0.00002259

mdl1.p = pf(q = summary(mdl1)$fstatistic[1],
            df1 = summary(mdl1)$fstatistic[2],
            df2 = summary(mdl1)$fstatistic[3],
            lower.tail = F)
mdl1.r2 = summary(mdl1)$r.squared
mdl1.r2a = summary(mdl1)$adj.r.squared

mdl1.lp.p = summary(mdl1)$coefficients[2, 4]
mdl1.pbb.p = summary(mdl1)$coefficients[3, 4]

Conclusion: A p-value of 0.0000226 < 0.05, signifies that at the 5% confidence level, the results from Model 1 is significant. From the p-values of the independent variables, it suggest that ‘Labour Productivity’ is statistically significant at the 5% significance level (p-value = 0.0005708 < 0.05) but ‘Percentage of new business births’ is not statistically significant significance level (p-value = 0.3945672 > 0.05).

Model 2: $GVA_i = \beta_0 + \beta_1 LP_i + \epsilon_i$

mdl2 = lm(formula = gross_v ~ L_prod,
          data = uk.data)
summary(mdl2)

## 
## Call:
## lm(formula = gross_v ~ L_prod, data = uk.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5300 -0.8375 -0.4193  1.0386  3.0149 
## 
## Coefficients:
##              Estimate Std. Error t value   Pr(>|t|)    
## (Intercept) -21.38866    3.19237  -6.700 0.00005365 ***
## L_prod        0.31460    0.03335   9.432 0.00000271 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.773 on 10 degrees of freedom
## Multiple R-squared:  0.899,  Adjusted R-squared:  0.8889 
## F-statistic: 88.97 on 1 and 10 DF,  p-value: 0.000002708

mdl2.p = pf(summary(mdl2)$fstatistic[1],
            summary(mdl2)$fstatistic[2],
            summary(mdl2)$fstatistic[3],
            lower.tail = F) 

mdl2.r2 = summary(mdl2)$r.squared
mdl2.r2a = summary(mdl2)$adj.r.squared

mdl2.lp.p = summary(mdl2)$coefficients[2, 4]

Conclusion: A p-value of 0.0000027 < 0.05, signifies that at the 5% confidence level, the results from Model 2 is significant. From the p-values of the independent variables, it suggest that ‘Labour Productivity’ is statistically significant at the 5% significance level (p-value = 0.0000027 < 0.05).

Model 3: $GVA_i = \beta_0 + \beta_1 PBB_i + \epsilon_i$

mdl3 = lm(formula = gross_v ~ pct_bsb,
          data = uk.data)
summary(mdl3)

## 
## Call:
## lm(formula = gross_v ~ pct_bsb, data = uk.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.8006 -1.5308 -0.6353  1.8746  5.6300 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -16.0547     5.9992  -2.676  0.02325 * 
## pct_bsb       2.3264     0.5646   4.121  0.00208 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.397 on 10 degrees of freedom
## Multiple R-squared:  0.6293, Adjusted R-squared:  0.5923 
## F-statistic: 16.98 on 1 and 10 DF,  p-value: 0.002075

mdl3.p = pf(summary(mdl3)$fstatistic[1],
            summary(mdl3)$fstatistic[2],
            summary(mdl3)$fstatistic[3],
            lower.tail = F)

mdl3.r2 = summary(mdl3)$r.squared
mdl3.r2a = summary(mdl3)$adj.r.squared

mdl3.pbb.p = summary(mdl3)$coefficient[2, 4]

Conclusion: A p-value of 0.002075 < 0.05, signifies that at the 5% confidence level, the results from Model 3 is significant. From the p-values of the independent variables, it suggest that ‘Percentage of new business births’ is statistically significant at the 5% significance level (p-value = 0.002075 < 0.05). However, given the ‘Percentage of new business births’ was not statistically significant, it’s significance in this model could be a result of omitted variable bias.

1.2 Test for Goodness-of-Fit

Akaike Information Criteria (AIC):

mdl1.aic = AIC(mdl1)
mdl2.aic = AIC(mdl2)
mdl3.aic = AIC(mdl3)

Schwartz (or Bayesian) Information Criteria (BIC):

mdl1.bic = BIC(mdl1)
mdl2.bic = BIC(mdl2)
mdl3.bic = BIC(mdl3)

Summary table of $R^2$, Adj. $R^2$, AIC and BIC:

Model	R-squared	Adj.R-squared	AIC	BIC
Model 1	0.9072016	0.8865798	52.59553	54.53516
Model 2	0.8989596	0.8888555	51.61663	53.07135
Model 3	0.6293493	0.5922842	67.21350	68.66822

Conclusion: From the summary table, Model 2 is a better model for the dataset. It has the highest Adjusted $R^2$, lowest AIC and BIC compared to the other 2 models.

1.3 Testing for Endogeneity between independent variables

mdl4 = lm(formula = L_prod ~ pct_bsb,
          data = uk.data)
summary(mdl4)

## 
## Call:
## lm(formula = L_prod ~ pct_bsb, data = uk.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.192  -6.589  -2.225   4.571  16.980 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   22.546     18.718   1.205  0.25613   
## pct_bsb        6.861      1.761   3.895  0.00298 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.6 on 10 degrees of freedom
## Multiple R-squared:  0.6027, Adjusted R-squared:  0.563 
## F-statistic: 15.17 on 1 and 10 DF,  p-value: 0.002984

mdl4.p = pf(q = summary(mdl4)$fstatistic[1],
            df1 = summary(mdl4)$fstatistic[2],
            df2 = summary(mdl4)$fstatistic[3],
            lower.tail = F)
mdl4.pbb.p = summary(mdl4)$estimate[2, 4]

Conclusion: Given the significance of Model 4 (p-value = 0.0029845 < 0.05), it suggests that there may be endogeneity in Model 1.

1.4 Test for Sphericity of Disturbance (Homoskedasticity)

Breusch-Pagan Test

Model 1

lmtest::bptest(mdl1)

## 
##  studentized Breusch-Pagan test
## 
## data:  mdl1
## BP = 1.5183, df = 2, p-value = 0.4681

mdl1.bp.p = lmtest::bptest(mdl1)$p.value

Conclusion: Since p-value = 0.4680731 > 0.05, we do not reject the null of homoskedasticity at the 5% significance level, which suggests that there is no evidence of heteroskedasticity.

Model 2

lmtest::bptest(mdl2)

## 
##  studentized Breusch-Pagan test
## 
## data:  mdl2
## BP = 0.52669, df = 1, p-value = 0.468

mdl2.bp.p = lmtest::bptest(mdl2)$p.value

Conclusion: Since p-value = 0.4680039 > 0.05, we do not reject the null of homoskedasticity at the 5% significance level, which suggests that there is no evidence of heteroskedasticity.

Model 3

lmtest::bptest(mdl3)

## 
##  studentized Breusch-Pagan test
## 
## data:  mdl3
## BP = 0.36682, df = 1, p-value = 0.5447

mdl3.bp.p = lmtest::bptest(mdl3)$p.value

Conclusion: Since p-value = 0.5447424 > 0.05, we do not reject the null of homoskedasticity at the 5% significance level, which suggests that that there is no evidence of heteroskedasticity.

1.5 Test for Normality of Disturbances

Jarque-Bera Test

Model 2

tseries::jarque.bera.test(mdl1$residuals)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## 
##  Jarque Bera Test
## 
## data:  mdl1$residuals
## X-squared = 0.091982, df = 2, p-value = 0.9551

mdl1.jb.p = tseries::jarque.bera.test(mdl1$residuals)$p.value
mdl1.jb.x = tseries::jarque.bera.test(mdl1$residuals)$statistic

Conclusion: Since p-value = 0.9550507 > 0.05 and the Jarque-Bera test statistic is close to 0 ($JB$ =0.09), there is insufficient evidence to reject the null of normality at the 5% significance level, which suggests that the data follows a normal distribution.

Model 2

tseries::jarque.bera.test(mdl2$residuals)

## 
##  Jarque Bera Test
## 
## data:  mdl2$residuals
## X-squared = 0.37587, df = 2, p-value = 0.8287

mdl2.jb.p = tseries::jarque.bera.test(mdl2$residuals)$p.value
mdl2.jb.x = tseries::jarque.bera.test(mdl2$residuals)$statistic

Conclusion: Since p-value = 0.828668 > 0.05 and the Jarque-Bera test statistic is close to 0 ($JB$ = 0.38), there is insufficient evidence to reject the null of normality at the 5% significance level, which suggests that the data follows a normal distribution.

Model 3

tseries::jarque.bera.test(mdl3$residuals)

## 
##  Jarque Bera Test
## 
## data:  mdl3$residuals
## X-squared = 0.079398, df = 2, p-value = 0.9611

mdl3.jb.p = tseries::jarque.bera.test(mdl3$residuals)$p.value
mdl3.jb.x = tseries::jarque.bera.test(mdl3$residuals)$statistic

Conclusion: Since p-value = 0.9610785 > 0.05 and the Jarque-Bera test statistic is small ($JB$ = 0.08), there is insufficient evidence to reject the null of normality at the 5% significance level, which suggests that the data follows a normal distribution.

2 Test for lagged variables

2.1 Test for lagged dependent variables

(lag.gva = as.matrix(lag.listw(x = uk.lw,
                               var = uk.data$gross_v)))

##            [,1]
##  [1,]  8.233333
##  [2,]  5.916667
##  [3,]  6.300000
##  [4,]  9.420000
##  [5,]  8.340000
##  [6,] 14.166667
##  [7,] 11.700000
##  [8,] 10.300000
##  [9,]  8.533333
## [10,]  8.166667
## [11,]  5.000000
## [12,]  8.300000

moran.plot(x = uk.data$gross_v,
           listw = uk.lw,
           main = "Moran's Scatter Plot - GVA",
           labels = F,
           xlab = "GVA",
           ylab = "Spatially Lagged GVA")

From Moran’s scatter plot, there seems to be positive spatial autocorrelation in the dependent variable ‘Gross Value Added’ (GVA) with most points in the High-High and Low-Low quadrants.

moran_test.gva = moran.test(x = uk.data$gross_v,
                            listw = uk.lw)

moran.test.gva.p = moran_test.gva$p.value

Conclusion: Since p-value = 0.0505258 > 0.05, we do not reject the null hypothesis of spatial randomness in favour of of spatial autocorrelation. There is no evidence of spatial autocorrelation in the dependent variable ‘GVA’ at the 5% significance level.

2.2 Test for lagged residual variables

<span style = “color:steelblue>Model 1<

mdl1.moran.test = lm.morantest(model = mdl1,
                               listw = uk.lw)

mdl1.moran.test.p = mdl1.moran.test$p.value
mdl1.moran.test.i = mdl1.moran.test$estimate[1]
mdl1.moran.test.e = mdl1.moran.test$estimate[2]

Conclusion: Since p-value = 0.5683374 > 0.05, there is insufficient evidence to reject the null of spatial randomness in favour of positive spatial autocorrelation. There is no evidence of spatial autocorrelation in the residuals at the 5% significance level.

Lecture 7b

1 Regression Modelling of Spatial Relationships

Load the data:

Variable	Column	Description
P	price.1960	Taxes and delivery charges for 1955-1959 new cars
T	tax.charges	1960 used car prices

used.cars = spData::used.cars
head(used.cars)

##    tax.charges price.1960
## AL         129       1461
## AZ         218       1601
## AR         176       1469
## CA         252       1611
## CO         186       1606
## CT         154       1491

used.cars.lw.w = nb2listw(neighbours = spData::usa48.nb,
                        style = "W")

1.1 Determine the OLS fit

Model: $P_i = \beta_0 + \beta_1 Ti + \epsilon_i$

mdl.uc = lm(formula = price.1960 ~ tax.charges,
            data = used.cars)
summary(mdl.uc)

## 
## Call:
## lm(formula = price.1960 ~ tax.charges, data = used.cars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -116.701  -45.053   -1.461   43.400  107.807 
## 
## Coefficients:
##              Estimate Std. Error t value             Pr(>|t|)    
## (Intercept) 1435.7506    27.5796  52.058 < 0.0000000000000002 ***
## tax.charges    0.6872     0.1754   3.918             0.000294 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared:  0.2503, Adjusted R-squared:  0.234 
## F-statistic: 15.35 on 1 and 46 DF,  p-value: 0.0002939

mdl.uc.p = pf(summary(mdl.uc)$fstatistic[1],
              summary(mdl.uc)$fstatistic[2],
              summary(mdl.uc)$fstatistic[3],
              lower.tail = F)
mdl.uc.t.int = summary(mdl.uc)$coefficients[1, 4]
mdl.uc.t.p = summary(mdl.uc)$coefficients[2, 4]

Are the model parameters significant?

Conclusion: The model parameters – intercept and tax variable are significant.

Does the linear regression model suffer from non-normality?

mdl.uc.jb.test = tseries::jarque.bera.test(mdl.uc$residuals)

mdl.uc.jb.p = mdl.uc.jb.test$p.value
mdl.uc.jb.x = mdl.uc.jb.test$statistic

Conclusion: Since the p-value = 0.3885549 > 0.05 and the Jarque-Bera test statistic is small (1.89), there is insufficient evidence to reject the null of normality, this suggests that the data follows a normal distribution.

Does the linear regression model suffer from heteroskedasticity in the errors?

mdl.uc.bp.test = lmtest::bptest(mdl.uc)

mdl.uc.bp.p = mdl.uc.bp.test$p.value

Conclusion: Since the p-value = 0.9709966 > 0.05, there is insufficient evidence to reject the null of homoskedasticity, which suggests that there is no evidence of heteroskedasticity.

1.2 Run Moran’s test on OLS residuals and the dependent variable

Run Moran’s test on the OLS residuals:

mdl.uc.morans.test = lm.morantest(model = mdl.uc,
                                  listw = used.cars.lw.w)

mdl.uc.morans.p = mdl.uc.morans.test$p.value
mdl.uc.morans.i = mdl.uc.morans.test$statistic[1]
mdl.uc.morans.e = mdl.uc.morans.test$statistic[2]

Conclusion: Since the p-value = 0 < 0.05 and $I$ = 6.3868735 > NA = $E[I]$, there is evidence of significant spatial autocorrelation in the OLS residuals based on Moran’s test for residuals. Recommend to consider SEM.

1.3 Run Moran’s test on the dependent variable ‘P’

(lag.uc.price = as.matrix(lag.listw(x = used.cars.lw.w,
                                    var = used.cars$price.1960)))

##           [,1]
##  [1,] 1480.250
##  [2,] 1625.200
##  [3,] 1479.833
##  [4,] 1621.333
##  [5,] 1593.571
##  [6,] 1557.000
##  [7,] 1510.667
##  [8,] 1471.000
##  [9,] 1476.600
## [10,] 1631.333
## [11,] 1482.200
## [12,] 1477.000
## [13,] 1537.833
## [14,] 1531.500
## [15,] 1476.714
## [16,] 1480.000
## [17,] 1539.000
## [18,] 1500.750
## [19,] 1537.600
## [20,] 1466.333
## [21,] 1568.750
## [22,] 1475.500
## [23,] 1510.375
## [24,] 1631.000
## [25,] 1582.667
## [26,] 1629.000
## [27,] 1559.333
## [28,] 1527.333
## [29,] 1566.600
## [30,] 1528.800
## [31,] 1476.000
## [32,] 1606.000
## [33,] 1471.800
## [34,] 1542.333
## [35,] 1624.500
## [36,] 1507.667
## [37,] 1531.500
## [38,] 1479.000
## [39,] 1609.500
## [40,] 1466.125
## [41,] 1518.000
## [42,] 1630.500
## [43,] 1551.667
## [44,] 1467.600
## [45,] 1643.000
## [46,] 1478.200
## [47,] 1550.500
## [48,] 1621.333

moran.plot(x = used.cars$price.1960,
           listw = used.cars.lw.w,
           main = "Moran's Scatter Plot - Price of Used Cars",
           labels = F,
           xlab = "Price",
           ylab = "Spatially lagged Price")

From Moran’s scatter plot, there seems to be positive spatial autocorrelation in the dependent variable ‘Price of Used Cars’ with most points in the High-High and Low-Low quadrants.

mdl.uc.moran.price = moran.test(x = used.cars$price.1960,
                                listw = used.cars.lw.w)

mdl.uc.moran.price.p = mdl.uc.moran.price$p.value
mdl.uc.moran.price.i = mdl.uc.moran.price$estimate[1]
mdl.uc.moran.price.e = mdl.uc.moran.price$estimate[2]

Conclusion: Since p-value = 0 < 0.05, there is evidence of significant spatial autocorrelation in the dependent variable ‘Price of Used Cars’ based on Moran’s test for residuals. Recommend to consider SLM or SAR.

1.4 Run Spatial Regression Models

Pure spatial autogressive model (SAR)

uc.sar = spautolm(formula = price.1960 ~ 1,
                  data = used.cars,
                  listw = used.cars.lw.w)
summary(uc.sar)

## 
## Call: 
## spautolm(formula = price.1960 ~ 1, data = used.cars, listw = used.cars.lw.w)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -76.152 -18.221   5.249  21.631  63.498 
## 
## Coefficients: 
##             Estimate Std. Error z value              Pr(>|z|)
## (Intercept) 1542.607     27.321  56.462 < 0.00000000000000022
## 
## Lambda: 0.82919 LR test value: 54.202 p-value: 0.00000000000018086 
## Numerical Hessian standard error of lambda: 0.065233 
## 
## Log likelihood: -240.977 
## ML residual variance (sigma squared): 1045.4, (sigma: 32.333)
## Number of observations: 48 
## Number of parameters estimated: 3 
## AIC: NA (not available for weighted model)

Conclusion: The pure spatial lag parameter is highly significant.

Spatial lag model (SLM)

uc.slm = lagsarlm(formula = price.1960 ~ tax.charges,
                  data = used.cars,
                  listw = used.cars.lw.w)
summary(uc.slm)

## 
## Call:lagsarlm(formula = price.1960 ~ tax.charges, data = used.cars, 
##     listw = used.cars.lw.w)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -77.6781 -16.9504   4.2497  19.5484  58.9811 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) 309.42546  123.03167  2.5150   0.0119
## tax.charges   0.16711    0.10212  1.6364   0.1018
## 
## Rho: 0.78302, LR test value: 42.681, p-value: 0.000000000064426
## Asymptotic standard error: 0.081636
##     z-value: 9.5916, p-value: < 0.000000000000000222
## Wald statistic: 91.998, p-value: < 0.000000000000000222
## 
## Log likelihood: -239.8252 for lag model
## ML residual variance (sigma squared): 1036.7, (sigma: 32.197)
## Number of observations: 48 
## Number of parameters estimated: 4 
## AIC: 487.65, (AIC for lm: 528.33)
## LM test for residual autocorrelation
## test value: 2.1141, p-value: 0.14594

Conclusion: The spatial lag parameter is highly significant but not the tax parameter as opposed to the simple linear model.

Spatial error model (SEM)

uc.sem = errorsarlm(formula = price.1960 ~ tax.charges,
                    data = used.cars,
                    listw = used.cars.lw.w)
summary(uc.sem)

## 
## Call:errorsarlm(formula = price.1960 ~ tax.charges, data = used.cars, 
##     listw = used.cars.lw.w)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -74.824 -17.459   2.406  21.278  64.597 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                Estimate  Std. Error z value            Pr(>|z|)
## (Intercept) 1528.345377   31.962601 47.8167 <0.0000000000000002
## tax.charges    0.088309    0.119233  0.7406              0.4589
## 
## Lambda: 0.819, LR test value: 40.899, p-value: 0.0000000001603
## Asymptotic standard error: 0.074051
##     z-value: 11.06, p-value: < 0.000000000000000222
## Wald statistic: 122.32, p-value: < 0.000000000000000222
## 
## Log likelihood: -240.7163 for error model
## ML residual variance (sigma squared): 1043.9, (sigma: 32.309)
## Number of observations: 48 
## Number of parameters estimated: 4 
## AIC: NA (not available for weighted model), (AIC for lm: 528.33)

Conclusion: The spatial lag parameter is highly significant but not the tax paramenter as opposed to the simple linear model.

SARAR(1,1)

uc.sarar = sacsarlm(formula = price.1960 ~ tax.charges,
                    data = used.cars,
                    listw = used.cars.lw.w)
summary(uc.sarar)

## 
## Call:sacsarlm(formula = price.1960 ~ tax.charges, data = used.cars, 
##     listw = used.cars.lw.w)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -74.0183 -16.7023   6.2859  19.5354  48.9003 
## 
## Type: sac 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) 139.123986  90.854598  1.5313   0.1257
## tax.charges   0.120064   0.079744  1.5056   0.1322
## 
## Rho: 0.89818
## Asymptotic standard error: 0.061727
##     z-value: 14.551, p-value: < 0.000000000000000222
## Lambda: -0.49184
## Asymptotic standard error: 0.27454
##     z-value: -1.7915, p-value: 0.073214
## 
## LR test value: 44.889, p-value: 0.00000000017884
## 
## Log likelihood: -238.7213 for sac model
## ML residual variance (sigma squared): 833.94, (sigma: 28.878)
## Number of observations: 48 
## Number of parameters estimated: 5 
## AIC: 487.44, (AIC for lm: 528.33)

Conclusion: Only the spatial lag parameter is highly significant. The spatial error parameter is not significant suggesting that SARAR(1,1) is not a suitable model. This outcome largely confirms the outcome of the LM test for residual autocorrelation done in conjunction with the estimation of the SLM parameters.

2 Decide on the best model

2.1 Run LM and RLM to test alternative hypotheses

Run the LM and RLM test to determine whether the SLM model or the SEM model is a better fit for the data.

lm.LMtests(model = mdl.uc,
           listw = used.cars.lw.w,
           test = "all")

## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = price.1960 ~ tax.charges, data = used.cars)
## weights: used.cars.lw.w
## 
## LMerr = 31.793, df = 1, p-value = 0.00000001715
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = price.1960 ~ tax.charges, data = used.cars)
## weights: used.cars.lw.w
## 
## LMlag = 40.664, df = 1, p-value = 0.0000000001808
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = price.1960 ~ tax.charges, data = used.cars)
## weights: used.cars.lw.w
## 
## RLMerr = 0.051751, df = 1, p-value = 0.82
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = price.1960 ~ tax.charges, data = used.cars)
## weights: used.cars.lw.w
## 
## RLMlag = 8.9229, df = 1, p-value = 0.002816
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = price.1960 ~ tax.charges, data = used.cars)
## weights: used.cars.lw.w
## 
## SARMA = 40.716, df = 2, p-value = 0.000000001441

uc.LMerr.p = lm.LMtests(model = mdl.uc,
                     listw = used.cars.lw.w,
                     test = "LMerr")$p.value
uc.LMlag.p = lm.LMtests(model = mdl.uc,
                     listw = used.cars.lw.w,
                     test = "LMlag")$p.value
uc.RLMerr.p = lm.LMtests(model = mdl.uc,
                      listw = used.cars.lw.w,
                      test = "RLMerr")$p.value
uc.RLMlag.p = lm.LMtests(model = mdl.uc,
                      listw = used.cars.lw.w,
                      test = "RLMlag")$p.value

Conclusion: Considering the robust version of the tests, we can conclude that the SLM specification is significant and fits the data well (SLM p-value = ) < 0.05. We have sufficient evidence to reject the SEM specification (SEM p-value = ) > 0.05.

However, the LM test is unable to compare the SLM model against the comprehensive SARAR(1,1) model and the SAR model. We will use the LR test to determine the best model.

2.2 Run LR to test alternative hypotheses

LR Tests ($H_0$: OLS)

$H_0$: OLS ; $H_1$: SLM

Conclusion: Reject $H_0$.

LR1.Sarlm(uc.slm)

## 
##  Likelihood Ratio diagnostics for spatial dependence
## 
## data:  
## Likelihood ratio = 42.681, df = 1, p-value = 0.00000000006443
## sample estimates:
## Log likelihood of spatial lag model         Log likelihood of OLS fit y 
##                           -239.8252                           -261.1658

uc.ols.slm.p = LR1.Sarlm(uc.slm)$p.value

$H_0$: OLS ; $H_1$: SARAR(1,1)

Conclusion: Reject $H_0$.

LR1.Sarlm(uc.sarar)

## 
##  Likelihood Ratio diagnostics for spatial dependence
## 
## data:  
## Likelihood ratio = 44.889, df = 2, p-value = 0.0000000001788
## sample estimates:
## Log likelihood of spatial lag model         Log likelihood of OLS fit y 
##                           -238.7213                           -261.1658

uc.ols.sarar.p = LR1.Sarlm(uc.sarar)$p.value

Comprehensive LR Tests

$H_0$: SEM ; $H_1$: SARAR(1,1)

Conclusion: Reject $H_0$.

LR.Sarlm(uc.sem,
         uc.sarar)

## 
##  Likelihood ratio for spatial linear models
## 
## data:  
## Likelihood ratio = -3.99, df = 1, p-value = 0.04577
## sample estimates:
##   Log likelihood of uc.sem Log likelihood of uc.sarar 
##                  -240.7163                  -238.7213

uc.sem.sarar.p = LR.Sarlm(uc.sem, uc.sarar)$p.value

$H_0$: SLM ; $H_1$: SARAR(1,1)

Conclusion: Do not reject $H_0$.

LR.Sarlm(uc.slm,
         uc.sarar)

## 
##  Likelihood ratio for spatial linear models
## 
## data:  
## Likelihood ratio = -2.2078, df = 1, p-value = 0.1373
## sample estimates:
##   Log likelihood of uc.slm Log likelihood of uc.sarar 
##                  -239.8252                  -238.7213

uc.slm.sarar.p = LR.Sarlm(uc.slm, uc.sarar)$p.value

$H_0$ : SAR ; $H_1$: SLM

Conclusion: Do not reject $H_0$.

LR.Sarlm(uc.sar,
         uc.slm)

## 
##  Likelihood ratio for spatial linear models
## 
## data:  
## Likelihood ratio = -2.3037, df = 1, p-value = 0.1291
## sample estimates:
## Log likelihood of uc.sar Log likelihood of uc.slm 
##                -240.9770                -239.8252

uc.sar.slm.p = LR.Sarlm(uc.sar, uc.slm)$p.value

Results

H0	H1	p-value	Decision at 5% significance level
OLS	SLM	0.0000000	Reject
OLS	SARAR(1,1)	0.0000000	Reject
SEM	SARAR(1,1)	0.0457715	Reject
SLM	SARAR(1,1)	0.1373141	Do not reject
SAR	SLM	0.1290696	Do not reject

Conclusion: Based on the tests, it seems that the pure SAR is the most suitable model for this dataset. Thus, the price of used cars are possibly closly related to the price of used cars in the neighbouring states that the amount of taxes and delivery charges of new cars of each state.

ECON6027 Cheatsheet

Gertrude Cheung

2023-11-09

Packages & libraries

Initialising libraries

Lecture 2b

Lecture 6a

1 Working with GAL files

1.1 Create a GAL file from scratch

1.2 Create a GAL file from nb and plot the nb object

1.3 Create GAL file from nb

2 Create neighbours

2.1 Create contiguity neighbours

2.2 “k” nearest neighbours

2.3 Neighbours within a certain metric distance

3 Create a weights list object

Lecture 6b

Preparation of data and creating nb object

1 Join Count Analysis

2 Moran’s I

3 Geary’s C statistic

4 Getis-Ord’s G statistic

5 Correlogram

Lecture 7a

Preparation of data

1 Choosing the best model

1.1 Setting up the models

1.2 Test for Goodness-of-Fit

1.3 Testing for Endogeneity between independent variables

1.4 Test for Sphericity of Disturbance (Homoskedasticity)

1.5 Test for Normality of Disturbances

2 Test for lagged variables

2.1 Test for lagged dependent variables

2.2 Test for lagged residual variables

Lecture 7b

1 Regression Modelling of Spatial Relationships

1.1 Determine the OLS fit

1.2 Run Moran’s test on OLS residuals and the dependent variable

1.3 Run Moran’s test on the dependent variable ‘P’

1.4 Run Spatial Regression Models

2 Decide on the best model

2.1 Run LM and RLM to test alternative hypotheses

2.2 Run LR to test alternative hypotheses