Spatial
install.packages(“sf”, dependencies = T)
install.packages(“sp”, dependencies = T)
install.packages(“spData”, dependencies = T)
install.packages(“spdep”, dependencies = T)
install.packages(“spatialreg”, dependencies = T)
install.packages(“tmap”, dependencies = T)
Non-spatial
install.packages(“tidyverse”, dependencies = T)
install.packages(“kableExtra”, dependencies = T)
Key points:
Key points:
The figure shoes the boundaries of the 8 Romanian NUTS2 regions. Create a GAL file and read neighbours from the file.
GAL file created:
Read neighbours from GAL file:
## Neighbour list object:
## Number of regions: 8
## Number of nonzero links: 26
## Percentage nonzero weights: 40.625
## Average number of links: 3.25
Create a GAL file from nb. Plot the nb object.
## Neighbour list object:
## Number of regions: 12
## Number of nonzero links: 42
## Percentage nonzero weights: 29.16667
## Average number of links: 3.5
## [1] "nb"
## Load the shapefile data that the nb object will be anchored on
(uk.boundaries = st_read("econ6027 cheatsheet/6a-1.2-shp/NUTS_Level_1__January_2018__Boundaries.shp"))## Reading layer `NUTS_Level_1__January_2018__Boundaries' from data source
## `/Users/gertrude/Documents/01 university/02 smu mse/04 classes/econ6027 | spatial econometrics and data analysis/econ6027 cheatsheet/6a-1.2-shp/NUTS_Level_1__January_2018__Boundaries.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 12 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid
## Simple feature collection with 12 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid
## First 10 features:
## objectid nuts118cd nuts118nm bng_e bng_n long lat
## 1 1 UKC North East (England) 417313 600358 -1.728900 55.29703
## 2 2 UKD North West (England) 350015 506280 -2.772370 54.44945
## 3 3 UKE Yorkshire and The Humber 446903 448736 -1.287120 53.93264
## 4 4 UKF East Midlands (England) 477660 322635 -0.849670 52.79572
## 5 5 UKG West Midlands (England) 386294 295477 -2.203580 52.55697
## 6 6 UKH East of England 571074 263229 0.504146 52.24067
## 7 7 UKI London 517516 178392 -0.308640 51.49227
## 8 8 UKJ South East (England) 470062 172924 -0.993110 51.45097
## 9 9 UKK South West (England) 285015 102567 -3.633430 50.81119
## 10 10 UKL Wales 263406 242881 -3.994160 52.06741
## st_areasha st_lengths geometry
## 1 8609938893 657578.2 MULTIPOLYGON (((397942.9 65...
## 2 14182609113 1063052.7 MULTIPOLYGON (((357136.2 58...
## 3 15432322860 863264.1 MULTIPOLYGON (((480000 5176...
## 4 15658181358 889656.3 MULTIPOLYGON (((516022.7 41...
## 5 13002149549 774627.1 MULTIPOLYGON (((409403 3657...
## 6 19155125591 1083693.3 MULTIPOLYGON (((599964.1 34...
## 7 1585475935 270855.1 MULTIPOLYGON (((537543.8 19...
## 8 19115903194 1454554.9 MULTIPOLYGON (((493941.1 25...
## 9 23979440241 1647103.7 MULTIPOLYGON (((416007.6 24...
## 10 20820253175 1552749.7 MULTIPOLYGON (((314837 3813...
## Return the points that represents a centroid of a geometry
uk.coords = st_centroid(st_geometry(uk.boundaries))
## Plot the (i) boundary; (ii) nb file; and (iii) centroid
plot(st_geometry(uk.boundaries),
main = "Plot nb object from GAL file, anchored onto shapefile",
border = "grey60",
axes = T)
plot(uk.nb, uk.coords,
pch = 20,
cex = 0.6,
add = T)Key points:
We will use the Singapore and UK planning areas in Sections 2 and 3 of this chapter. Load the datasets and assign a point that would lie within each polygon.
## Reading layer `MySingapura' from data source
## `/Users/gertrude/Documents/01 university/02 smu mse/04 classes/econ6027 | spatial econometrics and data analysis/econ6027 cheatsheet/6a-1.3-shp/MySingapura.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
## Simple feature collection with 55 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
## First 10 features:
## OBJECTID PLN_AREA_N PLN_AREA_C CA_IND REGION_N REGION_C
## 1 1 BISHAN BS N CENTRAL REGION CR
## 2 2 BUKIT BATOK BK N WEST REGION WR
## 3 3 BUKIT MERAH BM N CENTRAL REGION CR
## 4 4 BUKIT PANJANG BP N WEST REGION WR
## 5 5 BUKIT TIMAH BT N CENTRAL REGION CR
## 6 6 CENTRAL WATER CATCHMENT CC N NORTH REGION NR
## 7 7 CHANGI CH N EAST REGION ER
## 8 8 CHANGI BAY CB N EAST REGION ER
## 9 9 CHOA CHU KANG CK N WEST REGION WR
## 10 10 CLEMENTI CL N WEST REGION WR
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR SHAPE_Leng SHAPE_Area
## 1 BA616285F402846F 2014-12-05 28789.76 37450.89 13517.12 7618921
## 2 FB44C870B04B7F57 2014-12-05 19255.42 37527.65 15234.22 11133256
## 3 738B479882E4EE28 2014-12-05 26865.78 28662.87 29156.29 14462472
## 4 4A9C6E6BAF7BE998 2014-12-05 21287.04 38761.84 15891.85 9019940
## 5 C893AEAD20F42559 2014-12-05 23256.76 34689.00 22492.84 17526654
## 6 52D0068508B0348A 2014-12-05 24424.42 39849.05 30538.25 37147854
## 7 70EAFFC7C76EA5AD 2014-12-05 46307.48 36991.43 31342.12 40940490
## 8 ACFD43D88C72381C 2014-12-05 49502.49 34316.54 18731.56 1829822
## 9 F361929FAF5E9611 2014-12-05 18415.05 40833.42 11518.81 6117294
## 10 8A5BEDB748EAA0AA 2014-12-05 19923.36 33319.17 15024.87 9516228
## geometry
## 1 MULTIPOLYGON (((29772.19 38...
## 2 MULTIPOLYGON (((20294.46 39...
## 3 MULTIPOLYGON (((26228.63 30...
## 4 MULTIPOLYGON (((21448.72 41...
## 5 MULTIPOLYGON (((24031.39 36...
## 6 MULTIPOLYGON (((25073.29 43...
## 7 MULTIPOLYGON (((44586.3 417...
## 8 MULTIPOLYGON (((48860.11 34...
## 9 MULTIPOLYGON (((18349.06 43...
## 10 MULTIPOLYGON (((19680.06 31...
sg.coords = st_point_on_surface(st_geometry(sg_pl))
## UK
(uk_pl = st_read("econ6027 cheatsheet/6a-1.2-shp/NUTS_Level_1__January_2018__Boundaries.shp"))## Reading layer `NUTS_Level_1__January_2018__Boundaries' from data source
## `/Users/gertrude/Documents/01 university/02 smu mse/04 classes/econ6027 | spatial econometrics and data analysis/econ6027 cheatsheet/6a-1.2-shp/NUTS_Level_1__January_2018__Boundaries.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 12 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid
## Simple feature collection with 12 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid
## First 10 features:
## objectid nuts118cd nuts118nm bng_e bng_n long lat
## 1 1 UKC North East (England) 417313 600358 -1.728900 55.29703
## 2 2 UKD North West (England) 350015 506280 -2.772370 54.44945
## 3 3 UKE Yorkshire and The Humber 446903 448736 -1.287120 53.93264
## 4 4 UKF East Midlands (England) 477660 322635 -0.849670 52.79572
## 5 5 UKG West Midlands (England) 386294 295477 -2.203580 52.55697
## 6 6 UKH East of England 571074 263229 0.504146 52.24067
## 7 7 UKI London 517516 178392 -0.308640 51.49227
## 8 8 UKJ South East (England) 470062 172924 -0.993110 51.45097
## 9 9 UKK South West (England) 285015 102567 -3.633430 50.81119
## 10 10 UKL Wales 263406 242881 -3.994160 52.06741
## st_areasha st_lengths geometry
## 1 8609938893 657578.2 MULTIPOLYGON (((397942.9 65...
## 2 14182609113 1063052.7 MULTIPOLYGON (((357136.2 58...
## 3 15432322860 863264.1 MULTIPOLYGON (((480000 5176...
## 4 15658181358 889656.3 MULTIPOLYGON (((516022.7 41...
## 5 13002149549 774627.1 MULTIPOLYGON (((409403 3657...
## 6 19155125591 1083693.3 MULTIPOLYGON (((599964.1 34...
## 7 1585475935 270855.1 MULTIPOLYGON (((537543.8 19...
## 8 19115903194 1454554.9 MULTIPOLYGON (((493941.1 25...
## 9 23979440241 1647103.7 MULTIPOLYGON (((416007.6 24...
## 10 20820253175 1552749.7 MULTIPOLYGON (((314837 3813...
Singapore
## Neighbour list object:
## Number of regions: 55
## Number of nonzero links: 258
## Percentage nonzero weights: 8.528926
## Average number of links: 4.690909
## 1 region with no links:
## 38
## Neighbour list object:
## Number of regions: 55
## Number of nonzero links: 258
## Percentage nonzero weights: 8.528926
## Average number of links: 4.690909
## 1 region with no links:
## 38
par(mfrow = c(1, 2))
plot(st_geometry(sg_pl),
main = "Queen Contiguity (SG)",
border = "grey60",
axes = T)
plot(sg.nb.qn, sg.coords,
pch = 19,
cex = 0.6,
add = T)
plot(st_geometry(sg_pl),
main = "Rook Contiguity (SG)",
border = "grey60",
axes = T)
plot(sg.nb.rk, sg.coords,
pch = 19,
cex = 0.6,
add = T)## [1] TRUE
Conclusion: For Singapore’s planning areas, queen contiguity and rook contiguity produces the same results.
United Kingdom
par(mfrow = c(1, 2))
plot(st_geometry(uk_pl),
main = "Queen Contiguity (UK)",
border = "grey60",
axes = T)
plot(uk.nb.qn, uk.coords,
pch = 19,
cex = 0.6,
add = T)
plot(st_geometry(uk_pl),
main = "Rook Contiguity (UK)",
border = "grey60",
axes = T)
plot(uk.nb.rk, uk.coords,
pch = 19,
cex = 0.6,
add = T)## [1] TRUE
Conclusion: For the United Kingdom’s planning areas, queen contiguity and rook contiguity produces the same results.
Singapore
IDs = row.names(sg_pl)
sg.nb.k.1 = knn2nb(knearneigh(sg.coords,
k = 1),
row.names = IDs)
sg.nb.k.2 = knn2nb(knearneigh(sg.coords,
k = 2),
row.names = IDs)
sg.nb.k.3 = knn2nb(knearneigh(sg.coords,
k = 3),
row.names = IDs)par(mfrow = c(1, 3))
plot(st_geometry(sg_pl),
main = "k = 1 (SG)",
border = "grey60",
axes = T)
plot(sg.nb.k.1, sg.coords,
pch = 19,
cex = 0.6,
add = T)
plot(st_geometry(sg_pl),
main = "k = 2 (SG)",
border = "grey60",
axes = T)
plot(sg.nb.k.2, sg.coords,
pch = 19,
cex = 0.6,
add = T)
plot(st_geometry(sg_pl),
main = "k = 3 (SG)",
border = "grey60",
axes = T)
plot(sg.nb.k.3, sg.coords,
pch = 19,
cex = 0.6,
add = T) United Kingdom
IDs = row.names(uk_pl)
uk.nb.k.1 = knn2nb(knearneigh(uk.coords,
k = 1),
row.names = IDs)
uk.nb.k.2 = knn2nb(knearneigh(uk.coords,
k = 2),
row.names = IDs)
uk.nb.k.3 = knn2nb(knearneigh(uk.coords,
k = 3),
row.names = IDs)par(mfrow = c(1, 3))
plot(st_geometry(uk_pl),
main = "k = 1 (UK)",
border = "grey60",
axes = T)
plot(uk.nb.k.1, uk.coords,
pch = 19,
cex = 0.6,
add = T)
plot(st_geometry(uk_pl),
main = "k = 2 (UK)",
border = "grey60",
axes = T)
plot(uk.nb.k.2, uk.coords,
pch = 19,
cex = 0.6,
add = T)
plot(st_geometry(uk_pl),
main = "k = 3 (UK)",
border = "grey60",
axes = T)
plot(uk.nb.k.3, uk.coords,
pch = 19,
cex = 0.6,
add = T) Singapore
## [1] "m"
sg.nb.d.1 = dnearneigh(sg.coords,
d1 = 0,
d2 = 1000,
row.names = IDs)
sg.nb.d.2 = dnearneigh(sg.coords,
d1 = 0,
d2 = 5000,
row.names = IDs)
sg.nb.d.3 = dnearneigh(sg.coords,
d1 = 0,
d2 = 10000,
row.names = IDs)par(mfrow = c(1, 3))
plot(st_geometry(sg_pl),
main = "Within 1km (SG)",
border = "grey60",
axes = T)
plot(sg.nb.d.1, sg.coords,
pch = 19,
cex = 0.6,
add = T)
plot(st_geometry(sg_pl),
main = "Within 5km (SG)",
border = "grey60",
axes = T)
plot(sg.nb.d.2, sg.coords,
pch = 19,
cex = 0.6,
add = T)
plot(st_geometry(sg_pl),
main = "Within 10km (SG)",
border = "grey60",
axes = T)
plot(sg.nb.d.3, sg.coords,
pch = 19,
cex = 0.6,
add = T)United Kingdom
## [1] "m"
uk.nb.d.1 = dnearneigh(uk.coords,
d1 = 0,
d2 = 100000,
row.names = IDs)
uk.nb.d.2 = dnearneigh(uk.coords,
d1 = 0,
d2 = 300000,
row.names = IDs)
uk.nb.d.3 = dnearneigh(uk.coords,
d1 = 0,
d2 = 500000,
row.names = IDs)par(mfrow = c(1, 3))
plot(st_geometry(uk_pl),
main = "Within 100km (UK)",
border = "grey60",
axes = T)
plot(uk.nb.d.1, uk.coords,
pch = 19,
cex = 0.6,
add = T)
plot(st_geometry(uk_pl),
main = "Within 300km (UK)",
border = "grey60",
axes = T)
plot(uk.nb.d.2, uk.coords,
pch = 19,
cex = 0.6,
add = T)
plot(st_geometry(uk_pl),
main = "Within 500km (UK)",
border = "grey60",
axes = T)
plot(uk.nb.d.3, uk.coords,
pch = 19,
cex = 0.6,
add = T)Key points:
| Style options | Description |
|---|---|
| B | Basic binary coding |
| W | Row normalised (sums over all row links to n) |
| C | Globally standardised (sums all links to n) |
| U | Divided by the number of neighbours (sums over all links to unity) |
United Kingdom
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 12
## Number of nonzero links: 42
## Percentage nonzero weights: 29.16667
## Average number of links: 3.5
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 12 144 12 7.751111 50.16889
## [1] "style" "neighbours" "weights"
## [1] "names" "class" "region.id" "call" "GeoDa"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1667 0.2000 0.2667 0.2857 0.3333 1.0000
Singapore
An error is returned when zero.policy was set to ‘F’, as there are empty neighbour sets found under queen contiguity.
# (sg.qn.lw = nb2listw(neighbours = sg.nb.qn,
# style = "W",
# zero.policy = T))
print(sg.nb.qn,
zero.policy = T)## Neighbour list object:
## Number of regions: 55
## Number of nonzero links: 258
## Percentage nonzero weights: 8.528926
## Average number of links: 4.690909
## 1 region with no links:
## 38
We will use the ‘eire’ dataset in the package ‘spData’ in this chapter. Carry out the usual inspections: (1) Load the dataset, and check if all entries are valid and projected; (2) create the nb object using queen contiguity; (3) identify the coordinates of a point that lies in the polygon; (4) plot the base shape; and (5) create the weights list based on binary style and row normalised style.
The ‘eire’ dataset gives the counties of the Irish Republic. Description of ‘eire’ dataset:
| Variable | Description |
|---|---|
| A | Percentage of sample with blood gorup A |
| towns | Towns/unit area |
| pale | Beyond the Pale 0, within the Pale 1 |
| size | Number of blood type samples |
| ROADACC | Arterial road network accessibility in 1961 |
| OWNCONS | Percentage in value terms of gross agricultural output of each country consumed by itself |
| POPCHG | 1961 population as percentage of 1926 |
| RETSALE | Value of retail sales British Pound000 |
| INCOME | Total personal income British Pound000 |
| names | County names |
Since there is no CRS attached to the dataset, we will need to attach a CRS and ensure that the data is projected. We will used “+proj=UTM”, where we derive the “+zone” using coordinates: latitude – 53.7798N; longitude – 7.3055W. \(UTM Zone = 31 -\frac{7.3055}{6} = 29.78242N\) \(\Rightarrow\) “+zone=29”.
## Reading layer `eire' from data source
## `/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/spData/shapes/eire.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 26 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -4.12 ymin: 5768 xmax: 300.82 ymax: 6119.25
## CRS: NA
## Simple feature collection with 26 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -4.12 ymin: 5768 xmax: 300.82 ymax: 6119.25
## CRS: NA
## First 10 features:
## A towns pale size ROADACC OWNCONS POPCHG RETSALE INCOME names
## 1 34.20 0.12 1 1087 3664 8.6 97 2962 7185 Carlow
## 2 29.68 0.01 0 2133 5000 15.0 69 4452 9459 Cavan
## 3 26.54 0.01 0 535 4321 19.0 78 3460 12435 Clare
## 4 23.92 0.03 0 1476 4118 9.0 90 28402 65901 Cork
## 5 27.91 0.03 0 989 7500 27.0 75 7478 17626 Donegal
## 6 32.79 0.61 1 18105 3078 9.4 142 89424 164631 Dublin
## 7 26.53 0.02 0 2254 4537 21.9 88 8972 26950 Galway
## 8 25.14 0.01 0 740 5140 17.0 78 6341 20510 Kerry
## 9 30.69 0.39 1 1437 3200 9.0 111 4803 14703 Kildare
## 10 31.41 0.18 1 1885 3708 8.0 87 4321 13585 Kilkenny
## geometry
## 1 MULTIPOLYGON (((240.62 5885...
## 2 MULTIPOLYGON (((192.16 5986...
## 3 MULTIPOLYGON (((145.23 5901...
## 4 MULTIPOLYGON (((83.48 5845....
## 5 MULTIPOLYGON (((150.41 6040...
## 6 MULTIPOLYGON (((270.77 5929...
## 7 MULTIPOLYGON (((51.39 5970....
## 8 MULTIPOLYGON (((74.06 5871....
## 9 MULTIPOLYGON (((229.67 5932...
## 10 MULTIPOLYGON (((189.96 5878...
## Simple feature collection with 26 features and 10 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: -4.12 ymin: 5768 xmax: 300.82 ymax: 6119.25
## CRS: NA
## First 10 features:
## A towns pale size ROADACC OWNCONS POPCHG RETSALE INCOME names
## 1 34.20 0.12 1 1087 3664 8.6 97 2962 7185 Carlow
## 2 29.68 0.01 0 2133 5000 15.0 69 4452 9459 Cavan
## 3 26.54 0.01 0 535 4321 19.0 78 3460 12435 Clare
## 4 23.92 0.03 0 1476 4118 9.0 90 28402 65901 Cork
## 5 27.91 0.03 0 989 7500 27.0 75 7478 17626 Donegal
## 6 32.79 0.61 1 18105 3078 9.4 142 89424 164631 Dublin
## 7 26.53 0.02 0 2254 4537 21.9 88 8972 26950 Galway
## 8 25.14 0.01 0 740 5140 17.0 78 6341 20510 Kerry
## 9 30.69 0.39 1 1437 3200 9.0 111 4803 14703 Kildare
## 10 31.41 0.18 1 1885 3708 8.0 87 4321 13585 Kilkenny
## geometry
## 1 POLYGON ((240.62 5885.61, 2...
## 2 POLYGON ((192.16 5986.94, 1...
## 3 POLYGON ((145.23 5901.12, 1...
## 4 POLYGON ((83.48 5845.19, 90...
## 5 POLYGON ((150.41 6040.45, 1...
## 6 POLYGON ((270.77 5929.79, 2...
## 7 POLYGON ((51.39 5970.01, 62...
## 8 POLYGON ((74.06 5871.1, 76....
## 9 POLYGON ((229.67 5932.56, 2...
## 10 POLYGON ((189.96 5878.34, 1...
##
## TRUE
## 26
## Coordinate Reference System: NA
eire = st_set_crs(x = eire,
"+proj=utm +zone=29 +ellps=airy +units=km")
st_crs(eire); st_crs(eire)$proj; st_crs(eire)$proj4string## Coordinate Reference System:
## User input: +proj=utm +zone=29 +ellps=airy +units=km
## wkt:
## PROJCRS["unknown",
## BASEGEOGCRS["unknown",
## DATUM["Unknown based on Airy 1830 ellipsoid",
## ELLIPSOID["Airy 1830",6377563.396,299.3249646,
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8901]]],
## CONVERSION["UTM zone 29N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-9,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]],
## ID["EPSG",16029]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["kilometre",1000,
## ID["EPSG",9036]]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["kilometre",1000,
## ID["EPSG",9036]]]]
## [1] "utm"
## [1] "+proj=utm +zone=29 +ellps=airy +units=km +no_defs"
## Neighbour list object:
## Number of regions: 26
## Number of nonzero links: 114
## Percentage nonzero weights: 16.86391
## Average number of links: 4.384615
plot(st_geometry(eire),
main = "eire dataset - Neighbours created from Queen Contiguity",
border = "grey60",
axes = T)
plot(eire.nb.qn, eire.coords,
pch = 19,
cex = 0.6,
add = T)## List weights based on binary style ("B")
(eire.lw.b = nb2listw(neighbours = eire.nb,
style = "B"))## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 26
## Number of nonzero links: 114
## Percentage nonzero weights: 16.86391
## Average number of links: 4.384615
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 26 676 114 228 2288
## List weights based on row normalised style ("B")
(eire.lw.w = nb2listw(neighbours = eire.nb,
style = "W"))## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 26
## Number of nonzero links: 114
## Percentage nonzero weights: 16.86391
## Average number of links: 4.384615
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 26 676 26 13.2227 108.8578
## [1] 1 0 0 0 0 1 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1
## [1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE
## [13] FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
## [25] TRUE TRUE
## Levels: FALSE TRUE
## FALSE TRUE
## 14 12
## Joincount Expected Variance z-value
## FALSE:FALSE 18.0000 15.9600 8.4175 0.7031
## TRUE:TRUE 18.0000 11.5754 6.8634 2.4523
## TRUE:FALSE 21.0000 29.4646 11.9130 -2.4524
## Jtot 21.0000 29.4646 11.9130 -2.4524
Conclusion: The observed BW count (21.000) is significantly less than the expected BW count (29.4646). Sufficient evidence to suggest significant clustering at 5% level of significance. Reject \(H_{0}\) at 5% level of significance.
Note: - At 10% significance level, a Z-score of <-1.65 or >+1.65 is considered significant - At 5% significance level, a Z-score of <-1.96 or >+1.96 is considered significant - At 1% significance level, a Z-score of <-2.58 or >+2.58 is considered significant
Key points:
tm_shape(eire) +
tm_polygons(col = "A",
palette = "YlGnBu",
alpha = 0.5) +
tm_layout(legend.outside = T)moran.plot(eire$A,
listw = eire.lw.w,
main = "Moran's Scatterplot for variable A",
labels = F,
xlab = "A",
ylab = "Spatially Lagged A")From the plot, we can visually confirm that variable A exhibits positive spatial autocorrelation.
##
## Moran I test under randomisation
##
## data: eire$A
## weights: eire.lw.w
##
## Moran I statistic standard deviate = 4.6851, p-value = 1.399e-06
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.55412382 -0.04000000 0.01608138
g.moran.p = moran.test(x = eire$A,
listw = eire.lw.w)$p.value
g.moran.i = moran.test(x = eire$A,
listw = eire.lw.w)$estimate[1]
g.moran.e.i = moran.test(x = eire$A,
listw = eire.lw.w)$estimate[2]Conclusion: Since 1.3993826^{-6} < 0.05 and 0.5541238 > -0.04. Reject the null hypothesis of no spatial autocorrelation in favour of positive spatial autocorrelation 5% significance level. There seem to be spatial patterning in the incidence of group A type blood at the county level.
Local Moran’s I
## Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
## Carlow 1.636481921 -0.1078854679 0.4170668396 2.7010656 0.006911771
## Cavan -0.005625461 -0.0001168927 0.0005064758 -0.2447707 0.806633993
## Clare 0.438557030 -0.0440391047 0.3344584256 0.8344739 0.404013977
## Cork 0.596073603 -0.1552187377 0.7457784490 0.8699695 0.384317051
## Donegal 0.136917303 -0.0128996974 0.3310656763 0.2603778 0.794572336
## Dublin 1.109745739 -0.0526106276 0.3959729542 1.8471686 0.064722709
tm_shape(eire) +
tm_polygons(col = "l.moran.z",
palette = "YlGnBu",
alpha = 0.5) +
tm_layout(main.title = "Local Moran's I - Z-test",
legend.outside = T)## [1] 0.5541238
Show that Moran’s I statistic is highly sensitive to the choice of spatial weights matrix by comparing the results of a row normalised weights list and the binary weights list.
##
## Moran I test under randomisation
##
## data: eire$A
## weights: eire.lw.w
##
## Moran I statistic standard deviate = 4.6851, p-value = 1.399e-06
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.55412382 -0.04000000 0.01608138
##
## Moran I test under randomisation
##
## data: eire$A
## weights: eire.lw.b
##
## Moran I statistic standard deviate = 4.4684, p-value = 3.94e-06
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.47944757 -0.04000000 0.01351367
The computed Moran’s I changes when the weight matrix is changed from row standardised to binary.
Key points:
tm_shape(eire) +
tm_polygons(col = "A",
palette = "YlGnBu",
alpha = 0.5) +
tm_layout(legend.outside = T,)##
## Geary C test under randomisation
##
## data: eire$A
## weights: eire.lw.w
##
## Geary C statistic standard deviate = 4.5146, p-value = 3.172e-06
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic Expectation Variance
## 0.38011971 1.00000000 0.01885309
g.geary.p = geary.test(eire$A,
listw = eire.lw.w)$p.value
g.geary.c = geary.test(eire$A,
listw = eire.lw.w)$estimate[1]Conclusion: Since 3.1722482^{-6} < 0.05, reject the null hypothesis of no spatial autocorrelation (\(C = 1)\) at 5% significance level. There seem to be spatial patterning in the incidence of group A type blood at the county level.
Key points:
tm_shape(eire) +
tm_polygons(col = "A",
palette = "YlGnBu",
alpha = 0.5) +
tm_layout(legend.outside = T,)##
## Getis-Ord global G statistic
##
## data: eire$A
## weights: eire.lw.b
##
## standard deviate = 1.3199, p-value = 0.09344
## alternative hypothesis: greater
## sample estimates:
## Global G statistic Expectation Variance
## 1.787800e-01 1.753846e-01 6.617796e-06
Conclusion: Since 0.0934365 > 0.05, do not reject the null hypothesis at 5% significance level.
Local Getis-Ord’s G
tm_shape(eire) +
tm_polygons(col = "l.getisord.z",
alpha = 0.5) +
tm_layout(main.title = "Local Getis-Ord's G - Z-test",
legend.outside = T)## Variable(s) "l.getisord.z" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
correlogram.corr = sp.correlogram(neighbours = eire.nb.qn,
var = eire$A,
order = 3,
method = "corr",
style = "W")
correlogram.i = sp.correlogram(neighbours = eire.nb.qn,
var = eire$A,
order = 3,
method = "I",
style = "W")
correlogram.c = sp.correlogram(neighbours = eire.nb.qn,
var = eire$A,
order = 3,
method = "C",
style = "W")
par(mfrow = c(1, 3))
plot(correlogram.corr,
main = "Contiguity Lag Orders\nCorrelation")
plot(correlogram.i,
main = "Contiguity Lag Orders\nMoran's I")
plot(correlogram.c,
main = "Contiguity Lag Orders\nGeary's C")Conclusion: From the correlogram using “Correlation” method, a lag order of 2 might be more suitable. From the correlogram using “Moran’s I” method, a lag order of 1 might be more suitable. From the correlogram using “Geary’s C” method, a lag order of 2 might be more suitable.
We will use the ‘uk’ dataset from Lecture 6. Carry out the usual inspections: (1) Load the dataset, and check if all entries are valid and projected; (2) load the gal file and convert it into weights list; (3) identify the coordinates of a point that lies in the polygon; (4) plot the base shape; and (5) create the weights list based on binary style and row normalised style.
## Reading layer `uk_data' from data source
## `/Users/gertrude/Documents/01 university/02 smu mse/04 classes/econ6027 | spatial econometrics and data analysis/econ6027 cheatsheet/uk_data/uk_data.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 12 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid
## Simple feature collection with 12 features and 12 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: -70.2116 ymin: 7054.2 xmax: 655604.7 ymax: 1220302
## Projected CRS: OSGB36 / British National Grid
## First 10 features:
## objectd nts118c nts118n bng_e bng_n long lat
## 1 1 UKC North East (England) 417313 600358 -1.728900 55.29703
## 2 2 UKD North West (England) 350015 506280 -2.772370 54.44945
## 3 3 UKE Yorkshire and The Humber 446903 448736 -1.287120 53.93264
## 4 4 UKF East Midlands (England) 477660 322635 -0.849670 52.79572
## 5 5 UKG West Midlands (England) 386294 295477 -2.203580 52.55697
## 6 6 UKH East of England 571074 263229 0.504146 52.24067
## 7 7 UKI London 517516 178392 -0.308640 51.49227
## 8 8 UKJ South East (England) 470062 172924 -0.993110 51.45097
## 9 9 UKK South West (England) 285015 102567 -3.633430 50.81119
## 10 10 UKL Wales 263406 242881 -3.994160 52.06741
## st_arsh st_lngt gross_v L_prod pct_bsb geometry
## 1 8609938893 657578.2 3.2 86.2 11.2 MULTIPOLYGON (((4e+05 65416...
## 2 14182609113 1063052.7 9.5 88.6 11.1 MULTIPOLYGON (((356937.8 58...
## 3 15432322860 863264.1 6.9 84.7 10.5 POLYGON ((480000 517670.2, ...
## 4 15658181358 889656.3 6.2 89.2 10.3 POLYGON ((516022.7 412210.9...
## 5 13002149549 774627.1 7.3 89.1 10.5 POLYGON ((409403 365710.8, ...
## 6 19155125591 1083693.3 8.7 96.8 10.5 MULTIPOLYGON (((595955.6 19...
## 7 1585475935 270855.1 21.6 139.7 14.6 POLYGON ((537543.8 199883.1...
## 8 19115903194 1454554.9 14.7 108.3 10.8 MULTIPOLYGON (((451299.1 96...
## 9 23979440241 1647103.7 7.7 89.8 9.6 MULTIPOLYGON (((89276.33 82...
## 10 20820253175 1552749.7 3.6 81.5 9.3 MULTIPOLYGON (((212996.6 19...
##
## TRUE
## 12
## Neighbour list object:
## Number of regions: 12
## Number of nonzero links: 42
## Percentage nonzero weights: 29.16667
## Average number of links: 3.5
## [1] "nb"
| Variable | Column | Description |
|---|---|---|
| GVA | gross_v | Gross Value Added |
| LP | L_prod | Labour Productivity |
| PBB | pct_bsb | Percentage of new business births |
Model 1: \(GVA_i = \beta_0 + \beta_1 LP_i +\beta_2 PBB_i + \epsilon_i\)
##
## Call:
## lm(formula = gross_v ~ L_prod + pct_bsb, data = uk.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1398 -0.9172 -0.4388 1.0958 2.5365
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -22.31118 3.38594 -6.589 0.000100 ***
## L_prod 0.27750 0.05346 5.191 0.000571 ***
## pct_bsb 0.42239 0.47243 0.894 0.394567
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.791 on 9 degrees of freedom
## Multiple R-squared: 0.9072, Adjusted R-squared: 0.8866
## F-statistic: 43.99 on 2 and 9 DF, p-value: 0.00002259
mdl1.p = pf(q = summary(mdl1)$fstatistic[1],
df1 = summary(mdl1)$fstatistic[2],
df2 = summary(mdl1)$fstatistic[3],
lower.tail = F)
mdl1.r2 = summary(mdl1)$r.squared
mdl1.r2a = summary(mdl1)$adj.r.squared
mdl1.lp.p = summary(mdl1)$coefficients[2, 4]
mdl1.pbb.p = summary(mdl1)$coefficients[3, 4]Conclusion: A p-value of 0.0000226 < 0.05, signifies that at the 5% confidence level, the results from Model 1 is significant. From the p-values of the independent variables, it suggest that ‘Labour Productivity’ is statistically significant at the 5% significance level (p-value = 0.0005708 < 0.05) but ‘Percentage of new business births’ is not statistically significant significance level (p-value = 0.3945672 > 0.05).
Model 2: \(GVA_i = \beta_0 + \beta_1 LP_i + \epsilon_i\)
##
## Call:
## lm(formula = gross_v ~ L_prod, data = uk.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5300 -0.8375 -0.4193 1.0386 3.0149
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -21.38866 3.19237 -6.700 0.00005365 ***
## L_prod 0.31460 0.03335 9.432 0.00000271 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.773 on 10 degrees of freedom
## Multiple R-squared: 0.899, Adjusted R-squared: 0.8889
## F-statistic: 88.97 on 1 and 10 DF, p-value: 0.000002708
mdl2.p = pf(summary(mdl2)$fstatistic[1],
summary(mdl2)$fstatistic[2],
summary(mdl2)$fstatistic[3],
lower.tail = F)
mdl2.r2 = summary(mdl2)$r.squared
mdl2.r2a = summary(mdl2)$adj.r.squared
mdl2.lp.p = summary(mdl2)$coefficients[2, 4]Conclusion: A p-value of 0.0000027 < 0.05, signifies that at the 5% confidence level, the results from Model 2 is significant. From the p-values of the independent variables, it suggest that ‘Labour Productivity’ is statistically significant at the 5% significance level (p-value = 0.0000027 < 0.05).
Model 3: \(GVA_i = \beta_0 + \beta_1 PBB_i + \epsilon_i\)
##
## Call:
## lm(formula = gross_v ~ pct_bsb, data = uk.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.8006 -1.5308 -0.6353 1.8746 5.6300
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -16.0547 5.9992 -2.676 0.02325 *
## pct_bsb 2.3264 0.5646 4.121 0.00208 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.397 on 10 degrees of freedom
## Multiple R-squared: 0.6293, Adjusted R-squared: 0.5923
## F-statistic: 16.98 on 1 and 10 DF, p-value: 0.002075
mdl3.p = pf(summary(mdl3)$fstatistic[1],
summary(mdl3)$fstatistic[2],
summary(mdl3)$fstatistic[3],
lower.tail = F)
mdl3.r2 = summary(mdl3)$r.squared
mdl3.r2a = summary(mdl3)$adj.r.squared
mdl3.pbb.p = summary(mdl3)$coefficient[2, 4]Conclusion: A p-value of 0.002075 < 0.05, signifies that at the 5% confidence level, the results from Model 3 is significant. From the p-values of the independent variables, it suggest that ‘Percentage of new business births’ is statistically significant at the 5% significance level (p-value = 0.002075 < 0.05). However, given the ‘Percentage of new business births’ was not statistically significant, it’s significance in this model could be a result of omitted variable bias.
Akaike Information Criteria (AIC):
Schwartz (or Bayesian) Information Criteria (BIC):
Summary table of \(R^2\), Adj. \(R^2\), AIC and BIC:
| Model | R-squared | Adj.R-squared | AIC | BIC |
|---|---|---|---|---|
| Model 1 | 0.9072016 | 0.8865798 | 52.59553 | 54.53516 |
| Model 2 | 0.8989596 | 0.8888555 | 51.61663 | 53.07135 |
| Model 3 | 0.6293493 | 0.5922842 | 67.21350 | 68.66822 |
Conclusion: From the summary table, Model 2 is a better model for the dataset. It has the highest Adjusted \(R^2\), lowest AIC and BIC compared to the other 2 models.
##
## Call:
## lm(formula = L_prod ~ pct_bsb, data = uk.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.192 -6.589 -2.225 4.571 16.980
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.546 18.718 1.205 0.25613
## pct_bsb 6.861 1.761 3.895 0.00298 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.6 on 10 degrees of freedom
## Multiple R-squared: 0.6027, Adjusted R-squared: 0.563
## F-statistic: 15.17 on 1 and 10 DF, p-value: 0.002984
mdl4.p = pf(q = summary(mdl4)$fstatistic[1],
df1 = summary(mdl4)$fstatistic[2],
df2 = summary(mdl4)$fstatistic[3],
lower.tail = F)
mdl4.pbb.p = summary(mdl4)$estimate[2, 4]Conclusion: Given the significance of Model 4 (p-value = 0.0029845 < 0.05), it suggests that there may be endogeneity in Model 1.
Breusch-Pagan Test
Model 1
##
## studentized Breusch-Pagan test
##
## data: mdl1
## BP = 1.5183, df = 2, p-value = 0.4681
Conclusion: Since p-value = 0.4680731 > 0.05, we do not reject the null of homoskedasticity at the 5% significance level, which suggests that there is no evidence of heteroskedasticity.
Model 2
##
## studentized Breusch-Pagan test
##
## data: mdl2
## BP = 0.52669, df = 1, p-value = 0.468
Conclusion: Since p-value = 0.4680039 > 0.05, we do not reject the null of homoskedasticity at the 5% significance level, which suggests that there is no evidence of heteroskedasticity.
Model 3
##
## studentized Breusch-Pagan test
##
## data: mdl3
## BP = 0.36682, df = 1, p-value = 0.5447
Conclusion: Since p-value = 0.5447424 > 0.05, we do not reject the null of homoskedasticity at the 5% significance level, which suggests that that there is no evidence of heteroskedasticity.
Jarque-Bera Test
Model 2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Jarque Bera Test
##
## data: mdl1$residuals
## X-squared = 0.091982, df = 2, p-value = 0.9551
mdl1.jb.p = tseries::jarque.bera.test(mdl1$residuals)$p.value
mdl1.jb.x = tseries::jarque.bera.test(mdl1$residuals)$statisticConclusion: Since p-value = 0.9550507 > 0.05 and the Jarque-Bera test statistic is close to 0 (\(JB\) =0.09), there is insufficient evidence to reject the null of normality at the 5% significance level, which suggests that the data follows a normal distribution.
Model 2
##
## Jarque Bera Test
##
## data: mdl2$residuals
## X-squared = 0.37587, df = 2, p-value = 0.8287
mdl2.jb.p = tseries::jarque.bera.test(mdl2$residuals)$p.value
mdl2.jb.x = tseries::jarque.bera.test(mdl2$residuals)$statisticConclusion: Since p-value = 0.828668 > 0.05 and the Jarque-Bera test statistic is close to 0 (\(JB\) = 0.38), there is insufficient evidence to reject the null of normality at the 5% significance level, which suggests that the data follows a normal distribution.
Model 3
##
## Jarque Bera Test
##
## data: mdl3$residuals
## X-squared = 0.079398, df = 2, p-value = 0.9611
mdl3.jb.p = tseries::jarque.bera.test(mdl3$residuals)$p.value
mdl3.jb.x = tseries::jarque.bera.test(mdl3$residuals)$statisticConclusion: Since p-value = 0.9610785 > 0.05 and the Jarque-Bera test statistic is small (\(JB\) = 0.08), there is insufficient evidence to reject the null of normality at the 5% significance level, which suggests that the data follows a normal distribution.
## [,1]
## [1,] 8.233333
## [2,] 5.916667
## [3,] 6.300000
## [4,] 9.420000
## [5,] 8.340000
## [6,] 14.166667
## [7,] 11.700000
## [8,] 10.300000
## [9,] 8.533333
## [10,] 8.166667
## [11,] 5.000000
## [12,] 8.300000
moran.plot(x = uk.data$gross_v,
listw = uk.lw,
main = "Moran's Scatter Plot - GVA",
labels = F,
xlab = "GVA",
ylab = "Spatially Lagged GVA")From Moran’s scatter plot, there seems to be positive spatial autocorrelation in the dependent variable ‘Gross Value Added’ (GVA) with most points in the High-High and Low-Low quadrants.
moran_test.gva = moran.test(x = uk.data$gross_v,
listw = uk.lw)
moran.test.gva.p = moran_test.gva$p.valueConclusion: Since p-value = 0.0505258 > 0.05, we do not reject the null hypothesis of spatial randomness in favour of of spatial autocorrelation. There is no evidence of spatial autocorrelation in the dependent variable ‘GVA’ at the 5% significance level.
<span style = “color:steelblue>Model 1<
mdl1.moran.test = lm.morantest(model = mdl1,
listw = uk.lw)
mdl1.moran.test.p = mdl1.moran.test$p.value
mdl1.moran.test.i = mdl1.moran.test$estimate[1]
mdl1.moran.test.e = mdl1.moran.test$estimate[2]Conclusion: Since p-value = 0.5683374 > 0.05, there is insufficient evidence to reject the null of spatial randomness in favour of positive spatial autocorrelation. There is no evidence of spatial autocorrelation in the residuals at the 5% significance level.
Load the data:
| Variable | Column | Description |
|---|---|---|
| P | price.1960 | Taxes and delivery charges for 1955-1959 new cars |
| T | tax.charges | 1960 used car prices |
## tax.charges price.1960
## AL 129 1461
## AZ 218 1601
## AR 176 1469
## CA 252 1611
## CO 186 1606
## CT 154 1491
Model: \(P_i = \beta_0 + \beta_1 Ti + \epsilon_i\)
##
## Call:
## lm(formula = price.1960 ~ tax.charges, data = used.cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -116.701 -45.053 -1.461 43.400 107.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1435.7506 27.5796 52.058 < 0.0000000000000002 ***
## tax.charges 0.6872 0.1754 3.918 0.000294 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 57.01 on 46 degrees of freedom
## Multiple R-squared: 0.2503, Adjusted R-squared: 0.234
## F-statistic: 15.35 on 1 and 46 DF, p-value: 0.0002939
mdl.uc.p = pf(summary(mdl.uc)$fstatistic[1],
summary(mdl.uc)$fstatistic[2],
summary(mdl.uc)$fstatistic[3],
lower.tail = F)
mdl.uc.t.int = summary(mdl.uc)$coefficients[1, 4]
mdl.uc.t.p = summary(mdl.uc)$coefficients[2, 4]Are the model parameters significant?
Conclusion: The model parameters – intercept and tax variable are significant.
Does the linear regression model suffer from non-normality?
mdl.uc.jb.test = tseries::jarque.bera.test(mdl.uc$residuals)
mdl.uc.jb.p = mdl.uc.jb.test$p.value
mdl.uc.jb.x = mdl.uc.jb.test$statisticConclusion: Since the p-value = 0.3885549 > 0.05 and the Jarque-Bera test statistic is small (1.89), there is insufficient evidence to reject the null of normality, this suggests that the data follows a normal distribution.
Does the linear regression model suffer from heteroskedasticity in the errors?
Conclusion: Since the p-value = 0.9709966 > 0.05, there is insufficient evidence to reject the null of homoskedasticity, which suggests that there is no evidence of heteroskedasticity.
Run Moran’s test on the OLS residuals:
mdl.uc.morans.test = lm.morantest(model = mdl.uc,
listw = used.cars.lw.w)
mdl.uc.morans.p = mdl.uc.morans.test$p.value
mdl.uc.morans.i = mdl.uc.morans.test$statistic[1]
mdl.uc.morans.e = mdl.uc.morans.test$statistic[2]Conclusion: Since the p-value = 0 < 0.05 and \(I\) = 6.3868735 > NA = \(E[I]\), there is evidence of significant spatial autocorrelation in the OLS residuals based on Moran’s test for residuals. Recommend to consider SEM.
## [,1]
## [1,] 1480.250
## [2,] 1625.200
## [3,] 1479.833
## [4,] 1621.333
## [5,] 1593.571
## [6,] 1557.000
## [7,] 1510.667
## [8,] 1471.000
## [9,] 1476.600
## [10,] 1631.333
## [11,] 1482.200
## [12,] 1477.000
## [13,] 1537.833
## [14,] 1531.500
## [15,] 1476.714
## [16,] 1480.000
## [17,] 1539.000
## [18,] 1500.750
## [19,] 1537.600
## [20,] 1466.333
## [21,] 1568.750
## [22,] 1475.500
## [23,] 1510.375
## [24,] 1631.000
## [25,] 1582.667
## [26,] 1629.000
## [27,] 1559.333
## [28,] 1527.333
## [29,] 1566.600
## [30,] 1528.800
## [31,] 1476.000
## [32,] 1606.000
## [33,] 1471.800
## [34,] 1542.333
## [35,] 1624.500
## [36,] 1507.667
## [37,] 1531.500
## [38,] 1479.000
## [39,] 1609.500
## [40,] 1466.125
## [41,] 1518.000
## [42,] 1630.500
## [43,] 1551.667
## [44,] 1467.600
## [45,] 1643.000
## [46,] 1478.200
## [47,] 1550.500
## [48,] 1621.333
moran.plot(x = used.cars$price.1960,
listw = used.cars.lw.w,
main = "Moran's Scatter Plot - Price of Used Cars",
labels = F,
xlab = "Price",
ylab = "Spatially lagged Price")From Moran’s scatter plot, there seems to be positive spatial autocorrelation in the dependent variable ‘Price of Used Cars’ with most points in the High-High and Low-Low quadrants.
mdl.uc.moran.price = moran.test(x = used.cars$price.1960,
listw = used.cars.lw.w)
mdl.uc.moran.price.p = mdl.uc.moran.price$p.value
mdl.uc.moran.price.i = mdl.uc.moran.price$estimate[1]
mdl.uc.moran.price.e = mdl.uc.moran.price$estimate[2]Conclusion: Since p-value = 0 < 0.05, there is evidence of significant spatial autocorrelation in the dependent variable ‘Price of Used Cars’ based on Moran’s test for residuals. Recommend to consider SLM or SAR.
Pure spatial autogressive model (SAR)
uc.sar = spautolm(formula = price.1960 ~ 1,
data = used.cars,
listw = used.cars.lw.w)
summary(uc.sar)##
## Call:
## spautolm(formula = price.1960 ~ 1, data = used.cars, listw = used.cars.lw.w)
##
## Residuals:
## Min 1Q Median 3Q Max
## -76.152 -18.221 5.249 21.631 63.498
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1542.607 27.321 56.462 < 0.00000000000000022
##
## Lambda: 0.82919 LR test value: 54.202 p-value: 0.00000000000018086
## Numerical Hessian standard error of lambda: 0.065233
##
## Log likelihood: -240.977
## ML residual variance (sigma squared): 1045.4, (sigma: 32.333)
## Number of observations: 48
## Number of parameters estimated: 3
## AIC: NA (not available for weighted model)
Conclusion: The pure spatial lag parameter is highly significant.
Spatial lag model (SLM)
uc.slm = lagsarlm(formula = price.1960 ~ tax.charges,
data = used.cars,
listw = used.cars.lw.w)
summary(uc.slm)##
## Call:lagsarlm(formula = price.1960 ~ tax.charges, data = used.cars,
## listw = used.cars.lw.w)
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.6781 -16.9504 4.2497 19.5484 58.9811
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 309.42546 123.03167 2.5150 0.0119
## tax.charges 0.16711 0.10212 1.6364 0.1018
##
## Rho: 0.78302, LR test value: 42.681, p-value: 0.000000000064426
## Asymptotic standard error: 0.081636
## z-value: 9.5916, p-value: < 0.000000000000000222
## Wald statistic: 91.998, p-value: < 0.000000000000000222
##
## Log likelihood: -239.8252 for lag model
## ML residual variance (sigma squared): 1036.7, (sigma: 32.197)
## Number of observations: 48
## Number of parameters estimated: 4
## AIC: 487.65, (AIC for lm: 528.33)
## LM test for residual autocorrelation
## test value: 2.1141, p-value: 0.14594
Conclusion: The spatial lag parameter is highly significant but not the tax parameter as opposed to the simple linear model.
Spatial error model (SEM)
uc.sem = errorsarlm(formula = price.1960 ~ tax.charges,
data = used.cars,
listw = used.cars.lw.w)
summary(uc.sem)##
## Call:errorsarlm(formula = price.1960 ~ tax.charges, data = used.cars,
## listw = used.cars.lw.w)
##
## Residuals:
## Min 1Q Median 3Q Max
## -74.824 -17.459 2.406 21.278 64.597
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1528.345377 31.962601 47.8167 <0.0000000000000002
## tax.charges 0.088309 0.119233 0.7406 0.4589
##
## Lambda: 0.819, LR test value: 40.899, p-value: 0.0000000001603
## Asymptotic standard error: 0.074051
## z-value: 11.06, p-value: < 0.000000000000000222
## Wald statistic: 122.32, p-value: < 0.000000000000000222
##
## Log likelihood: -240.7163 for error model
## ML residual variance (sigma squared): 1043.9, (sigma: 32.309)
## Number of observations: 48
## Number of parameters estimated: 4
## AIC: NA (not available for weighted model), (AIC for lm: 528.33)
Conclusion: The spatial lag parameter is highly significant but not the tax paramenter as opposed to the simple linear model.
SARAR(1,1)
uc.sarar = sacsarlm(formula = price.1960 ~ tax.charges,
data = used.cars,
listw = used.cars.lw.w)
summary(uc.sarar)##
## Call:sacsarlm(formula = price.1960 ~ tax.charges, data = used.cars,
## listw = used.cars.lw.w)
##
## Residuals:
## Min 1Q Median 3Q Max
## -74.0183 -16.7023 6.2859 19.5354 48.9003
##
## Type: sac
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 139.123986 90.854598 1.5313 0.1257
## tax.charges 0.120064 0.079744 1.5056 0.1322
##
## Rho: 0.89818
## Asymptotic standard error: 0.061727
## z-value: 14.551, p-value: < 0.000000000000000222
## Lambda: -0.49184
## Asymptotic standard error: 0.27454
## z-value: -1.7915, p-value: 0.073214
##
## LR test value: 44.889, p-value: 0.00000000017884
##
## Log likelihood: -238.7213 for sac model
## ML residual variance (sigma squared): 833.94, (sigma: 28.878)
## Number of observations: 48
## Number of parameters estimated: 5
## AIC: 487.44, (AIC for lm: 528.33)
Conclusion: Only the spatial lag parameter is highly significant. The spatial error parameter is not significant suggesting that SARAR(1,1) is not a suitable model. This outcome largely confirms the outcome of the LM test for residual autocorrelation done in conjunction with the estimation of the SLM parameters.
Run the LM and RLM test to determine whether the SLM model or the SEM model is a better fit for the data.
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = used.cars)
## weights: used.cars.lw.w
##
## LMerr = 31.793, df = 1, p-value = 0.00000001715
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = used.cars)
## weights: used.cars.lw.w
##
## LMlag = 40.664, df = 1, p-value = 0.0000000001808
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = used.cars)
## weights: used.cars.lw.w
##
## RLMerr = 0.051751, df = 1, p-value = 0.82
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = used.cars)
## weights: used.cars.lw.w
##
## RLMlag = 8.9229, df = 1, p-value = 0.002816
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = price.1960 ~ tax.charges, data = used.cars)
## weights: used.cars.lw.w
##
## SARMA = 40.716, df = 2, p-value = 0.000000001441
uc.LMerr.p = lm.LMtests(model = mdl.uc,
listw = used.cars.lw.w,
test = "LMerr")$p.value
uc.LMlag.p = lm.LMtests(model = mdl.uc,
listw = used.cars.lw.w,
test = "LMlag")$p.value
uc.RLMerr.p = lm.LMtests(model = mdl.uc,
listw = used.cars.lw.w,
test = "RLMerr")$p.value
uc.RLMlag.p = lm.LMtests(model = mdl.uc,
listw = used.cars.lw.w,
test = "RLMlag")$p.valueConclusion: Considering the robust version of the tests, we can conclude that the SLM specification is significant and fits the data well (SLM p-value = ) < 0.05. We have sufficient evidence to reject the SEM specification (SEM p-value = ) > 0.05.
However, the LM test is unable to compare the SLM model against the comprehensive SARAR(1,1) model and the SAR model. We will use the LR test to determine the best model.
LR Tests (\(H_0\): OLS)
Conclusion: Reject \(H_0\).
##
## Likelihood Ratio diagnostics for spatial dependence
##
## data:
## Likelihood ratio = 42.681, df = 1, p-value = 0.00000000006443
## sample estimates:
## Log likelihood of spatial lag model Log likelihood of OLS fit y
## -239.8252 -261.1658
Conclusion: Reject \(H_0\).
##
## Likelihood Ratio diagnostics for spatial dependence
##
## data:
## Likelihood ratio = 44.889, df = 2, p-value = 0.0000000001788
## sample estimates:
## Log likelihood of spatial lag model Log likelihood of OLS fit y
## -238.7213 -261.1658
Comprehensive LR Tests
Conclusion: Reject \(H_0\).
##
## Likelihood ratio for spatial linear models
##
## data:
## Likelihood ratio = -3.99, df = 1, p-value = 0.04577
## sample estimates:
## Log likelihood of uc.sem Log likelihood of uc.sarar
## -240.7163 -238.7213
Conclusion: Do not reject \(H_0\).
##
## Likelihood ratio for spatial linear models
##
## data:
## Likelihood ratio = -2.2078, df = 1, p-value = 0.1373
## sample estimates:
## Log likelihood of uc.slm Log likelihood of uc.sarar
## -239.8252 -238.7213
Conclusion: Do not reject \(H_0\).
##
## Likelihood ratio for spatial linear models
##
## data:
## Likelihood ratio = -2.3037, df = 1, p-value = 0.1291
## sample estimates:
## Log likelihood of uc.sar Log likelihood of uc.slm
## -240.9770 -239.8252
Results
| H0 | H1 | p-value | Decision at 5% significance level |
|---|---|---|---|
| OLS | SLM | 0.0000000 | Reject |
| OLS | SARAR(1,1) | 0.0000000 | Reject |
| SEM | SARAR(1,1) | 0.0457715 | Reject |
| SLM | SARAR(1,1) | 0.1373141 | Do not reject |
| SAR | SLM | 0.1290696 | Do not reject |
Conclusion: Based on the tests, it seems that the pure SAR is the most suitable model for this dataset. Thus, the price of used cars are possibly closly related to the price of used cars in the neighbouring states that the amount of taxes and delivery charges of new cars of each state.