Suggested citation:
Mendez C. (2020). Spatial regression analysis in R. R Studio/RPubs. Available at https://rpubs.com/quarcs-lab/tutorial-spatial-regression
This work is licensed under the Creative Commons Attribution-Non Commercial-Share Alike 4.0 International License.
Acknowledgment:
Material adapted from multiple sources, in particular BurkeyAcademy’s GIS & Spatial Econometrics Project
Import shapefiles into R
Import neighbor relationship from .gal
files
Create neighbor relationships in R from shape files
Create neighbor relationships in R from shape latitude and longitude
Understand the difference between Great Circle and Euclidean distances
Export neighbor relationships as weight matrices to plain text files
Test for spatial dependence via the Moran’s I test
Evaluate the four simplest models of spatial regression
All necessary files can be downloaded from BurkeyAcademy’s GIS & Spatial Econometrics Project
If you are a member of the QuaRCS lab, you can run this tutorial in R Studio Cloud and access the files in the following Github Repository
Let us use the readOGR
function from the rgdal
library to import the .shp
file
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/carlos/Github/QuaRCS-lab/tutorial-spatial-regression", layer: "NCVACO"
## with 234 features
## It has 49 fields
## Integer64 fields read as strings: FIPS qtystores PCI COUNTMXBV DC GA KY MD SC TN WV VA COUNTBKGR TOTALPOP POP18OV LABFORCE HHOLDS POP25OV POP16OV
Note that the file is imported as a SpatialPolygonsDataFrame
object
.gal
fileLet us use the read.gal
function from the rgdal
library to import the .gal
weights matrix created in GeoDa
## Neighbour list object:
## Number of regions: 234
## Number of nonzero links: 1132
## Percentage nonzero weights: 2.067
## Average number of links: 4.838
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 11
## 16 26 22 32 36 49 37 9 6 1
## 16 least connected regions:
## 51515 51530 51595 51660 51690 51720 51820 51131 51520 51540 51580 51600 51678 51790 51840 51001 with 1 link
## 1 most connected region:
## 51041 with 11 links
## [1] TRUE
## Neighbour list object:
## Number of regions: 234
## Number of nonzero links: 1132
## Percentage nonzero weights: 2.067
## Average number of links: 4.838
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 11
## 16 26 22 32 36 49 37 9 6 1
## 16 least connected regions:
## 51515 51530 51595 51660 51690 51720 51820 51131 51520 51540 51580 51600 51678 51790 51840 51001 with 1 link
## 1 most connected region:
## 51041 with 11 links
## [1] TRUE
Recognize that latitude and longitude are handled using great circle distances
Fail to recognize that latitude and longitude are handled using great circle distances. Latitude and longitude should not be used to compute Euclidean distances
Plot the right neighbor relationship
Plot the wrong neighbor relationship
Compare the differences
plot(nc.5nn.nb,nc.coords)
plot(diffnb(nc.5nn.nb, nc.5nn.nb.wrong), nc.coords, add=TRUE, col="red", lty=2)
title(main="Differences between Euclidean and Great Circle k=5 neighbors")
## [1] TRUE
## [1] FALSE
There are many ways to store weights matrices and contiguity files:
listw
is used in most spdep commands
nb
means neighbor file
knn
is a k nearest neighbors object
neigh
is another kind of neighbor file, common in ecology (e.g. package ade4)
poly
stores it a “polygons” of a map file
There are many commands to convert one way of storing contiguity information into another:
poly2nb(object)
converts polygon
to nb
nb2listw(object)
converts nb
to listw
Let us use the Moran’s I based on the function moran
, which need the following arguments:
variable
neighbor relationship as a listw
object
number of regions
sum of weights
moranStatistic <- moran(NCVACO$SALESPC, nb2listw(queen.nb), length(NCVACO$SALESPC), Szero(nb2listw(queen.nb)))
moranStatistic
## $I
## [1] -0.01395
##
## $K
## [1] 9.059
Moran statistic
## [1] -0.01395
An alternative way of computing the test and a p value
##
## Moran I test under randomisation
##
## data: NCVACO$SALESPC
## weights: nb2listw(queen.nb)
##
## Moran I statistic standard deviate = -0.21, p-value = 0.6
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.013949 -0.004292 0.002032
Moran Statistic
## [1] -0.01395
P-value
## [1] 0.5848
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/carlos/Github/QuaRCS-lab/tutorial-spatial-regression", layer: "NCVACO"
## with 234 features
## It has 49 fields
## Integer64 fields read as strings: FIPS qtystores PCI COUNTMXBV DC GA KY MD SC TN WV VA COUNTBKGR TOTALPOP POP18OV LABFORCE HHOLDS POP25OV POP16OV
## [1] "GEO_ID" "STATE" "COUNTY" "NAME" "LSAD"
## [6] "CENSUSAREA" "FIPS2" "Lon" "Lat" "FIPS"
## [11] "qtystores" "SALESPC" "PCI" "COMM15OVP" "COLLENRP"
## [16] "SOMECOLLP" "ARMEDP" "NONWHITEP" "UNEMPP" "ENTRECP"
## [21] "PUBASSTP" "POVPOPP" "URBANP" "FOREIGNBP" "BAPTISTSP"
## [26] "ADHERENTSP" "BKGRTOMIX" "COUNTMXBV" "MXBVSQM" "BKGRTOABC"
## [31] "MXBVPPOP18" "DUI1802" "FVPTHH02" "DC" "GA"
## [36] "KY" "MD" "SC" "TN" "WV"
## [41] "VA" "AREALANDSQ" "COUNTBKGR" "TOTALPOP" "POP18OV"
## [46] "LABFORCE" "HHOLDS" "POP25OV" "POP16OV"
This dataset is some data from some studies Mark Burkey did on liquor demand using data from around 2003. In particular, he looks at the states of Virginia and North Carolina. This dataset is related to, but not the same as data used on an NIH grant and published in a paper:
Unit of analysis: counties in Virginia and North Carolina
Variable Descriptions:
Lon Lat
Longitude and Latitude of County Centroid
FIPS
FIPS Code for County (Federal Information Processing Standard)
qtystores
#Liquor Stores in County
SALESPC
Liquor Sales per capita per year, $
PCI
Per capita income
COMM15OVP
% commuting over 15 minutes to work
COLLENRP
% of people currently enrolled in college
SOMECOLLP
% of people with “some college” or higher education level
ARMEDP
% in armed forces
NONWHITEP
% nonwhite
UNEMPP
% unemployed
ENTRECP
% employed i entertainment or recreation fields (proxy for tourism areas)
PUBASSTP
% on public assistance of some sort
POVPOPP
% in poverty
URBANP
% living in urban areas
FOREIGNBP
% foreign born
BAPTISTSP
% southern baptist (historically anti-alcohol)
ADHERENTSP
% adherents of any religion
BKGRTOMIX
wtd. average distance from block group to nearest bar selling liquor
COUNTMXBV
count of bars selling liquor
MXBVSQM
bars per square mile
BKGRTOABC
distance fro block group to nearest retail liquor outlet (“ABC stores”)
MXBVPPOP18OV
Bars per 1,000? people 18 and older
DUI1802
DUI arrests per 1,000 people 18+
FVPTHH02
Offences against families and children (domestic violence) per 1,000 households
DC GA KY MD SC TN WV VA Dummy variables for counties bordering other states
AREALANDSQMI
Area of county
COUNTBKGR
count of block groups in county
TOTALPOP
Population of county
POP18OV
18+ people in county
LABFORCE
number in labor force in county
HHOLDS
# households in county
POP25OV
Pop 25+ in county
POP16OV
Pop 16+ in county
summarize imported data
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x -84.32 -75.24
## y 33.84 39.47
## Is projected: FALSE
## proj4string :
## [+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs]
## Data attributes:
## GEO_ID STATE COUNTY NAME LSAD
## 0500000US37001: 1 37:100 001 : 2 Franklin : 3 city : 39
## 0500000US37003: 1 51:134 003 : 2 Richmond : 3 County:195
## 0500000US37005: 1 005 : 2 Alleghany : 2
## 0500000US37007: 1 007 : 2 Bedford : 2
## 0500000US37009: 1 009 : 2 Brunswick : 2
## 0500000US37011: 1 011 : 2 Cumberland: 2
## (Other) :228 (Other):222 (Other) :220
## CENSUSAREA FIPS2 Lon Lat FIPS
## Min. : 2 37001 : 1 Min. :-83.9 Min. :34.2 37001 : 1
## 1st Qu.:234 37003 : 1 1st Qu.:-80.3 1st Qu.:35.8 37003 : 1
## Median :381 37005 : 1 Median :-78.5 Median :36.8 37005 : 1
## Mean :376 37007 : 1 Mean :-78.9 Mean :36.7 37007 : 1
## 3rd Qu.:514 37009 : 1 3rd Qu.:-77.4 3rd Qu.:37.5 37009 : 1
## Max. :969 37011 : 1 Max. :-75.6 Max. :39.2 37011 : 1
## (Other):228 (Other):228
## qtystores SALESPC PCI COMM15OVP COLLENRP
## 1 :80 Min. : 0.0 14893 : 2 Min. : 5.49 Min. : 0.89
## 2 :49 1st Qu.: 38.0 12808 : 1 1st Qu.:25.89 1st Qu.: 2.66
## 0 :28 Median : 58.8 12830 : 1 Median :30.42 Median : 3.18
## 3 :22 Mean : 65.0 13195 : 1 Mean :30.52 Mean : 4.58
## 5 :13 3rd Qu.: 78.7 13286 : 1 3rd Qu.:34.96 3rd Qu.: 4.18
## 4 :11 Max. :297.7 13293 : 1 Max. :48.79 Max. :41.30
## (Other):31 (Other):227
## SOMECOLLP ARMEDP NONWHITEP UNEMPP
## Min. :16.6 Min. : 0.000 Min. : 0.67 Min. : 1.71
## 1st Qu.:23.4 1st Qu.: 0.020 1st Qu.: 8.93 1st Qu.: 3.63
## Median :27.1 Median : 0.080 Median :22.17 Median : 4.67
## Mean :28.8 Mean : 0.666 Mean :24.76 Mean : 5.27
## 3rd Qu.:32.4 3rd Qu.: 0.240 3rd Qu.:37.80 3rd Qu.: 6.45
## Max. :59.9 Max. :21.530 Max. :81.59 Max. :41.41
##
## ENTRECP PUBASSTP POVPOPP URBANP
## Min. : 2.84 Min. :0.48 Min. : 2.75 Min. : 0.00
## 1st Qu.: 4.90 1st Qu.:2.00 1st Qu.: 9.17 1st Qu.: 5.94
## Median : 5.77 Median :2.76 Median :12.57 Median : 33.84
## Mean : 6.44 Mean :3.10 Mean :13.05 Mean : 41.16
## 3rd Qu.: 7.42 3rd Qu.:3.89 3rd Qu.:16.71 3rd Qu.: 70.97
## Max. :22.54 Max. :8.65 Max. :31.35 Max. :100.00
##
## FOREIGNBP BAPTISTSP ADHERENTSP BKGRTOMIX COUNTMXBV
## Min. : 0.000 Min. : 0.00 Min. : 13.3 Min. : 0.26 0 : 42
## 1st Qu.: 0.500 1st Qu.: 9.19 1st Qu.: 35.2 1st Qu.: 1.96 2 : 18
## Median : 0.995 Median :14.11 Median : 43.1 Median : 3.65 5 : 15
## Mean : 1.854 Mean :15.72 Mean : 45.2 Mean : 5.37 1 : 14
## 3rd Qu.: 2.243 3rd Qu.:21.25 3rd Qu.: 50.7 3rd Qu.: 6.86 4 : 11
## Max. :16.120 Max. :60.13 Max. :164.5 Max. :38.27 7 : 8
## (Other):126
## MXBVSQM BKGRTOABC MXBVPPOP18 DUI1802
## Min. :0.000 Min. : 0.549 Min. : 0.00 Min. : 0.25
## 1st Qu.:0.003 1st Qu.: 2.804 1st Qu.: 1.08 1st Qu.: 4.43
## Median :0.022 Median : 4.433 Median : 3.21 Median : 6.92
## Mean :0.414 Mean : 4.903 Mean : 4.57 Mean : 7.78
## 3rd Qu.:0.129 3rd Qu.: 5.974 3rd Qu.: 6.50 3rd Qu.: 9.92
## Max. :8.697 Max. :21.501 Max. :39.78 Max. :32.78
##
## FVPTHH02 DC GA KY MD SC TN WV
## Min. : 0.000 0:231 0:230 0:231 0:229 0:217 0:220 0:219
## 1st Qu.: 0.074 1: 3 1: 4 1: 3 1: 5 1: 17 1: 14 1: 15
## Median : 0.382
## Mean : 1.644
## 3rd Qu.: 2.146
## Max. :20.458
##
## VA AREALANDSQ COUNTBKGR TOTALPOP POP18OV LABFORCE
## 0:100 Min. : 2 12 : 10 13146 : 2 10022 : 1 10134 : 1
## 1:134 1st Qu.:236 10 : 8 100565 : 1 10042 : 1 10142 : 1
## Median :382 25 : 8 10290 : 1 10106 : 1 102470 : 1
## Mean :377 11 : 7 10377 : 1 10179 : 1 10524 : 1
## 3rd Qu.:519 14 : 7 10381 : 1 10215 : 1 106066 : 1
## Max. :971 17 : 7 10516 : 1 102361 : 1 10756 : 1
## (Other):187 (Other):227 (Other):228 (Other):228
## HHOLDS POP25OV POP16OV
## 2669 : 2 100128 : 1 10002 : 1
## 10029 : 1 10120 : 1 100501 : 1
## 10142 : 1 10403 : 1 10096 : 1
## 10186 : 1 107671 : 1 101238 : 1
## 10301 : 1 10803 : 1 101606 : 1
## 10394 : 1 10841 : 1 10207 : 1
## (Other):227 (Other):228 (Other):228
nb
to listw
type##
## Call:
## lm(formula = reg.eq1, data = spat.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.048 -3.125 -0.727 2.078 25.845
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.67323 1.31807 2.79 0.0058 **
## SALESPC 0.01334 0.00839 1.59 0.1131
## COLLENRP 0.00775 0.06557 0.12 0.9061
## BKGRTOABC 0.03207 0.14760 0.22 0.8282
## BAPTISTSP 0.06723 0.03606 1.86 0.0636 .
## BKGRTOMIX 0.08061 0.07601 1.06 0.2900
## ENTRECP 0.24194 0.14733 1.64 0.1019
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.98 on 227 degrees of freedom
## Multiple R-squared: 0.0584, Adjusted R-squared: 0.0335
## F-statistic: 2.34 on 6 and 227 DF, p-value: 0.0324
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = reg.eq1, data = spat.data)
## weights: listw1
##
## Moran I statistic standard deviate = 4.6, p-value = 0.000002
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.197279 -0.010566 0.002049
Source: Anselin (1988), Burkey (2018)
lmLMtests <- lm.LMtests(reg1, listw1, test=c("LMerr", "LMlag", "RLMerr", "RLMlag", "SARMA"))
lmLMtests
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = reg.eq1, data = spat.data)
## weights: listw1
##
## LMerr = 18, df = 1, p-value = 0.00002
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = reg.eq1, data = spat.data)
## weights: listw1
##
## LMlag = 24, df = 1, p-value = 0.000001
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = reg.eq1, data = spat.data)
## weights: listw1
##
## RLMerr = 11, df = 1, p-value = 0.001
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = reg.eq1, data = spat.data)
## weights: listw1
##
## RLMlag = 16, df = 1, p-value = 0.00006
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = reg.eq1, data = spat.data)
## weights: listw1
##
## SARMA = 35, df = 2, p-value = 0.00000003
Based on these results (Lowest is RLMlag), we would choose the spatial lag model
Spatially Lagged X y=Xß+WXT+e p=rho, T=theta, and L=lambda
##
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))),
## data = as.data.frame(x), weights = weights)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.472 -3.008 -0.536 1.818 25.056
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5105293 2.5747239 0.59 0.5580
## SALESPC 0.0154612 0.0083614 1.85 0.0658 .
## COLLENRP 0.0567602 0.0654168 0.87 0.3865
## BKGRTOABC 0.2227781 0.1543080 1.44 0.1502
## BAPTISTSP -0.0000135 0.0446868 0.00 0.9998
## BKGRTOMIX -0.2143435 0.1041711 -2.06 0.0408 *
## ENTRECP -0.0448487 0.1588845 -0.28 0.7780
## lag.SALESPC 0.0264937 0.0205916 1.29 0.1996
## lag.COLLENRP -0.2600258 0.2054739 -1.27 0.2070
## lag.BKGRTOABC -0.4691910 0.3047671 -1.54 0.1251
## lag.BAPTISTSP 0.1100568 0.0597374 1.84 0.0668 .
## lag.BKGRTOMIX 0.5278804 0.1670141 3.16 0.0018 **
## lag.ENTRECP 0.4328176 0.2512396 1.72 0.0863 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.78 on 221 degrees of freedom
## Multiple R-squared: 0.154, Adjusted R-squared: 0.108
## F-statistic: 3.35 on 12 and 221 DF, p-value: 0.000168
Even in the case of the SLX model, you need to think about total impact (Direct+Indirect)
## Impact measures (SLX, estimable):
## Direct Indirect Total
## SALESPC 0.01546115 0.02649 0.04195
## COLLENRP 0.05676021 -0.26003 -0.20327
## BKGRTOABC 0.22277806 -0.46919 -0.24641
## BAPTISTSP -0.00001348 0.11006 0.11004
## BKGRTOMIX -0.21434352 0.52788 0.31354
## ENTRECP -0.04484868 0.43282 0.38797
(R=500 not actually needed for SLX)
## Impact measures (SLX, estimable, n-k):
## Direct Indirect Total
## SALESPC 0.01546115 0.02649 0.04195
## COLLENRP 0.05676021 -0.26003 -0.20327
## BKGRTOABC 0.22277806 -0.46919 -0.24641
## BAPTISTSP -0.00001348 0.11006 0.11004
## BKGRTOMIX -0.21434352 0.52788 0.31354
## ENTRECP -0.04484868 0.43282 0.38797
## ========================================================
## Standard errors:
## Direct Indirect Total
## SALESPC 0.008361 0.02059 0.02290
## COLLENRP 0.065417 0.20547 0.20255
## BKGRTOABC 0.154308 0.30477 0.29311
## BAPTISTSP 0.044687 0.05974 0.04766
## BKGRTOMIX 0.104171 0.16701 0.12555
## ENTRECP 0.158885 0.25124 0.24909
## ========================================================
## Z-values:
## Direct Indirect Total
## SALESPC 1.8491075 1.287 1.8324
## COLLENRP 0.8676702 -1.265 -1.0035
## BKGRTOABC 1.4437231 -1.540 -0.8407
## BAPTISTSP -0.0003015 1.842 2.3091
## BKGRTOMIX -2.0576102 3.161 2.4973
## ENTRECP -0.2822722 1.723 1.5576
##
## p-values:
## Direct Indirect Total
## SALESPC 0.064 0.1982 0.067
## COLLENRP 0.386 0.2057 0.316
## BKGRTOABC 0.149 0.1237 0.401
## BAPTISTSP 1.000 0.0654 0.021
## BKGRTOMIX 0.040 0.0016 0.013
## ENTRECP 0.778 0.0849 0.119
y=pWy+XB+e
##
## Call:lagsarlm(formula = reg.eq1, data = spat.data, listw = listw1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.18539 -2.59660 -0.77605 1.61369 25.93228
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.5421139 1.3012975 1.1851 0.23599
## SALESPC 0.0138185 0.0077171 1.7906 0.07335
## COLLENRP 0.0327744 0.0603469 0.5431 0.58706
## BKGRTOABC 0.1164671 0.1358375 0.8574 0.39122
## BAPTISTSP 0.0392792 0.0332872 1.1800 0.23800
## BKGRTOMIX 0.0092150 0.0700728 0.1315 0.89538
## ENTRECP 0.1532086 0.1359122 1.1273 0.25963
##
## Rho: 0.3983, LR test value: 22.92, p-value: 0.0000016858
## Asymptotic standard error: 0.07512
## z-value: 5.302, p-value: 0.00000011459
## Wald statistic: 28.11, p-value: 0.00000011459
##
## Log likelihood: -692.7 for lag model
## ML residual variance (sigma squared): 21.01, (sigma: 4.584)
## Number of observations: 234
## Number of parameters estimated: 9
## AIC: 1403, (AIC for lm: 1424)
## LM test for residual autocorrelation
## test value: 13.05, p-value: 0.0003037
Remember that when you use the spatial lag model, you cannot interpret Betas as marginal effects, you have to use the impacts
function and obtain direct and indirect effects.
## Impact measures (lag, exact):
## Direct Indirect Total
## SALESPC 0.014383 0.008582 0.02297
## COLLENRP 0.034112 0.020356 0.05447
## BKGRTOABC 0.121222 0.072336 0.19356
## BAPTISTSP 0.040883 0.024396 0.06528
## BKGRTOMIX 0.009591 0.005723 0.01531
## ENTRECP 0.159463 0.095155 0.25462
## Impact measures (lag, exact):
## Direct Indirect Total
## SALESPC 0.014383 0.008582 0.02297
## COLLENRP 0.034112 0.020356 0.05447
## BKGRTOABC 0.121222 0.072336 0.19356
## BAPTISTSP 0.040883 0.024396 0.06528
## BKGRTOMIX 0.009591 0.005723 0.01531
## ENTRECP 0.159463 0.095155 0.25462
## ========================================================
## Simulation results (asymptotic variance matrix):
## Direct:
##
## Iterations = 1:500
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 500
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## SALESPC 0.0136 0.00805 0.00036 0.000294
## COLLENRP 0.0363 0.06644 0.00297 0.002971
## BKGRTOABC 0.1169 0.13955 0.00624 0.006241
## BAPTISTSP 0.0422 0.03546 0.00159 0.001586
## BKGRTOMIX 0.0113 0.07407 0.00331 0.003313
## ENTRECP 0.1665 0.13557 0.00606 0.006063
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## SALESPC -0.00201 0.00810 0.0134 0.0191 0.029
## COLLENRP -0.08854 -0.00994 0.0379 0.0811 0.162
## BKGRTOABC -0.14936 0.02840 0.1120 0.2147 0.371
## BAPTISTSP -0.02830 0.01984 0.0440 0.0644 0.111
## BKGRTOMIX -0.14417 -0.03827 0.0118 0.0585 0.162
## ENTRECP -0.09526 0.07636 0.1728 0.2575 0.435
##
## ========================================================
## Indirect:
##
## Iterations = 1:500
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 500
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## SALESPC 0.00844 0.00576 0.000257 0.000257
## COLLENRP 0.02309 0.04507 0.002016 0.002016
## BKGRTOABC 0.07192 0.09308 0.004163 0.004163
## BAPTISTSP 0.02587 0.02481 0.001109 0.001109
## BKGRTOMIX 0.00644 0.04788 0.002141 0.002141
## ENTRECP 0.10322 0.09793 0.004379 0.004379
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## SALESPC -0.00105 0.00443 0.00815 0.0116 0.0206
## COLLENRP -0.05861 -0.00627 0.02116 0.0498 0.1237
## BKGRTOABC -0.09681 0.01656 0.06623 0.1228 0.2501
## BAPTISTSP -0.02104 0.01022 0.02522 0.0404 0.0773
## BKGRTOMIX -0.08810 -0.02436 0.00622 0.0339 0.1009
## ENTRECP -0.05254 0.04310 0.09518 0.1552 0.2944
##
## ========================================================
## Total:
##
## Iterations = 1:500
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 500
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## SALESPC 0.0221 0.0133 0.000596 0.000596
## COLLENRP 0.0594 0.1097 0.004906 0.004906
## BKGRTOABC 0.1888 0.2284 0.010216 0.010216
## BAPTISTSP 0.0681 0.0587 0.002626 0.002626
## BKGRTOMIX 0.0177 0.1208 0.005401 0.005401
## ENTRECP 0.2697 0.2262 0.010115 0.010115
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## SALESPC -0.00319 0.0125 0.0223 0.0303 0.0506
## COLLENRP -0.14134 -0.0172 0.0601 0.1341 0.2862
## BKGRTOABC -0.24389 0.0477 0.1828 0.3412 0.6166
## BAPTISTSP -0.04970 0.0296 0.0703 0.1073 0.1780
## BKGRTOMIX -0.22491 -0.0641 0.0183 0.0933 0.2561
## ENTRECP -0.15026 0.1242 0.2667 0.4154 0.7081
##
## ========================================================
## Simulated standard errors
## Direct Indirect Total
## SALESPC 0.008054 0.005756 0.01332
## COLLENRP 0.066444 0.045072 0.10971
## BKGRTOABC 0.139550 0.093082 0.22843
## BAPTISTSP 0.035461 0.024806 0.05872
## BKGRTOMIX 0.074073 0.047882 0.12077
## ENTRECP 0.135571 0.097926 0.22617
##
## Simulated z-values:
## Direct Indirect Total
## SALESPC 1.6905 1.4662 1.6560
## COLLENRP 0.5464 0.5124 0.5414
## BKGRTOABC 0.8376 0.7726 0.8265
## BAPTISTSP 1.1905 1.0429 1.1595
## BKGRTOMIX 0.1521 0.1345 0.1466
## ENTRECP 1.2281 1.0541 1.1926
##
## Simulated p-values:
## Direct Indirect Total
## SALESPC 0.091 0.14 0.098
## COLLENRP 0.585 0.61 0.588
## BKGRTOABC 0.402 0.44 0.409
## BAPTISTSP 0.234 0.30 0.246
## BKGRTOMIX 0.879 0.89 0.883
## ENTRECP 0.219 0.29 0.233
Caution: These p-values are simulated, and seem to vary a bit from run to run.
y=XB+u, u=LWu+e
##
## Call:errorsarlm(formula = reg.eq1, data = spat.data, listw = listw1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.21199 -2.66820 -0.85841 1.57794 25.85910
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.1218025 1.4443139 3.5462 0.0003909
## SALESPC 0.0111587 0.0076284 1.4628 0.1435286
## COLLENRP 0.0533827 0.0617209 0.8649 0.3870915
## BKGRTOABC 0.1533586 0.1454500 1.0544 0.2917121
## BAPTISTSP 0.0326804 0.0394447 0.8285 0.4073797
## BKGRTOMIX -0.0232354 0.0832068 -0.2792 0.7800541
## ENTRECP 0.1067644 0.1449288 0.7367 0.4613242
##
## Lambda: 0.4034, LR test value: 20.07, p-value: 0.0000074467
## Asymptotic standard error: 0.07545
## z-value: 5.347, p-value: 0.000000089438
## Wald statistic: 28.59, p-value: 0.000000089438
##
## Log likelihood: -694.1 for error model
## ML residual variance (sigma squared): 21.25, (sigma: 4.609)
## Number of observations: 234
## Number of parameters estimated: 9
## AIC: 1406, (AIC for lm: 1424)
##
## Spatial Hausman test (asymptotic)
##
## data: NULL
## Hausman test = 14, df = 7, p-value = 0.06
Pace, R.K. and LeSage, J.P., 2008. A spatial Hausman test. Economics Letters, 101(3), pp.282-284.
Re-watch this video: R Spatial Regression 2: All of the models, likelihood Ratio specification tests, and spatial Breusch-Pagan
Source: Burkey (2018)
add lag X to SEM
y=XB+WxT+u, u=LWu+e
##
## Call:errorsarlm(formula = reg.eq1, data = spat.data, listw = listw1,
## etype = "emixed")
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.95062 -2.74043 -0.59773 1.73425 25.27727
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3840410 2.8953907 -0.1326 0.89448
## SALESPC 0.0165780 0.0081385 2.0370 0.04165
## COLLENRP 0.0441074 0.0606147 0.7277 0.46682
## BKGRTOABC 0.1974946 0.1414555 1.3962 0.16267
## BAPTISTSP 0.0158627 0.0395729 0.4008 0.68853
## BKGRTOMIX -0.1598168 0.0911062 -1.7542 0.07940
## ENTRECP 0.0035284 0.1456543 0.0242 0.98067
## lag.SALESPC 0.0387738 0.0203731 1.9032 0.05702
## lag.COLLENRP -0.3217522 0.2056446 -1.5646 0.11768
## lag.BKGRTOABC -0.0956966 0.3052079 -0.3135 0.75387
## lag.BAPTISTSP 0.0669554 0.0615853 1.0872 0.27695
## lag.BKGRTOMIX 0.3129327 0.1663462 1.8812 0.05994
## lag.ENTRECP 0.5558552 0.2641380 2.1044 0.03534
##
## Lambda: 0.3522, LR test value: 14.11, p-value: 0.00017285
## Asymptotic standard error: 0.07858
## z-value: 4.482, p-value: 0.000007405
## Wald statistic: 20.09, p-value: 0.000007405
##
## Log likelihood: -684.6 for error model
## ML residual variance (sigma squared): 19.78, (sigma: 4.447)
## Number of observations: 234
## Number of parameters estimated: 15
## AIC: 1399, (AIC for lm: 1411)
## Impact measures (SDEM, estimable):
## Direct Indirect Total
## SALESPC 0.016578 0.03877 0.05535
## COLLENRP 0.044107 -0.32175 -0.27764
## BKGRTOABC 0.197495 -0.09570 0.10180
## BAPTISTSP 0.015863 0.06696 0.08282
## BKGRTOMIX -0.159817 0.31293 0.15312
## ENTRECP 0.003528 0.55586 0.55938
## Impact measures (SDEM, estimable, n):
## Direct Indirect Total
## SALESPC 0.016578 0.03877 0.05535
## COLLENRP 0.044107 -0.32175 -0.27764
## BKGRTOABC 0.197495 -0.09570 0.10180
## BAPTISTSP 0.015863 0.06696 0.08282
## BKGRTOMIX -0.159817 0.31293 0.15312
## ENTRECP 0.003528 0.55586 0.55938
## ========================================================
## Standard errors:
## Direct Indirect Total
## SALESPC 0.008138 0.02037 0.02456
## COLLENRP 0.060615 0.20564 0.21829
## BKGRTOABC 0.141456 0.30521 0.33425
## BAPTISTSP 0.039573 0.06159 0.06246
## BKGRTOMIX 0.091106 0.16635 0.15194
## ENTRECP 0.145654 0.26414 0.30514
## ========================================================
## Z-values:
## Direct Indirect Total
## SALESPC 2.03700 1.9032 2.2536
## COLLENRP 0.72767 -1.5646 -1.2719
## BKGRTOABC 1.39616 -0.3135 0.3046
## BAPTISTSP 0.40085 1.0872 1.3259
## BKGRTOMIX -1.75418 1.8812 1.0077
## ENTRECP 0.02422 2.1044 1.8332
##
## p-values:
## Direct Indirect Total
## SALESPC 0.042 0.057 0.024
## COLLENRP 0.467 0.118 0.203
## BKGRTOABC 0.163 0.754 0.761
## BAPTISTSP 0.689 0.277 0.185
## BKGRTOMIX 0.079 0.060 0.314
## ENTRECP 0.981 0.035 0.067
add lag X to SAR
y=pWy+XB+WXT+e
##
## Call:lagsarlm(formula = reg.eq1, data = spat.data, listw = listw1,
## type = "mixed")
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.88614 -2.73500 -0.54014 1.56418 25.08356
##
## Type: mixed
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3019916 2.3986911 -0.1259 0.89981
## SALESPC 0.0142404 0.0077820 1.8299 0.06726
## COLLENRP 0.0713529 0.0609564 1.1706 0.24178
## BKGRTOABC 0.2315929 0.1434207 1.6148 0.10636
## BAPTISTSP 0.0029546 0.0415389 0.0711 0.94330
## BKGRTOMIX -0.2098770 0.0969290 -2.1653 0.03037
## ENTRECP -0.0676565 0.1479659 -0.4572 0.64750
## lag.SALESPC 0.0269862 0.0191874 1.4065 0.15959
## lag.COLLENRP -0.2777127 0.1909852 -1.4541 0.14592
## lag.BKGRTOABC -0.2682605 0.2830702 -0.9477 0.34329
## lag.BAPTISTSP 0.0600813 0.0557165 1.0783 0.28088
## lag.BKGRTOMIX 0.3952195 0.1553486 2.5441 0.01096
## lag.ENTRECP 0.4109060 0.2345201 1.7521 0.07975
##
## Rho: 0.3356, LR test value: 15.12, p-value: 0.000101
## Asymptotic standard error: 0.07813
## z-value: 4.296, p-value: 0.000017396
## Wald statistic: 18.45, p-value: 0.000017396
##
## Log likelihood: -684.1 for mixed model
## ML residual variance (sigma squared): 19.75, (sigma: 4.444)
## Number of observations: 234
## Number of parameters estimated: 15
## AIC: 1398, (AIC for lm: 1411)
## LM test for residual autocorrelation
## test value: 3.126, p-value: 0.077063
## Impact measures (mixed, exact):
## Direct Indirect Total
## SALESPC 0.016866 0.04519 0.06205
## COLLENRP 0.050375 -0.36099 -0.31061
## BKGRTOABC 0.215842 -0.27103 -0.05519
## BAPTISTSP 0.008003 0.08688 0.09488
## BKGRTOMIX -0.183029 0.46200 0.27898
## ENTRECP -0.035565 0.55222 0.51665
All inclusive
Not recommended
y=pWy+XB+WXT+u, u=LWu+e
##
## Call:sacsarlm(formula = reg.eq1, data = spat.data, listw = listw1,
## type = "sacmixed")
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.06691 -2.49565 -0.56282 1.24359 23.46396
##
## Type: sacmixed
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3461499 1.8190266 -0.1903 0.849079
## SALESPC 0.0142023 0.0075629 1.8779 0.060397
## COLLENRP 0.0749423 0.0611370 1.2258 0.220270
## BKGRTOABC 0.3189465 0.1456353 2.1900 0.028522
## BAPTISTSP -0.0140530 0.0431183 -0.3259 0.744487
## BKGRTOMIX -0.2736495 0.0979738 -2.7931 0.005221
## ENTRECP -0.0616431 0.1509625 -0.4083 0.683029
## lag.SALESPC 0.0100316 0.0174324 0.5755 0.564982
## lag.COLLENRP -0.2004362 0.1647538 -1.2166 0.223764
## lag.BKGRTOABC -0.3727231 0.2579012 -1.4452 0.148397
## lag.BAPTISTSP 0.0440849 0.0526102 0.8380 0.402057
## lag.BKGRTOMIX 0.4125315 0.1423864 2.8973 0.003764
## lag.ENTRECP 0.2425248 0.2098756 1.1556 0.247859
##
## Rho: 0.6649
## Asymptotic standard error: 0.0911
## z-value: 7.299, p-value: 2.9066e-13
## Lambda: -0.5252
## Asymptotic standard error: 0.1555
## z-value: -3.378, p-value: 0.00072879
##
## LR test value: 44.72, p-value: 0.00000041502
##
## Log likelihood: -681.8 for sacmixed model
## ML residual variance (sigma squared): 16.63, (sigma: 4.078)
## Number of observations: 234
## Number of parameters estimated: 16
## AIC: 1396, (AIC for lm: 1424)
## Impact measures (sacmixed, exact):
## Direct Indirect Total
## SALESPC 0.018650 0.05367 0.07232
## COLLENRP 0.040547 -0.41506 -0.37451
## BKGRTOABC 0.282258 -0.44274 -0.16049
## BAPTISTSP -0.006119 0.09574 0.08963
## BKGRTOMIX -0.220991 0.63546 0.41447
## ENTRECP -0.015617 0.55543 0.53981
a.k.a. Kelejian-Prucha, Cliff-Ord, or SAC
If all T=0,y=pWy+XB+u, u=LWu+e
##
## Call:sacsarlm(formula = reg.eq1, data = spat.data, listw = listw1,
## type = "sac")
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.01945 -2.34197 -0.78281 1.41007 23.48056
##
## Type: sac
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8353870 0.9351383 -0.8933 0.37168
## SALESPC 0.0171848 0.0067612 2.5417 0.01103
## COLLENRP 0.0138969 0.0505268 0.2750 0.78328
## BKGRTOABC 0.1404632 0.1045255 1.3438 0.17901
## BAPTISTSP 0.0211294 0.0225887 0.9354 0.34958
## BKGRTOMIX -0.0214055 0.0491538 -0.4355 0.66321
## ENTRECP 0.1409664 0.1061888 1.3275 0.18434
##
## Rho: 0.7415
## Asymptotic standard error: 0.06907
## z-value: 10.74, p-value: < 2.22e-16
## Lambda: -0.6036
## Asymptotic standard error: 0.1338
## z-value: -4.513, p-value: 0.0000064082
##
## LR test value: 32.47, p-value: 0.000000088832
##
## Log likelihood: -687.9 for sac model
## ML residual variance (sigma squared): 16.53, (sigma: 4.066)
## Number of observations: 234
## Number of parameters estimated: 10
## AIC: 1396, (AIC for lm: 1424)
(listw2 allows for a different weights matrix for the error structure if desired)
## Impact measures (sac, exact):
## Direct Indirect Total
## SALESPC 0.02090 0.04558 0.06648
## COLLENRP 0.01690 0.03686 0.05376
## BKGRTOABC 0.17084 0.37258 0.54343
## BAPTISTSP 0.02570 0.05605 0.08175
## BKGRTOMIX -0.02604 -0.05678 -0.08281
## ENTRECP 0.17146 0.37392 0.54537
SARMA (like SARAR, but more local error structure) y=ρWy+Xβ+u, u=I-λWε or y=ρWy+Xβ+u, u=λWε+ε (2 ways of writing the same thing) Can’t be done easily in R
Test Model Restrictions
Ho: Restricting coefficients = 0 (i.e., the restricting model is OK)
##
## Likelihood ratio for spatial linear models
##
## data:
## Likelihood ratio = 19, df = 6, p-value = 0.004
## sample estimates:
## Log likelihood of reg5 Log likelihood of reg4
## -684.6 -694.1
Since p-value = 0.004, we should reject the null hypothesis. SDM and SDEM are not nested.
##
## Likelihood ratio for spatial linear models
##
## data:
## Likelihood ratio = -19, df = 6, p-value = 0.004
## sample estimates:
## Log likelihood of reg4 Log likelihood of reg5
## -694.1 -684.6
##
## Likelihood ratio for spatial linear models
##
## data:
## Likelihood ratio = 14, df = 1, p-value = 0.0002
## sample estimates:
## Log likelihood of reg5 Log likelihood of reg2
## -684.6 -691.7
Since p-value = 0.0002, we should reject the null hypothesis. SLX and SDEM are not nested.
##
## Likelihood ratio for spatial linear models
##
## data:
## Likelihood ratio = 39, df = 7, p-value = 0.000002
## sample estimates:
## Log likelihood of reg5 Log likelihood of reg1
## -684.6 -704.2
Since p-value = 0.000002, we should reject the null hypothesis. OLS and SDEM are not nested.
##
## studentized Breusch-Pagan test
##
## data:
## BP = 20, df = 12, p-value = 0.07
We do not have evidence of Heteroskedasticity
If we want to get an idea of how accurately our spatial model “Fits” the data, we can via a Pseudo R^2 as follows
## [1] 0.2261
Anselin, Luc. (1988) Spatial Econometrics: Methods and Models. Kluwer Academic Publishers: Dordrecht, Germany.
Burkey, Mark L. A Short Course on Spatial Econometrics and GIS. REGION, 5(3), 2018, R13-R18. https://doi.org/10.18335/region.v5i3.254
Pace, R.K. and LeSage, J.P., 2008. A spatial Hausman test. Economics Letters, 101(3), pp.282-284.
Elhorst, J.P. (2014) Spatial Econometrics: From Cross-Sectional Data to Spatial Panels, Springer.
Lesage, James and R. Kelly Pace (2009) Introduction to Spatial Econometrics, CRC Press/Taylor & Francis Group.
https://ignaciomsarmiento.github.io/2017/02/07/An-Introduction-to-Spatial-Econometrics-in-R.html
https://github.com/rsbivand/ECS530_h19/blob/master/ECS530_VII.Rmd
https://www.r-bloggers.com/spatial-regression-in-r-part-1-spamm-vs-glmmtmb/
END