library(maptools) # read in shapefiles to an R session
library(sp)
library(spdep)
library(tmap)
library(RColorBrewer)
RRchi.poly <- maptools::readShapePoly('foreclosures.shp')
class(chi.poly)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
# check variables for individual census tracts
str(slot(chi.poly,"data"))
## 'data.frame': 897 obs. of 16 variables:
## $ SP_ID : Factor w/ 897 levels "1","10","100",..: 1 112 223 334 445 556 667 778 887 2 ...
## $ fips : Factor w/ 897 levels "17031010100",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ est_fcs : int 43 129 55 21 64 56 107 43 7 51 ...
## $ est_mtgs : int 904 2122 1151 574 1427 1241 1959 830 208 928 ...
## $ est_fcs_rt: num 4.76 6.08 4.78 3.66 4.48 4.51 5.46 5.18 3.37 5.5 ...
## $ res_addr : int 2530 3947 3204 2306 5485 2994 3701 1694 443 1552 ...
## $ est_90d_va: num 12.61 12.36 10.46 5.03 8.44 ...
## $ bls_unemp : num 8.16 8.16 8.16 8.16 8.16 8.16 8.16 8.16 8.16 8.16 ...
## $ county : Factor w/ 1 level "Cook County": 1 1 1 1 1 1 1 1 1 1 ...
## $ fips_num : num 1.7e+10 1.7e+10 1.7e+10 1.7e+10 1.7e+10 ...
## $ totpop : int 5391 10706 6649 5325 10944 7178 10799 5403 1089 3634 ...
## $ tothu : int 2557 3981 3281 2464 5843 3136 3875 1768 453 1555 ...
## $ huage : int 61 53 56 60 54 58 48 57 61 48 ...
## $ oomedval : int 169900 147000 119800 151500 143600 145900 153400 170500 215900 114700 ...
## $ property : num 646 914 478 509 641 612 678 332 147 351 ...
## $ violent : num 433 421 235 159 240 266 272 146 78 84 ...
## - attr(*, "data_types")= chr "C" "C" "N" "N" ...
Here are explanations on selected variables:
est_fcs: estimated count of foreclosure starts from Jan. 2007 through June 2008est_mtgs: estimated number of active mortgages from Jan. 2007 through June 2008est_fcs_rt: number of foreclosure starts divided by number of mortgages times 100bls_unemp: June 2008 place or county unemployment ratetotpop: total population from 2000 Censusviolent: number of violent crimes reported between Jan. 2007 through December 2008property: number of property crimes reported between Jan. 2007 through December 2008summary(chi.poly@data$violent)
plot(chi.poly)
tmap_mode("view")
tm_basemap("Stamen.TonerLite", alpha = 0.25) +
tm_shape(chi.poly) +
tm_fill(col = "violent", alpha = 0.75, style = "quantile", n = 10,
palette = "OrRd") +
tm_borders(col = "white", lwd = 0.5)
ROrdinary Least Square models do not take (possible) spatial dependence into account.
chi.ols<-lm(violent~est_fcs_rt+bls_unemp, data=chi.poly@data)
summary(chi.ols)
##
## Call:
## lm(formula = violent ~ est_fcs_rt + bls_unemp, data = chi.poly@data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -892.02 -77.02 -23.73 41.90 1238.22
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -18.627 45.366 -0.411 0.681
## est_fcs_rt 28.298 1.435 19.720 <2e-16 ***
## bls_unemp -0.308 5.770 -0.053 0.957
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 157.3 on 894 degrees of freedom
## Multiple R-squared: 0.3141, Adjusted R-squared: 0.3126
## F-statistic: 204.7 on 2 and 894 DF, p-value: < 2.2e-16
We have a few ways of allowing spatial dependence among observations:
1. Rook, or ploy2nb(..., queen=FALSE): two units are close to one another if they share a side
2. Queen, or poly2nb(..., queen=TRUE): two units are close if they share a side or an edge
3. Distance-based: first extract the centroids of polygons by coordinates and identify the neighbors (within a given distance threshold) for each polygon by dnearneigh
# create a list of neighbors for each polygon
list.queen <- poly2nb(chi.poly, queen=TRUE)
# next, supplement the neighbor list with spatial weights
W <- nb2listw(list.queen, style="W", zero.policy=TRUE) # "W" row-standardize weights
W
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 897
## Number of nonzero links: 6140
## Percentage nonzero weights: 0.7631036
## Average number of links: 6.845039
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 897 804609 897 274.4893 3640.864
plot(W,coordinates(chi.poly))
coords <- coordinates(chi.poly)
W_dist <- dnearneigh(coords,0,1,longlat = FALSE)
As for spatial dependence, we can choose one or the other modeling approach:
1. Spatial Autoregressive (SAR) Models
- Allows the values of neighbors to affect that of a given polygon (e.g., Home values)
2. Spatial Error Models (SEM)
- Allows the residuals of observations to be correlated (e.g., unobserved neighborhood characteristics affect the value of neighboring houses)
lm.morantest tests the presence/absence of correlation among residuals (i.e., spatial autocorrelation), but it doesn’t guide us to right alternatives.
moran.lm<-lm.morantest(chi.ols, W, alternative="two.sided")
print(moran.lm)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = violent ~ est_fcs_rt + bls_unemp, data =
## chi.poly@data)
## weights: W
##
## Moran I statistic standard deviate = 11.785, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I Expectation Variance
## 0.2142252370 -0.0020099108 0.0003366648
lm.LMtests tests against various alternative models, which helps determine more/less desirable alternatives
1. LMerr: simple LM test for error dependence
2. LMlag: simple LM test for a missing spatially lagged dependent variable
3. RLMerr: robust test for error dependence in the possible presence of a missing lagged dependent variable
4. RLMlag: robust test for a missing lagged dependent variable in the possible presence of error dependence
5. SARMA: a portmanteau test for seasonal autoregressive moving average (SARMA, in fact LMerr + RLMlag)
LM <- lm.LMtests(chi.ols, W, test="all")
print(LM)
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = violent ~ est_fcs_rt + bls_unemp, data =
## chi.poly@data)
## weights: W
##
## LMerr = 134.52, df = 1, p-value < 2.2e-16
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = violent ~ est_fcs_rt + bls_unemp, data =
## chi.poly@data)
## weights: W
##
## LMlag = 182.18, df = 1, p-value < 2.2e-16
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = violent ~ est_fcs_rt + bls_unemp, data =
## chi.poly@data)
## weights: W
##
## RLMerr = 0.00066762, df = 1, p-value = 0.9794
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = violent ~ est_fcs_rt + bls_unemp, data =
## chi.poly@data)
## weights: W
##
## RLMlag = 47.653, df = 1, p-value = 5.089e-12
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = violent ~ est_fcs_rt + bls_unemp, data =
## chi.poly@data)
## weights: W
##
## SARMA = 182.18, df = 2, p-value < 2.2e-16
# the first two inputs are the same as chi.ols (the model specificaiotn and the data)
# additionally, a spatial weight matrix enters
chi.sar <-lagsarlm(violent~est_fcs_rt+bls_unemp, data=chi.poly@data, W)
summary(chi.sar)
##
## Call:
## lagsarlm(formula = violent ~ est_fcs_rt + bls_unemp, data = chi.poly@data,
## listw = W)
##
## Residuals:
## Min 1Q Median 3Q Max
## -519.127 -65.003 -15.226 36.423 1184.193
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -93.7885 41.3162 -2.270 0.02321
## est_fcs_rt 15.6822 1.5600 10.053 < 2e-16
## bls_unemp 8.8949 5.2447 1.696 0.08989
##
## Rho: 0.49037, LR test value: 141.33, p-value: < 2.22e-16
## Asymptotic standard error: 0.039524
## z-value: 12.407, p-value: < 2.22e-16
## Wald statistic: 153.93, p-value: < 2.22e-16
##
## Log likelihood: -5738.047 for lag model
## ML residual variance (sigma squared): 20200, (sigma: 142.13)
## Number of observations: 897
## Number of parameters estimated: 5
## AIC: 11486, (AIC for lm: 11625)
## LM test for residual autocorrelation
## test value: 8.1464, p-value: 0.0043146
Check how the two models, OLS and SAR differ. First, save the residuals from the two models as separate variables. Next, plot these residuals on the map to see how much/less spatial dependence is present (the less, the better, because they are residuals).
chi.poly@data$chi.ols.res <- resid(chi.ols) #residuals ols
chi.poly@data$chi.sar.res <- resid(chi.sar) #residual sar
Below, we use sp::spplot to visually examine the distribution of residuals from two models.
sp::spplot(
chi.poly,
"chi.ols.res",
at = seq(min(chi.poly@data$chi.ols.res,na.rm=TRUE), max(chi.poly@data$chi.ols.res,na.rm=TRUE), length=12),
col.regions = rev(brewer.pal(11,"RdBu"))
)
sp::spplot(
chi.poly,
"chi.sar.res",
at = seq(min(chi.poly@data$chi.sar.res,na.rm=TRUE),max(chi.poly@data$chi.sar,na.rm=TRUE), length=12),
col.regions=rev(brewer.pal(11,"RdBu"))
)
chi.err <- errorsarlm(violent~est_fcs_rt+bls_unemp, data=chi.poly@data, W)
summary(chi.err)
##
## Call:
## errorsarlm(formula = violent ~ est_fcs_rt + bls_unemp, data = chi.poly@data,
## listw = W)
##
## Residuals:
## Min 1Q Median 3Q Max
## -650.506 -64.355 -22.646 35.461 1206.346
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.2624 43.0509 -0.0293 0.9766
## est_fcs_rt 19.4620 1.9450 10.0062 <2e-16
## bls_unemp 4.0380 5.5134 0.7324 0.4639
##
## Lambda: 0.52056, LR test value: 109.68, p-value: < 2.22e-16
## Asymptotic standard error: 0.042291
## z-value: 12.309, p-value: < 2.22e-16
## Wald statistic: 151.51, p-value: < 2.22e-16
##
## Log likelihood: -5753.875 for error model
## ML residual variance (sigma squared): 20796, (sigma: 144.21)
## Number of observations: 897
## Number of parameters estimated: 5
## AIC: 11518, (AIC for lm: 11625)