A non-exhaustive walk through is provided for those wishing to apply spatial regression analysis in their work. These techniques can help understand geographical patterns of crime or public safety problems. This may be of most interest for analysts working in or closely with Community Safety Partnerships, Violence Reduction Units or in strategic roles where there are opportunities to inform changes in policies and crime prevention plans.
For a comprehensive accessible guide on approaches and theory I would highly recommend Chainey, S. (2020). Understanding crime : analyzing the geography of crime. Redlands, California: Esri Press., Chapter 8: Applying spatial regression analysis to crime data. Further resources are included at the end of this guide.
For this walk through I created an aggregated data set using Census Output Area (COA) areal units covering the Teesside Urban Area. The significant majority of COAs contain between 110-139 households. The data set includes:
When creating a data set it’s a good idea to think about sourcing variables (or proxy variables) which are theoretically relevant and hypothesis driven - we could find many correlates with a large data set, but without theory might find it difficult to plausibly explain how the variable effects the outcome. Also consider whether or not the variable is something that could be modified in practice - the presence of heavy rain might explain lower burglary, but is not something that could be written into policy or a crime prevention plan.
Libraries used as below.
library(tidyverse)
library(sf)
library(sp)
library(tmap)
library(RColorBrewer)
library(GGally)
library(spdep)
library(spatialreg)
library(car)
Data import. This data set has been cleaned and checked for errors.
# import spatial data, geojson file
ts <- st_read("ts_burg_geojson.geojson")
# retrospectively added non-White data column
ts$bame <- ts$ea_mixed + ts$ea_black + ts$ea_asian
Data dictionary.
Let’s begin by generating understanding of the data by exploring the trends and relationships among the variables. I used resources from the book Geographical Data Science and Spatial Data Analysis, Lex Comber and Chris Brunsdon to assist in this EDA.
The cor.test() function is a simple way to test the relationship between two variables, providing the Pearson’s product moment correlation by default. A few examples are shown below observing the relationship between the Burglary Rate in 2019 and a) proportion of Asian households, b) proportion of student households and c) employment deprivation score.
The data outputs show the significance of the correlations (p-value), confidence intervals and the correlation coefficient estimates. These variables were all significantly correlated with burglary rates in 2019, with the effect size being greatest for employment deprivation.
# Burglary rate and % Asian Households
cor.test(~bdrate19 + ea_asian, data = ts)
##
## Pearson's product-moment correlation
##
## data: bdrate19 and ea_asian
## t = 9.9455, df = 1322, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2129952 0.3132621
## sample estimates:
## cor
## 0.2638412
# Burglary rate and % Student Households
cor.test(~bdrate19 + hh_student, data = ts)
##
## Pearson's product-moment correlation
##
## data: bdrate19 and hh_student
## t = 10.319, df = 1322, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2224276 0.3221644
## sample estimates:
## cor
## 0.2730295
# Burglary rate and Employment Deprivation
cor.test(~bdrate19 + imd_emp, data = ts)
##
## Pearson's product-moment correlation
##
## data: bdrate19 and imd_emp
## t = 14.592, df = 1322, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3250993 0.4179373
## sample estimates:
## cor
## 0.3724497
Scatter plots are a visual alternative to viewing linear relationships. The example output below shows that there is a positive correlation or linear relationship between burglary rates and employment deprivation.
# call to ggplot
ggplot(data = ts, aes(x = bdrate19, y = imd_emp)) +
# add scatter plot, set opacity aesthetics
geom_point(alpha = 0.1) +
# add trend line method and aesthetics
geom_smooth(method = "lm", colour = "#DE2D26") +
# add labels
labs(title = "Burglary Rate and Employment Deprivation: Teesside Output Areas 2019", x = "Burglary Rate", y = "Employment Deprivation") +
# choose theme
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
Boxplots are useful when you want to view continuous and categorical data relationships. Using classification data we can view the distribution of burglary rates at COA level by consumer vulnerability groups and output area classifications.
ts %>%
# call ggplot, order by output area classification
ggplot(aes(x = reorder(oac_gp, bdrate, FUN = median), y = bdrate, fill = "dodgerblue")) +
# add data to boxplot
geom_boxplot(aes(group = oac_gp), outlier.alpha = 0.4, outlier.colour = "grey25") +
# move x axis to vertical
coord_flip() + xlab("") + ylab("Burglary Rate") +
labs(title = "Burglary Rate by Output Area Classification") +
theme_minimal() +
# remove legend
theme(legend.position = "none")
# repeat above and change categorical variable to consumer vulnerability class
ts %>%
ggplot(aes(x = reorder(gd_cv, bdrate, FUN = median), y = bdrate, fill = "dodgerblue")) +
geom_boxplot(aes(group = gd_cv), outlier.alpha = 0.4, outlier.colour = "grey25") +
coord_flip() + xlab("") + ylab("Burglary Rate") +
labs(title = "Burglary Rate by Consumer Vulnerability Classification") +
theme_minimal() +
theme(legend.position = "none")
Seeing that the vulnerable communities group has higher rates of burglary, it might be useful to explore socio-economics and demographics to help identify potential hypotheses why?
The radar plot below provides an overview of a selection of Census 2011 and IMD 2019 variables. We can see that the vulnerable communities group has much lower levels of owner occupied housing and educational attainment levels. There are also higher levels of income deprivation and employment deprivation. We might infer from this that communities with lower incomes and higher unemployment on average risk exhibiting higher rates of burglary. This might mean targeted interventions to secure households (assisted target hardening of properties) and reduce the skills/unemployment gap.
# create tibble from the spatial data ts
ts_tbl <- as_tibble(ts)
# create dataset for radar plot, select numeric variables to include
radar <- ts_tbl %>% select(19:24, 27, 34:37, 44, 49) %>%
# standardise the variables using scale
mutate_if(is.numeric, scale) %>%
# aggregate COA data to consumer vulnerability group
aggregate(by = list(ts$gd_cv), FUN = mean) %>%
# pivot
pivot_longer(-Group.1) %>%
# arrange by variable name for plotting
arrange(name) %>%
filter(name != "gd_cv")
ggplot(data = radar, aes(x=factor(name), y=value, group = Group.1, colour = Group.1, fill = Group.1)) +
geom_point(size = 2) +
geom_polygon(size = 1, alpha = 0.2) +
scale_color_manual(values = brewer.pal(6, "Set2")) +
scale_fill_manual(values = brewer.pal(8, "Set2")) +
facet_wrap(~Group.1, ncol = 3) +
coord_polar() +
theme_light() +
theme(legend.position = "none", axis.title = element_blank(), axis.text = element_text(size = 6))
When working with problems that have many variables supported by theoretical frameworks, viewing them in one plot can provide a helpful overview.
In the case of residential burglary there are many variables to consider including:
Unfortunately only limited information is available publicly for this walk through.
The output below plots a selection of variables and their relationship to burglary rates. The results are split in two groups of COAs - higher or lower than the median rate. We can view the correlation coefficients, density plots, scatter plots, histograms and boxplots.
# select variables for pairing in correlation matrix
ts_tbl %>% .[c(12, 19:21,23:24,35, 37, 41, 49)] %>%
# add vector to indicate burglary rates around the median as Higher or Lower
mutate(bdrate_high = ifelse(bdrate19 > median(bdrate19), "H", "L")) %>%
# call ggpairs and set aesthetics for correlation coefficients and plots
ggpairs(aes(colour = bdrate_high, alpha = 0.4),
upper = list(continuous = wrap('cor', size = 2.5)),
lower = list(continuous = wrap('points', alpha = 0.7, size = 0.1))) +
theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())
Viewing the variables using maps can be a heuristic approach to visualizing and making hypotheses about spatial relationships.
The plot below shows the rates in burglary across Teesside COAs, this can be viewed as a static plot or as an interactive map. Somewhat limited, we can see that most geographical areas have lower rates of burglary and that areas with similar ranged values often cluster together geographically.
# set to plot mode
tmap_mode("plot")
# add polygon/shape file with data
tm_shape(ts) +
# set variable attribute, title and style
tm_polygons("bdrate19", title = "Burglary Dwelling Per 1,000 2019", palette = "YlOrRd", style = "jenks", legend.hist = T) +
# customise layout
tm_layout(title = "Teesside Burglary by COA",
frame = F, legend.outside = T,
legend.hist.width = 1,
legend.format = list(digits = 1),
legend.outside.position = c("left", "top"),
legend.text.size = 0.6,
legend.title.size = 1) +
tm_scale_bar(position = c("right", "bottom")) +
tm_borders(col = "grey", lwd= 1)
Viewing and arranging spatial plots side-by-side becomes more useful. From this we can begin to see that areas with high rates of student households and Asian households appear to occupy the same COAs with high burglary rates (around the River Tees in Stockton and Middlesbrough central areas), but would be unlikely to explain high rates in Redcar and pockets in north of Stockton or south of Middlesbrough where there are few student or Asian households.
tmap_mode("plot")
map1 <- tm_shape(ts) +
tm_fill("bdrate19", palette = "Reds", style = "jenks", n = 5, title = "BD Rate '19") +
tm_layout(legend.position = c("right", "bottom"), frame = F)
map2 <- tm_shape(ts) +
tm_fill("hh_student", palette = "YlGn", style = "jenks", n = 5, title = "% Student HH") +
tm_layout(legend.position = c("right", "bottom"), frame = F)
map3 <- tm_shape(ts) +
tm_fill("ea_asian", palette = "YlOrRd", style = "jenks", n = 5, title = "% Asian HH") +
tm_layout(legend.position = c("right", "bottom"), frame = F)
tmap_arrange(map2, map3, map1, nrow = 1)
Other pocket areas of high burglary rates would appear to co-locate with areas of high % social housing and high % of residents with no qualifications, in some instances but not all (i.e., Hemlington, to the south centre left part of Teesside).
tmap_mode("plot")
map1 <- tm_shape(ts) +
tm_fill("bdrate19", palette = "Reds", style = "jenks", n = 5, title = "BD Rate '19") +
tm_layout(legend.position = c("right", "bottom"), frame = F)
map4 <- tm_shape(ts) +
tm_fill("ten_soc", palette = "YlGn", style = "jenks", n = 5, title = "% Social HH") +
tm_layout(legend.position = c("right", "bottom"), frame = F)
map5 <- tm_shape(ts) +
tm_fill("edu_noqual", palette = "YlOrRd", style = "jenks", n = 5, title = "% No Quals") +
tm_layout(legend.position = c("right", "bottom"), frame = F)
tmap_arrange(map4, map5, map1, nrow = 1)
Lets now attempt to create a regression to identify the relationship between multiple explanatory variables and a dependent variable (burglary rate).
Three OLS models have been produced and one Poisson GLM.
Model 1 uses Employment Deprivation, % Student Households, % Non-White, % Private Rented, % Owner Occupied and Living Environment score.
Model 2 excludes the non-significant variable in Model 1 (% Student Households).
Model 3 uses Model 1 variables but transforms the burglary rate to a logarithmic value.
Model 4 uses Model 1 variables but changes dependent variable to burg19 count and uses a Poisson regression
The vif() function, Variation Inflation Factor, is used to check for multi-collinearity (where two or more explanatory variables are correlated). If explanatory variables have a vif > 7.5, it suggests the variable is redundant in the model and should be removed.
# run model 1
model1 <- lm(bdrate19 ~ imd_emp + hh_student + bame + edu_noqual + ten_priv + ten_own + imd_livenv, data = ts)
# summary of model outputs
summary(model1)
##
## Call:
## lm(formula = bdrate19 ~ imd_emp + hh_student + bame + edu_noqual +
## ten_priv + ten_own + imd_livenv, data = ts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.993 -4.324 -1.463 2.785 63.486
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.50666 2.00720 2.245 0.024917 *
## imd_emp 9.99762 4.12453 2.424 0.015487 *
## hh_student 0.09956 0.09066 1.098 0.272343
## bame 0.13315 0.03449 3.861 0.000118 ***
## edu_noqual 0.05264 0.03169 1.661 0.096974 .
## ten_priv 0.05717 0.03100 1.844 0.065366 .
## ten_own -0.05631 0.01670 -3.372 0.000767 ***
## imd_livenv 0.17317 0.03573 4.847 1.4e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.237 on 1316 degrees of freedom
## Multiple R-squared: 0.2675, Adjusted R-squared: 0.2636
## F-statistic: 68.66 on 7 and 1316 DF, p-value: < 2.2e-16
# variation inflation factor, to check for multi-collinearity
vif(model1)
## imd_emp hh_student bame edu_noqual ten_priv ten_own imd_livenv
## 3.351667 1.794739 1.787851 3.176990 3.273476 3.938303 2.190363
# run model 2
model2 <- lm(bdrate19 ~ imd_emp + bame + edu_noqual + ten_priv + ten_own, data = ts)
summary(model2)
##
## Call:
## lm(formula = bdrate19 ~ imd_emp + bame + edu_noqual + ten_priv +
## ten_own, data = ts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.207 -4.582 -1.468 2.843 63.372
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.57289 1.95802 1.314 0.18907
## imd_emp 12.69982 4.05137 3.135 0.00176 **
## bame 0.17206 0.03275 5.253 1.74e-07 ***
## edu_noqual 0.07411 0.03155 2.349 0.01897 *
## ten_priv 0.14937 0.02499 5.976 2.94e-09 ***
## ten_own -0.04189 0.01615 -2.594 0.00959 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.31 on 1318 degrees of freedom
## Multiple R-squared: 0.2534, Adjusted R-squared: 0.2506
## F-statistic: 89.47 on 5 and 1318 DF, p-value: < 2.2e-16
vif(model2)
## imd_emp bame edu_noqual ten_priv ten_own
## 3.177531 1.584625 3.093580 2.090806 3.619233
# run model 3
model3 <- lm(log1p(bdrate19) ~ imd_emp + hh_student + bame + edu_noqual + ten_priv + ten_own, data = ts)
summary(model3)
##
## Call:
## lm(formula = log1p(bdrate19) ~ imd_emp + hh_student + bame +
## edu_noqual + ten_priv + ten_own, data = ts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2645 -0.9013 0.1585 0.7998 2.5098
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.638376 0.244437 2.612 0.009114 **
## imd_emp 1.742004 0.508046 3.429 0.000625 ***
## hh_student 0.001362 0.011327 0.120 0.904294
## bame 0.013961 0.004268 3.271 0.001099 **
## edu_noqual 0.013683 0.003918 3.492 0.000495 ***
## ten_priv 0.018048 0.003269 5.521 4.05e-08 ***
## ten_own -0.002433 0.002035 -1.195 0.232172
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.031 on 1317 degrees of freedom
## Multiple R-squared: 0.2175, Adjusted R-squared: 0.2139
## F-statistic: 61 on 6 and 1317 DF, p-value: < 2.2e-16
vif(model3)
## imd_emp hh_student bame edu_noqual ten_priv ten_own
## 3.247942 1.789473 1.748927 3.101661 2.324356 3.736125
# run model 4 as Poisson
model4 <- glm(burg19 ~ imd_emp + hh_student + bame + edu_noqual + ten_priv + ten_own + imd_livenv, family = "poisson", data = ts)
summary(model4)
##
## Call:
## glm(formula = burg19 ~ imd_emp + hh_student + bame + edu_noqual +
## ten_priv + ten_own + imd_livenv, family = "poisson", data = ts)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.4654 -1.4321 -0.3452 0.6468 5.9627
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.578046 0.159372 3.627 0.000287 ***
## imd_emp 1.163623 0.308077 3.777 0.000159 ***
## hh_student 0.006258 0.004473 1.399 0.161809
## bame 0.017215 0.001832 9.398 < 2e-16 ***
## edu_noqual 0.003530 0.002561 1.378 0.168128
## ten_priv 0.002715 0.002169 1.252 0.210585
## ten_own -0.009059 0.001323 -6.846 7.58e-12 ***
## imd_livenv 0.017323 0.002306 7.513 5.77e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 4075.8 on 1323 degrees of freedom
## Residual deviance: 2847.2 on 1316 degrees of freedom
## AIC: 5365
##
## Number of Fisher Scoring iterations: 5
vif(model4)
## imd_emp hh_student bame edu_noqual ten_priv ten_own imd_livenv
## 2.856732 2.107486 1.940106 3.032330 5.084854 3.629715 2.760446
The results from the three linear models are weak. The output below shows the results for the three models side-by-side. Model 1 (27%) explains more of the variation in burglary rates than model 2 and 3. I’m ignoring model 4 for this walk through, this is included to show the call for producing a Poisson regression which may be used when data contains many units with zero counts creating a highly skewed data distribution.
texreg::screenreg(list(model1, model2, model3, model4))
##
## ===================================================================
## Model 1 Model 2 Model 3 Model 4
## -------------------------------------------------------------------
## (Intercept) 4.51 * 2.57 0.64 ** 0.58 ***
## (2.01) (1.96) (0.24) (0.16)
## imd_emp 10.00 * 12.70 ** 1.74 *** 1.16 ***
## (4.12) (4.05) (0.51) (0.31)
## hh_student 0.10 0.00 0.01
## (0.09) (0.01) (0.00)
## bame 0.13 *** 0.17 *** 0.01 ** 0.02 ***
## (0.03) (0.03) (0.00) (0.00)
## edu_noqual 0.05 0.07 * 0.01 *** 0.00
## (0.03) (0.03) (0.00) (0.00)
## ten_priv 0.06 0.15 *** 0.02 *** 0.00
## (0.03) (0.02) (0.00) (0.00)
## ten_own -0.06 *** -0.04 ** -0.00 -0.01 ***
## (0.02) (0.02) (0.00) (0.00)
## imd_livenv 0.17 *** 0.02 ***
## (0.04) (0.00)
## -------------------------------------------------------------------
## R^2 0.27 0.25 0.22
## Adj. R^2 0.26 0.25 0.21
## Num. obs. 1324 1324 1324 1324
## AIC 5364.99
## BIC 5406.50
## Log Likelihood -2674.50
## Deviance 2847.22
## ===================================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05
# other statistics to consider
# AIC measure, the lower the AIC the better the model
AIC(model1)
## [1] 9351.008
AIC(model2)
## [1] 9372.269
# log likelihood measuring goodness of the fit, always negative, higher the ll (closer to zero) better the model
logLik(model1)
## 'log Lik.' -4666.504 (df=9)
logLik(model2)
## 'log Lik.' -4679.135 (df=7)
Mapping the residuals can help identify areas, and potentially other variables, that could help improve where the model is under performing. These are shown in the plot below.
# add residuals to ts data
ts$resids <- model1$residuals
# add standardised residuals
ts$s.resids <- as.vector(scale(ts$resids))
# add poisson resids
ts$poisresides <- as.vector(scale(model4$residuals))
# map resids
tmap_mode("view")
tm_shape(ts) +
tm_fill("s.resids", palette = "seq", breaks = c(-Inf, -2, 0, 2, Inf), title = "Std. Resids", alpha = 0.6) +
tm_layout(aes.palette = list(seq = "-RdBu")) +
tm_shape(ts) +
tm_borders(col = "black",lwd = 1)
A problem here is that we are using a proximate residential burglary data set. It is possible that the COAs with high/low standardised residuals are those where all burglary points falling within Retail Centre Areas were discarded.
The age of the Census data also presents problems. Some COAs have undergone extensive changes in number of households (urban clearance, urban development) which could have altered the socio-economic and population structures since 2011. Examples are shown below for two areas of Teesside.
Urban Clearance of older properties in South Bank
New Urban Development in Ingleby Barwick
The data set is a shape file. We can use this to create spatial weights and neighbour relationships between COAs. A full guide on Spatial regression analysis in R, Carlos Mendez is available here.
# create queen contiguity
queen.nb <- poly2nb(ts, row.names = ts$oa11cd)
# create rook contiguity
rook.nb <- poly2nb(ts, row.names = ts$oa11cd, queen = FALSE)
# summarise relationships
summary(queen.nb)
## Neighbour list object:
## Number of regions: 1324
## Number of nonzero links: 7736
## Percentage nonzero weights: 0.4413067
## Average number of links: 5.8429
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 22 31
## 1 23 93 215 299 276 206 98 60 22 13 7 3 3 2 1 1 1
## 1 least connected region:
## 1236 with 1 link
## 1 most connected region:
## 430 with 31 links
# test symmetry
is.symmetric.nb(queen.nb)
## [1] TRUE
# create list
queen.listw <- nb2listw(queen.nb)
Check for spatial dependence using the Moran’s I test. A Moran I statistic tells us if we have clustered values (closer to 1) or dispersed values (closer to -1), or no pattern (0). When the p-value is significant and the standard deviate is positive we can summarise that the spatial distribution is more spatially clustered than would be expected if spatial processes were random.
There is spatial clustering in burglary rates by Teesside COAs.
moranTest <- moran.test(ts$bdrate19, nb2listw(queen.nb))
moranTest
##
## Moran I test under randomisation
##
## data: ts$bdrate19
## weights: nb2listw(queen.nb)
##
## Moran I statistic standard deviate = 19.71, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.3236350605 -0.0007558579 0.0002708625
To identify which type of model to select we can examine model 1 using Local Moran tests.
The Moran Test for Model 1 indicates some clustering, with a positive SD and significant p value.
The Lagrange Multiplier results suggest proceeding with a Spatial Lag model. A flow-chart for interpreting the results and selecting a model is provided here, see Figure 2.
# test the spatial dependency of the model
lmMoranTest <- lm.morantest(model1, queen.listw)
lmMoranTest
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = bdrate19 ~ imd_emp + hh_student + bame + edu_noqual
## + ten_priv + ten_own + imd_livenv, data = ts)
## weights: queen.listw
##
## Moran I statistic standard deviate = 6.2654, p-value = 1.86e-10
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.0996969173 -0.0032438816 0.0002699493
# calculate lagrange multiplier diagnostics
lmLMtests <- lm.LMtests(model1, queen.listw, test = c("LMerr", "LMlag", "RLMerr", "RLMlag", "SARMA"))
lmLMtests
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = bdrate19 ~ imd_emp + hh_student + bame + edu_noqual
## + ten_priv + ten_own + imd_livenv, data = ts)
## weights: queen.listw
##
## LMerr = 36.259, df = 1, p-value = 1.727e-09
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = bdrate19 ~ imd_emp + hh_student + bame + edu_noqual
## + ten_priv + ten_own + imd_livenv, data = ts)
## weights: queen.listw
##
## LMlag = 47.03, df = 1, p-value = 6.991e-12
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = bdrate19 ~ imd_emp + hh_student + bame + edu_noqual
## + ten_priv + ten_own + imd_livenv, data = ts)
## weights: queen.listw
##
## RLMerr = 3.599, df = 1, p-value = 0.05781
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = bdrate19 ~ imd_emp + hh_student + bame + edu_noqual
## + ten_priv + ten_own + imd_livenv, data = ts)
## weights: queen.listw
##
## RLMlag = 14.37, df = 1, p-value = 0.0001502
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = bdrate19 ~ imd_emp + hh_student + bame + edu_noqual
## + ten_priv + ten_own + imd_livenv, data = ts)
## weights: queen.listw
##
## SARMA = 50.629, df = 2, p-value = 1.014e-11
The call for a Spatial Lag model using the spatial weights list is provided below with a summary. The Spatial Lag model R square is only marginally better than the OLS model 1 (0.2827 versus 0.2675). It is likely if we were to include information on past burglary rates that this would improve significantly as crime hot spots tend not to change quickly over time.
# spatial lag model include spatial weights
model5 <- lmSLX(bdrate19 ~ imd_emp + hh_student + bame + edu_noqual + ten_priv + ten_own + imd_livenv, data = ts, queen.listw)
summary(model5)
##
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))),
## data = as.data.frame(x), weights = weights)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.007 -4.341 -1.497 2.740 62.206
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0685065 3.6923719 1.102 0.27072
## imd_emp -2.8426275 5.7596099 -0.494 0.62171
## hh_student -0.1643035 0.1129870 -1.454 0.14614
## bame 0.0263758 0.0541157 0.487 0.62606
## edu_noqual 0.0177939 0.0370836 0.480 0.63143
## ten_priv 0.0670293 0.0347207 1.931 0.05376 .
## ten_own -0.0488920 0.0183500 -2.664 0.00781 **
## imd_livenv 0.1269417 0.0702918 1.806 0.07116 .
## lag.imd_emp 25.0742173 9.2445558 2.712 0.00677 **
## lag.hh_student 0.5358595 0.1729742 3.098 0.00199 **
## lag.bame 0.1078121 0.0733639 1.470 0.14192
## lag.edu_noqual -0.0173323 0.0617539 -0.281 0.77901
## lag.ten_priv -0.0643768 0.0646851 -0.995 0.31981
## lag.ten_own -0.0003472 0.0345250 -0.010 0.99198
## lag.imd_livenv 0.0421586 0.0910299 0.463 0.64335
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.173 on 1309 degrees of freedom
## Multiple R-squared: 0.2827, Adjusted R-squared: 0.275
## F-statistic: 36.84 on 14 and 1309 DF, p-value: < 2.2e-16
Appreciated this walk through is a whistle stop tour! Intended audience are those working as crime, community safety or violence reduction unit analysts and who are interested in developing skills in spatial analysis, statistics and working with R.
Other more advanced techniques that you may wish to consider when working with spatial data are GWR (Geographic Weighted Regression), Machine Learning (i.e., Random Forests, Gradient Boosting Machines, k-nearest neighbours) and Risk Terrain Modeling.
Amazing resources below.