Forecasting of public safety problems using time series analysis is an area that receives relatively limited attention.
In UK policing, longer term demand forecasts are found in a Force Management Statement. Among it’s aims, this document is expected to explain the demand a police force expects to face in the next four years. Forecasts of demand may be used to inform strategic decisions regarding organisational resources.
Examples of linear regression used in the Metropolitan Police FMS can be seen here.
Quite often, forecasts are used to assist in decision making and then forgotten about. Until Covid-19. Now we are using forecasts to try and determine where levels of crime would usually have been and quantifying the extent of the impact lockdown has had on external demands for service. This may prompt a move towards more advanced forecasting techniques used in FMS but also in assessing crime prevention efforts in the future - moving away from the simplistic binary comparison and percentage change that are traditionally relied upon in public safety performance analysis.
A couple of recent papers caught my interest prompting me to produce this short forecasting walk-through script.
Libraries used as below.
library(tidyverse)
library(janitor)
library(tsibble)
library(fable)
library(forecast)
library(astsa)
I’ve chosen to use the Metropolitan Police ward level data which covers the period 2010 to 2018. This is available at the Greater London Authority website.
# read in data
gla_mps_ward <- read.csv("https://data.london.gov.uk/download/recorded_crime_summary/b03b8f4a-075f-4666-9c1d-8d9b0bfe3e63/MPS_Ward_Level_Crime_Historic_NewWard.csv")
# save data file
save(gla_mps_ward, file = "data_gla_mps_ward.RData")
# clean column headers
gla_mps_ward <- clean_names(gla_mps_ward)
# pivot data
gla_mps_ward <- pivot_longer(gla_mps_ward, -c(ward_code , ward_name , borough , major_category, minor_category))
# add date
gla_mps_ward$day = 1
gla_mps_ward$month <- substr(gla_mps_ward$name, 6,7)
gla_mps_ward$year <- substr(gla_mps_ward$name, 2,5)
gla_mps_ward$ym <- (paste(gla_mps_ward$year, gla_mps_ward$month, sep = "-"))
gla_mps_ward <- gla_mps_ward %>% mutate(yrmn = yearmonth(ym))
gla_mps_ward$date <- as.Date(paste(gla_mps_ward$year, gla_mps_ward$month, gla_mps_ward$day, sep = "-"))
# data sets
# aggregated borough crime data
borough_crime <- gla_mps_ward %>%
group_by(borough, major_category, date, yrmn) %>%
filter(date <= "2018-12-01") %>%
summarise(total = sum(value))
head(borough_crime)
## # A tibble: 6 x 5
## # Groups: borough, major_category, date [6]
## borough major_category date yrmn total
## <chr> <chr> <date> <mth> <int>
## 1 Barking and Dagenham Burglary 2010-04-01 2010 Apr 167
## 2 Barking and Dagenham Burglary 2010-05-01 2010 May 160
## 3 Barking and Dagenham Burglary 2010-06-01 2010 Jun 182
## 4 Barking and Dagenham Burglary 2010-07-01 2010 Jul 183
## 5 Barking and Dagenham Burglary 2010-08-01 2010 Aug 199
## 6 Barking and Dagenham Burglary 2010-09-01 2010 Sep 187
# aggregated met police wide crime data
mps_wide_crime <- gla_mps_ward %>%
group_by(minor_category, date, yrmn) %>%
filter(date <= "2018-12-01") %>%
summarise(total = sum(value))
head(mps_wide_crime)
## # A tibble: 6 x 4
## # Groups: minor_category, date [6]
## minor_category date yrmn total
## <chr> <date> <mth> <int>
## 1 Assault with Injury 2010-04-01 2010 Apr 4810
## 2 Assault with Injury 2010-05-01 2010 May 5370
## 3 Assault with Injury 2010-06-01 2010 Jun 5421
## 4 Assault with Injury 2010-07-01 2010 Jul 5391
## 5 Assault with Injury 2010-08-01 2010 Aug 4735
## 6 Assault with Injury 2010-09-01 2010 Sep 4656
Quite often we might just look at a line chart showing the data trend.
The code below shows the trend in major categories of crime occurring in the London Borough of Enfield from 2010 to 2018. It’s a bit messy presented in this way, we can only really see that most crime is Theft and Handling and that Violence Against The Person increased from January 2014 onward.
# Example, Enfield all major category
borough_crime %>%
filter(borough=="Enfield") %>%
ggplot(aes(yrmn, total, colour = major_category)) +
geom_line()
Another way to look at this data is to use a facet wrap to view each major category individually. We are limited to describing what we see that has happened.
# Example, Enfield crime types on their own
borough_crime %>%
filter(borough=="Enfield") %>%
ggplot(aes(yrmn, total)) +
geom_line() +
facet_wrap(~ major_category, scales = "free_y")
This next section looks at ways of deconstructing trends and applying forecasts using ETS (Error Trend and Seasonality, or exponential smoothing)
Time series can have a variety of patterns which can be observed when splitting them up into different parts. Time series decomposition is a way of doing this. Using London Borough of Enfield Burglary data we can see the trend (falling between 2012-2016, and rising thereafter) and seasonal component (peaks and troughs each year), and a random noise component (everything else in the time series).
# prepare Enfield burglary data
ye_burg_past <-
borough_crime %>%
filter(borough == "Enfield" & major_category == "Burglary") %>%
as_tsibble(index = yrmn)
head(ye_burg_past)
## # A tsibble: 6 x 5 [1M]
## # Groups: borough, major_category, date [6]
## borough major_category date yrmn total
## <chr> <chr> <date> <mth> <int>
## 1 Enfield Burglary 2010-04-01 2010 Apr 252
## 2 Enfield Burglary 2010-05-01 2010 May 283
## 3 Enfield Burglary 2010-06-01 2010 Jun 256
## 4 Enfield Burglary 2010-07-01 2010 Jul 252
## 5 Enfield Burglary 2010-08-01 2010 Aug 205
## 6 Enfield Burglary 2010-09-01 2010 Sep 220
# create time series object of ye burglary
ye_burg_ts <- ts(ye_burg_past$total, frequency = 12, start = c(2010, 4))
# decompose time series
ye_burg_dc <- decompose(ye_burg_ts)
# plot decomposition
plot(ye_burg_dc)
ETS are used for when there is a trend and/or seasonality in the data, which for Burglary in Enfield there is.
We use the ETS model to forecast the next three years.
# Forecast future Enfield burglary using the ye_burg_past object
ye_burg_future <- ye_burg_past %>%
model(ETS(total)) %>%
forecast(h = "3 years")
# view data
head(ye_burg_future)
## # A fable: 6 x 4 [1M]
## # Key: .model [1]
## .model yrmn total .mean
## <chr> <mth> <dist> <dbl>
## 1 ETS(total) 2019 Jan N(364, 1135) 364.
## 2 ETS(total) 2019 Feb N(355, 1303) 355.
## 3 ETS(total) 2019 Mar N(345, 1471) 345.
## 4 ETS(total) 2019 Apr N(309, 1640) 309.
## 5 ETS(total) 2019 May N(313, 1808) 313.
## 6 ETS(total) 2019 Jun N(291, 1977) 291.
# plot data
autoplot(ye_burg_future, ye_burg_past) +
ggtitle("Enfield Burglary Forecast")
# export data, use code below if you want to view the forecast outside of R
# write.csv(ye_burg_future, "ye_burg_future.csv")
# MS Excel has its own [ETS.FORECAST function](https://exceljet.net/excel-functions/excel-forecast.ets-function)
The underlying statistics can be used to determine how good the forecast is.
When this is assigned to an object, further statistics and diagnostics of the forecast can be viewed
This will provide details from the Ljung Box Test (which the burglary forecast fails - we are looking for a high Q statistic and a p value which is not significant) and charts. The residuals chart shouldn’t show clusters of volatility, the ACF shouldn’t show significant correlation between the residuals, and the count histogram ideally is symmetrical (a bell curve).
To learn more about how to interpret, test and make conclusions about how sound the model is, there are references included at the end. Also see:
# find the best model using ets
ets(ye_burg_past$total)
## ETS(A,N,N)
##
## Call:
## ets(y = ye_burg_past$total)
##
## Smoothing parameters:
## alpha = 0.7995
##
## Initial states:
## l = 257.0014
##
## sigma: 39.4767
##
## AIC AICc BIC
## 1264.546 1264.784 1272.508
# fit ETS model to the enfield burglary past data
fit.ets <- ets(ye_burg_past$total)
# check the residuals
checkresiduals(fit.ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 17.566, df = 8, p-value = 0.02473
##
## Model df: 2. Total lags used: 10
If we wanted to create forecasts for more than one problem simultaneously, we can do that by assigning a key when creating our tsibble object. When we display the forecasts we can facet by the key (category of crime in this instance). Note that most Fraud and Forgery offence recording moved away from police forces.
# create tsibble object for all crime types
ye_past <-
borough_crime %>%
filter(borough == "Enfield") %>%
as_tsibble(index = yrmn, key = major_category)
# forecast future
ye_future <-
ye_past %>%
model(ETS(total)) %>%
forecast(h = "3 years")
# plot with facet wrap
autoplot(ye_future, ye_past) +
facet_wrap(~ major_category, scales = "free_y") +
ggtitle("Enfield Crime Forecasts")
Keeping with Burglary, this time for the whole of the Metropolitan Police, we will look at another method called ARIMA (Autoregressive Integrated Moving Average). The code for the data set is shown below.
# filter the mps_wide_crime to burglary in a dwelling
mps_burg <- mps_wide_crime %>%
filter(minor_category == "Burglary In A Dwelling") %>%
group_by(date) %>%
summarize(value = as.numeric(sum(total))) %>%
as_tsibble()
# create an extensible time series object
mps_burgxts <- xts::xts(order.by = mps_burg$date, mps_burg$value)
If we plot the data we can see that there is a seasonal pattern in London burglary and both a downward and upward trend.
plot(mps_burgxts)
Differencing is sometimes applied to make a time series stationary by removing trends and/or seasonality (detrending).
You can learn more about this and why you may need to do this in the link below,
The code below as an example shows the Met Police burglary trend when the time series is differenced, and when the time series is differenced taking account of seasonal cycles (adding lag=12).
# difference time series
diff1 <- diff(log(mps_burgxts))
# seasonal time series differencing
diff2 <- (diff(log(mps_burgxts), lag =12))
#plot
par(mfrow =c(1,2))
autoplot(diff1)
autoplot(diff2)
The acf2 function is used to observe Autocorrelation and Partial Autocorrelation. These show whether the elements of a time series (or differenced time series) are correlated positively, negatively or independent of one another. The x-axis show lags between those elements.
These can be used to decide on the type of model needed for an ARIMA forecast.
To learn more about how to interpret acf plots and how they relate to ARIMA models see the link below:
# acf and pacf plot of burglary data
acf2(mps_burgxts)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.84 0.69 0.49 0.33 0.20 0.16 0.18 0.25 0.37 0.54 0.62 0.65 0.54
## PACF 0.84 -0.06 -0.26 0.00 0.04 0.15 0.16 0.16 0.21 0.32 -0.03 0.00 -0.35
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## ACF 0.37 0.17 0.04 -0.07 -0.13 -0.10 -0.04 0.05
## PACF -0.22 -0.15 0.11 -0.03 -0.15 0.04 -0.12 -0.07
# acf and pacf plot of differenced burglary data
acf2(diff(mps_burgxts))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF -0.06 0.21 -0.09 -0.13 -0.25 -0.21 -0.20 -0.15 -0.13 0.23 0.12 0.53
## PACF -0.06 0.21 -0.07 -0.19 -0.25 -0.20 -0.19 -0.22 -0.30 0.04 0.01 0.40
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## ACF 0.13 0.16 -0.20 -0.08 -0.16 -0.27 -0.11 -0.12 -0.10
## PACF 0.21 0.08 -0.21 -0.09 0.12 -0.03 0.08 0.05 -0.04
ARIMA models are classified using the values of p,d,q - p is the number of autoregressive terms, d is the number of non-seasonal differences and q is the number of lagged forecast errors. SARIMA (seasonal models) are classified using p,d,q,P,D,Q and the number of seasonal periods.
The code below shows examples of how we would create an AR, MA, ARIMA and SARIMA model using the burglary data.
Each model outputs residuals with four charts. What we are looking to see in these plots to determine if our forecast model is sound are:
We can see these are problems in the AR1, AR2, MA1 and ARIMA models using the Met Police burglary data. The SARIMA model looks to be the best.
# AR1, this would be used if the ACF tails off and PACF cuts off at lag 1
m1 <- sarima(mps_burgxts, p = 1, d = 0, q = 0)
## initial value 6.721561
## iter 2 value 6.085844
## iter 3 value 6.085662
## iter 4 value 6.085614
## iter 5 value 6.085551
## iter 6 value 6.085463
## iter 7 value 6.085443
## iter 8 value 6.085438
## iter 9 value 6.085433
## iter 10 value 6.085425
## iter 11 value 6.085420
## iter 12 value 6.085420
## iter 13 value 6.085420
## iter 13 value 6.085420
## iter 13 value 6.085419
## final value 6.085419
## converged
## initial value 6.087368
## iter 2 value 6.087314
## iter 3 value 6.087234
## iter 4 value 6.087228
## iter 5 value 6.087208
## iter 6 value 6.087207
## iter 6 value 6.087207
## final value 6.087207
## converged
# AR2, similar to above but if the PACF cuts off at lag 2, and we can just use the numbers to shorten our code
m2 <- sarima(mps_burgxts, 2,0,0)
## initial value 6.726277
## iter 2 value 6.572200
## iter 3 value 6.166248
## iter 4 value 6.121905
## iter 5 value 6.090672
## iter 6 value 6.089940
## iter 7 value 6.089918
## iter 8 value 6.089861
## iter 9 value 6.089855
## iter 10 value 6.089846
## iter 11 value 6.089825
## iter 12 value 6.089818
## iter 13 value 6.089811
## iter 14 value 6.089811
## iter 14 value 6.089811
## iter 14 value 6.089811
## final value 6.089811
## converged
## initial value 6.086941
## iter 2 value 6.086887
## iter 3 value 6.086816
## iter 4 value 6.086812
## iter 5 value 6.086795
## iter 6 value 6.086794
## iter 6 value 6.086794
## iter 6 value 6.086794
## final value 6.086794
## converged
# MA1, this would be used if the ACF cuts off at lag 1 and the PACF tails off
m3 <- sarima(mps_burgxts, 0,0,1)
## initial value 6.717088
## iter 2 value 6.571922
## iter 3 value 6.558565
## iter 4 value 6.483067
## iter 5 value 6.436828
## iter 6 value 6.415934
## iter 7 value 6.402668
## iter 8 value 6.398273
## iter 9 value 6.397435
## iter 10 value 6.397402
## iter 11 value 6.397402
## iter 11 value 6.397402
## final value 6.397402
## converged
## initial value 6.399357
## iter 2 value 6.399352
## iter 3 value 6.399350
## iter 3 value 6.399350
## iter 3 value 6.399350
## final value 6.399350
## converged
# ARIMA, this would be used if both the ACF/PACF tails off
m4 <- sarima(mps_burgxts, 1,0,1)
## initial value 6.721561
## iter 2 value 6.456784
## iter 3 value 6.155531
## iter 4 value 6.122235
## iter 5 value 6.095018
## iter 6 value 6.085352
## iter 7 value 6.085316
## iter 8 value 6.085287
## iter 9 value 6.085272
## iter 10 value 6.085255
## iter 11 value 6.085218
## iter 12 value 6.085193
## iter 13 value 6.085191
## iter 14 value 6.085190
## iter 14 value 6.085190
## iter 14 value 6.085190
## final value 6.085190
## converged
## initial value 6.087127
## iter 2 value 6.087075
## iter 3 value 6.087037
## iter 4 value 6.086985
## iter 5 value 6.086981
## iter 6 value 6.086969
## iter 6 value 6.086969
## iter 6 value 6.086969
## final value 6.086969
## converged
# SARIMA, seasonal model with 12 seasons (monthly data)
m5 <- sarima(mps_burgxts, 1,0,1,1,0,1,12)
## initial value 6.715449
## iter 2 value 6.639852
## iter 3 value 5.855734
## iter 4 value 5.808911
## iter 5 value 5.774553
## iter 6 value 5.689642
## iter 7 value 5.679556
## iter 8 value 5.664593
## iter 9 value 5.660144
## iter 10 value 5.656298
## iter 11 value 5.654674
## iter 12 value 5.654601
## iter 13 value 5.654600
## iter 14 value 5.654598
## iter 15 value 5.654598
## iter 16 value 5.654598
## iter 17 value 5.654597
## iter 18 value 5.654585
## iter 19 value 5.654564
## iter 20 value 5.654511
## iter 21 value 5.654425
## iter 22 value 5.654324
## iter 23 value 5.654236
## iter 24 value 5.654233
## iter 25 value 5.654224
## iter 26 value 5.654223
## iter 27 value 5.654223
## iter 27 value 5.654223
## iter 27 value 5.654223
## final value 5.654223
## converged
## initial value 5.749921
## iter 2 value 5.739198
## iter 3 value 5.734207
## iter 4 value 5.730259
## iter 5 value 5.724746
## iter 6 value 5.718790
## iter 7 value 5.718043
## iter 8 value 5.717607
## iter 9 value 5.717319
## iter 10 value 5.716389
## iter 11 value 5.716291
## iter 12 value 5.716208
## iter 13 value 5.716204
## iter 14 value 5.716184
## iter 15 value 5.716101
## iter 16 value 5.716085
## iter 17 value 5.716047
## iter 18 value 5.715988
## iter 19 value 5.715809
## iter 20 value 5.715498
## iter 21 value 5.715014
## iter 22 value 5.714535
## iter 23 value 5.714429
## iter 24 value 5.714428
## iter 25 value 5.714426
## iter 26 value 5.714378
## iter 27 value 5.714306
## iter 28 value 5.714282
## iter 29 value 5.714256
## iter 30 value 5.714246
## iter 31 value 5.714238
## iter 32 value 5.714235
## iter 33 value 5.714231
## iter 34 value 5.714231
## iter 34 value 5.714230
## final value 5.714230
## converged
We can look at the AIC values to see which of these models is best. We are looking for the one with the lowest AIC. We can see that m5 (SARIMA) had the lowest AIC value. Other values can be called for the model fit, AICc, BIC and ttable.
# compare AIC values for each model
m1$AIC
## [1] 15.06943
m2$AIC
## [1] 15.08766
m3$AIC
## [1] 15.69372
m4$AIC
## [1] 15.08801
m5$AIC
## [1] 14.38062
We can use different methods to identify the best p,d,q,P,D,Q inputs for our data automatically. The auto.arima() can be used for non-seasonal models.
# auto arima burg data
auto.arima(mps_burgxts)
## Series: mps_burgxts
## ARIMA(0,1,0)
##
## sigma^2 estimated as 207783: log likelihood=-784.27
## AIC=1570.54 AICc=1570.58 BIC=1573.19
# auto arima differenced data
auto.arima(diff(mps_burgxts))
## Series: diff(mps_burgxts)
## ARIMA(0,0,0) with zero mean
##
## sigma^2 estimated as 207782: log likelihood=-784.27
## AIC=1570.54 AICc=1570.58 BIC=1573.19
We could also use a function to identify the best model inputs. The function below is taken from the book Introductory Time Series with R by Andrew V. Metcalfe and Paul S.P. Cowpertwait.
This function checks a range of models using trial-and-error and then selects the one with the best AIC.
# get best arima function
get.best.arima <- function(x.ts, maxord = c(1,1,1,1,1,1))
{
best.aic <- 1e8
n <- length(x.ts)
for (p in 0:maxord[1]) for(d in 0:maxord[2]) for(q in 0:maxord[3])
for (P in 0:maxord[4]) for(D in 0:maxord[5]) for(Q in 0:maxord[6])
{
fit <- arima(x.ts, order = c(p,d,q),
seas = list(order = c(P,D,Q),
frequency(x.ts)), method = "CSS")
fit.aic <- -2 * fit$loglik + (log(n) + 1) * length(fit$coef)
if (fit.aic < best.aic)
{
best.aic <- fit.aic
best.fit <- fit
best.model <- c(p,d,q,P,D,Q)
}
}
list(best.aic, best.fit, best.model)
}
We can use this function on the Met Police burglary data using the code below. This identifies the best pdqPDQ for our data as 1,1,2,0,2,1.
# best burglary arima
best_burg <- get.best.arima(mps_burgxts, maxord = c(2,2,2,2,2,2))
# inputs
best_burg
## [[1]]
## [1] 1551.944
##
## [[2]]
##
## Call:
## arima(x = x.ts, order = c(p, d, q), seasonal = list(order = c(P, D, Q), frequency(x.ts)),
## method = "CSS")
##
## Coefficients:
## ar1 ma1 ma2 sma1
## -0.8182 -0.3172 -0.7706 -1.0277
## s.e. 0.0039 0.0647 0.0707 0.0114
##
## sigma^2 estimated as 190143: part log likelihood = -764.66
##
## [[3]]
## [1] 1 1 2 0 2 1
We can fit the model to our burglary data. We are going to forecast 3 years ahead, or 36 months. As the data ends in December 2018, the forecast will be for 2019-2021.
# forecast burglary
m6 <- sarima.for(mps_burgxts, n.ahead = 36, 1,1,2,0,2,1,12)
# view the predicted values
m6$pred
## Time Series:
## Start = 106
## End = 141
## Frequency = 1
## [1] 6177.727 5315.797 4815.595 4809.506 4827.152 4461.373 4561.089 4825.044
## [9] 4485.725 5457.062 6408.410 5646.836 6303.318 5482.455 4888.450 4920.272
## [17] 4930.721 4525.186 4628.621 4914.533 4512.895 5556.569 6507.265 5715.116
## [25] 6341.353 5561.558 4873.750 4943.482 4946.734 4501.444 4608.599 4916.466
## [33] 4452.509 5568.520 6518.565 5695.841
# view the se
m6$se
## Time Series:
## Start = 106
## End = 141
## Frequency = 1
## [1] 356.5754 381.7694 485.0405 583.9950 669.3328 745.2294 814.1193
## [8] 877.6237 936.8339 992.5179 1045.2392 1095.4234 1316.5205 1401.8137
## [15] 1550.3630 1699.1124 1834.0080 1959.8832 2078.1836 2190.1083 2296.5853
## [22] 2398.3394 2495.9474 2589.8727 2830.0510 2958.5499 3142.6368 3330.4246
## [29] 3503.2451 3668.1591 3826.0050 3977.5966 4123.6195 4264.6448 4401.1514
## [36] 4533.5383
Here we are just taking a straightforward look at what actually happened in terms of the burglary offences in London during 2019 and onwards in relation to the forecast.
We can see from the blue dotted line that the forecast performed very well right up until March 2020, but since Covid-19 has become much less accurate. This is of course no surprise, with many households being regularly occupied with instructions to stay indoors or work from home, the opportunities for burglary have been reduced.
# create a vector of actual data to date
# these values were taken from looking at the Met Police stats and data tableau dashboard
# Actual data
actual <- c(5617,4890,5399,4582,4596,4494,4510,4417,4859,5468,5431,5286,5242,4706,3821,2521,2945,3318,3807,3953,4013,4385,4179,3774,3230)
x <- mps_burg$value
x <- as.numeric(x)
new <- append(x, actual)
# forecast with actual
par(mfrow =c(1,1))
m6 <- sarima.for(mps_burgxts, n.ahead = 36, 1,1,2,0,2,1,12)
lines(new, lwd = 1, lty = 2, col = "blue")
title("SARIMA Forecast: Met Police Residential Burglary")