1 Overview
2 Libraries and data
3 Basic trend plots
4 Forecasting burglary in Enfield
5 Forecasting burglary using ARIMA
6 See how the forecast looks against the actual values
7 Learn more

1 Overview

Forecasting of public safety problems using time series analysis is an area that receives relatively limited attention.

In UK policing, longer term demand forecasts are found in a Force Management Statement. Among it’s aims, this document is expected to explain the demand a police force expects to face in the next four years. Forecasts of demand may be used to inform strategic decisions regarding organisational resources.

Examples of linear regression used in the Metropolitan Police FMS can be seen here.

Quite often, forecasts are used to assist in decision making and then forgotten about. Until Covid-19. Now we are using forecasts to try and determine where levels of crime would usually have been and quantifying the extent of the impact lockdown has had on external demands for service. This may prompt a move towards more advanced forecasting techniques used in FMS but also in assessing crime prevention efforts in the future - moving away from the simplistic binary comparison and percentage change that are traditionally relied upon in public safety performance analysis.

A couple of recent papers caught my interest prompting me to produce this short forecasting walk-through script.

2 Libraries and data

Libraries used as below.

library(tidyverse)
library(janitor)
library(tsibble)
library(fable)
library(forecast)
library(astsa)

I’ve chosen to use the Metropolitan Police ward level data which covers the period 2010 to 2018. This is available at the Greater London Authority website.

# read in data
gla_mps_ward <- read.csv("https://data.london.gov.uk/download/recorded_crime_summary/b03b8f4a-075f-4666-9c1d-8d9b0bfe3e63/MPS_Ward_Level_Crime_Historic_NewWard.csv") 

# save data file
save(gla_mps_ward, file = "data_gla_mps_ward.RData")

# clean column headers
gla_mps_ward <- clean_names(gla_mps_ward)

# pivot data
gla_mps_ward <- pivot_longer(gla_mps_ward, -c(ward_code , ward_name , borough , major_category, minor_category))

# add date
gla_mps_ward$day = 1
gla_mps_ward$month <- substr(gla_mps_ward$name, 6,7)
gla_mps_ward$year <- substr(gla_mps_ward$name, 2,5)
gla_mps_ward$ym <- (paste(gla_mps_ward$year, gla_mps_ward$month, sep = "-"))
gla_mps_ward <- gla_mps_ward %>% mutate(yrmn = yearmonth(ym))
gla_mps_ward$date <- as.Date(paste(gla_mps_ward$year, gla_mps_ward$month, gla_mps_ward$day, sep = "-"))

# data sets
# aggregated borough crime data
borough_crime <- gla_mps_ward %>%
  group_by(borough, major_category, date, yrmn) %>%
  filter(date <= "2018-12-01") %>%
  summarise(total = sum(value))

head(borough_crime)

## # A tibble: 6 x 5
## # Groups:   borough, major_category, date [6]
##   borough              major_category date           yrmn total
##   <chr>                <chr>          <date>        <mth> <int>
## 1 Barking and Dagenham Burglary       2010-04-01 2010 Apr   167
## 2 Barking and Dagenham Burglary       2010-05-01 2010 May   160
## 3 Barking and Dagenham Burglary       2010-06-01 2010 Jun   182
## 4 Barking and Dagenham Burglary       2010-07-01 2010 Jul   183
## 5 Barking and Dagenham Burglary       2010-08-01 2010 Aug   199
## 6 Barking and Dagenham Burglary       2010-09-01 2010 Sep   187

# aggregated met police wide crime data
mps_wide_crime <- gla_mps_ward %>%
  group_by(minor_category, date, yrmn) %>%
  filter(date <= "2018-12-01") %>%
  summarise(total = sum(value))

head(mps_wide_crime)

## # A tibble: 6 x 4
## # Groups:   minor_category, date [6]
##   minor_category      date           yrmn total
##   <chr>               <date>        <mth> <int>
## 1 Assault with Injury 2010-04-01 2010 Apr  4810
## 2 Assault with Injury 2010-05-01 2010 May  5370
## 3 Assault with Injury 2010-06-01 2010 Jun  5421
## 4 Assault with Injury 2010-07-01 2010 Jul  5391
## 5 Assault with Injury 2010-08-01 2010 Aug  4735
## 6 Assault with Injury 2010-09-01 2010 Sep  4656

3 Basic trend plots

Quite often we might just look at a line chart showing the data trend.

The code below shows the trend in major categories of crime occurring in the London Borough of Enfield from 2010 to 2018. It’s a bit messy presented in this way, we can only really see that most crime is Theft and Handling and that Violence Against The Person increased from January 2014 onward.

# Example, Enfield all major category
borough_crime %>%
  filter(borough=="Enfield") %>%
  ggplot(aes(yrmn, total, colour = major_category)) +
  geom_line()

Another way to look at this data is to use a facet wrap to view each major category individually. We are limited to describing what we see that has happened.

# Example, Enfield crime types on their own
borough_crime %>%
  filter(borough=="Enfield") %>%
  ggplot(aes(yrmn, total)) +
  geom_line() +
  facet_wrap(~ major_category, scales = "free_y")

4 Forecasting burglary in Enfield

This next section looks at ways of deconstructing trends and applying forecasts using ETS (Error Trend and Seasonality, or exponential smoothing)

4.1 Decomposing a time series

Time series can have a variety of patterns which can be observed when splitting them up into different parts. Time series decomposition is a way of doing this. Using London Borough of Enfield Burglary data we can see the trend (falling between 2012-2016, and rising thereafter) and seasonal component (peaks and troughs each year), and a random noise component (everything else in the time series).

# prepare Enfield burglary data
ye_burg_past <-
  borough_crime %>%
  filter(borough == "Enfield" & major_category == "Burglary") %>%
  as_tsibble(index = yrmn)

head(ye_burg_past)

## # A tsibble: 6 x 5 [1M]
## # Groups:    borough, major_category, date [6]
##   borough major_category date           yrmn total
##   <chr>   <chr>          <date>        <mth> <int>
## 1 Enfield Burglary       2010-04-01 2010 Apr   252
## 2 Enfield Burglary       2010-05-01 2010 May   283
## 3 Enfield Burglary       2010-06-01 2010 Jun   256
## 4 Enfield Burglary       2010-07-01 2010 Jul   252
## 5 Enfield Burglary       2010-08-01 2010 Aug   205
## 6 Enfield Burglary       2010-09-01 2010 Sep   220

# create time series object of ye burglary
ye_burg_ts <- ts(ye_burg_past$total, frequency = 12, start = c(2010, 4))

# decompose time series
ye_burg_dc <- decompose(ye_burg_ts)

# plot decomposition
plot(ye_burg_dc)

4.2 ETS Forecast

ETS are used for when there is a trend and/or seasonality in the data, which for Burglary in Enfield there is.

We use the ETS model to forecast the next three years.

# Forecast future Enfield burglary using the ye_burg_past object
ye_burg_future <- ye_burg_past %>%
  model(ETS(total)) %>%
  forecast(h = "3 years")

# view data
head(ye_burg_future)

## # A fable: 6 x 4 [1M]
## # Key:     .model [1]
##   .model         yrmn        total .mean
##   <chr>         <mth>       <dist> <dbl>
## 1 ETS(total) 2019 Jan N(364, 1135)  364.
## 2 ETS(total) 2019 Feb N(355, 1303)  355.
## 3 ETS(total) 2019 Mar N(345, 1471)  345.
## 4 ETS(total) 2019 Apr N(309, 1640)  309.
## 5 ETS(total) 2019 May N(313, 1808)  313.
## 6 ETS(total) 2019 Jun N(291, 1977)  291.

# plot data
autoplot(ye_burg_future, ye_burg_past) + 
  ggtitle("Enfield Burglary Forecast")

# export data, use code below if you want to view the forecast outside of R
# write.csv(ye_burg_future, "ye_burg_future.csv")
# MS Excel has its own [ETS.FORECAST function](https://exceljet.net/excel-functions/excel-forecast.ets-function)

The underlying statistics can be used to determine how good the forecast is.

ets(dataset) will find the best model for your data

When this is assigned to an object, further statistics and diagnostics of the forecast can be viewed

checkresiduals(fitted model)

This will provide details from the Ljung Box Test (which the burglary forecast fails - we are looking for a high Q statistic and a p value which is not significant) and charts. The residuals chart shouldn’t show clusters of volatility, the ACF shouldn’t show significant correlation between the residuals, and the count histogram ideally is symmetrical (a bell curve).

To learn more about how to interpret, test and make conclusions about how sound the model is, there are references included at the end. Also see:

# find the best model using ets
ets(ye_burg_past$total)

## ETS(A,N,N) 
## 
## Call:
##  ets(y = ye_burg_past$total) 
## 
##   Smoothing parameters:
##     alpha = 0.7995 
## 
##   Initial states:
##     l = 257.0014 
## 
##   sigma:  39.4767
## 
##      AIC     AICc      BIC 
## 1264.546 1264.784 1272.508

# fit ETS model to the enfield burglary past data
fit.ets <- ets(ye_burg_past$total)

# check the residuals
checkresiduals(fit.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 17.566, df = 8, p-value = 0.02473
## 
## Model df: 2.   Total lags used: 10

4.3 Creating multiple forecasts at once

If we wanted to create forecasts for more than one problem simultaneously, we can do that by assigning a key when creating our tsibble object. When we display the forecasts we can facet by the key (category of crime in this instance). Note that most Fraud and Forgery offence recording moved away from police forces.

# create tsibble object for all crime types
ye_past <-
  borough_crime %>%
  filter(borough == "Enfield") %>%
  as_tsibble(index = yrmn, key = major_category)

# forecast future
ye_future <-
  ye_past %>%
  model(ETS(total)) %>%
  forecast(h = "3 years")

# plot with facet wrap
autoplot(ye_future, ye_past) + 
  facet_wrap(~ major_category, scales = "free_y") +
  ggtitle("Enfield Crime Forecasts")

5 Forecasting burglary using ARIMA

Keeping with Burglary, this time for the whole of the Metropolitan Police, we will look at another method called ARIMA (Autoregressive Integrated Moving Average). The code for the data set is shown below.

# filter the mps_wide_crime to burglary in a dwelling
mps_burg <- mps_wide_crime %>%
  filter(minor_category == "Burglary In A Dwelling") %>%
  group_by(date) %>%
  summarize(value = as.numeric(sum(total))) %>%
  as_tsibble()

# create an extensible time series object
mps_burgxts <- xts::xts(order.by = mps_burg$date, mps_burg$value)

5.1 View the data

If we plot the data we can see that there is a seasonal pattern in London burglary and both a downward and upward trend.

plot(mps_burgxts)

Differencing is sometimes applied to make a time series stationary by removing trends and/or seasonality (detrending).

You can learn more about this and why you may need to do this in the link below,

Stationarity and differencing

The code below as an example shows the Met Police burglary trend when the time series is differenced, and when the time series is differenced taking account of seasonal cycles (adding lag=12).

# difference time series
diff1 <- diff(log(mps_burgxts))

# seasonal time series differencing
diff2 <- (diff(log(mps_burgxts), lag =12))


#plot
par(mfrow =c(1,2))
autoplot(diff1)

autoplot(diff2)

5.2 Identifying model types

The acf2 function is used to observe Autocorrelation and Partial Autocorrelation. These show whether the elements of a time series (or differenced time series) are correlated positively, negatively or independent of one another. The x-axis show lags between those elements.

These can be used to decide on the type of model needed for an ARIMA forecast.

AR(p) model when the ACF tails off and the PACF cuts off abruptly at a lag
MA(q) model when the ACF cuts off abruptly at a lag and the PACF tails off
ARMA when both the ACF and PACF tails off

To learn more about how to interpret acf plots and how they relate to ARIMA models see the link below:

Non-Seasonal ARIMA: ACF and PACF Plots section

# acf and pacf plot of burglary data
acf2(mps_burgxts)

##      [,1]  [,2]  [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  0.84  0.69  0.49 0.33 0.20 0.16 0.18 0.25 0.37  0.54  0.62  0.65  0.54
## PACF 0.84 -0.06 -0.26 0.00 0.04 0.15 0.16 0.16 0.21  0.32 -0.03  0.00 -0.35
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## ACF   0.37  0.17  0.04 -0.07 -0.13 -0.10 -0.04  0.05
## PACF -0.22 -0.15  0.11 -0.03 -0.15  0.04 -0.12 -0.07

# acf and pacf plot of differenced burglary data
acf2(diff(mps_burgxts))

##       [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF  -0.06 0.21 -0.09 -0.13 -0.25 -0.21 -0.20 -0.15 -0.13  0.23  0.12  0.53
## PACF -0.06 0.21 -0.07 -0.19 -0.25 -0.20 -0.19 -0.22 -0.30  0.04  0.01  0.40
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## ACF   0.13  0.16 -0.20 -0.08 -0.16 -0.27 -0.11 -0.12 -0.10
## PACF  0.21  0.08 -0.21 -0.09  0.12 -0.03  0.08  0.05 -0.04

5.3 Experimenting with different model inputs

ARIMA models are classified using the values of p,d,q - p is the number of autoregressive terms, d is the number of non-seasonal differences and q is the number of lagged forecast errors. SARIMA (seasonal models) are classified using p,d,q,P,D,Q and the number of seasonal periods.

The code below shows examples of how we would create an AR, MA, ARIMA and SARIMA model using the burglary data.

Each model outputs residuals with four charts. What we are looking to see in these plots to determine if our forecast model is sound are:

We don’t want to see patterns in the residuals
We don’t want to see ACF that has large values
Q-Q plot suggests normality, no/few outliers
We don’t want to see all points below the line on the Q-Statistic

We can see these are problems in the AR1, AR2, MA1 and ARIMA models using the Met Police burglary data. The SARIMA model looks to be the best.

# AR1, this would be used if the ACF tails off and PACF cuts off at lag 1
m1 <- sarima(mps_burgxts, p = 1, d = 0, q = 0)

## initial  value 6.721561 
## iter   2 value 6.085844
## iter   3 value 6.085662
## iter   4 value 6.085614
## iter   5 value 6.085551
## iter   6 value 6.085463
## iter   7 value 6.085443
## iter   8 value 6.085438
## iter   9 value 6.085433
## iter  10 value 6.085425
## iter  11 value 6.085420
## iter  12 value 6.085420
## iter  13 value 6.085420
## iter  13 value 6.085420
## iter  13 value 6.085419
## final  value 6.085419 
## converged
## initial  value 6.087368 
## iter   2 value 6.087314
## iter   3 value 6.087234
## iter   4 value 6.087228
## iter   5 value 6.087208
## iter   6 value 6.087207
## iter   6 value 6.087207
## final  value 6.087207 
## converged

# AR2, similar to above but if the PACF cuts off at lag 2, and we can just use the numbers to shorten our code
m2 <- sarima(mps_burgxts, 2,0,0)

## initial  value 6.726277 
## iter   2 value 6.572200
## iter   3 value 6.166248
## iter   4 value 6.121905
## iter   5 value 6.090672
## iter   6 value 6.089940
## iter   7 value 6.089918
## iter   8 value 6.089861
## iter   9 value 6.089855
## iter  10 value 6.089846
## iter  11 value 6.089825
## iter  12 value 6.089818
## iter  13 value 6.089811
## iter  14 value 6.089811
## iter  14 value 6.089811
## iter  14 value 6.089811
## final  value 6.089811 
## converged
## initial  value 6.086941 
## iter   2 value 6.086887
## iter   3 value 6.086816
## iter   4 value 6.086812
## iter   5 value 6.086795
## iter   6 value 6.086794
## iter   6 value 6.086794
## iter   6 value 6.086794
## final  value 6.086794 
## converged

# MA1, this would be used if the ACF cuts off at lag 1 and the PACF tails off
m3 <- sarima(mps_burgxts, 0,0,1)

## initial  value 6.717088 
## iter   2 value 6.571922
## iter   3 value 6.558565
## iter   4 value 6.483067
## iter   5 value 6.436828
## iter   6 value 6.415934
## iter   7 value 6.402668
## iter   8 value 6.398273
## iter   9 value 6.397435
## iter  10 value 6.397402
## iter  11 value 6.397402
## iter  11 value 6.397402
## final  value 6.397402 
## converged
## initial  value 6.399357 
## iter   2 value 6.399352
## iter   3 value 6.399350
## iter   3 value 6.399350
## iter   3 value 6.399350
## final  value 6.399350 
## converged

# ARIMA, this would be used if both the ACF/PACF tails off
m4 <- sarima(mps_burgxts, 1,0,1)

## initial  value 6.721561 
## iter   2 value 6.456784
## iter   3 value 6.155531
## iter   4 value 6.122235
## iter   5 value 6.095018
## iter   6 value 6.085352
## iter   7 value 6.085316
## iter   8 value 6.085287
## iter   9 value 6.085272
## iter  10 value 6.085255
## iter  11 value 6.085218
## iter  12 value 6.085193
## iter  13 value 6.085191
## iter  14 value 6.085190
## iter  14 value 6.085190
## iter  14 value 6.085190
## final  value 6.085190 
## converged
## initial  value 6.087127 
## iter   2 value 6.087075
## iter   3 value 6.087037
## iter   4 value 6.086985
## iter   5 value 6.086981
## iter   6 value 6.086969
## iter   6 value 6.086969
## iter   6 value 6.086969
## final  value 6.086969 
## converged

# SARIMA, seasonal model with 12 seasons (monthly data)
m5 <- sarima(mps_burgxts, 1,0,1,1,0,1,12)

## initial  value 6.715449 
## iter   2 value 6.639852
## iter   3 value 5.855734
## iter   4 value 5.808911
## iter   5 value 5.774553
## iter   6 value 5.689642
## iter   7 value 5.679556
## iter   8 value 5.664593
## iter   9 value 5.660144
## iter  10 value 5.656298
## iter  11 value 5.654674
## iter  12 value 5.654601
## iter  13 value 5.654600
## iter  14 value 5.654598
## iter  15 value 5.654598
## iter  16 value 5.654598
## iter  17 value 5.654597
## iter  18 value 5.654585
## iter  19 value 5.654564
## iter  20 value 5.654511
## iter  21 value 5.654425
## iter  22 value 5.654324
## iter  23 value 5.654236
## iter  24 value 5.654233
## iter  25 value 5.654224
## iter  26 value 5.654223
## iter  27 value 5.654223
## iter  27 value 5.654223
## iter  27 value 5.654223
## final  value 5.654223 
## converged
## initial  value 5.749921 
## iter   2 value 5.739198
## iter   3 value 5.734207
## iter   4 value 5.730259
## iter   5 value 5.724746
## iter   6 value 5.718790
## iter   7 value 5.718043
## iter   8 value 5.717607
## iter   9 value 5.717319
## iter  10 value 5.716389
## iter  11 value 5.716291
## iter  12 value 5.716208
## iter  13 value 5.716204
## iter  14 value 5.716184
## iter  15 value 5.716101
## iter  16 value 5.716085
## iter  17 value 5.716047
## iter  18 value 5.715988
## iter  19 value 5.715809
## iter  20 value 5.715498
## iter  21 value 5.715014
## iter  22 value 5.714535
## iter  23 value 5.714429
## iter  24 value 5.714428
## iter  25 value 5.714426
## iter  26 value 5.714378
## iter  27 value 5.714306
## iter  28 value 5.714282
## iter  29 value 5.714256
## iter  30 value 5.714246
## iter  31 value 5.714238
## iter  32 value 5.714235
## iter  33 value 5.714231
## iter  34 value 5.714231
## iter  34 value 5.714230
## final  value 5.714230 
## converged

We can look at the AIC values to see which of these models is best. We are looking for the one with the lowest AIC. We can see that m5 (SARIMA) had the lowest AIC value. Other values can be called for the model fit, AICc, BIC and ttable.

# compare AIC values for each model
m1$AIC

## [1] 15.06943

m2$AIC

## [1] 15.08766

m3$AIC

## [1] 15.69372

m4$AIC

## [1] 15.08801

m5$AIC

## [1] 14.38062

5.4 Automatically choosing forecast model

We can use different methods to identify the best p,d,q,P,D,Q inputs for our data automatically. The auto.arima() can be used for non-seasonal models.

# auto arima burg data
auto.arima(mps_burgxts)

## Series: mps_burgxts 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 207783:  log likelihood=-784.27
## AIC=1570.54   AICc=1570.58   BIC=1573.19

# auto arima differenced data
auto.arima(diff(mps_burgxts))

## Series: diff(mps_burgxts) 
## ARIMA(0,0,0) with zero mean 
## 
## sigma^2 estimated as 207782:  log likelihood=-784.27
## AIC=1570.54   AICc=1570.58   BIC=1573.19

We could also use a function to identify the best model inputs. The function below is taken from the book Introductory Time Series with R by Andrew V. Metcalfe and Paul S.P. Cowpertwait.

This function checks a range of models using trial-and-error and then selects the one with the best AIC.

# get best arima function
get.best.arima <- function(x.ts, maxord = c(1,1,1,1,1,1)) 
  {
  best.aic <- 1e8
  n <- length(x.ts)
  for (p in 0:maxord[1]) for(d in 0:maxord[2]) for(q in 0:maxord[3])
    for (P in 0:maxord[4]) for(D in 0:maxord[5]) for(Q in 0:maxord[6])
    {
      fit <- arima(x.ts, order = c(p,d,q),
                   seas = list(order = c(P,D,Q),
                               frequency(x.ts)), method = "CSS")
      fit.aic <- -2 * fit$loglik + (log(n) + 1) * length(fit$coef)
      if (fit.aic < best.aic)
      {
        best.aic <- fit.aic
        best.fit <- fit
        best.model <- c(p,d,q,P,D,Q)
      }
    } 
  list(best.aic, best.fit, best.model)
}

We can use this function on the Met Police burglary data using the code below. This identifies the best pdqPDQ for our data as 1,1,2,0,2,1.

# best burglary arima
best_burg <- get.best.arima(mps_burgxts, maxord = c(2,2,2,2,2,2))

# inputs
best_burg

## [[1]]
## [1] 1551.944
## 
## [[2]]
## 
## Call:
## arima(x = x.ts, order = c(p, d, q), seasonal = list(order = c(P, D, Q), frequency(x.ts)), 
##     method = "CSS")
## 
## Coefficients:
##           ar1      ma1      ma2     sma1
##       -0.8182  -0.3172  -0.7706  -1.0277
## s.e.   0.0039   0.0647   0.0707   0.0114
## 
## sigma^2 estimated as 190143:  part log likelihood = -764.66
## 
## [[3]]
## [1] 1 1 2 0 2 1

5.5 Fitting the SARIMA model

We can fit the model to our burglary data. We are going to forecast 3 years ahead, or 36 months. As the data ends in December 2018, the forecast will be for 2019-2021.

# forecast burglary
m6 <- sarima.for(mps_burgxts, n.ahead = 36, 1,1,2,0,2,1,12)

# view the predicted values
m6$pred

## Time Series:
## Start = 106 
## End = 141 
## Frequency = 1 
##  [1] 6177.727 5315.797 4815.595 4809.506 4827.152 4461.373 4561.089 4825.044
##  [9] 4485.725 5457.062 6408.410 5646.836 6303.318 5482.455 4888.450 4920.272
## [17] 4930.721 4525.186 4628.621 4914.533 4512.895 5556.569 6507.265 5715.116
## [25] 6341.353 5561.558 4873.750 4943.482 4946.734 4501.444 4608.599 4916.466
## [33] 4452.509 5568.520 6518.565 5695.841

# view the se
m6$se

## Time Series:
## Start = 106 
## End = 141 
## Frequency = 1 
##  [1]  356.5754  381.7694  485.0405  583.9950  669.3328  745.2294  814.1193
##  [8]  877.6237  936.8339  992.5179 1045.2392 1095.4234 1316.5205 1401.8137
## [15] 1550.3630 1699.1124 1834.0080 1959.8832 2078.1836 2190.1083 2296.5853
## [22] 2398.3394 2495.9474 2589.8727 2830.0510 2958.5499 3142.6368 3330.4246
## [29] 3503.2451 3668.1591 3826.0050 3977.5966 4123.6195 4264.6448 4401.1514
## [36] 4533.5383

6 See how the forecast looks against the actual values

Here we are just taking a straightforward look at what actually happened in terms of the burglary offences in London during 2019 and onwards in relation to the forecast.

We can see from the blue dotted line that the forecast performed very well right up until March 2020, but since Covid-19 has become much less accurate. This is of course no surprise, with many households being regularly occupied with instructions to stay indoors or work from home, the opportunities for burglary have been reduced.

# create a vector of actual data to date
# these values were taken from looking at the Met Police stats and data tableau dashboard
# Actual data
actual <- c(5617,4890,5399,4582,4596,4494,4510,4417,4859,5468,5431,5286,5242,4706,3821,2521,2945,3318,3807,3953,4013,4385,4179,3774,3230)
x <- mps_burg$value
x <- as.numeric(x)
new <- append(x, actual)

# forecast with actual
par(mfrow =c(1,1))
m6 <- sarima.for(mps_burgxts, n.ahead = 36, 1,1,2,0,2,1,12)
lines(new, lwd = 1, lty = 2, col = "blue")
title("SARIMA Forecast: Met Police Residential Burglary")

7 Learn more

Forecasting: Principles and Practice, Rob J Hyndman and George Athanasopolous
Time Series Analysis and Its Applications Robert H. Shumway and David S.Stoffer
R Code for TSA and Its Applications
Introductory Time Series with R

Forecasting crime

Iain @routineactivity

16/05/2021