In this discussion, we will be looking at the Bostin Warrant Arrests and forecasts using ETS and ARIMA models.
crime <- read.csv("crimeinboston.csv")
arrests=crime[crime$OFFENSE_CODE_GROUP == "Warrant Arrests",]
arrests=arrests[,c(3,8)]
arrests$OCCURRED_ON_DATE=as.Date(arrests$OCCURRED_ON_DATE)
arrests=arrests %>% count(arrests$OCCURRED_ON_DATE)
data <- as.xts(arrests[,2],order.by=as.Date(arrests[,1]))
arrests <- apply.monthly(data,sum)
names(arrests)[1] <- "Car_Accidents"
n<-dim(arrests)[1]
arrests<-arrests[2:(n-1),]
Warrests <- ts(arrests[,1], start = c(2015,07), frequency = 12)
The time series shows a increase in 2016 May and decrease from 2017 Jun to 2018 July.
autoplot(Warrests)+
ggtitle("Warrant Arrests in Boston") +
xlab("Dates") +
ylab("Arrests")
The ACF of the differenced Warrent Arrests in Boston looks like there is autocorrelations lying outside the 95% limits, and the Ljung-Box statistic has a very small p value. This suggests that the monthly arrests are correlated with that of previous month
Box.test(Warrests, lag=6, type="Ljung-Box")
##
## Box-Ljung test
##
## data: Warrests
## X-squared = 68.241, df = 6, p-value = 9.379e-13
acf(Warrests,plot=TRUE)
Warrests %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.2728
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
fitETS <- ets(Warrests)
fitETS
## ETS(M,N,N)
##
## Call:
## ets(y = Warrests)
##
## Smoothing parameters:
## alpha = 0.8022
##
## Initial states:
## l = 204.7117
##
## sigma: 0.1354
##
## AIC AICc BIC
## 398.1196 398.8255 403.0324
fitETS %>% forecast(h=6) %>% autoplot()
accuracy(fitETS)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -1.842939 30.69216 23.50572 -2.18962 10.88321 0.3091293 -0.1234126
fitARIMA <- auto.arima(Warrests, seasonal=FALSE)
fitARIMA
## Series: Warrests
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.3376
## s.e. 0.1538
##
## sigma^2 estimated as 951.7: log likelihood=-178.93
## AIC=361.87 AICc=362.22 BIC=365.09
fitARIMA %>% forecast(h=6) %>% autoplot()
accuracy(fitARIMA)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -2.20282 30.02757 23.42974 -2.396768 10.8837 0.3081301 0.04287017
The results from the 2 forecast model are very similar. ARIMA outperforms slightly. The ETS model produced a smaller range than the ARIMA model.
I think incorporating seasonality component will help improve the accuracy.