Discussion 4:

In this discussion, we will be looking at the Bostin Warrant Arrests and forecasts using ETS and ARIMA models.

crime <- read.csv("crimeinboston.csv")

arrests=crime[crime$OFFENSE_CODE_GROUP  == "Warrant Arrests",]
arrests=arrests[,c(3,8)]
arrests$OCCURRED_ON_DATE=as.Date(arrests$OCCURRED_ON_DATE)
arrests=arrests %>% count(arrests$OCCURRED_ON_DATE)
data <- as.xts(arrests[,2],order.by=as.Date(arrests[,1]))
arrests <- apply.monthly(data,sum)
names(arrests)[1] <- "Car_Accidents"
n<-dim(arrests)[1]
arrests<-arrests[2:(n-1),]

Warrests <- ts(arrests[,1], start = c(2015,07), frequency = 12)

The time series shows a increase in 2016 May and decrease from 2017 Jun to 2018 July.

autoplot(Warrests)+ 
  ggtitle("Warrant Arrests in Boston") +
  xlab("Dates") +
  ylab("Arrests")

The ACF of the differenced Warrent Arrests in Boston looks like there is autocorrelations lying outside the 95% limits, and the Ljung-Box statistic has a very small p value. This suggests that the monthly arrests are correlated with that of previous month

Box.test(Warrests, lag=6, type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  Warrests
## X-squared = 68.241, df = 6, p-value = 9.379e-13

acf(Warrests,plot=TRUE)

Warrests %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.2728 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

fitETS <- ets(Warrests)
fitETS

## ETS(M,N,N) 
## 
## Call:
##  ets(y = Warrests) 
## 
##   Smoothing parameters:
##     alpha = 0.8022 
## 
##   Initial states:
##     l = 204.7117 
## 
##   sigma:  0.1354
## 
##      AIC     AICc      BIC 
## 398.1196 398.8255 403.0324

fitETS %>% forecast(h=6) %>% autoplot()

accuracy(fitETS)

##                     ME     RMSE      MAE      MPE     MAPE      MASE       ACF1
## Training set -1.842939 30.69216 23.50572 -2.18962 10.88321 0.3091293 -0.1234126

fitARIMA <- auto.arima(Warrests, seasonal=FALSE)
fitARIMA

## Series: Warrests 
## ARIMA(1,1,0) 
## 
## Coefficients:
##           ar1
##       -0.3376
## s.e.   0.1538
## 
## sigma^2 estimated as 951.7:  log likelihood=-178.93
## AIC=361.87   AICc=362.22   BIC=365.09

fitARIMA %>% forecast(h=6) %>% autoplot()

accuracy(fitARIMA)

##                    ME     RMSE      MAE       MPE    MAPE      MASE       ACF1
## Training set -2.20282 30.02757 23.42974 -2.396768 10.8837 0.3081301 0.04287017

The results from the 2 forecast model are very similar. ARIMA outperforms slightly. The ETS model produced a smaller range than the ARIMA model.

I think incorporating seasonality component will help improve the accuracy.

Discussion 4:

Yu Mu

11/16/2020