crime <- read.csv("crimeinboston.csv")
caraccident=crime[crime$OFFENSE_CODE_GROUP == "Motor Vehicle Accident Response",]
caraccident=caraccident[,c(3,8)]
caraccident$OCCURRED_ON_DATE=as.Date(caraccident$OCCURRED_ON_DATE)
caraccident=caraccident %>% count(caraccident$OCCURRED_ON_DATE)
data <- as.xts(caraccident[,2],order.by=as.Date(caraccident[,1]))
caraccident <- apply.monthly(data,sum)
names(caraccident)[1] <- "Car_Accidents"
n<-dim(caraccident)[1]
caraccident<-caraccident[2:(n-1),]
#dates <- as.Date(caraccident$OCCURRED_ON_DATE, '%d/%m/%Y')
#monyr <- as.yearmon(dates)
#caraccident$monyr=as.yearmon(dates)
#caraccident=caraccident %>% count(monyr)
#names(caraccident)[2] <- "Car_Accidents"
WCarAccident <- ts(caraccident[,1], start = c(2015,07), frequency = 12)
The begining and ending of data was removed as they are significant outliers and will affect the forecast model.
autoplot(WCarAccident)+
ggtitle("Motor Vehicle Accident Response in Boston") +
xlab("Dates") +
ylab("Accidents")
To better understand the data, I used decompose to look at the seasonal and trend factors. Both decomposition shows there is an increase trend in 2016 to 2017 and slower decrease trend in 2017 to 2018. The seasonality componet is suprising, showing less accidents in the winter time.
dec1<-decompose(WCarAccident,type="additive") #decompose additive
dec2<-decompose(WCarAccident,type="multiplicative") #decompose multiplicative
autoplot(dec1)
autoplot(dec2)
sesfc=ses(WCarAccident, h=12)
#forecast
autoplot(sesfc)+
autolayer(fitted(sesfc), series="ses")+
ggtitle("Motor Vehicle Accident in Boston") + xlab("Date (Week)") +
ylab("Accidents")
First, using the simple smoothing prediction produced a flat line prediction.
holtfc = holt(WCarAccident, h=12)
autoplot(holtfc) +
autolayer(fitted(holtfc), series="Holt's method") +
ggtitle("Motor Vehicle Accident in Boston") + xlab("Date (Week)") +
ylab("Accidents") +
guides(colour=guide_legend(title="Forecast"))
holtdampfc <- holt(WCarAccident, damped=TRUE, phi = 0.8, h=12)
autoplot(holtdampfc) +
autolayer(fitted(holtdampfc), series="Damped Holt's method") +
ggtitle("Motor Vehicle Accident in Boston") + xlab("Date (Week)") +
ylab("Accidents") +
guides(colour=guide_legend(title="Forecast"))
Looking at the holt method, the first method without a dampener produced an uptick estimation. With dampening, the forecast is very similar to ses.
seasonalholtfc<-hw(WCarAccident, seasonal="additive")
autoplot(seasonalholtfc)+
autolayer(fitted(seasonalholtfc), series="Seasonal Holt's method")+
ggtitle("Motor Vehicle Accident in Boston") + xlab("Date (Week)") +
ylab("Accidents")
The range is very large using the Holt Winter forecast. The number of accident could reach as high as 1900 or as low as 400 base on the 80% CI. The point prediction continue to be around 900-1200 which is in range.
etsfc <- ets(WCarAccident)
summary(etsfc)
## ETS(M,N,N)
##
## Call:
## ets(y = WCarAccident)
##
## Smoothing parameters:
## alpha = 0.2987
##
## Initial states:
## l = 891.329
##
## sigma: 0.0782
##
## AIC AICc BIC
## 469.8931 470.5990 474.8059
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 8.039102 72.76709 55.63389 0.3396024 5.898817 0.6954236
## ACF1
## Training set -0.09363773
autoplot(etsfc)
As shown above, the ets model provide a much better ME and MAE
etsfc %>% forecast(h=24) %>%
autoplot() +
ylab("Motor Vehicle Accident in Boston")
accuracy(sesfc)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 7.32551 72.74439 55.70737 0.2662276 5.909321 0.6963421 -0.1121108
accuracy(holtfc)
## ME RMSE MAE MPE MAPE MASE
## Training set -1.583671 72.62459 54.38314 -0.6800326 5.811711 0.6797893
## ACF1
## Training set -0.08906737
accuracy(holtdampfc)
## ME RMSE MAE MPE MAPE MASE
## Training set 3.24001 72.73885 54.59676 -0.1895338 5.807934 0.6824595
## ACF1
## Training set -0.08938032
accuracy(seasonalholtfc)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.4660516 40.13755 29.75292 -0.1332263 3.13852 0.3719114
## ACF1
## Training set -0.2970688
accuracy(etsfc)
## ME RMSE MAE MPE MAPE MASE
## Training set 8.039102 72.76709 55.63389 0.3396024 5.898817 0.6954236
## ACF1
## Training set -0.09363773
As shown above, the Holt Winter is more accurate looking at RMSE, ME. This means seasonaly is a strong componet related to accidents. This make sense since if the weather is nice will lead to more driving and higher chance of accidents. However, as discussed before, the range produce by Holt Winter model is very wide. Other models produce a relatively high RMSE and MAE.