crime <- read.csv("crimeinboston.csv")
attach(crime)
vandal=crime[crime$OFFENSE_DESCRIPTION == "VANDALISM",]
vandal=vandal[,c(4,8,9,10)]
vandal$OCCURRED_ON_DATE=(as.Date(vandal$OCCURRED_ON_DATE))
dates <- as.Date(vandal$OCCURRED_ON_DATE, '%d/%m/%Y')
monyr <- as.yearmon(dates)
vandal$monyr=as.yearmon(dates)
vandal=vandal %>% count(monyr)
names(vandal)[2] <- "Vandalism"
MonthlyVandalism <- ts(vandal[,2], start = c(2015,06), frequency = 12)
After cleaning the data to only have vandalism crimes, we will plot it. The data looks very volatile, a 2 month moving average smoothout the data quite a bit. This could potentially be due to recording timing error. For example many the vandalism crimes might be recorded together. The drop at the end of 2018 also looks like a data error.
autoplot(MonthlyVandalism)+
ggtitle("Vandalism in Boston") +
xlab("Year") +
ylab("Vandalism")
No significant seasonal pattern is shown in the season plot. Summer have higher vandalism rate is expected.
Seasonality and multi year cyclical seems to be present. There is also a downward trend.
decadd<-decompose(MonthlyVandalism,type="additive")
autoplot(decadd)
## Warning: attributes are not identical across measure variables;
## they will be dropped
## Warning: Removed 24 row(s) containing missing values (geom_path).
addseas=decadd$seasonal
addtrend=decadd$trend
adderror=decadd$random
Comparing additive decomposition and multiplicative, the 2 model shows similar results. The trend component seems to be more profound in the multiplicative model.
decmul<-decompose(MonthlyVandalism,type="multiplicative")
multseas=decmul$seasonal
multtrend=decmul$trend
multerror=decmul$random
autoplot(decmul)
## Warning: attributes are not identical across measure variables;
## they will be dropped
## Warning: Removed 24 row(s) containing missing values (geom_path).
erroradd<-na.omit(decadd$random) #generate residuals without NA
errormul<-na.omit(decmul$random) #generate residuals without NA
erroradd<-as.vector(erroradd) #set residuals as a vector
errormul<-as.vector(errormul)
abserroradd<-abs(erroradd)
abserrormul<-abs(errormul)
sqerroradd<-erroradd^2
sqerrormul<-errormul^2
#Calculate Statisics
ME<-c(mean(erroradd),mean(errormul))
MAD<-c(mean(abserroradd), mean(abserrormul))
MSE<-c(mean(sqerroradd),mean(sqerrormul))
RMSE<-sqrt(MSE)
all<-cbind(ME, MAD, MSE, RMSE)
all
## ME MAD MSE RMSE
## [1,] 3.372106 15.50765 432.078123 20.78649
## [2,] 1.007210 1.00721 1.017658 1.00879
Top row is Additive model, bottom is multiplicative model Comparing Mean errors, absolute mean deviation, Mean square errors, root mean square error the multiplicative model does a much better job.
Looking at the X11 decomposition, we see the seaonality is allowed to vary. The trend component are also estimated on the end points. This model is able to captures the unusual fall at the end of 2018.
MonthlyVandalism %>% seas(x11="") -> fit
autoplot(fit) +
ggtitle("X11 decomposition of Vandalism in Boston")
SEATS decomposition use ARIMA, like X11, it is also able to capture more information(end points) than the classica decompositions.
MonthlyVandalism %>% seas() %>%
autoplot() +
ggtitle("SEATS decomposition of Vandalism in Boston")
## Model used in SEATS is different: (0 0 0)(0 1 0)
STL decomposition shows a much more smoothed trend component than X11 and SEATS. the remainder error are also very small
MonthlyVandalism %>%
stl(t.window=13, s.window="periodic", robust=TRUE) %>%
autoplot()+ggtitle("STL decomposition of Vandalism in Boston")
Looking at all the model above, the multiplicative model is the best out of the classic model. I would also use STL model as the trend is smoother and the seasonality is different. The dataset might be too limited (too small) to provide a proper forecasting model.
Forecasting
autoplot(model1<-forecast(MonthlyVandalism))
## Warning: `filter_()` is deprecated as of dplyr 0.7.0.
## Please use `filter()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.