ARIMA Analysis of NYC Restaurant Violations by Inspection Date
Introduction
An ARIMA analysis was performed on restaurant violations by inspection date in New York City using the Forecast package in R. The analysis is associated with the Data Incubator Capstone project meant to explore possibilities using ARIMA on the data. This document was reproduced from the referenced sources.
In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). ARIMA models are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the “integrated” part of the model) can be applied one or more times to eliminate the non-stationarity.
Time Series
The ts() function will convert a numeric vector into an R time series object. The format is ts(vector, start=, end=, frequency=) where start and end are the times of the first and last observation and frequency is the number of observations per unit time (1=annual, 4=quartly, 12=monthly, etc.).
Seasonal Decomposition
A time series with additive trend, seasonal, and irregular components can be decomposed using the stl() function. Note that a series with multiplicative effects can often by transformed into series with additive effects through a log transformation (i.e., newts <- log(myts)).
Exponential Models
Both the HoltWinters() function in the base installation, and the ets() function in the forecast package, can be used to fit exponential models.
# simple exponential - models level
fit <- HoltWinters(Date_of_Vio, beta = FALSE, gamma = FALSE)
fitHolt-Winters exponential smoothing without trend and without seasonal component.
Call:
HoltWinters(x = Date_of_Vio, beta = FALSE, gamma = FALSE)
Smoothing parameters:
alpha: 0.1407241
beta : FALSE
gamma: FALSE
Coefficients:
[,1]
a 635.4118
Double Exponential Model
Double Exponential models level and trend.
Holt-Winters exponential smoothing with trend and without seasonal component.
Call:
HoltWinters(x = Date_of_Vio, gamma = FALSE)
Smoothing parameters:
alpha: 0.4960217
beta : 0.3238853
gamma: FALSE
Coefficients:
[,1]
a 399.17972
b -72.67498
Triple Exponential Model
Triple Exponential models that include level, trend, and seasonal components.
Holt-Winters exponential smoothing with trend and additive seasonal component.
Call:
HoltWinters(x = Date_of_Vio)
Smoothing parameters:
alpha: 0
beta : 0
gamma: 0.320561
Coefficients:
[,1]
a 635.673611
b 1.527098
s1 390.452924
s2 102.956374
s3 178.805097
s4 -165.582414
s5 -19.326198
s6 69.462963
s7 4.290053
s8 297.307655
s9 22.225080
s10 -66.213040
s11 -147.500934
s12 61.784102
Forecast 3 Obs.
Predicting the next three future values for inspection date field.
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jul 2020 1027.6536 491.3142 1563.993 207.393255 1847.914
Aug 2020 741.6842 205.3447 1278.024 -78.576198 1561.945
Sep 2020 819.0600 282.7206 1355.399 -1.200377 1639.320
Forecast Plot
#ARIMA Models
The arima() function can be used to fit an autoregressive integrated moving averages model. Other useful functions include:
Note that the forecast package has somewhat nicer versions of acf() and pacf() called Acf() and Pacf() respectively.
- fit an ARIMA model of order P, D, Q
Call:
arima(x = Date_of_Vio)
Coefficients:
intercept
617.4556
s.e. 39.1210
sigma^2 estimated as 137741: log likelihood = -660.2, aic = 1324.39
Predictive Accuracy
ME RMSE MAE MPE MAPE MASE
Training set -1.91978e-13 371.1346 320.0657 -897.6897 928.5464 0.7726232
ACF1
Training set 0.07222558
Forecast 5 Obs.
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jul 2020 617.4556 141.8274 1093.084 -109.955 1344.866
Aug 2020 617.4556 141.8274 1093.084 -109.955 1344.866
Sep 2020 617.4556 141.8274 1093.084 -109.955 1344.866
Oct 2020 617.4556 141.8274 1093.084 -109.955 1344.866
Nov 2020 617.4556 141.8274 1093.084 -109.955 1344.866
AutoExponential Model
Automated forecasting using exponential model.
ETS(M,N,N)
Call:
ets(y = Date_of_Vio)
Smoothing parameters:
alpha = 1e-04
Initial states:
l = 617.4709
sigma: 0.608
AIC AICc BIC
1475.975 1476.254 1483.475
Predictive Accuracy
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.138956 371.1531 320.0882 -897.6451 928.5196 0.7788033 0.07221765
AutoARIMA model
Automated forecasting using ARIMA model.
Series: Date_of_Vio
ARIMA(0,0,0) with non-zero mean
Coefficients:
mean
617.4556
s.e. 39.1210
sigma^2 estimated as 139289: log likelihood=-660.2
AIC=1324.39 AICc=1324.53 BIC=1329.39
Predictive Accuracy
ME RMSE MAE MPE MAPE MASE
Training set -1.91978e-13 371.1346 320.0657 -897.6897 928.5464 0.7787486
ACF1
Training set 0.07222558
X11 Decomposition
Another popular method for decomposing quarterly and monthly data is the X11 method which originated in the US Census Bureau and Statistics Canada.
This method is based on classical decomposition, but includes many extra steps and features in order to overcome the drawbacks of classical decomposition that were discussed in the previous section. In particular, trend-cycle estimates are available for all observations including the end points, and the seasonal component is allowed to vary slowly over time. X11 also has some sophisticated methods for handling trading day variation, holiday effects and the effects of known predictors. It handles both additive and multiplicative decomposition. The process is entirely automatic and tends to be highly robust to outliers and level shifts in the time series.
The details of the X11 method are described in Dagum & Bianconcini (2016). Here we will only demonstrate how to use the automatic procedure in R.
The X11 method is available using the seas() function from the seasonal package for R.
Series: Date_of_Vio
ARIMA(3,1,1)
Coefficients:
ar1 ar2 ar3 ma1
0.0646 -0.1353 -0.0589 -0.9579
s.e. 0.1115 0.1091 0.1126 0.0423
sigma^2 estimated as 144495: log likelihood=-654.33
AIC=1318.65 AICc=1319.37 BIC=1331.09
Residuals
Checking residuals.
Ljung-Box test
data: Residuals from ARIMA(3,1,1)
Q* = 11.545, df = 14, p-value = 0.6428
Model df: 4. Total lags used: 18
References
Kabacoff, R. “Time Series and Forecasting”. DataCamp, 2017. https://www.statmethods.net/advstats/timeseries.html
Athanasopoulos, G. and Hyndman, R. “Forecasting: Principles and Practice”. https://otexts.com/fpp2/x11.html
Wikipedia, “Autoregressive integrated moving average”. https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average