Introduction

An ARIMA analysis was performed on restaurant violations by inspection date in New York City using the Forecast package in R. The analysis is associated with the Data Incubator Capstone project meant to explore possibilities using ARIMA on the data. This document was reproduced from the referenced sources.

In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). ARIMA models are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the “integrated” part of the model) can be applied one or more times to eliminate the non-stationarity.

Time Series

The ts() function will convert a numeric vector into an R time series object. The format is ts(vector, start=, end=, frequency=) where start and end are the times of the first and last observation and frequency is the number of observations per unit time (1=annual, 4=quartly, 12=monthly, etc.).

Seasonal Decomposition

A time series with additive trend, seasonal, and irregular components can be decomposed using the stl() function. Note that a series with multiplicative effects can often by transformed into series with additive effects through a log transformation (i.e., newts <- log(myts)).

Monthly plot

Seasonal Plot

Exponential Models

Both the HoltWinters() function in the base installation, and the ets() function in the forecast package, can be used to fit exponential models.

Holt-Winters exponential smoothing without trend and without seasonal component.

Call:
HoltWinters(x = Date_of_Vio, beta = FALSE, gamma = FALSE)

Smoothing parameters:
 alpha: 0.1407241
 beta : FALSE
 gamma: FALSE

Coefficients:
      [,1]
a 635.4118

Exponential Plot

Double Exponential Model

Double Exponential models level and trend.

Holt-Winters exponential smoothing with trend and without seasonal component.

Call:
HoltWinters(x = Date_of_Vio, gamma = FALSE)

Smoothing parameters:
 alpha: 0.4960217
 beta : 0.3238853
 gamma: FALSE

Coefficients:
       [,1]
a 399.17972
b -72.67498

Double Exponential Plot

Triple Exponential Model

Triple Exponential models that include level, trend, and seasonal components.

Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = Date_of_Vio)

Smoothing parameters:
 alpha: 0
 beta : 0
 gamma: 0.320561

Coefficients:
           [,1]
a    635.673611
b      1.527098
s1   390.452924
s2   102.956374
s3   178.805097
s4  -165.582414
s5   -19.326198
s6    69.462963
s7     4.290053
s8   297.307655
s9    22.225080
s10  -66.213040
s11 -147.500934
s12   61.784102

Triple Exponential Plot

Forecast 3 Obs.

Predicting the next three future values for inspection date field.

         Point Forecast    Lo 80    Hi 80      Lo 95    Hi 95
Jul 2020      1027.6536 491.3142 1563.993 207.393255 1847.914
Aug 2020       741.6842 205.3447 1278.024 -78.576198 1561.945
Sep 2020       819.0600 282.7206 1355.399  -1.200377 1639.320

Forecast Plot

#ARIMA Models

The arima() function can be used to fit an autoregressive integrated moving averages model. Other useful functions include:

Note that the forecast package has somewhat nicer versions of acf() and pacf() called Acf() and Pacf() respectively.

  • fit an ARIMA model of order P, D, Q

Call:
arima(x = Date_of_Vio)

Coefficients:
      intercept
       617.4556
s.e.    39.1210

sigma^2 estimated as 137741:  log likelihood = -660.2,  aic = 1324.39

Predictive Accuracy

                       ME     RMSE      MAE       MPE     MAPE      MASE
Training set -1.91978e-13 371.1346 320.0657 -897.6897 928.5464 0.7726232
                   ACF1
Training set 0.07222558

Forecast 5 Obs.

         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
Jul 2020       617.4556 141.8274 1093.084 -109.955 1344.866
Aug 2020       617.4556 141.8274 1093.084 -109.955 1344.866
Sep 2020       617.4556 141.8274 1093.084 -109.955 1344.866
Oct 2020       617.4556 141.8274 1093.084 -109.955 1344.866
Nov 2020       617.4556 141.8274 1093.084 -109.955 1344.866

Forecast Plot

AutoExponential Model

Automated forecasting using exponential model.

ETS(M,N,N) 

Call:
 ets(y = Date_of_Vio) 

  Smoothing parameters:
    alpha = 1e-04 

  Initial states:
    l = 617.4709 

  sigma:  0.608

     AIC     AICc      BIC 
1475.975 1476.254 1483.475 

Predictive Accuracy

                   ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
Training set 0.138956 371.1531 320.0882 -897.6451 928.5196 0.7788033 0.07221765

Plot

AutoARIMA model

Automated forecasting using ARIMA model.

Series: Date_of_Vio 
ARIMA(0,0,0) with non-zero mean 

Coefficients:
          mean
      617.4556
s.e.   39.1210

sigma^2 estimated as 139289:  log likelihood=-660.2
AIC=1324.39   AICc=1324.53   BIC=1329.39

Predictive Accuracy

                       ME     RMSE      MAE       MPE     MAPE      MASE
Training set -1.91978e-13 371.1346 320.0657 -897.6897 928.5464 0.7787486
                   ACF1
Training set 0.07222558

X11 Decomposition

Another popular method for decomposing quarterly and monthly data is the X11 method which originated in the US Census Bureau and Statistics Canada.

This method is based on classical decomposition, but includes many extra steps and features in order to overcome the drawbacks of classical decomposition that were discussed in the previous section. In particular, trend-cycle estimates are available for all observations including the end points, and the seasonal component is allowed to vary slowly over time. X11 also has some sophisticated methods for handling trading day variation, holiday effects and the effects of known predictors. It handles both additive and multiplicative decomposition. The process is entirely automatic and tends to be highly robust to outliers and level shifts in the time series.

The details of the X11 method are described in Dagum & Bianconcini (2016). Here we will only demonstrate how to use the automatic procedure in R.

The X11 method is available using the seas() function from the seasonal package for R.

Series: Date_of_Vio 
ARIMA(3,1,1) 

Coefficients:
         ar1      ar2      ar3      ma1
      0.0646  -0.1353  -0.0589  -0.9579
s.e.  0.1115   0.1091   0.1126   0.0423

sigma^2 estimated as 144495:  log likelihood=-654.33
AIC=1318.65   AICc=1319.37   BIC=1331.09

Residuals

Checking residuals.


    Ljung-Box test

data:  Residuals from ARIMA(3,1,1)
Q* = 11.545, df = 14, p-value = 0.6428

Model df: 4.   Total lags used: 18

Residuals Autoplot

Residuals Forecast Autoplot

References