Leveraging the R forecast package auto.arima functions ability to generate the best ARIMA model(model with the smallest AICc) for a time series. I leverage auto.arima ability and applies it directly to the time series without removing the stationarity effect to the Nigerian Deposit Money Banks Monthly Loan to Deposit Ratio (Jan. 2007 - April 2017). Previous analysis of the same dataset has shown the dataset is not stationary and would require one order differentiation.
Nigerian Deposit Money Banks Monthly Loan to Deposit Ratio (Jan. 2007 - April 2017) Central Bank of Nigeria Statistics Database
Allowing auto.arima generate the best model, one gets the ARIMA(0,1,0)(1,0,1)[12]. As stated in the text Forecasting principle auto.arima uses a short cut in other to speed up its computation to determine the best model which may not necessarily be the model with the smallest AICc. Like in the case of this model auto.arima shows that the model has both a non seasonal part (0,1,0) and a seasonal part (1,0,1) with 12 as the seasonal value.
#auto.arima model
arimafit <- auto.arima(loan_to_deposit)
summary(arimafit)
Series: loan_to_deposit
ARIMA(0,1,0)(1,0,1)[12]
Coefficients:
NaNs produced
sar1 sma1
-0.4186 0.3838
s.e. NaN NaN
sigma^2 estimated as 19.98: log likelihood=-358.7
AIC=723.41 AICc=723.61 BIC=731.84
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.1117394 4.451392 2.693037 -0.1663461 4.527217 0.2186715 -0.1409733
#plot fitted model and original time series
plot(loan_to_deposit, main= "Time Series with ARIMA(0,1,0)(1,0,1)[12] Model ", xlab="Year",ylab="loan-to-deposit ratio")
lines(fitted(arimafit), col="blue")
legend("bottomleft",lty=1, col=c("black","blue"),
c("Data","ARIMA(0,1,0)(1,0,1)[12]"),cex=0.80)
arimafit2 <- Arima(loan_to_deposit,order=c(0,1,0))
plot(loan_to_deposit, main= "Time Series with ARIMA(0,1,0) Model ", xlab="Year", ylab="Loan-to-Deposit Ratio")
lines(fitted(arimafit), col="blue")
lines(fitted(arimafit2), col="red")
legend("bottomleft",lty=1, col=c("black","blue","red"),
c("Data","ARIMA(0,1,0)(1,0,1)[12]","ARIMA(0,1,0)"),cex=0.80)
ARIMA(O,1,0)(1,0,1)[12] indicates that the non seasonal part of the model is a random walk model (0,1,0) while the seasonal part of the model (1,0,1) and is a monthly data [12]
plot(residuals(arimafit), main= "Residual:ARIMA(0,1,0)(1,0,1)[12] and ARIMA(0,1,0) Model ", ylab="Residual", xlab="Year")
lines(residuals(arimafit2), col="red")
legend("bottomleft",lty=1, col=c("black","red"),
c("ARIMA(0,1,0)(1,0,1)[12]","ARIMA(0,1,0)"),cex=0.80)
Using the residual plot above to compare the outcome of this model with the previous drift model shows that the ARIMA(0,1,0)(1,0,1)[12] is very similar to the ARIMA (0,1,0) with very similar residuals.
tsdisplay(residuals(arimafit))
Box.test(residuals(arimafit), lag=36, fitdf=6, type="Ljung")
Box-Ljung test
data: residuals(arimafit)
X-squared = 28.703, df = 30, p-value = 0.5332
The residuals shows that there are significant spikes in lag 12 for both ACF and PACF. With the p-value of 0.5332 it also shows that the model fails the Box-Ljung test. Other ARIMA models - ARIMA(0,1,0) and ARIMA(2,1,1) the result of disabling fast computation in auto.arima also provides models that fail the Box-Ljung test.
When the fast computation used in arriving the best model is disabled in auto.arima, the best model generated is ARIMA(2,1,1) indicating no seasonality part.
arimafit3 <- auto.arima(loan_to_deposit,stepwise=FALSE)
summary(arimafit3)
Series: loan_to_deposit
ARIMA(2,1,1)
Coefficients:
ar1 ar2 ma1
-0.7574 -0.0351 0.6233
s.e. 0.6346 0.1532 0.6262
sigma^2 estimated as 19.5: log likelihood=-357.22
AIC=722.44 AICc=722.78 BIC=733.69
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.1214725 4.397839 2.687355 -0.1711618 4.541871 0.2182101 -0.002261327
tsdisplay(residuals(arimafit3))
Comparing the three ARIMA models - ARIMA(0,1,0)(1,0,1)[12], ARIMA(0,1,0) and ARIMA(2,1,1), the ARIMA(0,1,0) has the smallest AICc value previously generated. Since the the three models have the same one order difference they can be compared using AICc. Based on this criteria ARIMA(0,1,0) would be a best model of the three.
#AICc values for the different Models
x<- arimafit$aicc
y<- arimafit2$aicc
z<- arimafit3$aicc
df1 <- data.frame( models = c("ARIMA(0,1,0)(1,0,1)[12]","ARIMA(0,1,0)","ARIMA(2,1,1)"),AICc=c(x,y,z))
knitr::kable(df1, caption = "AICs of the Different Models")
| models | AICc |
|---|---|
| ARIMA(0,1,0)(1,0,1)[12] | 723.6083 |
| ARIMA(0,1,0) | 719.6052 |
| ARIMA(2,1,1) | 722.7817 |
Compared using RMSE ARIMA(2,1,0) produces the lowest RMSE
# ARIMA(0,1,0)(1,0,1)[12]
accuracy(arimafit)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.1117394 4.451392 2.693037 -0.1663461 4.527217 0.2186715 -0.1409733
# ARIMA(0,1,0)
accuracy(arimafit2)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.1111027 4.454762 2.674006 -0.1614636 4.494982 0.2171262 -0.1453001
# ARIMA(2,1,1)
accuracy(arimafit3)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.1214725 4.397839 2.687355 -0.1711618 4.541871 0.2182101 -0.002261327
plot(residuals(arimafit), main= "Residuals-(0,1,0)(1,0,1)[12], ARIMA(0,1,0) & ARIMA(2,1,1) Models ", xlab="Year", ylab="Residual")
lines(residuals(arimafit2), col="red")
lines(residuals(arimafit3), col="blue")
legend("bottomleft",lty=1, col=c("black","red","blue"),
c("ARIMA(0,1,0)(1,0,1)[12]","ARIMA(0,1,0)","ARIMA(2,1,1)"),cex=0.80)
Observing the residual plots the three models looks very similar.