Workshop 3, Time Series

Author

Alberto Dorantes, Ph.D.

Published

May 28, 2023

Abstract

The main purpose of this workshop is to illustrate the foundations of Autoregressive Integrated Moving Average Seasonal models (ARIMA/SARIMA). We will learn by doing. We will work with a model for a macro-economic index, the IGAE (Índice General de la Actividad Económica), published by INEGI.

1 General directions for this Workshop

You will work in RStudio. Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop.

At the beginning of the R Notebook write Workshop 3 - Time Series and your name (as we did in previous workshop).

You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question. It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.

You have to keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file. Pay attention in class to know how to generate an html file from your .Rmd.

2 Introduction

The main purpose of this workshop is to illustrate what is the an Autoregressive Integrated Moving Average Seasonal model (ARIMA/SARIMA model) and how we can estimate a basic model.

This methodology is part of the Box-Jenkins models for time series, which provide a series of tools to understand past behaviors and predict future values of time series variables.

Given a time series Y_{t}, the ARIMA/SARIMA model can be be denoted as ARIMA(p,d,q) SARIMA(P,D,Q,#annual periods). We will see later what are these parameters of the model (p,d,q,P,D,Q,#annual periods).

In an ARIMA/SARIMA model we try to understand current values of a time-series variable using its own past values and past random shocks; the model does not include other explanatory variables (X’s variables) like in a traditional linear regression model. If we add external explanatory variables to an ARIMA/SARIMA model, then this model is usually called ARIMAX.

Then, the Y_t time-series variable in an ARIMA/SARIMA model is explained by:

Its own past values; these values are called LAGvalues or just LAGS. The value of last period (t-1) is called LAG 1; the value of 2 periods ago (t-2) is called LAG 2, and so forth. These LAGS are called AR(p) terms, where p is number of previous LAGS to be included. For example, AR(2) indicates that we include LAG 1 and LAG 2 values as explanatory variables.
Past random shocks; these values are called LAG errors or LAG shocks. The shock of last period (t-1) is called LAG 1 error or LAG 1 shock; the shock of 2 periods ago (t-2) is called LAG 2 error or LAG 2 shock, and so forth. These LAG errors are called MA(q) terms, where q is the # of previous LAG shocks to be included. For example, MA(2) indicates that we include LAG 1 error and LAG 2 error as explanatory variables.
Its own seasonal values; A season refers to 1 year. These values are called seasonal LAG values. If we have monthly data (12 periods in 1 year), then the value of last year (t-12) is called seasonal LAG 1; the value of 2 years ago (t-24) is called seasonal LAG 2, and so forth. If we have quarterly data, then we have to go back 4 periods for last year (t-4), and so forth. These LAGS are called Seasonal AR(P) terms, where P is the number of previous seasonal LAGS to be included.
Past seasonal random shocks; these values are called seasonal LAG errors or seasonal LAG shocks. If we have monthly data (12 periods in 1 year), the shock of last year (t-12) is called seasonal LAG 1 error or seasonal LAG 1 shock; the shock of 2 years ago (t-24) is called seasonal LAG 2 error or seasonal LAG 2 shock, and so forth. These seasonal LAG errors are called Seasonal MA(Q) terms, where Q is the # of previous seasonal LAG shocks to be included.

Then, the following parameters of an ARIMA/SARIMA model need to be estimated:

p: # of AR terms to be included as explanatory variables
d: # of first differences applied to the original variable
q: # of MA terms to be included as explanatory variables
P: # of Seasonal AR terms to be included as explanatory variables
D: # of Seasonal difference applied to the variable
Q: # of Seasonal MA terms to be included as explanatory variables

The d parameter refers to how many first differences we apply to the original time-series variables to make the series stationary.

The D parameter refers to how many seasonal differences we decide to apply to the variable.

Let’s start with models with only AR terms, AR(p) models, that can be named ARIMA(p,0,0), or ARIMA(p,1,0).

3 The AR(p) model - ARIMA(p,d,0)

In general the autoregressive models AR(p) assume that there is some kind of autocorrelation phenomenon in the time series. The concept of autocorrelation refers to the correlation of the variable today at time t with its past values (t-n). Autocorrelation is the building block behind these models.

In time-series Econometrics we can talk about an autocorrelation in the dependent variable (also called, endogeneous variable). This means the current or future values of an economic or financial variable can be explained somehow by its previous values. This is the foundation of what econometricians have called an Autoregressive Model also denoted as AR(p) model or ARIMA(p,d,0) model.

The AR(p) model for the Y_t variable can be expressed as:

Y_{t}=\phi_{0}+\phi_{1}Y_{t-1}+\phi_{2}Y_{t-2}+...+\phi_{p}Y_{t-p}+\epsilon_{t}

Where \phi_{1},...\phi_{p} are the parameters of the model, \epsilon_{t} refers to the error term or random shock. We can express the same equation as follows:

Y_{t}=\phi_{0}+\sum_{k=1}^{p}\phi_{k}Y_{t-k}+\epsilon_{t}

In order to apply this model we have to prove the series Y_{t} is stationary. We can use the Dicky-Fuller test to decide whether the series is stationary. If the original series is not stationary, we can calculate the first difference of the series, and most of the time the resulting series will become stationary.

In linear regression models, the method to estimate the regression coefficients is the Ordinary Least Squares (OLS). For ARIMA models, the best method to estimate the autoregressive coefficients is the Maximum Likelihood (ML) method. The ML method works with iterations, while OLS has an analytic solution (in othe words, a formula). Fortunately, R can estimate this for you. In this model, we assume that the errors are not correlated among them. If that is the case, that means that we can include another autoregressive term to have a better model.

Now I will explain what is a moving average model.

4 The MA(q) model - ARIMA(0,d,q)

The AR(p) model is a process with a long-term memory. Why? this is because a shock today will impact in the future forever. If you read the mathematical equation, this can be interpreted as follows. The value of the series tomorrow will depend on the value of today, after applying the filter \phi_1. Then, the value of tomorrow already has information about the value of today. And the value of tomorrow will impact the value of 2 days from now. Then, the value of today will have an impact in the whole future of the series.

However, the impact of the value of today will be diminishing exponentially according to (\phi_1)^N, where N is the number of periods in the future. Since |\phi1|<1, then the bigger the exponent N -which refers to a further future period- the smaller the effect.

If we want to model a series with a short-term memory, then we have to think in a different equation:

Y_{t}=\theta_{0}+\theta_{1}\epsilon_{t-1}+\theta_{2}\epsilon_{t-2}+...+\theta_{q}\epsilon_{t-q}+\epsilon_{t}

This is called a Moving-Average model with q terms. In this case, the value of today is not directly affected by its previous own value, but the past random shocks. In the case of q=1, this process does not have long memory since the value of today is only affected by the shock of yesterday.

The more q terms included into the model, then the longer the memory of the model. Most of the cases, real-world series only include very few q terms.

In the MA(q) model, \theta_{0}, … \theta_{q} are the parameters of the model, \epsilon_{t-k} refers to past external errors/shocks, and the current shock is defined as \epsilon_{t}, so:

Y_{t}=\theta_{0}+\epsilon_{t}+\sum_{k=1}^{q}\theta_{k}\epsilon_{t-k}

We can interpret a pure MA model as a random process where the current value of the series is influenced by some percentages of previous random q shocks/errors and the current random shock. The percentages are actually the values of the coefficients \theta.

In Finance, when we want to model the behavior of market returns, we know that there are thousands of events that affect how the market returns are moving today. These thousands of events are basically summarized in a “random shock” that nobody knows what will be its magnitude and direction (positive or negative effect).

Now I will put together both AR and MA model into one: the ARIMA(p,d,q) model.

5 ARIMA(p,d,q) model

Now we can combine both AR(p) terms and MA(q) terms into ONE model, that is called ARMA(p,q):

Y_{t}=\phi_{0}+\sum_{k=1}^{p}\phi_{k}Y_{t-k}+ \sum_{k=1}^{q}\theta_{k}\epsilon_{t-k} + \epsilon_{t}

We combine AR and MA terms to model Y in case Y can be modeled including short-term and long-term memory. The long-term memory is modeled with the AR terms, while the short-term memory is modeled with the MA terms. We will practice with an example to illustrate this theoretical model.

6 Hand-on activity

We will run a first version of an ARIMA-SARIMA model for the IGAE index (Índice General de Actividad Económica) developed by INEGI (Instituto Nacional de Estadística y Geografía). Since 1993 INEGI calculates this index based on the Mexican general economy and different industry sectors.

6.1 Data collection and data management

For practical purposes, I already downloaded the IGAE dataset in Excel. Below I show you how to get this igae.xls file from Canvas, and also from the INEGI site (for getting updated data in the future).

6.2 Downloading from Canvas

Go to Canvas/Week 3 and download the inegi.xls file and copy into the folder where you saved this workshop. Then, go to the subsection Importing the Excel file to continue.

6.3 Downloading from the INEGI site.

If you want to download the most recent IGAE data, follow directions to generate the inegi.xls file:

Go to INEGI and download the IGAE series. Go to the following link:

https://www.inegi.org.mx/temas/igae/#Herramientas

Click in “Banco de Información Económica (BIE)” and then select:

Indicador glogal de la actividad económica, base 2013

Expand the option: “Series originales” and, then expand the option “Índice de volumen físico”.

Check-mark the option “Total (Índice base 2013=100)”

Click the button “Consulta”. You will see tabulated data for the IGAE.

Select oldest year (1993 in “Periodo desde”). Finally click the download button and select the Excel format. The file be downloaded to your computer.

Open the Excel file and drop the first rows before the tabluated data, and only leave the period column and the IGAE Total index. RENAME the second column as IGAE. Review the Period column, and edit if there are some dates with notes.

Save the Excel file as igae.xls in the same folder of your Workshop.

6.4 Importing the Excel file

library(readxl)
library(xts)
library(zoo)
library(tseries)
library(forecast)
library(astsa)
library(lmtest)

igae <- read_excel("igae.xlsx")
# I convert the igae data frame into a time series R dataset.
# This is convenient when working with ARIMA-SARIMA models:
igae<-ts(coredata(igae$igae),start=c(1993,1),frequency=12)
# The ts function transform the dataset from a data frame to a ts object
# I can see the content of IGAE in a table format:
igae

           Jan       Feb       Mar       Apr       May       Jun       Jul
1993  60.40769  61.02252  63.94325  61.86598  63.61290  62.88259  62.79795
1994  63.02927  62.73316  65.79027  65.89172  66.85880  66.63579  64.70206
1995  65.10234  60.29419  62.92987  59.23909  61.17677  61.08234  59.77917
1996  64.59736  63.25033  65.00435  63.82202  66.56896  65.57817  65.91223
1997  67.92226  66.01572  67.08871  70.18939  71.25363  70.71757  70.48390
1998  72.28125  71.26764  75.30751  72.95238  75.13802  74.64519  74.61446
1999  74.28381  72.70940  76.99211  74.32923  76.92304  76.55675  76.31262
2000  77.93003  77.36732  80.35795  77.00756  82.34331  81.54154  79.62973
2001  79.42346  76.41926  80.76787  77.53115  81.78409  80.60701  79.32531
2002  77.38629  75.10932  77.12227  80.83024  82.06994  79.72127  79.92718
2003  79.27239  77.17550  80.33976  79.95093  82.28842  81.44337  80.93570
2004  80.93007  79.19382  84.92785  82.93623  85.23905  85.62087  82.94023
2005  82.25837  80.89942  83.62640  86.59373  87.88889  86.35209  83.78420
2006  87.04556  84.08297  89.54008  87.10250  93.13935  91.36864  88.35889
2007  88.77729  85.76828  91.42205  89.32063  94.32237  93.53460  91.41102
2008  90.95435  88.98933  89.05326  94.71533  94.86436  94.45588  94.15931
2009  84.89588  81.84098  86.87355  84.31067  86.08159  88.18426  88.73169
2010  86.86702  85.22716  92.29229  90.58182  92.30005  93.23105  92.32869
2011  90.20546  88.56874  95.82899  91.60005  95.94630  96.48831  95.21792
2012  94.67463  94.07903  99.09222  95.47932 100.31525  99.68684  99.31091
2013  97.96857  94.67101  97.06078  99.91622 102.27528  99.51754 100.80982
2014  98.75516  96.95655 101.49486 100.31857 104.86084 102.89404 104.13870
2015 102.31974  99.87052 104.96609 103.68789 106.19792 107.31706 107.46007
2016 104.41891 104.58013 105.90846 107.29258 109.06242 109.96157 107.52493
2017 108.41992 105.28369 111.90442 106.20187 112.30309 112.79730 108.95506
2018 111.01484 107.81021 111.19756 111.60068 115.82727 114.44395 112.80196
2019 112.52387 108.84469 112.29140 109.81835 115.24003 112.83183 113.37459
2020 112.34858 108.82988 110.00435  88.26705  89.00929  97.95509 102.42161
2021 105.54571 102.80294 111.51864 107.83487 111.17681 110.80350 109.21461
2022 107.22904 105.54008 112.25941 109.61857 114.21407 112.79007 111.28421
2023 112.09953 109.55038 115.29381                                        
           Aug       Sep       Oct       Nov       Dec
1993  62.02789  62.01834  62.83480  63.62177  66.12119
1994  65.79634  65.76477  66.73068  67.84687  68.06744
1995  61.06820  60.84772  60.21824  63.06020  65.41346
1996  65.36457  65.03774  67.82464  68.55904  69.26845
1997  70.02845  70.75158  73.23728  72.75789  73.78044
1998  73.47925  73.76732  74.46418  74.44344  75.67735
1999  75.67001  76.24775  76.45465  77.67013  78.05347
2000  81.19074  80.16487  80.80958  80.91619  79.49830
2001  80.58703  78.26957  79.97355  80.09390  79.09247
2002  80.30576  78.38018  81.60774  80.09518  80.49051
2003  79.41995  79.18878  81.87911  80.63451  83.33037
2004  83.13244  82.19751  84.36268  85.94154  86.23057
2005  86.43120  84.64256  86.85310  88.65732  89.15110
2006  90.24619  87.86910  91.59749  91.14520  90.86138
2007  92.21733  89.47518  95.27999  93.80926  92.45720
2008  91.49309  90.57651  95.50683  92.04270  91.89505
2009  86.55360  86.50378  91.12057  91.24910  91.85630
2010  91.56219  90.55653  93.84529  95.96933  95.59592
2011  96.53903  94.10211  97.43771 101.08275  98.81575
2012  99.09917  95.18924 101.85991 104.38649 100.52816
2013 100.27357  96.80675 103.66720 104.23167 102.80159
2014 101.73624 100.01548 106.81106 106.47984 106.88133
2015 105.48270 105.02288 109.29465 109.09913 109.26299
2016 108.94846 106.06076 110.56435 114.14576 112.50988
2017 111.77032 106.32204 112.84842 116.17877 113.99772
2018 114.32556 108.76835 115.96147 117.72689 113.10769
2019 113.23239 108.85413 115.36387 116.26350 113.35962
2020 102.83827 103.51214 109.45849 111.32584 111.19313
2021 106.90935 104.20553 108.24911 112.53449 112.65367
2022 112.91704 109.50577 113.10130 116.23547 115.65745
2023

7 Visualize the data

Plot the IGAE index to see how it has grown over time

plot(igae)

What do you observe? Compared to previous crisis in Mexico during 1994-1995 and 2008-2009, how does the COVID crisis look like?

We see that the decline of the index for the COVID crisis was much profound compared with previous crisis (2008 mortgage crisis and the internal 1994 crisis)

8 Stationary vs Non-stationary

We first have to find a version of the IGAE series that is a stationary series. For any economic or financial series I strongly recommend to start calculating the logarithm of the variable:

logigae = log(igae)

I can plot the log of the IGAE to have an idea whether it is stationary:

plot(logigae)

It is clear that the log series is not stationary, so I do not need to run a Dicky-Fuller test to check stationarity for the log. I have to do a difference of the series and then do the Dicky-Fuller test.

For financial and economic monthly data I strongly recommend to start calculating the seasonal log difference and check whether this series is stationary. If it is, then use this version for your model.

In case that the seasonal log difference is not stationary, then you can calculate the first difference, and most of the time this first difference will be stationary.

Then, we start checking whether the seasonal log of the IGAE is a stationary series. We start with a plot of this variable, and then run the the corresponding Dicky-Fuller test:

seasonaldiflog = diff(logigae,lag=12)

plot(seasonaldiflog)

The seasonal log difference of the series is actually the annual % change month by month of the IGAE (in continuously compounded percentages).

Let’s see why the seasonal log difference is the annual % change of the series:

diff(logigae,12) = logigae - lag(logigae,12)

lag(logigae,12) is the value of the log IGAE, but 12 months ago!

In this case the diff function gets the difference from a current value of the log minus the log value 12 months ago. This is actually a % annual change since we are taking the difference with 12 months ago.

It is hard to our eyes to see whether the seasonal log series is stationary. We need to run the Dicky-Fuller test:

adf.test(seasonaldiflog,k=1)


    Augmented Dickey-Fuller Test

data:  seasonaldiflog
Dickey-Fuller = -5.7989, Lag order = 1, p-value = 0.01
alternative hypothesis: stationary

Since the p-value of the Dicky-Fuller test is 0.01<0.05, then we can reject the null hypothesis and accept the alternative hypothesis. Then we can conclude that the seasonal log of the IGAE is a stationary series. Then we will use this version of the variable to do our model.

9 Autocorrelations (AC) and Partial-autocorrelations (PAC)

Once we determine which version of the variable is stationary, then we need to examine autocorrelations and partial autocorrelations of this variable. We can examine this with a correlogram.

In time series analysis, a correlogram is an chart of correlation statistics also known as an autocorrelation plot. This is a plot of the sample autocorrelations of the dependent variable Y_t and the time lags. In R, the function to compute the autocorrelations is acf2() from the astsa package, which gives you both, the autocorrelation (ACF) figures and the partial-autocorrelations (PACF), as well.

Let’s examine the autocorrelations of the seasonal log of the IGAE.

You need to install the astsa package to do auto-correlation plots.

library(astsa)
acf2(seasonaldiflog,max.lag = 24)

     [,1] [,2] [,3]  [,4] [,5]  [,6]  [,7] [,8]  [,9] [,10] [,11] [,12] [,13]
ACF  0.79 0.64 0.53  0.41 0.35  0.28  0.18 0.13  0.06 -0.05 -0.13 -0.22 -0.17
PACF 0.79 0.05 0.04 -0.08 0.08 -0.03 -0.11 0.02 -0.07 -0.17 -0.08 -0.10  0.31
     [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
ACF  -0.10 -0.07 -0.03 -0.01 -0.04 -0.05 -0.03 -0.05 -0.05 -0.04 -0.11
PACF  0.08  0.01  0.00  0.03 -0.15 -0.08  0.11 -0.10 -0.15  0.02 -0.20

The acf2() function gives you the autocorrelogram, which is the autocorrelations in a graph where it is easy to identify which lags of the variable have a significant autocorrelation.

The X axis represents the lag numbers, while the Y axis is the level of autocorrelation. The vertical lines are the levels of autocorrelations between the current value of Y (Y_t) and its own lags (Y_{t-1}, Y_{t-2}...).

The dotted, blue line delimits the 95% confidence interval for the autocorrelations. Then, if the autocorrelation goes out of the 95% C.I., then the autocorrelation of the variable Y_t with its corresponding lag Y_{t-1} (AutoCorr(Y_t, Y_{t-1})) is statistically different than zero.

We can see that the LAG 1 (Y_{t-1}) is significantly autocorrelated (positively) with the current value of the variable (Y_t) since this autocorrelation is significantly greater than zero.

What is the meaning of the first significant LAG? In this case, this means that the value of Y at any point in time t has an average autocorrelation about 0.80 with its own value at t-1. We can see this 0.80 as the height of the first vertical line of the ACF plot.

The autocorrelation of Y_t with its previous value Y_{t-1} can be calculated manually as the correlation, which is equal to their covariance divided by the product of their standard deviation.

Remembering the formula for correlation between 2 variables Y and X:

CORR(Y,X)=\frac{COV(Y,X)}{SD(Y)*SD(X)}

In the case of “Autocorrelation”, we only have one variable Y, so we are looking at the correlation of Y_t with its own previous value Y_{t-1}. Then, applying the correlation formula between Y_t and Y_{t-1} (instead of X), we can calculate this autocorrelation as:

AUTOCORR(Y_t,Y_{t-1})=\frac{AUTOCOV(Y_t,Y_{t-1})}{SD(Y_t)*SD(Y_{t-1})} Since Y is supposed to be stationary series, then the standard deviation of Y_t has to be the same as the standard deviation of Y_{t-1}. Then:

AUTOCORR(Y_t,Y_{t-1})=\frac{AUTOCOV(Y_t,Y_{t-1})}{VAR(Y_t)} = \frac{\gamma\left(1\right)}{\sigma^{2}_y}

Fortunately, R does the calculation of all the autocorrelations of Y_t with its own lags, and these autocorrelations are plotted using the acf2 command (as seen above).

Unlike the ACF plot, the partial-autocorrelation (PACF) plot shows the autocorrelation of the lag with the variable after considering the effect of the lower-order autocorrelations.

The autocorrelation and partial autocorrelation between Y_{t} and its LAG 1 (Y_{t-1}) are the same, since there is no lower-order autocorrelations for LAG 1.

The autocorrelation between Y_{t} and its LAG2 (Y_{t-2}) and the partial autocorrelation between Y_{t} and its LAG2 (Y_{t-2}) will be different since the partial autocorrelation is calculated AFTER the autocorrelation between Y_{t} and Y_{t-1} is taken into consideration.

In other words, the PACF plot shows the amount of autocorrelation between Y_{t} and its lag K (Y_{t-k}) that is not explained by all the lower-order autocorrelations (Y_{t} with (Y_{t-k-n}).

The ACF plot shows that only the LAG 1 autocorrelation is positive and significant, and the following LAG autocorrelations are also positive but not significant.

We see a fast decay in the magnitude of the LAG autocorrelations in the PACF, and a low decay in the magnitud of the LAG autocorrelations in the ACF plot.

In the partial autocorrelation (PACF) we can see that ONLY the autocorrelation of LAG 1 is significant and positive, and immediately the rest of the autocorrelation go close to zero suddenly. This is a classical AR(1) signature with 1 AR terms. We do not need to include other terms further than LAG 1.

Then we can see that the partial autocorrelation plot (PACF) helps us to identify the # of AR terms. Then, according to this PACF we see that only LAG 1 is autocorrelated with the current value AFTER considering the effect of the lower-level autocorreations of lags 2,3,4, etc. In other words, the annual %change in the IGAE of the current month seems to be positively correlated with its own annual % changes of the previous month.

10 Calibrating the ARMIA-SARIMA model

The calibration process of an ARIMA-SARIMA is not straight-forward. It is recommended to learn this process, but for this course we will use the automated algorithm auto.arima, which is part of the forecast library.

The objective of the calibration of an ARIMA-SARIMA model is to find the right values for p, q, P and Q. It is recommended to start indicating the values of d and D after we found the stationary version of the variable. In this case, we found that the seasonal difference of the log of IGAE is stationary, so the parameters for d and D are:

d=0; D=1

If D=1 it means that we are doing a seasonal difference (annual % growth), and d=0 indicates that we are NOT doing a first difference (monthly % growth).

The values for p,q, P, Q are rarely greater than 2. We need to set the auto.arima function with maximum values of these parameters. We recommend to set p=2 as maximum, and the rest q, P, Q=1:

You need to install the forecast package to run the auto.arima function:

library(forecast)
model1 <- auto.arima(igae,
              lambda = 0, # this means that the model will transform the variable with its natural logarithm  
              d=0,D=1, # we are modeling the annual % growth month by month (not the monthly %growth)
              max.p=2, max.q=1, # setting the maximum values for p and q
              max.P = 1,max.Q = 1 # setting the max. values for P and Q
              )
coeftest(model1)


z test of coefficients:

         Estimate  Std. Error  z value  Pr(>|z|)    
ar1    1.41808761  0.21241540   6.6760 2.455e-11 ***
ar2   -0.44209276  0.19500407  -2.2671 0.0233844 *  
ma1   -0.67649177  0.18570581  -3.6428 0.0002697 ***
sma1  -0.85152598  0.03661183 -23.2582 < 2.2e-16 ***
drift  0.00169775  0.00023565   7.2046 5.822e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This model ended with 2 AR terms, 1 MA term and 1 seasonal MA term. We can express this model in the following equation as follows:

Y_{t}=\phi_{0}+\phi_{1}Y_{t-1}+\phi_{2}Y_{t-2}+\theta_{1}error_{t-1}+\theta_{2}error_{t-12}+error_{t}

Where Y_t is the annual % growth of the IGAE (the seasonal difference of the log).

The sum of the \phi_{i} coefficients must always be less than 1. If you find that the sum of these coefficients is bigger than 1, then you need to change the calibration. Most of the time the auto.arima algorithm always ends up with a good model.

Then this model is an ARIMA(2,0,1)-SARIMA(0,1,1,12) model.

Remember the parameters of these models:

ARIMA(p,d,q) SARIMA(P,D,Q,#periods in a year)

The option lambda = 0 indicates that we will use the natural logarithm of the series. The D=1 indicates that the seasonal difference has to be made before estimating the model.

According to the result, we can see that the coefficients \phi_1 is positive and significant and \phi_2 is negative and significant (since their p-values are <0.05). Then we can say that the IGAE annual % change is positively and significantly related with its own annual % change of 1 month ago, and it is negatively and significantly related with its own annual % growth 2 months ago.

Also, the \theta_{1} coefficient (the MA coefficient) is negative and significant. This means that the annual growth of the IGAE is negatively related with the error (shock) of 1 month ago. The \theta_{2} coefficient (the seasonal MA coefficient) is negative and very significant. This means that the annual growth of the IGAE is negatively related with the error (shock) of 12 months ago.

Now we can use this model to do predictions.

If I want to do a forecast for 21 months (the end of 2024), I will use the forecast() function:

forecast_igae <- forecast(model1, h = 21)
autoplot(forecast_igae)

11 CHALLENGE 1

Check whether the residuals of model 1 is white-noise series. ALL calibrated models must have the residuals as white noise. Explain what is a white-noise variable and respond whether the residuals look like a white-noise.

12 CHALLENGE 2

Interpret the calibrated ARIMA-SARIMA model. Be sure to interpret the sign and magnitude of the coefficients! Pay attention in class to understand HOW TO INTERPRET the coefficients!

13 Reading

Go to the course site and carefully re-read the Notes a) Introduction to time-series financial models and b) re-read the theory explained in this workshop about the Introduction to ARIMA-SARIMA Models.

14 W3 submission

The grade of this Workshop will be the following:

Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions
Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
Not submitted (0%)

Remember that you have to submit your .html file through Canvas BEFORE NEXT CLASS.