Abstract: The main purpose of this workshop is to illustrate the foundations of Autoregressive Integrated Moving Average Seasonal models (ARIMA/SARIMA). We will learn by doing. We will work with a model for a macro-economic index, the IGAE (Índice General de la Actividad Económica), published by INEGI.

##Introduction

The main purpose of this workshop is to illustrate what is the an Autoregressive Integrated Moving Average Seasonal model (ARIMA/SARIMA model) and how we can estimate a basic model.

This methodology is part of the Box-Jenkins models for time series, which provide a series of tools to understand past behaviors and predict future values of time series variables.

DETERMINAR EL COMPORTAMIENTO PASADO Y PREDECIR VALORES FUTUROS.

Given a time series Yt, the ARIMA/SARIMA model can be be denoted as ARIMA(p,d,q) SARIMA(P,D,Q,#annual periods). We will see later what are these parameters of the model (p,d,q,P,D,Q,#annual periods).

In an ARIMA/SARIMA model we try to understand current values of a time-series variable using its own past values and past random shocks; the model does not include other explanatory variables (X’s variables) like in a traditional linear regression model. If we add external explanatory variables to an ARIMA/SARIMA model, then this model is usually called ARIMAX.

Then, the Yt time-series variable in an ARIMA/SARIMA model is explained by:

· Its own past values; these values are called LAGvalues or just LAGS. The value of last period (t-1) is called LAG 1; the value of 2 periods ago (t-2) is called LAG 2, and so forth. These LAGS are called AR(p) terms, where p is number of previous LAGS to be included. For example, AR(2) indicates that we include LAG 1 and LAG 2 values as explanatory variables.

EL COMPORTAMIENTO DE SUS VALORES PASADOS.

· Past random shocks; these values are called LAG errors or LAG shocks. The shock of last period (t-1) is called LAG 1 error or LAG 1 shock; the shock of 2 periods ago (t-2) is called LAG 2 error or LAG 2 shock, and so forth. These LAG errors are called MA(q) terms, where q is the # of previous LAG shocks to be included. For example, MA(2) indicates that we include LAG 1 error and LAG 2 error as explanatory variables.

EL COMPORTAMIENTO DEL IMPACTO DE FACTORES ALEATORIOS PASADOS.

· Its own seasonal values; A season refers to 1 year. These values are called seasonal LAG values. If we have monthly data (12 periods in 1 year), then the value of last year (t-12) is called seasonal LAG 1; the value of 2 years ago (t-24) is called seasonal LAG 2, and so forth. If we have quarterly data, then we have to go back 4 periods for last year (t-4), and so forth. These LAGS are called Seasonal AR(P) terms, where P is the number of previous seasonal LAGS to be included.

LOS VALORES ESTACIONALES DE LA SERIE MISMA.

· Past seasonal random shocks; these values are called seasonal LAG errors or seasonal LAG shocks. If we have monthly data (12 periods in 1 year), the shock of last year (t-12) is called seasonal LAG 1 error or seasonal LAG 1 shock; the shock of 2 years ago (t-24) is called seasonal LAG 2 error or seasonal LAG 2 shock, and so forth. These seasonal LAG errors are called Seasonal MA(Q) terms, where Q is the # of previous seasonal LAG shocks to be included.

· EL COMPORTAMIENTO PASADO DEL IMPACTO DE FACTORES ALEATORIOS ESTACIONALES.

Then, the following parameters of an ARIMA/SARIMA model need to be estimated:

· p: # of AR terms to be included as explanatory variables

· d: # of first differences applied to the original variable

· q: # of MA terms to be included as explanatory variables

· P: # of Seasonal AR terms to be included as explanatory variables

· D: # of Seasonal difference applied to the variable

· Q: # of Seasonal MA terms to be included as explanatory variables

The d parameter refers to how many first differences we apply to the original time-series variables to make the series stationary.

The D parameter refers to how many seasonal differences we decide to apply to the variable.

Let’s start with models with only AR terms, AR(p) models, that can be named ARIMA(p,0,0), or ARIMA(p,1,0).

##The AR(p) model - ARIMA(p,d,0)

In general the autoregressive models AR(p) assume that there is some kind of autocorrelation phenomenon in the time series. The concept of autocorrelation refers to the correlation of the variable today at time t with its past values (t-n). Autocorrelation is the building block behind these models.

In time-series Econometrics we can talk about an autocorrelation in the dependent variable (also called, endogeneous variable). This means the current or future values of an economic or financial variable can be explained somehow by its previous values. This is the foundation of what econometricians have called an Autoregressive Model also denoted as AR(p) model or ARIMA(p,d,0) model.

The AR(p) model for the Yt variable can be expressed as: \[Y_{1}=\phi _{0}+\sum_{k=1}^{P}\phi _{k}Y_{t-k}+\varepsilon _{t}\]

In order to apply this model we have to prove the series Yt is stationary. We can use the Dicky-Fuller test to decide whether the series is stationary. If the original series is not stationary, we can calculate the first difference of the series, and most of the time the resulting series will become stationary.

In linear regression models, the method to estimate the regression coefficients is the Ordinary Least Squares (OLS). For ARIMA models, the best method to estimate the autoregressive coefficients is the Maximum Likelihood (ML) method. The ML method works with iterations, while OLS has an analytic solution (in other words, a formula). Fortunately, R can estimate this for you. In this model, we assume that the errors are not correlated among them. If that is the case, that means that we can include another autoregressive term to have a better model.

##The MA(q) model - ARIMA(0,d,q)

The AR(p) model is a process with a long-term memory. Why? this is because a shock today will impact in the future forever. If you read the mathematical equation, this can be interpreted as follows. The value of the series tomorrow will depend on the value of today, after applying the filter ϕ1. Then, the value of tomorrow already has information about the value of today. And the value of tomorrow will impact the value of 2 days from now. Then, the value of today will have an impact in the whole future of the series.

However, the impact of the value of today will be diminishing exponentially according to (ϕ1)N, where N is the number of periods in the future. Since |ϕ1|<1, then the bigger the exponent N -which refers to a further future period- the smaller the effect.

If we want to model a series with a short-term memory, then we have to think in a different equation. This is called a Moving-Average model with q terms. In this case, the value of today is not directly affected by its previous own value, but the past random shocks. In the case of q=1, this process does not have long memory since the value of today is only affected by the shock of yesterday.

The more q terms included into the model, then the longer the memory of the model. Most of the cases, real-world series only include very few q terms.

In the MA(q) model, θ0, … θq are the parameters of the model, ϵt−k refers to past external errors/shocks, and the current shock is defined as ϵt.

We can interpret a pure MA model as a random process where the current value of the series is influenced by some percentages of previous random q shocks/errors and the current random shock. The percentages are actually the values of the coefficients θ.

In Finance, when we want to model the behavior of market returns, we know that there are thousands of events that affect how the market returns are moving today. These thousands of events are basically summarized in a “random shock” that nobody knows what will be its magnitude and direction (positive or negative effect).

Now I will put together both AR and MA model into one: the ARIMA(p,d,q) model.

##ARIMA(p,d,q) model

Now we can combine both AR(p) terms and MA(q) terms into ONE model, that is called ARMA(p,q): \[Y_{1}=\phi _{0}+\sum_{k=1}^{P}\phi _{k}Y_{t-k}+\sum_{k=1}^{q}\theta _{k}\epsilon _{t-k}+\epsilon_{t}\] We combine AR and MA terms to model Y in case Y can be modeled including short-term and long-term memory. The long-term memory is modeled with the AR terms, while the short-term memory is modeled with the MA terms. We will practice with an example to illustrate this theoretical model.

##Hand-on activity

We will run a first version of an ARIMA-SARIMA model for the IGAE index (Índice General de Actividad Económica) developed by INEGI (Instituto Nacional de Estadística y Geografía). Since 1993 INEGI calculates this index based on the Mexican general economy and different industry sectors.

library(readxl)
library(xts)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(zoo)
library(tseries)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(forecast)
library(astsa)
## 
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
## 
##     gas
library(lmtest)

igae <- read_excel("igae.xls")
# I convert the igae data frame into a time series R dataset.
# This is convenient when working with ARIMA-SARIMA models:
igae<-ts(coredata(igae$IGAE),start=c(1993,1),frequency=12)
# The ts function transform the dataset from a data frame to a ts object
# I can see the content of IGAE in a table format:
igae
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1993  60.40769  61.02252  63.94325  61.86598  63.61290  62.88259  62.79795
## 1994  63.02927  62.73316  65.79027  65.89172  66.85880  66.63579  64.70206
## 1995  65.10234  60.29419  62.92987  59.23909  61.17677  61.08234  59.77917
## 1996  64.59736  63.25033  65.00435  63.82202  66.56896  65.57817  65.91223
## 1997  67.92226  66.01572  67.08871  70.18939  71.25363  70.71757  70.48390
## 1998  72.28125  71.26764  75.30751  72.95238  75.13802  74.64519  74.61446
## 1999  74.28381  72.70940  76.99211  74.32923  76.92304  76.55675  76.31262
## 2000  77.93003  77.36732  80.35795  77.00756  82.34331  81.54154  79.62973
## 2001  79.42346  76.41926  80.76787  77.53115  81.78409  80.60701  79.32531
## 2002  77.38629  75.10932  77.12227  80.83024  82.06994  79.72127  79.92718
## 2003  79.27239  77.17550  80.33976  79.95093  82.28842  81.44337  80.93570
## 2004  80.93007  79.19382  84.92785  82.93623  85.23905  85.62087  82.94023
## 2005  82.25837  80.89942  83.62640  86.59373  87.88889  86.35209  83.78420
## 2006  87.04556  84.08297  89.54008  87.10250  93.13935  91.36864  88.35889
## 2007  88.77729  85.76828  91.42205  89.32063  94.32237  93.53460  91.41102
## 2008  90.95435  88.98933  89.05326  94.71533  94.86436  94.45588  94.15931
## 2009  84.89588  81.84098  86.87355  84.31067  86.08159  88.18426  88.73169
## 2010  86.86702  85.22716  92.29229  90.58182  92.30005  93.23105  92.32869
## 2011  90.20546  88.56874  95.82899  91.60005  95.94631  96.48831  95.21792
## 2012  94.67463  94.07903  99.09222  95.47932 100.31525  99.68684  99.31091
## 2013  97.96858  94.67101  97.06079  99.91623 102.27529  99.51755 100.80982
## 2014  98.75513  96.95653 101.49481 100.31854 104.86081 102.89402 104.13869
## 2015 102.31977  99.87062 104.96625 103.68802 106.19806 107.31713 107.46008
## 2016 104.41881 104.57988 105.90821 107.29216 109.06156 109.96077 107.52441
## 2017 108.42006 105.28496 111.90569 106.20453 112.30643 112.80000 108.95594
## 2018 111.01173 107.81001 111.19094 111.59738 115.82252 114.44374 112.81183
## 2019 112.71950 109.09518 112.46967 109.98886 115.45875 113.04393 113.54114
## 2020 111.92756 108.40448 109.91800  87.99004  89.47630  98.01972 102.22197
## 2021 105.84460 102.91511                                                  
##            Aug       Sep       Oct       Nov       Dec
## 1993  62.02789  62.01834  62.83480  63.62177  66.12119
## 1994  65.79634  65.76477  66.73068  67.84687  68.06744
## 1995  61.06820  60.84772  60.21824  63.06020  65.41346
## 1996  65.36457  65.03774  67.82464  68.55904  69.26845
## 1997  70.02845  70.75158  73.23728  72.75789  73.78044
## 1998  73.47925  73.76732  74.46418  74.44344  75.67735
## 1999  75.67001  76.24775  76.45465  77.67013  78.05347
## 2000  81.19074  80.16487  80.80958  80.91619  79.49830
## 2001  80.58703  78.26957  79.97355  80.09390  79.09247
## 2002  80.30576  78.38018  81.60774  80.09518  80.49051
## 2003  79.41995  79.18878  81.87911  80.63451  83.33037
## 2004  83.13244  82.19751  84.36268  85.94154  86.23057
## 2005  86.43120  84.64256  86.85310  88.65732  89.15110
## 2006  90.24619  87.86910  91.59749  91.14520  90.86138
## 2007  92.21733  89.47518  95.27999  93.80926  92.45720
## 2008  91.49309  90.57651  95.50683  92.04270  91.89505
## 2009  86.55360  86.50378  91.12057  91.24910  91.85630
## 2010  91.56219  90.55653  93.84529  95.96933  95.59592
## 2011  96.53903  94.10211  97.43771 101.08275  98.81575
## 2012  99.09917  95.18924 101.85991 104.38649 100.52817
## 2013 100.27357  96.80675 103.66719 104.23166 102.80157
## 2014 101.73624 100.01550 106.81109 106.47987 106.88143
## 2015 105.48271 105.02281 109.29460 109.09904 109.26256
## 2016 108.94810 106.06088 110.56483 114.14690 112.51169
## 2017 111.77091 106.32100 112.84684 116.17409 113.99218
## 2018 114.33744 108.78172 115.99083 117.77276 113.17022
## 2019 113.42805 109.01316 115.46429 116.32435 113.47028
## 2020 102.93410 103.13199 109.33333 111.49701 110.43280
## 2021
plot(igae)

What do you observe? Compared to previous crisis in Mexico during 1994-1995 and 2008-2009, how does the COVID crisis look like?

EN EL GRÁFICO PODEMOS OBSERVAR EL COMPORTAMIENTO DEL IGAE (ÍNDICE GLOBAL DE LA ACTIVIDAD ECONÓMICA) DESDE 1993. COMO PODEMOS OBSERVAR, NI LA CRISIS DEL 94’, NI LA DEL 2008; TIENEN PUNTO DE COMPARACIÓN FRENTE LA DEL 2020 ORIGINADA POR LA PANDEMIA (SARS-COV-2). LA CRISIS ORIGINADA POR LA COVID-19 EN EL 2020 DESPLOMÓ POR MÁS DE 25 PUNTOS EL IGAE, A PESAR DE SU PRONTA RECUPERACIÓN, RETROCEDIMOS A NIVELES QUE NO HABÍAMOS VISTO DESDE EL AÑO 2011.

We see that the decline of the index for the COVID crisis was much profound compared with previous crisis (2008 mortgage crisis and the internal 1994 crisis)

##Stationary vs Non-stationary

We first have to find a version of the IGAE series that is a stationary series. For any economic or financial series I strongly recommend to start calculating the logarithm of the variable:

logigae = log(igae)

I can plot the log of the IGAE to have an idea whether it is stationary:

plot(logigae)

It is clear that the log series is not stationary, so I do not need to run a Dicky-Fuller test to check stationarity for the log. I have to do a difference of the series and then do the Dicky-Fuller test.

For financial and economic monthly data I strongly recommend to start calculating the seasonal log difference and check whether this series is stationary. If it is, then use this version for your model.

In case that the seasonal log difference is not stationary, then you can calculate the first difference, and most of the time this first difference will be stationary.

Then, we start checking whether the seasonal log of the IGAE is a stationary series. We start with a plot of this variable, and then run the corresponding Dicky-Fuller test:

seasonaldiflog = diff(logigae,lag=12)
plot(seasonaldiflog)

The seasonal log difference of the series is actually the annual % change month by month of the IGAE (in continuously compounded percentages).

Let’s see why the seasonal log difference is the annual % change of the series:

diff(logigae,12) = logigae - lag(logigae,12)

lag(logigae,12) is the value of the log IGAE, but 12 months ago!

In this case the diff function gets the difference from a current value of the log minus the log value 12 months ago. This is actually a % annual change since we are taking the difference with 12 months ago.

It is hard to our eyes to see whether the seasonal log series is stationary. We need to run the Dicky-Fuller test:

adf.test(seasonaldiflog, k=1)
## Warning in adf.test(seasonaldiflog, k = 1): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  seasonaldiflog
## Dickey-Fuller = -4.5256, Lag order = 1, p-value = 0.01
## alternative hypothesis: stationary

Since the p-value of the Dicky-Fuller test is 0.01<0.05, then we can reject the null hypothesis and accept the alternative hypothesis. Then we can conclude that the seasonal log of the IGAE is a stationary series. Then we will use this version of the variable to do our model.

##Autocorrelations (AC) and Partial-autocorrelations (PAC)

Once we determine which version of the variable is stationary, then we need to examine autocorrelations and partial autocorrelations of this variable. We can examine this with a correlogram.

In time series analysis, a correlogram is an chart of correlation statistics also known as an autocorrelation plot. This is a plot of the sample autocorrelations of the dependent variable Yt and the time lags. In R, the function to compute the autocorrelations is acf2() from the astsa package, which gives you both, the autocorrelation (ACF) figures and the partial-autocorrelations (PACF), as well.

Let’s examine the autocorrelations of the seasonal log of the IGAE.

library(astsa)
acf2(seasonaldiflog)

##      [,1] [,2] [,3]  [,4] [,5]  [,6]  [,7] [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF   0.8 0.72 0.65  0.53 0.47  0.39  0.28 0.22  0.15  0.05  0.01 -0.06 -0.05
## PACF  0.8 0.20 0.09 -0.15 0.03 -0.04 -0.15 0.02 -0.02 -0.12  0.00 -0.07  0.20
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF  -0.05 -0.07 -0.06 -0.03 -0.06 -0.05 -0.02 -0.04 -0.04 -0.01 -0.07 -0.06
## PACF  0.01  0.01  0.00  0.05 -0.10 -0.06  0.12 -0.07 -0.09  0.09 -0.18  0.12
##      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF  -0.04 -0.08 -0.08 -0.08 -0.09 -0.08 -0.08 -0.10 -0.09 -0.10 -0.07 -0.07
## PACF  0.02 -0.03 -0.06  0.04 -0.01 -0.05  0.05 -0.05 -0.06  0.04  0.05 -0.02
##      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF  -0.07 -0.06 -0.03 -0.04 -0.03  0.01 -0.01  0.00  0.03  0.01 -0.03
## PACF  0.00 -0.02  0.07 -0.04 -0.02  0.07 -0.01 -0.06  0.04 -0.02 -0.15

The acf2() function gives you the autocorrelogram, which is the autocorrelations in a graph where it is easy to identify which lags of the variable have a significant autocorrelation.

The X axis represents the lag numbers, while the Y axis is the level of autocorrelation. The vertical lines are the levels of autocorrelations between the current value of Y (Yt) and its own lags (Yt−1,Yt−2…).

The dotted, blue line delimits the 95% confidence interval for the autocorrelations. Then, if the autocorrelation goes out of the 95% C.I., then the autocorrelation of the variable Yt with its corresponding lag Yt−1 (AutoCorr(Yt,Yt−1)) is statistically different than zero.

We can see that only the LAG 1 (Yt−1) and LAG 2 (Yt−2) are significantly autocorrelated (positively) with the current value of the variable (Yt) since this autocorrelation is significantly greater than zero.

What is the meaning of the first significant LAG? In this case, this means that the value of Y at any point in time t has an average autocorrelation about 0.80 with its own value at t−1. We can see this 0.80 as the hight of the first vertical line of the ACF plot.

The autocorrelation of Yt with its previous value Yt−1 can be calculated manually as the correlation, which is equal to their covariance divided by the product of their standard deviation.

Fortunately, R does the calculation of all the autocorrelations of Yt with its own lags, and these autocorrelations are plotted using the acf2 command (as seen above).

Unlike the ACF plot, the partial-autocorrelation (PACF) plot shows the autocorrelation of the lag with the variable after considering the effect of the lower-order autocorrelations.

The autocorrelation and partial autocorrelation between Yt and its LAG 1 (Yt−1) are the same, since there is no lower-order autocorrelations for LAG 1.

The autocorrelation between Yt and its LAG2 (Yt−2) and the partial autocorrelation between Yt and its LAG2 (Yt−2) will be different since the partial autocorrelation is calculated AFTER the autocorrelation between Yt and Yt−1 is taken into consideration.

In other words, the PACF plot shows the amount of autocorrelation between Yt and its lag K (Yt−k) that is not explained by all the lower-order autocorrelations (Yt with (Yt−k−n).

The ACF plot shows that the LAG 1 and LAG 2 autocorrelations are positive and significant, and the following LAG autocorrelations are also positive but not significant.

We see a fast decay in the magnitude of the LAG autocorrelations in the PACF, and a low decay in the magnitud of the LAG autocorrelations in the ACF plot.

In the partial autocorrelation (PACF) we can see that ONLY the autocorrelation of LAG 1 and LAG 2 are significant and positive, and immediately the rest of the autocorrelation go close to zero suddenly. This is a classical AR(2) signature with 2 AR terms. We do not need to include other terms further than LAG 2.

Then we can see that the partial autocorrelation plot (PACF) helps us to identify the # of AR terms. Then, according to this PACF we see that only LAG 1 and LAG 2 are autocorrelated with the current value AFTER considering the effect of the lower-level autocorreations of lags 2,3,4, etc. In other words, the annual %change in the IGAE of the current month seems to be positively correlated with its own annual % changes of the previous 2 months.

##Running an AR(2) model

The PACF plot suggested that the AR(2) model for the seasonal log of the IGAE might be a good one. As a first attempt, we will run this AR(2) model for the seasonal log variable. We later will learn how to do a more comprehensive process for the calibration of a ARIMA-SARIMA models.

Since we are using the seasonal difference of the variable, this AR(2) model is also called and ARIMA(2,0,0)-SARIMA(0,1,0,12) model. Remember the parameters of these models:

ARIMA(p,d,q) SARIMA(P,D,Q,#periods in a year)

Then, p=2 and D=1. D refers to how many seasonal difference we did to convert the series as stationary.

This AR(2) model, or ARIMA(2,0,0)SARIMA(0,1,0,12) can be run in R as follows:

You need to install the forecast package to run the Arima function:

# Load package forecast
library(forecast)

# Use function Arima
m1a <- Arima(seasonaldiflog, order = c(2,0,0),
             include.constant = TRUE)
# we can display the results of the coefficients and their p-values
#  with the coeftest function:
coeftest(m1a)
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value Pr(>|z|)    
## ar1       0.6379183  0.0539848 11.8166  < 2e-16 ***
## ar2       0.2124846  0.0542221  3.9188  8.9e-05 ***
## intercept 0.0169578  0.0090552  1.8727  0.06111 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We specify the ARIMA parameters p,d,q in the order option: c(2,0,0). In this case,

We can also run this same using the original variable for IGAE and indicating that we will apply the logarithm, and then use the seasional difference:

# Use function Arima
m1b <- Arima(igae, order = c(2,0,0),
             seasonal = list(order=c(0,1,0), period=12),
             include.constant = TRUE,
             lambda = 0)
# we can display the results of the coefficients and their p-values
#  with the coeftest function:
coeftest(m1b)
## 
## z test of coefficients:
## 
##         Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.63792070 0.05398473 11.8167 < 2.2e-16 ***
## ar2   0.21248695 0.05422197  3.9188 8.898e-05 ***
## drift 0.00141291 0.00075665  1.8673   0.06186 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

USAR ESTE… POR CUESTIONES DE PRACTICIDAD

The option lamda = 0 indicates that we will use the natural logarithm of the series. The seasonal option list(order=c(0,1,0), period=12) indicates that the seasonal difference has to be made before estimating the model.

This second version m1b will give exactly the same result than the m1a!

According to the result, we can see that the coefficients ϕ1 and ϕ2 are positive and significant since their p-values are <0.05. Then we can say that the IGAE annual % change is positively and significantly related with its own annual % change of 1 month ago and 2 months ago.

We can write the equation this model models as follows:

seasonaldiflog(t) = 0.017 + 0.6379 * seasonaldiflog(t-1) + 0.2125 * seasonaldiflog(t-2)+ error(t)

Now we can use this model to do predictions. It is recommended to use the m1b version to forecast the actual igae instead of forecasting the seasonal log of the IGAE.

If I want to do a forecast for 48 months, I will use the forecast() function with the m1b model:

forecast_igae <- forecast(m1b, h = 48)
autoplot(forecast_igae)

##Challenge

# Use function Arima
m2b <- Arima(igae, order = c(2,0,0),
             seasonal = list(order=c(0,1,1), period=12),
             include.constant = TRUE,
             lambda = 0)
# we can display the results of the coefficients and their p-values
#  with the coeftest function:
coeftest(m2b)
## 
## z test of coefficients:
## 
##          Estimate  Std. Error  z value  Pr(>|z|)    
## ar1    0.72898457  0.05588460  13.0445 < 2.2e-16 ***
## ar2    0.16395764  0.05593457   2.9312  0.003376 ** 
## sma1  -0.76939989  0.05460279 -14.0909 < 2.2e-16 ***
## drift  0.00173624  0.00025438   6.8254 8.771e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
forecast_igae <- forecast(m2b, h = 48)
autoplot(forecast_igae)

EL GRÁFICO REALIZADO DEJA EN CLARO CÓMO APESAR DE LAS SECUELAS ECONÓMICAS DE LA PANDEMIA, SE PRONOSTICA EN UN INTERVALO DE CONFIANZA DEL 95% EL CONTINUO CRECIMIENTO DEL ÍNDICE, O BIEN, DE LA ACTIVIDAD ECONÓMICA MEXICANA. ES UN ACIERTO MENCIONAR QUE LA REPRESENTACIÓN DEL PRONÓSTICO PARA LOS PRÓXIMOS 4 AÑOS LO ENCONTRAMOS EN EL ÁREA SOMBREADA DE COLOR AZUL.

ME RESULTA INTERESANTE LA PROYECCIÓN DEL MODELO; EN MI OPINIÓN, ESTAMOS VIVIENDO TANTOS FACTORES NUNCA ANTES VISTOS (IMPRESIÓN MASIVA DE USD, PAQUETES DE ESTÍMULO, ETC), QUE FIARNOS DE LA HISTORIA NO SÉ SI SEA LO MÁS ACERTADO.