Abstract: The main purpose of this workshop is to illustrate the foundations of Autoregressive Integrated Moving Average Seasonal models (ARIMA/SARIMA). We will learn by doing. We will work with a model for a macro-economic index, the IGAE (Índice General de la Actividad Económica), published by INEGI.
##Introduction
The main purpose of this workshop is to illustrate what is the an Autoregressive Integrated Moving Average Seasonal model (ARIMA/SARIMA model) and how we can estimate a basic model.
This methodology is part of the Box-Jenkins models for time series, which provide a series of tools to understand past behaviors and predict future values of time series variables.
DETERMINAR EL COMPORTAMIENTO PASADO Y PREDECIR VALORES FUTUROS.
Given a time series Yt, the ARIMA/SARIMA model can be be denoted as ARIMA(p,d,q) SARIMA(P,D,Q,#annual periods). We will see later what are these parameters of the model (p,d,q,P,D,Q,#annual periods).
In an ARIMA/SARIMA model we try to understand current values of a time-series variable using its own past values and past random shocks; the model does not include other explanatory variables (X’s variables) like in a traditional linear regression model. If we add external explanatory variables to an ARIMA/SARIMA model, then this model is usually called ARIMAX.
Then, the Yt time-series variable in an ARIMA/SARIMA model is explained by:
· Its own past values; these values are called LAGvalues or just LAGS. The value of last period (t-1) is called LAG 1; the value of 2 periods ago (t-2) is called LAG 2, and so forth. These LAGS are called AR(p) terms, where p is number of previous LAGS to be included. For example, AR(2) indicates that we include LAG 1 and LAG 2 values as explanatory variables.
EL COMPORTAMIENTO DE SUS VALORES PASADOS.
· Past random shocks; these values are called LAG errors or LAG shocks. The shock of last period (t-1) is called LAG 1 error or LAG 1 shock; the shock of 2 periods ago (t-2) is called LAG 2 error or LAG 2 shock, and so forth. These LAG errors are called MA(q) terms, where q is the # of previous LAG shocks to be included. For example, MA(2) indicates that we include LAG 1 error and LAG 2 error as explanatory variables.
EL COMPORTAMIENTO DEL IMPACTO DE FACTORES ALEATORIOS PASADOS.
· Its own seasonal values; A season refers to 1 year. These values are called seasonal LAG values. If we have monthly data (12 periods in 1 year), then the value of last year (t-12) is called seasonal LAG 1; the value of 2 years ago (t-24) is called seasonal LAG 2, and so forth. If we have quarterly data, then we have to go back 4 periods for last year (t-4), and so forth. These LAGS are called Seasonal AR(P) terms, where P is the number of previous seasonal LAGS to be included.
LOS VALORES ESTACIONALES DE LA SERIE MISMA.
· Past seasonal random shocks; these values are called seasonal LAG errors or seasonal LAG shocks. If we have monthly data (12 periods in 1 year), the shock of last year (t-12) is called seasonal LAG 1 error or seasonal LAG 1 shock; the shock of 2 years ago (t-24) is called seasonal LAG 2 error or seasonal LAG 2 shock, and so forth. These seasonal LAG errors are called Seasonal MA(Q) terms, where Q is the # of previous seasonal LAG shocks to be included.
· EL COMPORTAMIENTO PASADO DEL IMPACTO DE FACTORES ALEATORIOS ESTACIONALES.
Then, the following parameters of an ARIMA/SARIMA model need to be estimated:
· p: # of AR terms to be included as explanatory variables
· d: # of first differences applied to the original variable
· q: # of MA terms to be included as explanatory variables
· P: # of Seasonal AR terms to be included as explanatory variables
· D: # of Seasonal difference applied to the variable
· Q: # of Seasonal MA terms to be included as explanatory variables
The d parameter refers to how many first differences we apply to the original time-series variables to make the series stationary.
The D parameter refers to how many seasonal differences we decide to apply to the variable.
Let’s start with models with only AR terms, AR(p) models, that can be named ARIMA(p,0,0), or ARIMA(p,1,0).
##The AR(p) model - ARIMA(p,d,0)
In general the autoregressive models AR(p) assume that there is some kind of autocorrelation phenomenon in the time series. The concept of autocorrelation refers to the correlation of the variable today at time t with its past values (t-n). Autocorrelation is the building block behind these models.
In time-series Econometrics we can talk about an autocorrelation in the dependent variable (also called, endogeneous variable). This means the current or future values of an economic or financial variable can be explained somehow by its previous values. This is the foundation of what econometricians have called an Autoregressive Model also denoted as AR(p) model or ARIMA(p,d,0) model.
The AR(p) model for the Yt variable can be expressed as: \[Y_{1}=\phi _{0}+\sum_{k=1}^{P}\phi _{k}Y_{t-k}+\varepsilon _{t}\]
In order to apply this model we have to prove the series Yt is stationary. We can use the Dicky-Fuller test to decide whether the series is stationary. If the original series is not stationary, we can calculate the first difference of the series, and most of the time the resulting series will become stationary.
In linear regression models, the method to estimate the regression coefficients is the Ordinary Least Squares (OLS). For ARIMA models, the best method to estimate the autoregressive coefficients is the Maximum Likelihood (ML) method. The ML method works with iterations, while OLS has an analytic solution (in other words, a formula). Fortunately, R can estimate this for you. In this model, we assume that the errors are not correlated among them. If that is the case, that means that we can include another autoregressive term to have a better model.
##The MA(q) model - ARIMA(0,d,q)
The AR(p) model is a process with a long-term memory. Why? this is because a shock today will impact in the future forever. If you read the mathematical equation, this can be interpreted as follows. The value of the series tomorrow will depend on the value of today, after applying the filter ϕ1. Then, the value of tomorrow already has information about the value of today. And the value of tomorrow will impact the value of 2 days from now. Then, the value of today will have an impact in the whole future of the series.
However, the impact of the value of today will be diminishing exponentially according to (ϕ1)N, where N is the number of periods in the future. Since |ϕ1|<1, then the bigger the exponent N -which refers to a further future period- the smaller the effect.
If we want to model a series with a short-term memory, then we have to think in a different equation. This is called a Moving-Average model with q terms. In this case, the value of today is not directly affected by its previous own value, but the past random shocks. In the case of q=1, this process does not have long memory since the value of today is only affected by the shock of yesterday.
The more q terms included into the model, then the longer the memory of the model. Most of the cases, real-world series only include very few q terms.
In the MA(q) model, θ0, … θq are the parameters of the model, ϵt−k refers to past external errors/shocks, and the current shock is defined as ϵt.
We can interpret a pure MA model as a random process where the current value of the series is influenced by some percentages of previous random q shocks/errors and the current random shock. The percentages are actually the values of the coefficients θ.
In Finance, when we want to model the behavior of market returns, we know that there are thousands of events that affect how the market returns are moving today. These thousands of events are basically summarized in a “random shock” that nobody knows what will be its magnitude and direction (positive or negative effect).
Now I will put together both AR and MA model into one: the ARIMA(p,d,q) model.
##ARIMA(p,d,q) model
Now we can combine both AR(p) terms and MA(q) terms into ONE model, that is called ARMA(p,q): \[Y_{1}=\phi _{0}+\sum_{k=1}^{P}\phi _{k}Y_{t-k}+\sum_{k=1}^{q}\theta _{k}\epsilon _{t-k}+\epsilon_{t}\] We combine AR and MA terms to model Y in case Y can be modeled including short-term and long-term memory. The long-term memory is modeled with the AR terms, while the short-term memory is modeled with the MA terms. We will practice with an example to illustrate this theoretical model.
##Hand-on activity
We will run a first version of an ARIMA-SARIMA model for the IGAE index (Índice General de Actividad Económica) developed by INEGI (Instituto Nacional de Estadística y Geografía). Since 1993 INEGI calculates this index based on the Mexican general economy and different industry sectors.
library(readxl)
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(zoo)
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(forecast)
library(astsa)
##
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
##
## gas
library(lmtest)
igae <- read_excel("igae.xls")
# I convert the igae data frame into a time series R dataset.
# This is convenient when working with ARIMA-SARIMA models:
igae<-ts(coredata(igae$IGAE),start=c(1993,1),frequency=12)
# The ts function transform the dataset from a data frame to a ts object
# I can see the content of IGAE in a table format:
igae
## Jan Feb Mar Apr May Jun Jul
## 1993 60.40769 61.02252 63.94325 61.86598 63.61290 62.88259 62.79795
## 1994 63.02927 62.73316 65.79027 65.89172 66.85880 66.63579 64.70206
## 1995 65.10234 60.29419 62.92987 59.23909 61.17677 61.08234 59.77917
## 1996 64.59736 63.25033 65.00435 63.82202 66.56896 65.57817 65.91223
## 1997 67.92226 66.01572 67.08871 70.18939 71.25363 70.71757 70.48390
## 1998 72.28125 71.26764 75.30751 72.95238 75.13802 74.64519 74.61446
## 1999 74.28381 72.70940 76.99211 74.32923 76.92304 76.55675 76.31262
## 2000 77.93003 77.36732 80.35795 77.00756 82.34331 81.54154 79.62973
## 2001 79.42346 76.41926 80.76787 77.53115 81.78409 80.60701 79.32531
## 2002 77.38629 75.10932 77.12227 80.83024 82.06994 79.72127 79.92718
## 2003 79.27239 77.17550 80.33976 79.95093 82.28842 81.44337 80.93570
## 2004 80.93007 79.19382 84.92785 82.93623 85.23905 85.62087 82.94023
## 2005 82.25837 80.89942 83.62640 86.59373 87.88889 86.35209 83.78420
## 2006 87.04556 84.08297 89.54008 87.10250 93.13935 91.36864 88.35889
## 2007 88.77729 85.76828 91.42205 89.32063 94.32237 93.53460 91.41102
## 2008 90.95435 88.98933 89.05326 94.71533 94.86436 94.45588 94.15931
## 2009 84.89588 81.84098 86.87355 84.31067 86.08159 88.18426 88.73169
## 2010 86.86702 85.22716 92.29229 90.58182 92.30005 93.23105 92.32869
## 2011 90.20546 88.56874 95.82899 91.60005 95.94631 96.48831 95.21792
## 2012 94.67463 94.07903 99.09222 95.47932 100.31525 99.68684 99.31091
## 2013 97.96858 94.67101 97.06079 99.91623 102.27529 99.51755 100.80982
## 2014 98.75513 96.95653 101.49481 100.31854 104.86081 102.89402 104.13869
## 2015 102.31977 99.87062 104.96625 103.68802 106.19806 107.31713 107.46008
## 2016 104.41881 104.57988 105.90821 107.29216 109.06156 109.96077 107.52441
## 2017 108.42006 105.28496 111.90569 106.20453 112.30643 112.80000 108.95594
## 2018 111.01173 107.81001 111.19094 111.59738 115.82252 114.44374 112.81183
## 2019 112.71950 109.09518 112.46967 109.98886 115.45875 113.04393 113.54114
## 2020 111.92756 108.40448 109.91800 87.99004 89.47630 98.01972 102.22197
## 2021 105.84460 102.91511
## Aug Sep Oct Nov Dec
## 1993 62.02789 62.01834 62.83480 63.62177 66.12119
## 1994 65.79634 65.76477 66.73068 67.84687 68.06744
## 1995 61.06820 60.84772 60.21824 63.06020 65.41346
## 1996 65.36457 65.03774 67.82464 68.55904 69.26845
## 1997 70.02845 70.75158 73.23728 72.75789 73.78044
## 1998 73.47925 73.76732 74.46418 74.44344 75.67735
## 1999 75.67001 76.24775 76.45465 77.67013 78.05347
## 2000 81.19074 80.16487 80.80958 80.91619 79.49830
## 2001 80.58703 78.26957 79.97355 80.09390 79.09247
## 2002 80.30576 78.38018 81.60774 80.09518 80.49051
## 2003 79.41995 79.18878 81.87911 80.63451 83.33037
## 2004 83.13244 82.19751 84.36268 85.94154 86.23057
## 2005 86.43120 84.64256 86.85310 88.65732 89.15110
## 2006 90.24619 87.86910 91.59749 91.14520 90.86138
## 2007 92.21733 89.47518 95.27999 93.80926 92.45720
## 2008 91.49309 90.57651 95.50683 92.04270 91.89505
## 2009 86.55360 86.50378 91.12057 91.24910 91.85630
## 2010 91.56219 90.55653 93.84529 95.96933 95.59592
## 2011 96.53903 94.10211 97.43771 101.08275 98.81575
## 2012 99.09917 95.18924 101.85991 104.38649 100.52817
## 2013 100.27357 96.80675 103.66719 104.23166 102.80157
## 2014 101.73624 100.01550 106.81109 106.47987 106.88143
## 2015 105.48271 105.02281 109.29460 109.09904 109.26256
## 2016 108.94810 106.06088 110.56483 114.14690 112.51169
## 2017 111.77091 106.32100 112.84684 116.17409 113.99218
## 2018 114.33744 108.78172 115.99083 117.77276 113.17022
## 2019 113.42805 109.01316 115.46429 116.32435 113.47028
## 2020 102.93410 103.13199 109.33333 111.49701 110.43280
## 2021
plot(igae)
What do you observe? Compared to previous crisis in Mexico during 1994-1995 and 2008-2009, how does the COVID crisis look like?
EN EL GRÁFICO PODEMOS OBSERVAR EL COMPORTAMIENTO DEL IGAE (ÍNDICE GLOBAL DE LA ACTIVIDAD ECONÓMICA) DESDE 1993. COMO PODEMOS OBSERVAR, NI LA CRISIS DEL 94’, NI LA DEL 2008; TIENEN PUNTO DE COMPARACIÓN FRENTE LA DEL 2020 ORIGINADA POR LA PANDEMIA (SARS-COV-2). LA CRISIS ORIGINADA POR LA COVID-19 EN EL 2020 DESPLOMÓ POR MÁS DE 25 PUNTOS EL IGAE, A PESAR DE SU PRONTA RECUPERACIÓN, RETROCEDIMOS A NIVELES QUE NO HABÍAMOS VISTO DESDE EL AÑO 2011.
We see that the decline of the index for the COVID crisis was much profound compared with previous crisis (2008 mortgage crisis and the internal 1994 crisis)
##Stationary vs Non-stationary
We first have to find a version of the IGAE series that is a stationary series. For any economic or financial series I strongly recommend to start calculating the logarithm of the variable:
logigae = log(igae)
I can plot the log of the IGAE to have an idea whether it is stationary:
plot(logigae)
It is clear that the log series is not stationary, so I do not need to run a Dicky-Fuller test to check stationarity for the log. I have to do a difference of the series and then do the Dicky-Fuller test.
For financial and economic monthly data I strongly recommend to start calculating the seasonal log difference and check whether this series is stationary. If it is, then use this version for your model.
In case that the seasonal log difference is not stationary, then you can calculate the first difference, and most of the time this first difference will be stationary.
Then, we start checking whether the seasonal log of the IGAE is a stationary series. We start with a plot of this variable, and then run the corresponding Dicky-Fuller test:
seasonaldiflog = diff(logigae,lag=12)
plot(seasonaldiflog)
The seasonal log difference of the series is actually the annual % change month by month of the IGAE (in continuously compounded percentages).
Let’s see why the seasonal log difference is the annual % change of the series:
diff(logigae,12) = logigae - lag(logigae,12)
lag(logigae,12) is the value of the log IGAE, but 12 months ago!
In this case the diff function gets the difference from a current value of the log minus the log value 12 months ago. This is actually a % annual change since we are taking the difference with 12 months ago.
It is hard to our eyes to see whether the seasonal log series is stationary. We need to run the Dicky-Fuller test:
adf.test(seasonaldiflog, k=1)
## Warning in adf.test(seasonaldiflog, k = 1): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: seasonaldiflog
## Dickey-Fuller = -4.5256, Lag order = 1, p-value = 0.01
## alternative hypothesis: stationary
Since the p-value of the Dicky-Fuller test is 0.01<0.05, then we can reject the null hypothesis and accept the alternative hypothesis. Then we can conclude that the seasonal log of the IGAE is a stationary series. Then we will use this version of the variable to do our model.
##Autocorrelations (AC) and Partial-autocorrelations (PAC)
Once we determine which version of the variable is stationary, then we need to examine autocorrelations and partial autocorrelations of this variable. We can examine this with a correlogram.
In time series analysis, a correlogram is an chart of correlation statistics also known as an autocorrelation plot. This is a plot of the sample autocorrelations of the dependent variable Yt and the time lags. In R, the function to compute the autocorrelations is acf2() from the astsa package, which gives you both, the autocorrelation (ACF) figures and the partial-autocorrelations (PACF), as well.
Let’s examine the autocorrelations of the seasonal log of the IGAE.
library(astsa)
acf2(seasonaldiflog)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.8 0.72 0.65 0.53 0.47 0.39 0.28 0.22 0.15 0.05 0.01 -0.06 -0.05
## PACF 0.8 0.20 0.09 -0.15 0.03 -0.04 -0.15 0.02 -0.02 -0.12 0.00 -0.07 0.20
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF -0.05 -0.07 -0.06 -0.03 -0.06 -0.05 -0.02 -0.04 -0.04 -0.01 -0.07 -0.06
## PACF 0.01 0.01 0.00 0.05 -0.10 -0.06 0.12 -0.07 -0.09 0.09 -0.18 0.12
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF -0.04 -0.08 -0.08 -0.08 -0.09 -0.08 -0.08 -0.10 -0.09 -0.10 -0.07 -0.07
## PACF 0.02 -0.03 -0.06 0.04 -0.01 -0.05 0.05 -0.05 -0.06 0.04 0.05 -0.02
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF -0.07 -0.06 -0.03 -0.04 -0.03 0.01 -0.01 0.00 0.03 0.01 -0.03
## PACF 0.00 -0.02 0.07 -0.04 -0.02 0.07 -0.01 -0.06 0.04 -0.02 -0.15
The acf2() function gives you the autocorrelogram, which is the autocorrelations in a graph where it is easy to identify which lags of the variable have a significant autocorrelation.
The X axis represents the lag numbers, while the Y axis is the level of autocorrelation. The vertical lines are the levels of autocorrelations between the current value of Y (Yt) and its own lags (Yt−1,Yt−2…).
The dotted, blue line delimits the 95% confidence interval for the autocorrelations. Then, if the autocorrelation goes out of the 95% C.I., then the autocorrelation of the variable Yt with its corresponding lag Yt−1 (AutoCorr(Yt,Yt−1)) is statistically different than zero.
We can see that only the LAG 1 (Yt−1) and LAG 2 (Yt−2) are significantly autocorrelated (positively) with the current value of the variable (Yt) since this autocorrelation is significantly greater than zero.
What is the meaning of the first significant LAG? In this case, this means that the value of Y at any point in time t has an average autocorrelation about 0.80 with its own value at t−1. We can see this 0.80 as the hight of the first vertical line of the ACF plot.
The autocorrelation of Yt with its previous value Yt−1 can be calculated manually as the correlation, which is equal to their covariance divided by the product of their standard deviation.
Fortunately, R does the calculation of all the autocorrelations of Yt with its own lags, and these autocorrelations are plotted using the acf2 command (as seen above).
Unlike the ACF plot, the partial-autocorrelation (PACF) plot shows the autocorrelation of the lag with the variable after considering the effect of the lower-order autocorrelations.
The autocorrelation and partial autocorrelation between Yt and its LAG 1 (Yt−1) are the same, since there is no lower-order autocorrelations for LAG 1.
The autocorrelation between Yt and its LAG2 (Yt−2) and the partial autocorrelation between Yt and its LAG2 (Yt−2) will be different since the partial autocorrelation is calculated AFTER the autocorrelation between Yt and Yt−1 is taken into consideration.
In other words, the PACF plot shows the amount of autocorrelation between Yt and its lag K (Yt−k) that is not explained by all the lower-order autocorrelations (Yt with (Yt−k−n).
The ACF plot shows that the LAG 1 and LAG 2 autocorrelations are positive and significant, and the following LAG autocorrelations are also positive but not significant.
We see a fast decay in the magnitude of the LAG autocorrelations in the PACF, and a low decay in the magnitud of the LAG autocorrelations in the ACF plot.
In the partial autocorrelation (PACF) we can see that ONLY the autocorrelation of LAG 1 and LAG 2 are significant and positive, and immediately the rest of the autocorrelation go close to zero suddenly. This is a classical AR(2) signature with 2 AR terms. We do not need to include other terms further than LAG 2.
Then we can see that the partial autocorrelation plot (PACF) helps us to identify the # of AR terms. Then, according to this PACF we see that only LAG 1 and LAG 2 are autocorrelated with the current value AFTER considering the effect of the lower-level autocorreations of lags 2,3,4, etc. In other words, the annual %change in the IGAE of the current month seems to be positively correlated with its own annual % changes of the previous 2 months.
##Running an AR(2) model
The PACF plot suggested that the AR(2) model for the seasonal log of the IGAE might be a good one. As a first attempt, we will run this AR(2) model for the seasonal log variable. We later will learn how to do a more comprehensive process for the calibration of a ARIMA-SARIMA models.
Since we are using the seasonal difference of the variable, this AR(2) model is also called and ARIMA(2,0,0)-SARIMA(0,1,0,12) model. Remember the parameters of these models:
ARIMA(p,d,q) SARIMA(P,D,Q,#periods in a year)
Then, p=2 and D=1. D refers to how many seasonal difference we did to convert the series as stationary.
This AR(2) model, or ARIMA(2,0,0)SARIMA(0,1,0,12) can be run in R as follows:
You need to install the forecast package to run the Arima function:
# Load package forecast
library(forecast)
# Use function Arima
m1a <- Arima(seasonaldiflog, order = c(2,0,0),
include.constant = TRUE)
# we can display the results of the coefficients and their p-values
# with the coeftest function:
coeftest(m1a)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.6379183 0.0539848 11.8166 < 2e-16 ***
## ar2 0.2124846 0.0542221 3.9188 8.9e-05 ***
## intercept 0.0169578 0.0090552 1.8727 0.06111 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We specify the ARIMA parameters p,d,q in the order option: c(2,0,0). In this case,
We can also run this same using the original variable for IGAE and indicating that we will apply the logarithm, and then use the seasional difference:
# Use function Arima
m1b <- Arima(igae, order = c(2,0,0),
seasonal = list(order=c(0,1,0), period=12),
include.constant = TRUE,
lambda = 0)
# we can display the results of the coefficients and their p-values
# with the coeftest function:
coeftest(m1b)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.63792070 0.05398473 11.8167 < 2.2e-16 ***
## ar2 0.21248695 0.05422197 3.9188 8.898e-05 ***
## drift 0.00141291 0.00075665 1.8673 0.06186 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
USAR ESTE… POR CUESTIONES DE PRACTICIDAD
The option lamda = 0 indicates that we will use the natural logarithm of the series. The seasonal option list(order=c(0,1,0), period=12) indicates that the seasonal difference has to be made before estimating the model.
This second version m1b will give exactly the same result than the m1a!
According to the result, we can see that the coefficients ϕ1 and ϕ2 are positive and significant since their p-values are <0.05. Then we can say that the IGAE annual % change is positively and significantly related with its own annual % change of 1 month ago and 2 months ago.
We can write the equation this model models as follows:
seasonaldiflog(t) = 0.017 + 0.6379 * seasonaldiflog(t-1) + 0.2125 * seasonaldiflog(t-2)+ error(t)
Now we can use this model to do predictions. It is recommended to use the m1b version to forecast the actual igae instead of forecasting the seasonal log of the IGAE.
If I want to do a forecast for 48 months, I will use the forecast() function with the m1b model:
forecast_igae <- forecast(m1b, h = 48)
autoplot(forecast_igae)
##Challenge
# Use function Arima
m2b <- Arima(igae, order = c(2,0,0),
seasonal = list(order=c(0,1,1), period=12),
include.constant = TRUE,
lambda = 0)
# we can display the results of the coefficients and their p-values
# with the coeftest function:
coeftest(m2b)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.72898457 0.05588460 13.0445 < 2.2e-16 ***
## ar2 0.16395764 0.05593457 2.9312 0.003376 **
## sma1 -0.76939989 0.05460279 -14.0909 < 2.2e-16 ***
## drift 0.00173624 0.00025438 6.8254 8.771e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
forecast_igae <- forecast(m2b, h = 48)
autoplot(forecast_igae)
EL GRÁFICO REALIZADO DEJA EN CLARO CÓMO APESAR DE LAS SECUELAS ECONÓMICAS DE LA PANDEMIA, SE PRONOSTICA EN UN INTERVALO DE CONFIANZA DEL 95% EL CONTINUO CRECIMIENTO DEL ÍNDICE, O BIEN, DE LA ACTIVIDAD ECONÓMICA MEXICANA. ES UN ACIERTO MENCIONAR QUE LA REPRESENTACIÓN DEL PRONÓSTICO PARA LOS PRÓXIMOS 4 AÑOS LO ENCONTRAMOS EN EL ÁREA SOMBREADA DE COLOR AZUL.
ME RESULTA INTERESANTE LA PROYECCIÓN DEL MODELO; EN MI OPINIÓN, ESTAMOS VIVIENDO TANTOS FACTORES NUNCA ANTES VISTOS (IMPRESIÓN MASIVA DE USD, PAQUETES DE ESTÍMULO, ETC), QUE FIARNOS DE LA HISTORIA NO SÉ SI SEA LO MÁS ACERTADO.