Abstract
The main purpose of this workshop is to keep learning about Autoregressive Integrated Moving Average Seasonal models (ARIMA/SARIMA) with another practical application. We will learn by doing. We will focus on the calibration or determination for the model parameters using an automated function. We will calibrate an ARIMA/SARIMA model for the USD-MXPeso exchange rate.You will work in RStudio. Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop.
At the beginning of the R Notebook write Workshop 4 - Time Series and your name (as we did in previous workshop).
You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question. It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.
You have to keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file. Pay attention in class to know how to generate an html file from your .Rmd.
In last workshop, we introduced the theoretical foundations of an ARIMA-SARIMA model and we run an ARIMA-SARIMA model for the Mexican Economy (IGAE index). Now we will work with another important economic variable, the USD-Peso exchange rate. We will end-up with a forecast of the exchange rate for the next 2 years.
In this workshop we will focus on a) How can we calibrate the specific configuration of values of the ARIMA-SARIMA parameters (p, d, q, P, D, Q) in an automatic way, and b) how can we make sure that a specific configuration is adequate for the variable we are modeling.
Remember that the ARIMA-SARIMA parameters that needs to be defined are the following:
p: # of AR terms to be included as explanatory variables
d: # of first differences applied to the original variable
q: # of MA terms to be included as explanatory variables
P: # of Seasonal AR terms to be included as explanatory variables
D: # of Seasonal difference applied to the variable
Q: # of Seasonal MA terms to be included as explanatory variables
The d parameter refers to how many first differences we apply to the original time-series variables to make the series stationary.
The D parameter refers to how many seasonal differences we decide to apply to the variable.
When you apply an ARIMA-SARIMA model for a real-world time series variable, most of the time, the possible values of these parameters are either 0 or 1, and few times some of the values are 2.
The calibration process of an ARIMA-SARIMA model is the definition for all the parameters p, d, q, P, D, Q.
Each ARIMA-SARIMA model needs to be CALIBRATED. The general steps to calibrate a model are the following:
Usually monthly and quarterly data of many economic and financial series have a seasonal pattern.
I recommend that if you have monthly or quarterly data of any financial or economic variable, start testing whether the seasonal difference of the log of the series (that is the annual % change of the variable) is stationary . If you have daily data, it is hard to model for seasonality. In the case of daily data, start testing whether the first difference of the log is stationary.
Then we start testing whether the seasonal difference of the log is stationary. Then we apply the Dicky-Fuller test to this seasonal difference. If this variable is stationary, then we will keep using this transformation, so D=1 and d=0. If not, then we can apply the first difference of the log of the series, which is the % change for each period. If the first difference of the log of the series is stationary, then d=1 and D=0.
AR signature: when a)the ACF plot shows positive autocorrelations that decay slowly, and b) the PACF plot shows positive autocorrelations, but decay suddenly after lag 1, 2 or 3. For this pattern, the value of p will be the # of significant positive lags in the PACF, and the value for q is set zero (q=0).
MA signature: when the ACF and the PACF plot shows a negative and significant autocorrelation in the same lag #. In this case, the value for q will be the negative lag # that is significant.
For the definition of P and Q we have to check whether there is a significant autocorrelation in the seasonal lags in the ACF and PACF plots. Most of the time P and Q will be either 0 or 1. If we have monthly data, we have to look whether there is a significant autocorrelation in the lag 12. If it is a positive significant autocorrelation, then define P=1 and Q=0; if the autocorrelation in lag 12 is significant and negative, then define P=0 and Q=1. If there is no significant autocorrelation in lag 12, then P=0 and Q=0.
For this course we will use an automatic algorithm to help us do the calibration, but we need to end up doing a fine-tuning of this automatic result. So, we need to know the main patterns to do a good calibration.
At this step we have defined the values for all parameters of the model.
Estimate/Run the ARIMA-SARIMA model with the values we defined previously.
Run the ACF and PACF of the residuals/errors of the model to check whether the errors is a white noise series. In other words, if there is no significant autocorrelations of the errors.
If you have significant autocorrelations, you can include other term(s) in the ARIMA-SARIMA model, and go back to step 3 to check for patters of the ACF and PACF plots of the errors.
If there the errors seem like a white noise (no signficant autocorrelations), then you can continue and finish the calibration process.
If you end up with more than 1 model after your calibration (models that ends up in white noise in the residuals), then you can use the Akaike Information Criteria (AIC) of the model to select the best model. The best model will be the one with the lowest AIC value. The AIC is a measure of fit of the model; in other words, it tells how well the model fits the data.
Interpret your model
Run a forecast using the model
Let’s start with calibrating an ARIMA-SARIMA model for the USD-Mexican Peso exchange rate.
We will download the USD-peso exchange rate from the FRED database. The FRED is a free economic database from the US Federal Reserve Bank of St. Louis. It has thousands of economic and financial data for all the countries around the world. You can go to the site and download data in Excel, or import directly to your R session using the function getSymbols.
It is recommended to check the FRED database to know the details of the series (just google FRED database). Search for Mexico exchange rate, and click in the series. You will see the name Mexico / U.S. Foreign Exchange Rate (DEXMXUS). This means that DEXMXUS is the ticker or identifier we need to download the series.
We need to check a) Units and b) Frequency. The DEXMXUS units is: Mexican New Pesos to One U.S. Dollar, Not Seasonally Adjusted. If we plan to design an ARIMA/SARIMA model it is a recommended to use not-seasonally adjusted series, since we will model the seasonality of the data. The frequency of this series is daily.
We will transform the series from daily frequency to monthly so that we can design an ARIMA/SARIMA model. A Seasonal ARIMA (ARIMA/SARIMA) model allows us to do a forecast for more than one year. If we use daily data, we cannot design a seasonal ARIMA since the number of business days in a year is not the same for each year. Also, with daily data it is very hard to do a long-term forecast. In sum, for daily data such as financial stock prices, we can design an ARIMA (with no seasonality). We can design an ARIMA/SARIMA model only for weekly, monthly or quarterly data.
In this case, the best way to transform the daily exchange rate is to get tha last exchange rate for each month. This can be easily done with the to.monthly() function from the xts package.
library(quantmod)
#library(PerformanceAnalytics)
library(tseries)
library(forecast)
getSymbols(Symbols="DEXMXUS", src="FRED")
## [1] "DEXMXUS"
#Since the dataset is daily, I collapse it to monthly:
exrate.xts = to.monthly(DEXMXUS)
plot(exrate.xts)
# The to.monthly function gets the open, close, highest and lowest value
# for each month!
# We can get only the closing exchange rate for each month using the Cl
# function:
closeexrate.xts = Cl(exrate.xts)
exrate<-ts(coredata(closeexrate.xts),start=c(1993,11),frequency=12)
exrate
## Jan Feb Mar Apr May Jun Jul Aug Sep
## 1993
## 1994 3.1060 3.1950 3.3650 3.2800 3.3320 3.4000 3.4050 3.3890 3.3990
## 1995 5.8200 5.9700 6.8200 6.0500 6.1600 6.2550 6.1150 6.3000 6.3850
## 1996 7.4400 7.6350 7.5375 7.4400 7.4275 7.5850 7.5900 7.5900 7.5500
## 1997 7.8200 7.9650 7.9300 7.9450 7.9170 7.9520 7.8250 7.7850 7.7750
## 1998 8.4550 8.5275 8.5200 8.4920 8.8250 8.9850 8.9200 9.9850 10.1960
## 1999 10.1520 9.9365 9.5210 9.2425 9.7925 9.4430 9.4000 9.3930 9.3520
## 2000 9.6400 9.3645 9.2895 9.4110 9.5110 9.8430 9.3680 9.2000 9.4400
## 2001 9.6790 9.6925 9.4850 9.2610 9.1650 9.0600 9.1510 9.2075 9.5200
## 2002 9.1520 9.1300 9.0005 9.3750 9.6525 9.9800 9.8000 9.9185 10.2120
## 2003 10.9020 11.0285 10.7820 10.3080 10.3400 10.4550 10.5850 11.0600 11.0025
## 2004 11.0120 11.0620 11.1830 11.4020 11.4140 11.5380 11.4100 11.3760 11.3850
## 2005 11.2065 11.0885 11.1770 11.0820 10.9125 10.7720 10.5995 10.7940 10.7910
## 2006 10.4400 10.4542 10.8980 11.0890 11.2880 11.2865 10.9160 10.9120 10.9775
## 2007 11.0381 11.1575 11.0427 10.9295 10.7380 10.7901 10.9311 11.0320 10.9315
## 2008 10.8190 10.7263 10.6300 10.5101 10.3290 10.3035 10.0345 10.2970 10.9726
## 2009 14.3330 15.0880 14.2100 13.8010 13.1820 13.1700 13.2025 13.3361 13.4805
## 2010 13.0285 12.7575 12.3005 12.2281 12.8633 12.8306 12.6421 13.1671 12.6270
## 2011 12.1541 12.1130 11.9170 11.5237 11.5790 11.7191 11.7235 12.3262 13.7701
## 2012 13.0356 12.7941 12.8115 12.9900 14.3045 13.4110 13.2669 13.2401 12.8630
## 2013 12.7344 12.7788 12.3155 12.1323 12.7791 12.9925 12.8570 13.3355 13.1550
## 2014 13.3585 13.2255 13.0560 13.0835 12.8605 12.9695 13.2370 13.0690 13.4270
## 2015 15.0050 14.9390 15.2450 15.3855 15.3688 15.6902 16.0610 16.7315 16.8980
## 2016 18.2110 18.0675 17.2140 17.1900 18.4130 18.4935 18.7610 18.8490 19.3355
## 2017 20.8355 19.9975 18.8290 18.9340 18.6550 18.0760 17.8590 17.7875 18.1480
## 2018 18.6215 18.8405 18.1676 18.7715 20.0005 19.6495 18.5980 19.2135 18.7050
## 2019 19.0525 19.2650 19.3980 18.9945 19.6520 19.2089 18.9930 20.0674 19.7420
## 2020 18.8900 19.7300 23.4480 23.9790 22.1700 23.0821 22.2260 21.8800 22.0910
## 2021 20.2550 20.9350 20.4410 20.1690 19.9400
## Oct Nov Dec
## 1993 3.1055 3.1080
## 1994 3.4380 3.4400 5.0000
## 1995 7.1500 7.5500 7.7400
## 1996 8.0450 7.8950 7.8810
## 1997 8.4100 8.2150 8.0700
## 1998 10.1020 10.0050 9.9010
## 1999 9.6180 9.4240 9.4800
## 2000 9.5640 9.4100 9.6180
## 2001 9.2700 9.2585 9.1560
## 2002 10.1500 10.1540 10.4250
## 2003 11.0550 11.3950 11.2420
## 2004 11.5365 11.2411 11.1540
## 2005 10.7890 10.5765 10.6275
## 2006 10.7670 11.0000 10.7995
## 2007 10.6996 10.8960 10.9169
## 2008 12.7050 13.3900 13.8320
## 2009 13.1555 12.9160 13.0576
## 2010 12.3415 12.4540 12.3825
## 2011 13.1690 13.6201 13.9510
## 2012 13.0877 12.9171 12.9635
## 2013 12.9995 13.1110 13.0980
## 2014 13.4825 13.9210 14.7500
## 2015 16.5300 16.6005 17.1950
## 2016 18.7900 20.4565 20.6170
## 2017 19.1290 18.6335 19.6395
## 2018 20.2550 20.2975 19.6350
## 2019 19.1740 19.5430 18.8600
## 2020 21.2760 20.1500 19.8920
## 2021
For economic and financial monthly non-stationary series, it is recommended to apply the natural logarithm to the variable, and then check whether the seasonal difference or the first difference is stationary.
For monthly economic series it is recommended to start checking whether the seasonal difference is stationary. Remember that the seasonal difference of the log of a monthly variable is actually the annual % growth of the variable month by month.
# We get the seasonal difference of the log of the exchange rate:
seasonaldiflog_er<- diff(log(exrate), lag = 12)
Now we do the Dicky-Fuller test to the seasonal difference of the log, which is actually the annual % change in the exchange rate:
## Warning in adf.test(seasonaldiflog_er, k = 1): p-value smaller than printed p-
## value
##
## Augmented Dickey-Fuller Test
##
## data: seasonaldiflog_er
## Dickey-Fuller = -4.2767, Lag order = 1, p-value = 0.01
## alternative hypothesis: stationary
Since the p-value<0.05, then we can reject the null hypothesis that states that the series is not stationary. In other words, we have enough statistical evidence at the 95% confidence that we can treat the seasonal difference of the log as stationary.
Remember that d is how many first differences we apply to the variable (in this case, the log of the exchange rate), and D of the series, while D refers to the # of seasonal differences applied to the variable.
Once we find which transformation is stationary, then we can define the parameters d and D. Since we will use the seasonal difference of the log since it is stationary, then d=0, and D=1.
We run the autocorrelation (ACF) and the partial autocorrelation (PACF) plots:
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF 0.93 0.84 0.75 0.66 0.59 0.52 0.45 0.39 0.31 0.23 0.14 0.05
## PACF 0.93 -0.18 -0.01 -0.06 0.11 -0.09 -0.03 -0.01 -0.17 -0.02 -0.17 0.02
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.03 0.03 0.04 0.04 0.03 0.01 -0.01 -0.03 -0.03 -0.04 -0.04 -0.04
## PACF 0.38 -0.03 0.02 -0.10 0.04 -0.12 0.00 -0.03 -0.01 -0.01 -0.09 -0.07
This is a classical AR signature since the autocorrelations in the ACF are positive and decay slowly, while the autocorrelations in the PACF decline after lag 1. For the AR signature, the definition of p is determined by the PACF plot. The p value will be the # of lag with a positive and significant autocorrelation in the PACF. In this case, p=1 since the autocorrelation of lag 1 is significant and positive, and the autocorrelation of lag 2 drop down to a negative value..
We will use the automatic algorithm auto.arima from the forecast package to help us with the calibration of p, q, P and Q. We run this algorithm as follows:
library(lmtest)
model1 <- auto.arima(exrate,
lambda = 0, # this means that the log will be applied
d=0,D=1, # we are modeling the annual % growth month by month
max.p=2, max.q=1, # setting the maximum values for p and q
max.P = 1,max.Q = 1, # setting the max. values for P and Q
stepwise = FALSE)
coeftest(model1)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.9602980 0.0180497 53.2030 < 2e-16 ***
## ma1 0.1286291 0.0609278 2.1112 0.03476 *
## sar1 -0.5919663 0.0526610 -11.2411 < 2e-16 ***
## drift 0.0063084 0.0035929 1.7558 0.07912 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We run the auto.arima model indicating the values of d and D, and also indicating that we need to apply the logarithm (lambda=0). We could run auto.arima without indication the values for d and D, but I do not recommend to do this for economic series. It is a good idea to determine what transformation of variable we want to model instead of asking to an automatic algorithm.
As expected, the automatic algorithm detected the AR signature with p=1, but it also added other 2 terms: an MA(1) and a Seasonal AR(1) term.
It is recommended to check for the significance for each term. If there are non-significant terms, I recommend to drop the the term with the highest p-value (the less significant). In this case, the MA(1) term and the Seasonal AR(1) are significant (pvalues<0.05), so we keep them in the model.
Now I need to check whether the errors/residuals is a stationary series. If the errors is a stationary series this means that we have all terms that provide explanation power of the series:
We run the ACF and the PACF for the errors series. The errors are already calculated and stored in the model in the residuals attribute:
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0 0.02 0.09 -0.12 0.04 0.04 -0.05 0.06 -0.03 0.13 0.08 -0.08 -0.04
## PACF 0 0.02 0.09 -0.12 0.04 0.03 -0.03 0.04 -0.03 0.14 0.06 -0.07 -0.07
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF -0.12 -0.02 0.00 0.05 -0.02 -0.01 -0.02 0.00 0.02 0.07 -0.33
## PACF -0.10 0.01 -0.03 0.08 -0.04 -0.01 -0.04 -0.01 0.04 0.09 -0.33
We see that the autocorrelation of lag 4 is negative and barely significant in both plots. We could add this to the model, but for now we will keep with the suggestion of the automatic algorithm and consider the errors as white noise, so we would consider this model as a good one (for now…).
The automated model gave us the following terms: an AR(1), and MA(1), and a Seasonal AR(1) term. Then, I can re-run it directly indicating these values:
Showing the model output again:
model2 <- Arima(exrate,
lambda = 0, # apply the natural log to the variable
order = c(1,0,1), #p=1; d=0; q=1
seasonal = list(order=c(1,1,0),period=12), #P=1,D=1,Q=0
include.constant = TRUE,
)
coeftest(model2)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.9602980 0.0180497 53.2030 < 2e-16 ***
## ma1 0.1286291 0.0609278 2.1112 0.03476 *
## sar1 -0.5919663 0.0526610 -11.2411 < 2e-16 ***
## drift 0.0063084 0.0035929 1.7558 0.07912 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Then, model 2 will be exactly the same as the automated model 1.
For a good interpretation, first identify what is the transformation of the variable that we are modeling. In this case, since we apply the log to the exchange rate and d=0, and D=1 then we are modeling the annual % change of the exchange rate. Now we can continue with the interpretation.
The coefficient phi1 (the coefficient of the AR(1) term) is 0.960298 and it is significantly positive. This means that:
The annual % growth of the exchange rate is positively and significantly related with its own annual growth of the previous month.
For each 1% annual growth of the exchange rate in the previous month, the current annual growth growth will be changed in about 0.960298%.
The coefficient theta1 (the coefficient of the MA(1) term) is 0.1286291 and it is significantly positive. This means that:
The annual % growth of the exchange rate is positively and significantly related with the shock (error) of the previous month.
For each 1% change in the shock of the previous month, the current annual growth growth will be changed in about 0.1286291%.
The coefficient of the seasonal AR(1) term is -0.5919663and it is significantly negative This means that:
The annual % growth of the exchange rate is negatively and significantly related with its own annual growth of 12 months ago. If the exchange rate grew 12 months ago, it is likely that this month the exchange rate will go down.
For each +1% annual growth of the exchange rate one year ago, the current annual growth will be changed in about -0.5919663%.
Now we can easily do a forecast for the exchange rate using our calibrated model using the forecast function from the forecast package. We will do a forecast for the next 19 months in the future:
exrate_forecast <- forecast(model1, h=19)
# I stored the information of the forecast in forecast_air
# I plot the forecast
autoplot(exrate_forecast)
## Jan Feb Mar Apr May Jun Jul Aug
## 2021 20.10863 19.78610 20.42701
## 2022 19.58742 20.46246 22.54122 22.81146 21.76132 22.44893 21.88774 22.04655
## Sep Oct Nov Dec
## 2021 20.42127 19.87018 19.75430 19.33482
## 2022 22.24446 21.58319 20.91461 20.63619
The forecast is shown in purple. We can see that the forecast for the mean exchange rate, and also the 80% confidence interval and the 95% confidence interval of this forecast. The 95% confidence interval is in light purple.
We can see that the forecast error can be big since the 95% confidence interval of the forecast for the end of 2022 is between about $15 and $30.
We can display the mean forecast for the following 19 months:
## Jan Feb Mar Apr May Jun Jul Aug
## 2021 20.10863 19.78610 20.42701
## 2022 19.58742 20.46246 22.54122 22.81146 21.76132 22.44893 21.88774 22.04655
## Sep Oct Nov Dec
## 2021 20.42127 19.87018 19.75430 19.33482
## 2022 22.24446 21.58319 20.91461 20.63619
We can see the maximum exchange rate in the forecast period:
## [1] 22.81146
According to our forecast, the maximum exchange rate will be close to $23.00, and this might happen in July 2022.
To improve this forecast, it is a a good idea to incorporate other macro economic variables as explanatory variables of the exchange rate. When we include explanatory variables in an ARIMA-SARIMA model, the model is called ARIMAX model.
Go to the course site and carefully re-read the Notes a) Introduction to time-series financial models and b) re-read the theory explained in this workshop about the Introduction to ARIMA-SARIMA Models.
Go to Canvas and respond Quiz 2. You will have 3 attempts. Questions in this Quiz are related to concepts of the readings related to this Workshop.
The grade of this Workshop will be the following:
Remember that you have to submit your .html file through Canvas BEFORE NEXT CLASS.