Workshop 4, Financial Econometrics II Alberto Dorantes, Ph.D. Mar 15,
2022
Abstract
In this workshop we learn how to design/calibrate a seasonal ARIMA
model (SARIMA) for a time series.
Introduction to ARIMA-SARIMA Models
A seasonal ARIMA model, also called ARIMA/SARIMA model, is usually
applied to monthly or quarterly time series. This model is very
effective for any of the following type of economic/financial
series:
Historical monthly or quarterly sales data
Historical monthly or quarterly price data
Historical monthly or quarterly data related to any variable of any
financial statement. For example: cost of good sold, sales and general
administrative expenses, income tax, EBIT, etc.
Historical monthly or quarterly data related to any macro economic
variable. For example: inflation indexes, GDP indexes, exchange rates,
etc.
These models are not so effective for daily stock prices or financial
indexes. The main reason is that one of the most important conditions
for a financial market to exist is that no body can predict stock prices
since the log of stock prices are supposed to behave as a Random Walk!
However, these models are useful in this context compared to just doing
intuitive guesses.
Calibration Steps for ARIMA-SARIMA
Each ARIMA-SARIMA model needs to be CALIBRATED. The general steps to
calibrate a model are the following:
Check if the series has seasonality. If the series shows a seasonal
pattern, start with the seasonal difference of the log of the series,
which is the % annual growth of the series. Usually monthly and
quarterly data of many economic and financial series have seasonal
pattern. I recommend that if you have monthly or quarterly data of any
financial or economic variable, start testing whether the seasonal
difference of the log of the series is stationary. If you have daily
data, it is hard to model for seasonality. In the case of daily data,
start testing whether the first difference of the log is stationary.
In this step you start with a version of the series.
For next steps we will assume that we decided to use the seasonal
difference of the log of the series.
Test whether the seasonal difference is stationary. If so, continue
to next step. If not, then change the variable to the first difference
of the log of the series, which is the % change for each period. Most of
the economic and financial series become stationary with the first log
difference.
Run the ACF and PACF plots to identify the ARIMA-SARIMA parameters p,
d, q, P, D, Q. The ACF is the auto-correlation plot, and the PACF is the
partial autocorrelation plot. Both plots show auto*correlations between
the variable and its own lags.
An ARIMA/SARIMA model has the following “parameters” that need to be
defined:
arima(p,d,q) sarima(P,D,Q,#Periods)
p: refers to the number of autoregressive (AR) terms. Usually this
parameter is either 0,1 or 2.
d: refers to how many first differences where needed to the series in
order to make the series stationary. Usually this parameter is either 0
or 1.
q: refers to the number of moving average (MA) terms in the model.
Usually this parameter is either 0,1 or 2.
P: refers to the number of SEASONAL autoregressive terms. Usually
this parameter is either 0 or 1.
D: refers to how many SEASONAL differences were needed to the series
to make the series stationary. Usually this parameter is either 0 or
1.
Q: refers to the number of SEASONAL moving average terms. Usually
this parameter is either 0 or 1.
#periods: refers to the number of periods in the year. For example,
if the data is monthly, then #periods=12.
These parameters are usually expressed as: ARIMA(p,d,q) SARIMA(P,D,Q,
#periods).
When you identify the first values of these paremeters, then you will
have your first CALIBRATION of the model.
Estimate/Run the ARIMA-SARIMA model
Run the ACF and PACF of the residuals/errors of the model to check
whether the errors is a white noise series. In other words, if there is
no significant autocorrelations of the errors.
If there is one or more significant autocorrelations, we can include
other term(s) in the ARIMA-SARIMA model, and go back to step 4.
If there the errors seem like a white noise (no signficant
autocorrelations), then we can continue and finish the calibration
process.
Interpret the model
Run a forecast using the model
Let’s start practicing with these models.
Example - model for air passengers
Identifying seasonality in a time-series
We will work with a simple dataset that has the number of air
passengers of an US Airline. This series might have a strong seasonality
since people used to fly in vacation periods, which is the the same each
year.
Download the excel file from a web site:
download.file("http://www.apradie.com/ec2004/air2.xlsx", "air2.xlsx", mode="wb")
trying URL 'http://www.apradie.com/ec2004/air2.xlsx'
Content type 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' length 5924 bytes
==================================================
downloaded 5924 bytes
In order to read Excel files, we need the readxl package:
library(readxl)
Now, we read the data from the Excel file:
dataset <- read_excel("air2.xlsx")
# I familiarize with the data
head(dataset)
In order to perform time-series analysis, it is better to use xts
objects. We can construct this by using the xts() method of the xts
package. Objects of class xts are composed by 2 elements: the core data
and the index (class Date, POSIX,etc).
# Load library xts and zoo. Install the packages if you haven't already.
library(xts)
library(zoo)
library(tseries)
library(forecast)
# I get the date of the dataset:
date <- dataset$month
# I join both objects using xts() and assign it to a new object called air.xts
air.xts<- xts(x=dataset$air, frequency = "monthly", order.by=date)
names(air.xts)<-c("Passengers")
class(air.xts)
[1] "xts" "zoo"
# I see behaviour of air passengers over time
plot(air.xts$Passengers)

This dataset contains air passengers by month from an airline. Do the
following:
Does the series look stationary? Obviously not, it looks with a clear
growing trend, and a seasonality pattern. The mean of the series GROWS
OVER TIME.
Get the natural logarithm of the series:
lnair.xts = log(air.xts)
Testing for stationary
Since we see a clear seasonality pattern, it is strongly recommended
to use the seasonal difference of the log of the series. So, we will
check whether we can treat the seasonal difference of the log as
STATIONARY.
We start by plotting the seasonal difference of the log, which is the
annual % growth (in cc) of the air passengers:
plot(diff(lnair.xts,lag=12))

It is hard to see whether this is stationary or not, so we have to
run the Dicky-Fuller test. Run the Dicky-Fuller test using the adf.test
function of the tseries packages.
adf.test(na.omit(diff(lnair.xts,lag=12)),alternative="stationary",k=0)
Warning in adf.test(na.omit(diff(lnair.xts, lag = 12)), alternative = "stationary", :
p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: na.omit(diff(lnair.xts, lag = 12))
Dickey-Fuller = -4.988, Lag order = 0, p-value = 0.01
alternative hypothesis: stationary
We got a p-value<0.05, so we can treat the seasonal difference
(the annual % growth) as stationary and continue with this series to
calibrate an ARIMA-SARIMA model.
ACF and PACF plots
Once we confirm that our series is stationary, then the first step is
to do the AC and PAC plots to identify how many AR and MA terms we can
include in our ARIMA/SARIMA model.
Remember that we can compute the autocorrelation plots with the acf2
function from the astsa package. Let’s examine the autocorrelations of
our series. Remember that our series is the seasonal difference of the
log of air passengers, which is the annual % growth of air passengers
month by month.
library(astsa)
Attaching package: ‘astsa’
The following object is masked from ‘package:forecast’:
gas
# I create a ts object from the xts; the acf2 works better with ts objects
lnair =ts(lnair.xts$Passengers)
acf2(diff(lnair,lag=12))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.71 0.62 0.48 0.44 0.39 0.32 0.24 0.19 0.15 -0.01
PACF 0.71 0.23 -0.06 0.10 0.05 -0.06 -0.06 0.01 -0.01 -0.29
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
ACF -0.11 -0.24 -0.14 -0.14 -0.10 -0.15 -0.10 -0.11 -0.14
PACF -0.16 -0.14 0.29 0.06 0.05 -0.04 0.13 -0.06 -0.15
[,20] [,21] [,22]
ACF -0.16 -0.11 -0.08
PACF -0.02 0.12 -0.18

I used the ts function to transform the dataset into a ts dataset
since this type of R object will allow us to easily run and forecast
arima-sarima models.
ACF and PACF interpretation
The ACF shows the autocorrelations (AC) between the series and its
lags, and the PACF shows the partial autocorrelations (PAC) between the
series and its lags.
The VERTICAL lines show the MAGNITUD of the AUTOCORRELATION between a
LAG and the current value of the series.
The blue horizontal dotted lines in the plots cover the 95%
confidence interval for the autocorrelations. So, if the vertical line
croses the blue dotted line, it means that that specific autocorrelation
is SIGNIFICANTLY DIFFERENT THAN ZERO (it could be positive or
negative).
Now what is the general interpretation of the these plots?
First, we always have to remember what we are modelling. In this case
we are modelling the seasonal difference of the log, which is the ANNUAL
% GROWTH of passengers month by month.
In the ACF plot I see that the first 8 ACs are positive and
significant, and they gradually decay. For example, the AC of LAG 1 is
about 70% (almost 0.70). This means that ANNUAL % growth of air
passengers is strongly correlated (with a correlation of 70%) with ITS
OWN ANNUAL % GROWTH of the PREVIOUS month (its LAG 1).
The AC of LAG 2 is about 62%, so the ANNUAL % GROWTH of passengers is
strongly correlated with ITS OWN ANNUAL % GROWTH of 2 MONTHS AGO (its
LAG 2)… and so on.
It is important to mention that the ACF plot shows autocorrelations
independently of each other.
In the case of PACF, we see PARTIAL autocorrelations, which measure
HOW MUCH ELSE the series is correlated with each its own LAGS, AFTER
CONSIDERING the effect of LOWER-ORDER LAG AUTOCORRELATIONS.
I see that the PACF shows only the first 2 autocorrelations to be
positive AND SINGIFICANT, and the magnitude of the following
autocorrelations goes down to zero or negative very quickly.
When the ACF plot shows a SLOW DECLINE of autocorrelations, and the
PACF shows a FAST DECLINE of autocorrelatinos, this is a clear PATTERN
OF AN AR(p) MODEL. In this case, the # of AR terms is determined by the
SIGNIFICANT LAGS in the PACF plot.
We see that also the AC of LAG 10 is statistically significant and
negative, but it is recommended first to start with the AR(2) model, and
then check the residuals of the model to see whether there is another
significant lag.
Then, in this case we start with an ARIMA with p=2 and q=0.
Read Dr. Nau slides to review more PATTERNS of ACF and PACF
plots.
Estimate the ARIMA-SARIMA model
In this case, I can start with the following calibration. Now we can
use the log series, and leave the model to automatically calculate the
SEASONAL difference. Then, I will use the log of the series (log of air
passengers) and define my model as:
arima(p,d,q) sarima(P,D,Q,#periods in the year)
In our example, the values for the parameters are:
arima(2,0,0) sarima(0,1,0,12)
What this model tells us?
You first have to look at the # of periods. In this case, you have 12
periods in a year, so your series should be monthly.
Then you look at d and D (first differences and the seasonal
differences). In this case, d=0 and D=1. What does this mean? this means
that we are modeling the SEASONAL difference (D=1) of the log of the
variable, which is the annual % growth (month by month). Since d=0 this
means that we are NOT using the first difference of the series.
Then you look at the p, q, P, Q. p tells you how many AR terms are
included in your model. In this case, p=2, so your model includes the
first 2 AR terms. Since q, P and Q are zero, then your model does not
include any of these type of terms.
How this model can be expressed mathematically?
Since you are modeling the seasonal difference, you can express this
model as the following equation:
△12lnair=ϕ0+ϕ1(L1△12lnairt)+ϕ2(L2△12lnairt)+εt
Using a simpler notation, this equation can be expressed as:
s12.lnairt=ϕ0+ϕ1L1.s12.lnairt+ϕ2L2.s12.lnairt+εt
Remember that s12.lnair is the seasonal difference of the log of air
passengers, which is the annual %growth of air passengers month by
month. Then we can read this mathematical expression as: “The annual
%growth (month by month) of air passengers at time t can be determined
by its own annual % growth of air passengers at time t-1 and t-2 (the
%growth of the previous 2 months), and by a random shock”
Running the ARIMA/SARIMA
Now we can estimate the ARIMA-SARIMA model using the Arima function
of the forecast package.
In this case, instead of creating a variable for the seasonal
difference of the log (s12.lnair), we will configure the parameters of
the Arima function so that the log transformation and the seasonal
difference will be calculated before estimating the arima
coefficients.
In our first calibration we discussed above, we defined the following
values of the ARIMA-SARIMA model for the log of the number of air
passengers:
arima(2,0,0) sarima(0,1,0,12)
This means that D=1, so R will calculate the seasonal difference, and
then estimate the arima model and include 2 AR terms and no MA
terms.
We can run this model in R as follows:
m1 <- Arima(air.xts$Passengers, # We use the Passengers column of the dataset
order = c(2,0,0), # we indicate that p=0; d=0; q=0
seasonal = list(order=c(0,1,0),period=12), #P=0; D=1; Q=0, #periods=12
include.constant = TRUE, # Here we indicate to include the phi0 coefficient
lambda = 0) # Here we indicate to apply the natural log to the series
m1
Series: air.xts$Passengers
ARIMA(2,0,0)(0,1,0)[12] with drift
Box Cox transformation: lambda= 0
Coefficients:
ar1 ar2 drift
0.5540 0.2378 0.0096
s.e. 0.0845 0.0848 0.0014
sigma^2 = 0.001742: log likelihood = 233.13
AIC=-458.26 AICc=-457.95 BIC=-446.73
If we set lamda = 0 this means that the the model will use the log of
the variable (passengers); this is one of the Box-Jenkins mathematical
transformations, and it is the most common for financial and economic
series.
We can alternatively use the function sarima from the astsa model to
run the same arima model. The sarima function provides more detail in
the output, but the Arima function has advantages when we want to get
the forecast with the model. We run sarima for the same model as:
m1a<-sarima(log(air.xts$Passengers), p=2, d=0, q=0,P=0,D=1,Q=0,S=12)
initial value -2.794390
iter 2 value -2.940651
iter 3 value -3.175869
iter 4 value -3.186870
iter 5 value -3.189461
iter 6 value -3.189605
iter 7 value -3.189606
iter 8 value -3.189607
iter 9 value -3.189607
iter 10 value -3.189608
iter 11 value -3.189609
iter 12 value -3.189609
iter 12 value -3.189609
iter 12 value -3.189609
final value -3.189609
converged
initial value -3.184486
iter 2 value -3.184596
iter 3 value -3.184959
iter 4 value -3.185009
iter 5 value -3.185059
iter 6 value -3.185073
iter 7 value -3.185076
iter 8 value -3.185076
iter 8 value -3.185076
final value -3.185076
converged

m1a
$fit
Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 constant
0.5540 0.2378 0.0096
s.e. 0.0845 0.0848 0.0014
sigma^2 estimated as 0.001701: log likelihood = 233.13, aic = -458.26
$degrees_of_freedom
[1] 129
$ttable
Estimate SE t.value p.value
ar1 0.5540 0.0845 6.5597 0.0000
ar2 0.2378 0.0848 2.8037 0.0058
constant 0.0096 0.0014 6.8655 0.0000
$AIC
[1] -3.471669
$AICc
[1] -3.470249
$BIC
[1] -3.384312
As we see, we need to take the log first and then specify the model.
The result is exactly the same as the Arima result, but with sarima we
can see the coefficients, their p-values and other important plots of
the model. Also, we can see important plots such as ACF of the
errors.
Interpretation of an ARIMA-SARIMA model
Since the Arima function does not display the p values and t values
of the coefficients, we can use the function coeftest from the lmtest
library.
You have to install the lmtest library and the load it:
library(lmtest)
Now we can see the coefficients and their corresponding standard
error, pvalue and t value. We apply the coeftest function to the m1
model created with Arima:
coeftest(m1)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 0.5540129 0.0844571 6.5597 5.392e-11 ***
ar2 0.2377939 0.0848135 2.8037 0.005052 **
drift 0.0095854 0.0013962 6.8655 6.625e-12 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The drift coefficient is the phi0 coefficient. phi0 and both lags are
positive and SIGNIFICANT. What does this mean? First, I have to remember
which variable I am modelling. I am modeling the seasonal difference of
log passengers, which is the annual % growth of air passengers month by
month. Then, I can interpret this ARIMA-SARIMA model as follows:
The annual % growth of air passengers is positively and significantly
related with its annual % growth of air passengers of last month (lag1).
In a similar way, the annual % growth of air passengers is positively
and significantly related with the annual % growth of two months ago.
More specifically, 55.4% of annual growth of last month is passed to
this month’s annual % growth, and only 23.78% of the annual growth of 2
months ago is passed to the current annual % growth.
In other words, for each percentual point of annual growth of
previous month, it is expected that the current annual growth will grow
about 0.55 percent points
It is interesting to see that the sum of phi1 and phi2 (0.55+0.23)
MUST be < 1 in order for the series to be stationary. If you include
more arima terms, the sum of the phi’s must be less than one. If you
include MA terms, the sum of the theta coefficients also must be less
than one. In addition, the constant or phi0 is positive and significant,
indicating that the annual growth of air passengers has a positive
tendency to grow over time.
The mathematical model is:
E[s12.lnair]=0.0096+0.554∗L1.s12.lnair+0.2378∗L2.s12.lnair or
s12.lnair=0.0096+0.554∗L1.s12.lnair+0.2378∗L2.s12.lnair+error
Autocorrelations of the Model residuals/errors
We need to generate a series for the residuals of the ARIMA-SARIMA
model. To do this, get the residuals as follows:
air_res <- m1$residuals
head(air_res)
Time Series:
Start = 1
End = 432001
Frequency = 1.15740740740741e-05
[1] 0.004708908 0.004751501 0.004854026 0.004821445
[5] 0.004747833 0.004847728
Test whether the residuals is a white noise series
Test whether the residual series (the air_res variable) is a white
noise running the ac and pac plots. A white noise is a series that has
no signicant autocorrelations with any of its own lags.
acf2(air_res)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.01 0.01 -0.13 0.02 0.09 0.05 -0.03 0.05 0.20 -0.01
PACF 0.01 0.01 -0.13 0.02 0.10 0.03 -0.03 0.08 0.21 -0.04
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
ACF -0.09 -0.40 0.04 0.01 0.11 -0.11 0.07 0.04 -0.05
PACF -0.10 -0.37 0.01 -0.04 0.01 -0.09 0.16 0.11 -0.08
[,20] [,21] [,22]
ACF -0.13 -0.03 -0.03
PACF -0.08 0.16 -0.09

INTERPRET these plots THESE PLOTS ARE MEANT TO TEST WHETHER THE
RESIDUAL SERIES HAVE WHITE NOISE RUNNING IN THEIR PLOTS. AS DEFINED
EARLIER, WHITE NOISE IS A SERIES THAT HAS NO SIGNIFICANT
AUTOCORRELATIONS WITH ANY OF THE LAGS. THE PLOTS SUGGEST THESE RESIDUALS
ARENT WHITE NOISE DUE TO THE AMOUNT OF INCIDENCY IN THE 750000 AND
100000 LAG MARKERS.
Can you add another AR or MA term to improve the previous model and
then to make the residual a white noise ?
YES, WE CAN ADD MORE TERMS AND ESTIMATE THE ARIMA-SARIMA MODEL AS
INSTRUCTED EARLIER IN THIS WORKSHOP.
Forecast air passengers
We re-run the Arima model we run above, but now we will transform the
dataset from xts to a ts R object to improve the visualization of the
forecast:
#Convert the xts to a ts dataset:
air.ts <- ts(coredata(air.xts$Passengers),start=c(1949,1),frequency=12)
# We display the content of the ts dataset:
air.ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432
# Run the Arima model:
m1 <- Arima(air.ts, order = c(2,0,0), # p=2:include 2 AR terms; q=0: No MA terms;
seasonal = list(order=c(0,1,0), # D=1: use seasonal difference; No Seasonal terms
period=12), # periods=12 period in the year
include.constant = TRUE, # include the phi0 (drift) coefficient
lambda = 0) # do the log transformation
m1
Series: air.ts
ARIMA(2,0,0)(0,1,0)[12] with drift
Box Cox transformation: lambda= 0
Coefficients:
ar1 ar2 drift
0.5540 0.2378 0.0096
s.e. 0.0845 0.0848 0.0014
sigma^2 = 0.001742: log likelihood = 233.13
AIC=-458.26 AICc=-457.95 BIC=-446.73
# We can see the pvalues of each ARIMA-SARIMA coefficient using the
# coeftest from the lmtest library:
coeftest(m1)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 0.5540129 0.0844571 6.5597 5.392e-11 ***
ar2 0.2377939 0.0848135 2.8037 0.005052 **
drift 0.0095854 0.0013962 6.8655 6.625e-12 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The pvalue of the drift, ϕ0, and the pvalues of the AR coefficients,
ϕ1 and ϕ2, are very significant (<0.05). Then we keep these terms in
the model.
Using the last Arima model, run a forecast for the following 24
months.
We run the forescast using the arima model m1 (the one I run with the
Arima function. We indicate to do a forecast for 24 months, specifying
h=24:
forecast_air <- forecast(m1, h=24)
# I stored the information of the forecast in forecast_air
# I plot the forecast
autoplot(forecast_air)

The forecast_air R object has both, historical data and forecast
data. A very nice feature of the forecast function after using Arima
function, is that we DO NOT HAVE TO do transformations to get the
original scale of the variable, which is in tons. The Arima function
automatically does the log transformation, and the forecast function
automatically does the exponential transofrmation to get the forecast in
tons!
The forecast_air R object is a list type of object that contains
several elements.
attributes(forecast_air)
$names
[1] "method" "model" "level" "mean"
[5] "lower" "upper" "x" "series"
[9] "fitted" "residuals"
$class
[1] "forecast"
The forecast values are in the mean attribute:
forecast_air$mean
Jan Feb Mar Apr May Jun
1961 450.5662 424.4918 457.4920 505.5161 519.5149 590.6793
1962 503.1498 474.3893 511.5911 565.5928 581.5127 661.4150
Jul Aug Sep Oct Nov Dec
1961 688.5216 672.2711 564.5825 513.1307 434.6578 481.9836
1962 771.2137 753.2077 632.6921 575.1384 487.2573 540.3795
forecast_air also has historical data (x), residuals, the lower and
upper limit of the 95% confidence interval of the forecast, etc !
Q CHALLENGE - Modelling consumer sales with an ARIMA/SARIMA
model
FYI BEFORE YOU CONTINUE READING MY WORKSHOP, PLEASE KNOW THAT I
DIDN’T MANAGE TO SOLVE IT. I TRIED MANY TIMES BUT I COULDN’T HACK IT.
THEORETICAL QUESTIONS WERE ANSWERED BUT I COULDN’T MODEL.
Go to the course site (Material) and download Dr. Nau slides about
ARIMA/SARIMA models. Read the slides and focus on his recommendations to
select the number of p, d, q, P, D, Q for any ARIMA/SARIMA model.
Download the following Dataset:
download.file("http://www.apradie.com/ec2004/salesfab2.xlsx", "salesfabs2.xlsx", mode="wb")
trying URL 'http://www.apradie.com/ec2004/salesfab2.xlsx'
Content type 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' length 11096 bytes (10 KB)
==================================================
downloaded 10 KB
Now, we read the data from the Excel file and construct an xts
dataset:
sales_brands <- read_xlsx("salesfabs2.xlsx", sheet = "Sheet1")
#I create an xts object to handle the object and dates in a better way:
# I construct a date variable as a sequence of dates using the month column:
date <- seq(as.Date(sales_brands$month[1]),
by="month",
length.out = nrow(sales_brands) )
# I delete the first column since it has the month, and I will construct
# an xts dataset with the index equal to the month
sales_brands = sales_brands[,-1]
# I create the xts object:
sales_brands<- xts(x=sales_brands, frequency = "monthly", order.by=date)
This dataset contains consumer national monthly sales of a Category
of products. The sales is in tons (qfab#) and prices (pfab#) and include
sales of few products. You have to do the following:
dataset <- read_excel("salesfabs2.xlsx")
# I familiarize with the data
head(dataset)
# I join both objects using xts() and assign it to a new object called price.xts
price.xts<- xts(x=sales_brands$price, frequency = "monthly", order.by=date)
class(price.xts)
[1] "xts" "zoo"
# Load library xts and zoo. Install the packages if you haven't already.
library(xts)
library(zoo)
library(tseries)
library(forecast)
# I get the date of the dataset:
date <- dataset$month
# I join both objects using xts() and assign it to a new object called volume.xts
volume.xts<- xts(x=sales_brands$volume, frequency = "monthly", order.by=date)
class(volume.xts)
[1] "xts" "zoo"
Generate the log variables for price (pfab1) and sales in volume
(qfab1) for product 1.
lnprice.xts = log(price.xts)
lnvolume.xts = log(volume.xts)
Now you have to work in a model for qfab1, sales in tons of product
1:
plot(volume.xts$date)
Error in `$.zoo`(volume.xts, date) :
not possible for univariate zoo series
Graph the natural log of sales in tons of this product. WHAT DO YOU
OBSERVE about the series? does it look stationary or nonstationary? does
it look with seasonality?
plot(diff(lnvolume.xts,lag=1000))
Error in lag.xts(x, k = lag, na.pad = na.pad) :
abs(k) must be less than nrow(x)
If the series is not stationary, try whether the seasonal difference
of the log looks as stationary (s12.lnqfab1). Do a time series plot, run
and INTERPRET the Dicky-Fuller test
adf.test(na.omit(diff(lnvolume.xts,lag=1000)),alternative="stationary",k=0)
Error in lag.xts(x, k = lag, na.pad = na.pad) :
abs(k) must be less than nrow(x)
According to Dr Nau slides and what you learned in this workshop,
calibrate the best ARIMA/SARIMA model.
I’VE TRIED MANY TIMES, CANT DO IT?
library(astsa)
# I create a ts object from the xts; the acf2 works better with ts objects
lnvolume.xts =ts(lnvolume.xts$tons)
Error in `$.zoo`(lnvolume.xts, tons) :
not possible for univariate zoo series
What is a white noise series? RESPOND WITH YOUR WORDS
WHEN WHITE NOISE IS PRESENT, ANY PAIR OF VALUES THAT CAN BE TAKEN AT
ANY GIVEN TIME ARE NOT CORRELATED. IT IS A STATIONARY TIME SERIES WITH
NO AUTOCORRELATION.
Create a variable for the residuals of your model. Run the ACF plot
and the PACF plot for this residual. Does the residual look like a white
noise?
m1 <- Arima(volume.xts$tons, order = c(2,0,0), seasonal = list(order=c(0,1,0),period=12), include.constant = TRUE, lambda = 0)m1
Error: unexpected symbol in "m1 <- Arima(volume.xts$tons, order = c(2,0,0), seasonal = list(order=c(0,1,0),period=12), include.constant = TRUE, lambda = 0)m1"
m1a<-sarima(log(volume.xts$tons), p=2, d=0, q=0,P=0,D=1,Q=0,S=12)
Error in `$.zoo`(volume.xts, tons) :
not possible for univariate zoo series
Why do you think that the residuals of your best model should look
like A White noise? RESPOND WITH YOUR OWN WORDS
I’VE TRIED MANY TIMES, CANT DO IT.
Create 24 new observations in the future I’VE TRIED MANY TIMES, CANT
DO IT. Create a forecast for the future observations using your last
model I’VE TRIED MANY TIMES, CANT DO IT. (optional) Estimate the Akaike
Information Criterion (AIC) of the model. To do this, run the following
command right after running your arima model: I’VE TRIED MANY TIMES,
CANT DO IT.
Do a quick research about what is the AIC of the model and EXPLAIN
WITH YOUR OWN WORDS
AIC STANDS FOR AKIAKE INFORMATION CRITERION. THIS MATHEMATHICAL
METHOD FOR EVALUATION ALLOWS TO KNOW HOW WELL A MODEL FITS THE DATA IT
ORIGINATED FROM. IT’S USE IN STATISTICS IS FOR COMPARING DIFFERENT
MODELS IN ORDER TO CHOOSE THE BEST ONE. IN ORDER TO CALCULATE AIC, ONE
NEEDS THE FOLLOWING.
THE NUMBER OF INDEPENDENT VARIABLES USED TO BUILD SAID MODEL. THE
MAXIMUM LIKELIHOOD ESTIMATE OF THE MODEL
“The best-fit model according to AIC is the one that explains the
greatest amount of variation using the fewest possible independent
variables.” (R. BEVANS, 2021).
Bevans, R. (2021, June 18). An introduction to the akaike information
criterion. Scribbr. Retrieved March 16, 2022, from https://www.scribbr.com/statistics/akaike-information-criterion/#:~:text=The%20Akaike%20information%20criterion%20(AIC,best%20fit%20for%20the%20data.
---
title: "Workshop 4, Financial Econometrics II"
author: Stefan Schweitzer A01209755
output: html_notebook
---
Workshop 4, Financial Econometrics II
Alberto Dorantes, Ph.D.
Mar 15, 2022

# Abstract

In this workshop we learn how to design/calibrate a seasonal ARIMA model (SARIMA) for a time series.

##  Introduction to ARIMA-SARIMA Models
A seasonal ARIMA model, also called ARIMA/SARIMA model, is usually applied to monthly or quarterly time series. This model is very effective for any of the following type of economic/financial series:

Historical monthly or quarterly sales data

Historical monthly or quarterly price data

Historical monthly or quarterly data related to any variable of any financial statement. For example: cost of good sold, sales and general administrative expenses, income tax, EBIT, etc.

Historical monthly or quarterly data related to any macro economic variable. For example: inflation indexes, GDP indexes, exchange rates, etc.

These models are not so effective for daily stock prices or financial indexes. The main reason is that one of the most important conditions for a financial market to exist is that no body can predict stock prices since the log of stock prices are supposed to behave as a Random Walk! However, these models are useful in this context compared to just doing intuitive guesses.

## Calibration Steps for ARIMA-SARIMA
Each ARIMA-SARIMA model needs to be CALIBRATED. The general steps to calibrate a model are the following:

Check if the series has seasonality. If the series shows a seasonal pattern, start with the seasonal difference of the log of the series, which is the % annual growth of the series. Usually monthly and quarterly data of many economic and financial series have seasonal pattern.
I recommend that if you have monthly or quarterly data of any financial or economic variable, start testing whether the seasonal difference of the log of the series is stationary. If you have daily data, it is hard to model for seasonality. In the case of daily data, start testing whether the first difference of the log is stationary.

In this step you start with a version of the series.

For next steps we will assume that we decided to use the seasonal difference of the log of the series.

Test whether the seasonal difference is stationary. If so, continue to next step. If not, then change the variable to the first difference of the log of the series, which is the % change for each period. Most of the economic and financial series become stationary with the first log difference.

Run the ACF and PACF plots to identify the ARIMA-SARIMA parameters p, d, q, P, D, Q. The ACF is the auto-correlation plot, and the PACF is the partial autocorrelation plot. Both plots show auto*correlations between the variable and its own lags.

An ARIMA/SARIMA model has the following “parameters” that need to be defined:

arima(p,d,q) sarima(P,D,Q,#Periods)

p: refers to the number of autoregressive (AR) terms. Usually this parameter is either 0,1 or 2.

d: refers to how many first differences where needed to the series in order to make the series stationary. Usually this parameter is either 0 or 1.

q: refers to the number of moving average (MA) terms in the model. Usually this parameter is either 0,1 or 2.

P: refers to the number of SEASONAL autoregressive terms. Usually this parameter is either 0 or 1.

D: refers to how many SEASONAL differences were needed to the series to make the series stationary. Usually this parameter is either 0 or 1.

Q: refers to the number of SEASONAL moving average terms. Usually this parameter is either 0 or 1.

#periods: refers to the number of periods in the year. For example, if the data is monthly, then #periods=12.

These parameters are usually expressed as: ARIMA(p,d,q) SARIMA(P,D,Q, #periods).

When you identify the first values of these paremeters, then you will have your first CALIBRATION of the model.

Estimate/Run the ARIMA-SARIMA model

Run the ACF and PACF of the residuals/errors of the model to check whether the errors is a white noise series. In other words, if there is no significant autocorrelations of the errors.

If there is one or more significant autocorrelations, we can include other term(s) in the ARIMA-SARIMA model, and go back to step 4.

If there the errors seem like a white noise (no signficant autocorrelations), then we can continue and finish the calibration process.

Interpret the model

Run a forecast using the model

Let’s start practicing with these models.

## Example - model for air passengers

# Identifying seasonality in a time-series
We will work with a simple dataset that has the number of air passengers of an US Airline. This series might have a strong seasonality since people used to fly in vacation periods, which is the the same each year.

Download the excel file from a web site:

```{r}
download.file("http://www.apradie.com/ec2004/air2.xlsx", "air2.xlsx", mode="wb")
```
In order to read Excel files, we need the readxl package:

```{r}
library(readxl)
```

Now, we read the data from the Excel file:

```{r}
dataset <- read_excel("air2.xlsx")
# I familiarize with the data
head(dataset)
```

In order to perform time-series analysis, it is better to use xts objects. We can construct this by using the xts() method of the xts package. Objects of class xts are composed by 2 elements: the core data and the index (class Date, POSIX,etc).

```{r}
# Load library xts and zoo. Install the packages if you haven't already.
library(xts)
library(zoo)
library(tseries)
library(forecast)


# I get the date of the dataset:
date <- dataset$month
# I join both objects using xts() and assign it to a new object called air.xts
air.xts<- xts(x=dataset$air, frequency = "monthly", order.by=date)
names(air.xts)<-c("Passengers")
class(air.xts)
```
```{r}
# I see behaviour of air passengers over time 
plot(air.xts$Passengers)
```
This dataset contains air passengers by month from an airline. Do the following:

Does the series look stationary? Obviously not, it looks with a clear growing trend, and a seasonality pattern. The mean of the series GROWS OVER TIME.

Get the natural logarithm of the series:
```{r}
lnair.xts = log(air.xts)
```

# Testing for stationary

Since we see a clear seasonality pattern, it is strongly recommended to use the seasonal difference of the log of the series. So, we will check whether we can treat the seasonal difference of the log as STATIONARY.

We start by plotting the seasonal difference of the log, which is the annual % growth (in cc) of the air passengers:

```{r}
plot(diff(lnair.xts,lag=12))
```
It is hard to see whether this is stationary or not, so we have to run the Dicky-Fuller test. Run the Dicky-Fuller test using the adf.test function of the tseries packages.

```{r}
adf.test(na.omit(diff(lnair.xts,lag=12)),alternative="stationary",k=0)
```
We got a p-value<0.05, so we can treat the seasonal difference (the annual % growth) as stationary and continue with this series to calibrate an ARIMA-SARIMA model.

# ACF and PACF plots

Once we confirm that our series is stationary, then the first step is to do the AC and PAC plots to identify how many AR and MA terms we can include in our ARIMA/SARIMA model.

Remember that we can compute the autocorrelation plots with the acf2 function from the astsa package. Let’s examine the autocorrelations of our series. Remember that our series is the seasonal difference of the log of air passengers, which is the annual % growth of air passengers month by month.

```{r}
library(astsa)
# I create a ts object from the xts; the acf2 works better with ts objects
lnair =ts(lnair.xts$Passengers)
acf2(diff(lnair,lag=12))
```
I used the ts function to transform the dataset into a ts dataset since this type of R object will allow us to easily run and forecast arima-sarima models.

# ACF and PACF interpretation
The ACF shows the autocorrelations (AC) between the series and its lags, and the PACF shows the partial autocorrelations (PAC) between the series and its lags.

The VERTICAL lines show the MAGNITUD of the AUTOCORRELATION between a LAG and the current value of the series.

The blue horizontal dotted lines in the plots cover the 95% confidence interval for the autocorrelations. So, if the vertical line croses the blue dotted line, it means that that specific autocorrelation is SIGNIFICANTLY DIFFERENT THAN ZERO (it could be positive or negative).

Now what is the general interpretation of the these plots?

First, we always have to remember what we are modelling. In this case we are modelling the seasonal difference of the log, which is the ANNUAL % GROWTH of passengers month by month.

In the ACF plot I see that the first 8 ACs are positive and significant, and they gradually decay. For example, the AC of LAG 1 is about 70% (almost 0.70). This means that ANNUAL % growth of air passengers is strongly correlated (with a correlation of 70%) with ITS OWN ANNUAL % GROWTH of the PREVIOUS month (its LAG 1).

The AC of LAG 2 is about 62%, so the ANNUAL % GROWTH of passengers is strongly correlated with ITS OWN ANNUAL % GROWTH of 2 MONTHS AGO (its LAG 2)… and so on.

It is important to mention that the ACF plot shows autocorrelations independently of each other.

In the case of PACF, we see PARTIAL autocorrelations, which measure HOW MUCH ELSE the series is correlated with each its own LAGS, AFTER CONSIDERING the effect of LOWER-ORDER LAG AUTOCORRELATIONS.

I see that the PACF shows only the first 2 autocorrelations to be positive AND SINGIFICANT, and the magnitude of the following autocorrelations goes down to zero or negative very quickly.

When the ACF plot shows a SLOW DECLINE of autocorrelations, and the PACF shows a FAST DECLINE of autocorrelatinos, this is a clear PATTERN OF AN AR(p) MODEL. In this case, the # of AR terms is determined by the SIGNIFICANT LAGS in the PACF plot.

We see that also the AC of LAG 10 is statistically significant and negative, but it is recommended first to start with the AR(2) model, and then check the residuals of the model to see whether there is another significant lag.

Then, in this case we start with an ARIMA with p=2 and q=0.

Read Dr. Nau slides to review more PATTERNS of ACF and PACF plots.

# Estimate the ARIMA-SARIMA model
In this case, I can start with the following calibration. Now we can use the log series, and leave the model to automatically calculate the SEASONAL difference. Then, I will use the log of the series (log of air passengers) and define my model as:

arima(p,d,q) sarima(P,D,Q,#periods in the year)

In our example, the values for the parameters are:

arima(2,0,0) sarima(0,1,0,12)

What this model tells us?

You first have to look at the # of periods. In this case, you have 12 periods in a year, so your series should be monthly.

Then you look at d and D (first differences and the seasonal differences). In this case, d=0 and D=1. What does this mean? this means that we are modeling the SEASONAL difference (D=1) of the log of the variable, which is the annual % growth (month by month). Since d=0 this means that we are NOT using the first difference of the series.

Then you look at the p, q, P, Q. p tells you how many AR terms are included in your model. In this case, p=2, so your model includes the first 2 AR terms. Since q, P and Q are zero, then your model does not include any of these type of terms.

How this model can be expressed mathematically?

Since you are modeling the seasonal difference, you can express this model as the following equation:

△12lnair=ϕ0+ϕ1(L1△12lnairt)+ϕ2(L2△12lnairt)+εt

Using a simpler notation, this equation can be expressed as:

s12.lnairt=ϕ0+ϕ1L1.s12.lnairt+ϕ2L2.s12.lnairt+εt

Remember that s12.lnair is the seasonal difference of the log of air passengers, which is the annual %growth of air passengers month by month. Then we can read this mathematical expression as: “The annual %growth (month by month) of air passengers at time t can be determined by its own annual % growth of air passengers at time t-1 and t-2 (the %growth of the previous 2 months), and by a random shock”

## Running the ARIMA/SARIMA
Now we can estimate the ARIMA-SARIMA model using the Arima function of the forecast package.

In this case, instead of creating a variable for the seasonal difference of the log (s12.lnair), we will configure the parameters of the Arima function so that the log transformation and the seasonal difference will be calculated before estimating the arima coefficients.

In our first calibration we discussed above, we defined the following values of the ARIMA-SARIMA model for the log of the number of air passengers:

arima(2,0,0) sarima(0,1,0,12)

This means that D=1, so R will calculate the seasonal difference, and then estimate the arima model and include 2 AR terms and no MA terms.

We can run this model in R as follows:

```{r}
m1 <- Arima(air.xts$Passengers, # We use the Passengers column of the dataset
        order = c(2,0,0), # we indicate that p=0; d=0; q=0 
        seasonal = list(order=c(0,1,0),period=12), #P=0; D=1; Q=0, #periods=12
        include.constant = TRUE, # Here we indicate to include the phi0 coefficient
        lambda = 0) # Here we indicate to apply the natural log to the series  
m1
```

If we set lamda = 0 this means that the the model will use the log of the variable (passengers); this is one of the Box-Jenkins mathematical transformations, and it is the most common for financial and economic series.

We can alternatively use the function sarima from the astsa model to run the same arima model. The sarima function provides more detail in the output, but the Arima function has advantages when we want to get the forecast with the model. We run sarima for the same model as:

```{r}
m1a<-sarima(log(air.xts$Passengers), p=2, d=0, q=0,P=0,D=1,Q=0,S=12)
```

```{r}
m1a
```
As we see, we need to take the log first and then specify the model. The result is exactly the same as the Arima result, but with sarima we can see the coefficients, their p-values and other important plots of the model. Also, we can see important plots such as ACF of the errors.

# Interpretation of an ARIMA-SARIMA model

Since the Arima function does not display the p values and t values of the coefficients, we can use the function coeftest from the lmtest library.

You have to install the lmtest library and the load it:

```{r}
library(lmtest)
```

Now we can see the coefficients and their corresponding standard error, pvalue and t value. We apply the coeftest function to the m1 model created with Arima:

```{r}
coeftest(m1)
```

The drift coefficient is the phi0 coefficient. phi0 and both lags are positive and SIGNIFICANT. What does this mean? First, I have to remember which variable I am modelling. I am modeling the seasonal difference of log passengers, which is the annual % growth of air passengers month by month. Then, I can interpret this ARIMA-SARIMA model as follows:

The annual % growth of air passengers is positively and significantly related with its annual % growth of air passengers of last month (lag1). In a similar way, the annual % growth of air passengers is positively and significantly related with the annual % growth of two months ago. More specifically, 55.4% of annual growth of last month is passed to this month’s annual % growth, and only 23.78% of the annual growth of 2 months ago is passed to the current annual % growth.

In other words, for each percentual point of annual growth of previous month, it is expected that the current annual growth will grow about 0.55 percent points

It is interesting to see that the sum of phi1 and phi2 (0.55+0.23) MUST be < 1 in order for the series to be stationary. If you include more arima terms, the sum of the phi’s must be less than one. If you include MA terms, the sum of the theta coefficients also must be less than one. In addition, the constant or phi0 is positive and significant, indicating that the annual growth of air passengers has a positive tendency to grow over time.

The mathematical model is:

E[s12.lnair]=0.0096+0.554∗L1.s12.lnair+0.2378∗L2.s12.lnair
or
s12.lnair=0.0096+0.554∗L1.s12.lnair+0.2378∗L2.s12.lnair+error

# Autocorrelations of the Model residuals/errors

We need to generate a series for the residuals of the ARIMA-SARIMA model. To do this, get the residuals as follows:

```{r}
air_res <- m1$residuals
head(air_res)
```
# Test whether the residuals is a white noise series
Test whether the residual series (the air_res variable) is a white noise running the ac and pac plots. A white noise is a series that has no signicant autocorrelations with any of its own lags.

```{r}
acf2(air_res)
```

INTERPRET these plots
THESE PLOTS ARE MEANT TO TEST WHETHER THE RESIDUAL SERIES HAVE WHITE NOISE RUNNING IN THEIR PLOTS. AS DEFINED EARLIER, WHITE NOISE IS A SERIES THAT HAS NO SIGNIFICANT AUTOCORRELATIONS WITH ANY OF THE LAGS. THE PLOTS SUGGEST THESE RESIDUALS ARENT WHITE NOISE DUE TO THE AMOUNT OF INCIDENCY IN THE 750000 AND 100000 LAG MARKERS. 

Can you add another AR or MA term to improve the previous model and then to make the residual a white noise ?

YES, WE CAN ADD MORE TERMS AND ESTIMATE THE ARIMA-SARIMA MODEL AS INSTRUCTED EARLIER IN THIS WORKSHOP. 

# Forecast air passengers
We re-run the Arima model we run above, but now we will transform the dataset from xts to a ts R object to improve the visualization of the forecast:

```{r}
#Convert the xts to a ts dataset:
air.ts <- ts(coredata(air.xts$Passengers),start=c(1949,1),frequency=12)
# We display the content of the ts dataset:
air.ts
```

```{r}
# Run the Arima model:
m1 <- Arima(air.ts, order = c(2,0,0), # p=2:include 2 AR terms; q=0: No MA terms;
       seasonal = list(order=c(0,1,0), # D=1: use seasonal difference; No Seasonal terms
                       period=12),  # periods=12 period in the year
        include.constant = TRUE, # include the phi0 (drift) coefficient
        lambda = 0) # do the log transformation
m1
```
```{r}
# We can see the pvalues of each ARIMA-SARIMA coefficient using the
#   coeftest from the lmtest library:
coeftest(m1)
```
The pvalue of the drift, ϕ0, and the pvalues of the AR coefficients, ϕ1 and ϕ2, are very significant (<0.05). Then we keep these terms in the model.

Using the last Arima model, run a forecast for the following 24 months.

We run the forescast using the arima model m1 (the one I run with the Arima function. We indicate to do a forecast for 24 months, specifying h=24:

```{r}
forecast_air <- forecast(m1, h=24)
# I stored the information of the forecast in forecast_air

# I plot the forecast
autoplot(forecast_air)
```
The forecast_air R object has both, historical data and forecast data. A very nice feature of the forecast function after using Arima function, is that we DO NOT HAVE TO do transformations to get the original scale of the variable, which is in tons. The Arima function automatically does the log transformation, and the forecast function automatically does the exponential transofrmation to get the forecast in tons!

The forecast_air R object is a list type of object that contains several elements.

```{r}
attributes(forecast_air)
```
The forecast values are in the mean attribute:

```{r}
forecast_air$mean
```

forecast_air also has historical data (x), residuals, the lower and upper limit of the 95% confidence interval of the forecast, etc !

# Q CHALLENGE - Modelling consumer sales with an ARIMA/SARIMA model

FYI
BEFORE YOU CONTINUE READING MY WORKSHOP, PLEASE KNOW THAT I DIDN'T MANAGE TO SOLVE IT. I TRIED MANY TIMES BUT I COULDN'T HACK IT. THEORETICAL QUESTIONS WERE ANSWERED BUT I COULDN'T MODEL. 

Go to the course site (Material) and download Dr. Nau slides about ARIMA/SARIMA models. Read the slides and focus on his recommendations to select the number of p, d, q, P, D, Q for any ARIMA/SARIMA model. Download the following Dataset:

```{r}
download.file("http://www.apradie.com/ec2004/salesfab2.xlsx", "salesfabs2.xlsx", mode="wb")
```
Now, we read the data from the Excel file and construct an xts dataset:

```{r}
sales_brands <- read_xlsx("salesfabs2.xlsx", sheet = "Sheet1")

#I create an xts object to handle the object and dates in a better way:

# I construct a date variable as a sequence of dates using the month column:
date <- seq(as.Date(sales_brands$month[1]),
                      by="month",
                      length.out = nrow(sales_brands) )
# I delete the first column since it has the month, and I will construct
#   an xts dataset with the index equal to the month
sales_brands = sales_brands[,-1]

# I create the xts object:
sales_brands<- xts(x=sales_brands, frequency = "monthly", order.by=date)
```

This dataset contains consumer national monthly sales of a Category of products. The sales is in tons (qfab#) and prices (pfab#) and include sales of few products. You have to do the following:

```{r}
dataset <- read_excel("salesfabs2.xlsx")
# I familiarize with the data
head(dataset)
```

```{r}
# I join both objects using xts() and assign it to a new object called price.xts
price.xts<- xts(x=sales_brands$price, frequency = "monthly", order.by=date)
class(price.xts)
```
```{r}
# Load library xts and zoo. Install the packages if you haven't already.
library(xts)
library(zoo)
library(tseries)
library(forecast)


# I get the date of the dataset:
date <- dataset$month
# I join both objects using xts() and assign it to a new object called volume.xts
volume.xts<- xts(x=sales_brands$volume, frequency = "monthly", order.by=date)
class(volume.xts)
```


Generate the log variables for price (pfab1) and sales in volume (qfab1) for product 1.

```{r}
lnprice.xts = log(price.xts)
```
```{r}
lnvolume.xts = log(volume.xts)
```


Now you have to work in a model for qfab1, sales in tons of product 1:

```{r}
plot(volume.xts$date)
```


Graph the natural log of sales in tons of this product. WHAT DO YOU OBSERVE about the series? does it look stationary or nonstationary? does it look with seasonality?


```{r}
plot(diff(lnvolume.xts,lag=12))
```


If the series is not stationary, try whether the seasonal difference of the log looks as stationary (s12.lnqfab1). Do a time series plot, run and INTERPRET the Dicky-Fuller test


```{r}
adf.test(na.omit(diff(lnvolume.xts,lag=0)),alternative="stationary",k=0)
```

According to Dr Nau slides and what you learned in this workshop, calibrate the best ARIMA/SARIMA model.

I'VE TRIED MANY TIMES, CANT DO IT?
```{r}
library(astsa)
# I create a ts object from the xts; the acf2 works better with ts objects
lnvolume.xts =ts(lnvolume.xts$tons)
acf2(diff(lnair,lag=12))
```

What is a white noise series? RESPOND WITH YOUR WORDS

WHEN WHITE NOISE IS PRESENT, ANY PAIR OF VALUES THAT CAN BE TAKEN AT ANY GIVEN TIME ARE NOT CORRELATED. IT IS A STATIONARY TIME SERIES WITH NO AUTOCORRELATION.

Create a variable for the residuals of your model. Run the ACF plot and the PACF plot for this residual. Does the residual look like a white noise?
```{r}
m1 <- Arima(volume.xts$tons, order = c(2,0,0), seasonal = list(order=c(0,1,0),period=12), include.constant = TRUE, lambda = 0)m1
```
```{r}
m1a<-sarima(log(volume.xts$tons), p=2, d=0, q=0,P=0,D=1,Q=0,S=12)
```

Why do you think that the residuals of your best model should look like A White noise? RESPOND WITH YOUR OWN WORDS

I'VE TRIED MANY TIMES, CANT DO IT.

Create 24 new observations in the future
I'VE TRIED MANY TIMES, CANT DO IT.
Create a forecast for the future observations using your last model
I'VE TRIED MANY TIMES, CANT DO IT.
(optional) Estimate the Akaike Information Criterion (AIC) of the model. To do this, run the following command right after running your arima model:
I'VE TRIED MANY TIMES, CANT DO IT.

Do a quick research about what is the AIC of the model and EXPLAIN WITH YOUR OWN WORDS

AIC STANDS FOR AKIAKE INFORMATION CRITERION. THIS MATHEMATHICAL METHOD FOR EVALUATION ALLOWS TO KNOW HOW WELL A MODEL FITS THE DATA IT ORIGINATED FROM. IT'S USE IN STATISTICS IS FOR COMPARING DIFFERENT MODELS IN ORDER TO CHOOSE THE BEST ONE. IN ORDER TO CALCULATE AIC, ONE NEEDS THE FOLLOWING. 

THE NUMBER OF INDEPENDENT VARIABLES USED TO BUILD SAID MODEL.
THE MAXIMUM LIKELIHOOD ESTIMATE OF THE MODEL

"The best-fit model according to AIC is the one that explains the greatest amount of variation using the fewest possible independent variables." (R. BEVANS, 2021).

Bevans, R. (2021, June 18). An introduction to the akaike information criterion. Scribbr. Retrieved March 16, 2022, from https://www.scribbr.com/statistics/akaike-information-criterion/#:~:text=The%20Akaike%20information%20criterion%20(AIC,best%20fit%20for%20the%20data. 



