1 Abstract

In this workshop we learn how to design/calibrate a seasonal ARIMA model (SARIMA) for a time series.

2 Introduction to ARIMA-SARIMA Models

An ARIMA/SARIMA model is applied to monthly or quarterly time series. It shows:

1 Historical monthly or quarterly sales data 2 Historical monthly or quarterly price data 3 Historical monthly or quarterly data related to any variable of any financial statement. For example: cost of good sold, sales and general administrative expenses, income tax, EBIT, etc. 4 Historical monthly or quarterly data related to any macro economic variable. For example: inflation indexes, GDP indexes, exchange rates, etc.

This model help in the fact that nobody can predict the stock prices, with this you get intuitive guesses

3 Calibrate Steps for ARIMA/SARIMA

Each ARIMA-SARIMA model needs to be CALIBRATED. The general steps to calibrate a model are the following:

1 Check if the series has seasonality. If the series shows a seasonal pattern, start with the seasonal difference of the log of the series, which is the % annual growth of the series. Usually monthly and quarterly data of many economic and financial series have seasonal pattern.

I recommend that if you have monthly or quarterly data of any financial or economic variable, start testing whether the seasonal difference of the log of the series is stationary. If you have daily data, it is hard to model for seasonality. In the case of daily data, start testing whether the first difference of the log is stationary.

In this step you start with a version of the series. Assume that you decide to start with the seasonal difference of the log.

2 Test whether the seasonal difference is stationary. If so, continue to next step. If not, then change the variable to first difference of the log of the series, which is the % change for each period.

3 Run the AC and PAC plots to identify the ARIMA-SARIMA parameters p, d, q, P, D, Q:

An ARIMA/SARIMA model has the following “parameters” that need to be defined:

arima(p,d,q) sarima(P,D,Q,#Periods)

*p: refers to the number of autoregressive (AR) terms

*d: refers to how many differences where needed to the series in order to make the series stationary. Usually this parameter is either 0 or 1.

*q: refers to the number of moving average (MA) terms in the model

*P: refers to the number of SEASONAL autoregressive terms

*D: refers to how many SEASONAL differences were needed to the series to make the series stationary. Usually this parameter is either 0 or 1.

*Q: refers to the number of SEASONAL moving average terms

*#periods: referes to the number of periods in the year. For example, if the data is monthly, then #periods=12.

These parameters are usually expressed as: ARIMA(p,d,q) SARIMA(P,D,Q, #periods).

When you identify the first values of these paremeter, then you will have your first CALIBRATION of the model.

4 Estimate/Run the ARIMA-SARIMA model

5 Run the ACF and PACF of the residuals/errors of the model to check whether the errors is a white noise series. In other words, if there is no significant autocorrelations of the errors.

If you have significant autocorrelations, you can include other term(s) in the ARIMA-SARIMA model, and go back to step 4.

If there the errors seem like a white noise (no signficant autocorrelations), then you can continue and finish the calibration process.

5 Interpret the model

6 Run a forecast using the model

4 Example Model for Air Passengers

4.1 Identifying seasonality in a time-series

Download the excel file from a web site:

download.file("http://www.apradie.com/ec2004/air2.xlsx", "air2.xlsx", mode="wb")

library(readxl)

dataset <- read_excel("air2.xlsx")
# I familiarize with the data
head(dataset)

In order to perform time-series analysis, it is better to use xts objects. We can construct this by using the xts() method of the xts package. Objects of class xts are composed by 2 elements: the core data and the index (class Date, POSIX,etc).

# Load library xts and zoo. Install the packages if you haven't already.
library(xts)
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.0.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(zoo)
library(tseries)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(forecast)


# I get the date of the dataset:
date <- dataset$month
# I join both objects using xts() and assign it to a new object called air.xts
air.xts<- xts(x=dataset$air, frequency = "monthly", order.by=date)
names(air.xts)<-c("Passengers")
class(air.xts)
## [1] "xts" "zoo"
# I see behaviour of air passengers over time 
plot(air.xts$Passengers)

This dataset contains air passengers by month from an airline. Do the following:

Does the series look stationary? Obviously not, it looks with a clear growing trend, and a seasonality pattern. The mean of the series GROWS OVER TIME.

Get the natural logarithm of the series:

lnair.xts = log(air.xts)

4.2 Testing for stationary

Since we see a clear seasonality pattern, it is strongly recommended to use the seasonal difference of the log of the series. So, we will check whether we can treat the seasonal difference of the log as STATIONARY.

We start by plotting the seasonal difference of the log, which is the annual % growth (in cc) of the air passengers:

plot(diff(lnair.xts,lag=12))

It is hard to see whether this is stationary or not, so we have to run the Dicky-Fuller test. Run the Dicky-Fuller test using the adf.test function of the tseries packages.

adf.test(na.omit(diff(lnair.xts,lag=12)),alternative="stationary",k=0)
## Warning in adf.test(na.omit(diff(lnair.xts, lag = 12)), alternative =
## "stationary", : p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(diff(lnair.xts, lag = 12))
## Dickey-Fuller = -4.988, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary

We got a p-value<0.05, so we can treat the seasonal difference (the annual % growth) as stationary and continue with this series to calibrate an ARIMA-SARIMA model.

4.3 AC and PAC plots

Once we confirm that our series is stationary, then the first step is to do the AC and PAC plots to identify how many AR and MA terms we can include in our ARIMA/SARIMA model.

Remember that our series is the seasonal difference of the log of air passengers, which is the annual % growth of air passengers month by month.

library(astsa)
## Warning: package 'astsa' was built under R version 4.0.3
## 
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
## 
##     gas
# I create a ts object from the xts; the acf2 works better with ts objects
lnair =ts(lnair.xts$Passengers)
acf2(diff(lnair,lag=12))

##      [,1] [,2]  [,3] [,4] [,5]  [,6]  [,7] [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  0.71 0.62  0.48 0.44 0.39  0.32  0.24 0.19  0.15 -0.01 -0.11 -0.24 -0.14
## PACF 0.71 0.23 -0.06 0.10 0.05 -0.06 -0.06 0.01 -0.01 -0.29 -0.16 -0.14  0.29
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## ACF  -0.14 -0.10 -0.15 -0.10 -0.11 -0.14 -0.16 -0.11 -0.08
## PACF  0.06  0.05 -0.04  0.13 -0.06 -0.15 -0.02  0.12 -0.18

We used the ts function to transform the dataset into a ts R dataset since this function works better for time-series (ts) class datasets.

4.3.1 AC and PAC interpretation

The ACF shows the autocorrelations (AC) between the series and its lags, and the PACF shows the partial autocorrelations (PAC) between the series and its lags.

The VERTICAL lines show the MAGNITUD of the AUTOCORRELATION between a LAG and the current value of the series.

The blue horizontal dotted lines in the plots cover the 95% confidence interval for the autocorrelations. So, if the vertical line croses the blue dotted line, it means that that specific autocorrelation is SIGNIFICANTLY DIFFERENT THAN ZERO (it could be positive or negative).

Now what is the general interpretation of the these plots?

First, always remember what are we modelling. In this case is the seasonal difference of the log, which is the ANNUAL % GROWTH of the series.

In the ACF plot I see that the first 8 ACs are positive and significant, and the gradually decay. For example, the AC of LAG 1 is about 70% (almost 0.70). This means that ANNUAL % growth of air passengers is strongly correlated (with a correlation of 70%) with ITS OWN ANNUAL % GROWTH of the PREVIOUS month (its LAG 1).

The AC of LAG 2 is about 62%, so the ANNUAL % GROWTH of passengers is strongly correlated with ITS OWN ANNUAL % GROWTH of 2 MONTHS AGO (its LAG 2)… and so on.

It is important to mention that the ACF plot shows autocorrelations independently of each other.

In the case of PACF, we see PARTIAL autocorrelations, which measure HOW MUCH ELSE the series is correlated with each its own LAGS, AFTER CONSIDERING the effect of LOWER-ORDER LAG AUTOCORRELATIONS.

I see that the PACF shows only the first 2 autocorrelations to be positive AND SINGIFICANT, and the magnitude of the following autocorrelations goes down to zero or negative very quiclky.

When the ACF plot shows a SLOW DECLINE of autocorrelations, and the PACF shows a FAST DECLINE of autocorrelatinos, this is a clear PATTERN OF AN AR(p) MODEL. In this case, the # of AR terms is determined by the SIGNIFICANT LAGS in the PACF plot.

We see that also the AC of LAG 10 is statistically significant and negative, but it is recommended first to start with the AR(2) model, and then check the residuals of the model to see whether there is another significant lag.

Then, in this case we start with an ARIMA with p=2 and q=0.

4.4 Estimate the ARIMA-SARIMA model

We can use the log series, and leave the model to automatically calculate the SEASONAL difference. Then, we will use the log of the series (log of air passengers) and define my model as:

arima(2,0,0) sarima(0,1,0,12)

What this model tells us?

You first have to look at the # of periods. In this case, you have 12 periods in a year, so your series should be monthly.

Then you look at d and D (first differences and the seasonal differences). In this case, d=0 and D=1. What does this mean? this means that we are modeling the SEASONAL difference (D=1) of the log of the variable, which is the annual % growth (month by month). Since d=0 this means that we are NOT using the first difference of the series.

Then you look at the p, q, P, Q. p tells you how many AR terms are included in your model. In this case, p=2, so your model includes the first 2 AR terms. Since q, P and Q are zero, then your model does not include any of these type of terms.

How this model can be expressed mathematically?

Since you are modeling the seasonal difference, you can express this model as the following equation:

△12lnair=ϕ0+ϕ1(L1△12lnairt)+ϕ2(L2△12lnairt)+εt

Using a simpler notation, this equation can be expressed as:

s12.lnairt=ϕ0+ϕ1L1.s12.lnairt+ϕ2L2.s12.lnairt+εt

Remember that s12.lnair is the seasonal difference of the log of air passengers, which is the annual %growth of air passengers month by month. Then we can read this mathematical expression as: “The annual %growth (month by month) of air passengers at time t can be determined by its own annual % growth of air passen

4.5 Running the ARIMA/SARIMA

In this case, instead of creating a variable for the seasonal difference of the log (s12.lnair), we will configure the parameters of the Arima function so that the log transformation and the seasonal difference will be calculated before estimating the arima coefficients.

In our first calibration we discussed above, we defined the following values of the ARIMA-SARIMA model for the log of the number of air passengers:

arima(2,0,0) sarima(0,1,0,12)

This means that D=1, so R will calculate the seasonal difference, and then estimate the arima model and include 2 AR terms and no MA terms.

We can run this model in R as follows:

m1 <- Arima(air.xts$Passengers, order = c(2,0,0), 
            seasonal = list(order=c(0,1,0),period=12),
            include.constant = TRUE,
            lambda = 0) 
m1
## Series: air.xts$Passengers 
## ARIMA(2,0,0)(0,1,0)[12] with drift 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##          ar1     ar2   drift
##       0.5540  0.2378  0.0096
## s.e.  0.0845  0.0848  0.0014
## 
## sigma^2 estimated as 0.001742:  log likelihood=233.13
## AIC=-458.26   AICc=-457.95   BIC=-446.73

We can alternatively use the function sarima from the astsa model to do the same. This function provides more detail in the output, but is very quick to do predictions as the Arima function. We run sarima for the same model as:

m1a<-sarima(log(air.xts$Passengers), p=2, d=0, q=0,P=0,D=1,Q=0,S=12)
## initial  value -2.794390 
## iter   2 value -2.940651
## iter   3 value -3.175869
## iter   4 value -3.186870
## iter   5 value -3.189461
## iter   6 value -3.189605
## iter   7 value -3.189606
## iter   8 value -3.189607
## iter   9 value -3.189607
## iter  10 value -3.189608
## iter  11 value -3.189609
## iter  12 value -3.189609
## iter  12 value -3.189609
## iter  12 value -3.189609
## final  value -3.189609 
## converged
## initial  value -3.184486 
## iter   2 value -3.184596
## iter   3 value -3.184959
## iter   4 value -3.185009
## iter   5 value -3.185059
## iter   6 value -3.185073
## iter   7 value -3.185076
## iter   8 value -3.185076
## iter   8 value -3.185076
## final  value -3.185076 
## converged

m1a
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ar2  constant
##       0.5540  0.2378    0.0096
## s.e.  0.0845  0.0848    0.0014
## 
## sigma^2 estimated as 0.001701:  log likelihood = 233.13,  aic = -458.26
## 
## $degrees_of_freedom
## [1] 129
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1        0.5540 0.0845  6.5597  0.0000
## ar2        0.2378 0.0848  2.8037  0.0058
## constant   0.0096 0.0014  6.8655  0.0000
## 
## $AIC
## [1] -3.204618
## 
## $AICc
## [1] -3.203411
## 
## $BIC
## [1] -3.12398

As we see, we need to take the log first and then specify the model. The result is exactly the same as the Arima result, but we can see the coefficients, their p-values and other important plots of the model.

4.6 Interpretation of an ARIMA-SARIMA model

m1a$ttable
##          Estimate     SE t.value p.value
## ar1        0.5540 0.0845  6.5597  0.0000
## ar2        0.2378 0.0848  2.8037  0.0058
## constant   0.0096 0.0014  6.8655  0.0000

I see that the constant (phi0) and both lags are positive and SIGNIFICANT. What does this mean? First, I have to check which is the variable I am modelling. I am modelling the s12.lnair, which is the annual % growth of air passengers month by month. Then, I can interpret this arima model as follow:

The annual % growth of air passengers is positively and significantly related with the annual % growth of air passengers of last month (lag1). In a similar way, the annual % growth of air passengers is positively and significantly related with the annual % growth of two months ago. More specifically, 55.4% of annual growth of last month is passed to this month’s annual growth, andonly 23.78% of the annual growth of 2 months ago is passed to this month. In other words, for each percentual point of annual growth of previous month, it is expected that the current annual growth will grow about 0.55 percentual points

It is interesting to see that the sum of phi1 and phi2 (0.55+0.23) MUST be < 1 in order for the series to be stationary. If you include more arima terms, the sum of the phi’s must be less than one, and the sum of the thetas also must be less than one. In addition, the constant or phi0 is positive and significant, indicating that the annual growth of air passengers has a positive tendency to grow over time.

The mathematical model is: E[s12.lnair]=0.1150+0.5540∗L1.s12.lnair+0.23∗L2.s12.lnair or s12.lnair=0.1150+0.5540∗L1.s12.lnair+0.23∗L2.s12.lnair+e

4.7 Autocorrelations of the Model residuals/errors

We need to generate a series for the residuals of the ARIMA-SARIMA model. To do this, get the residuals as follows:

air_res <- m1a[["fit"]][["residuals"]]
head(air_res)
## Time Series:
## Start = 1 
## End = 432001 
## Frequency = 1.15740740740741e-05 
## [1] 0.004708908 0.004751501 0.004854026 0.004821445 0.004747833 0.004847728

4.8 Test whether the residuals is a white noise series

Test whether the residual series (the air_res variable) is a white noise running the ac and pac plots. A white noise is a series that has no signicant autocorrelations with any of its own lags.

acf2(air_res)

##      [,1] [,2]  [,3] [,4] [,5] [,6]  [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  0.01 0.01 -0.13 0.02 0.09 0.05 -0.03 0.05 0.20 -0.01 -0.09 -0.40  0.04
## PACF 0.01 0.01 -0.13 0.02 0.10 0.03 -0.03 0.08 0.21 -0.04 -0.10 -0.37  0.01
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## ACF   0.01  0.11 -0.11  0.07  0.04 -0.05 -0.13 -0.03 -0.03
## PACF -0.04  0.01 -0.09  0.16  0.11 -0.08 -0.08  0.16 -0.09

4.8.0.1 INTERPRET these plots

BY LOOKING AT THE PLOT I CAN SAY THAT THERE´S NO EXISTENT AUTOCORRELATION IN THE RESIDUALS.

Can you add another AR or MA term to improve the previous model and then to make the residual a white noise ?

THE RESIDUALS ALREADY ARE WHITE NOISE

5 Q CHALLENGE - Modelling consumer sales with an ARIMA/SARIMA model

Go to the course site (Material) and download Dr. Nau slides about ARIMA/SARIMA models. Read the slides and focus on his recommendations to select the number of p, d, q, P, D, Q for any ARIMA/SARIMA model. Download the following Dataset:

rm(list=ls())
download.file("http://www.apradie.com/ec2004/salesfab2.xlsx", "salesfabs2.xlsx", mode="wb")

library(readxl)
library(xts)
library(zoo)
library(tseries)
library(forecast)
library(astsa)
library(lmtest)

Now, we read the data from the Excel file:

sales_brands.xts <- read_xlsx("salesfabs2.xlsx", sheet = "Sheet1")

# Date variable using the month column:
dates <- seq(as.Date(sales_brands.xts$month[1]),
                      by="month",
                      length.out = nrow(sales_brands.xts) )

sales_brands.xts = sales_brands.xts[,-1]

sales_brands.xts<- xts(x=sales_brands.xts, frequency = "monthly", order.by=dates)

Create the log variables for price (pfab1) and sales in volume (qfab1) for product 1.

For Price

ln.pfab1.xts <- log(sales_brands.xts$pfab1)

For Sales

ln.qfab1.xts <- log(sales_brands.xts$qfab1)

5.0.1 Identifying Seasonality

Sales in tons of product 1:

plot(ln.qfab1.xts)

5.0.1.1 WHAT DO YOU OBSERVE about the series? does it look stationary or nonstationary? does it look with seasonality?

BY LOOKING AT THE GRAPH I CAN SAY THAT IT HAS A GROWING TENDENCY, I WOULD SAY ITS NOT STATIONARY CAUSE ITS MEAN IS CHANGING AND THE SEASONALITY IS MARKABLE

5.0.2 Testing for stationary

If the series is not stationary, try whether the seasonal difference of the log looks as stationary (s12.lnqfab1). Do a time series plot, run and INTERPRET the Dicky-Fuller test

plot(diff(ln.qfab1.xts, lag=12))

5.0.2.1 Dicky Fuller Test

adf.test(na.omit(diff(ln.qfab1.xts,lag=12)), alternative = "stationary", k=0)
## Warning in adf.test(na.omit(diff(ln.qfab1.xts, lag = 12)), alternative =
## "stationary", : p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(diff(ln.qfab1.xts, lag = 12))
## Dickey-Fuller = -4.8046, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary

5.0.2.2 Interpret the Dicky Fuller Test

DUE TO MY P-VALUE OF 0.01, I CAN SAY THAT THE SEASONAL DIFFERENCE IS STATIONARY.

# I create a ts object from the xts; the acf2 works better with ts object
ln.qfab1 <- ts(ln.qfab1.xts$qfab1)
acf2(diff(ln.qfab1,lag = 12))

##      [,1] [,2] [,3]  [,4]  [,5] [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  0.41 0.47 0.31  0.17  0.01 0.04 -0.13 -0.28 -0.29 -0.48 -0.24 -0.49 -0.15
## PACF 0.41 0.37 0.05 -0.13 -0.19 0.05 -0.10 -0.29 -0.14 -0.26  0.26 -0.30  0.21
##      [,14] [,15] [,16] [,17] [,18] [,19]
## ACF  -0.23 -0.15 -0.14 -0.03 -0.06  0.17
## PACF -0.04 -0.08 -0.09 -0.18  0.00  0.21

5.0.3 Estimate the ARIMA-SARIMA model

I´m going to use the log of the series(log of sales for product 1), my model would be arima(2,0,0)sarima(1,1,0,12)

What this model is telling us?

I have 12 number of periods due to my monthly series

Then the p and P (number of autorregresive terms and seasonal terms) In this case p=2 and P=1 which means how many AR terms are included un your model. Which are 2 AR terms ans 1 AR seasonal term.

Then the d and D (first differenes and seasonal differences). In this case, d=0 and D=1 which means that we are modeling the SEASONAL differnece (D=1) of the lof of the variable.

Then the q and Q which are in zero, then your model does not include any of these type of terms.

5.0.4 Running the ARIMA-SARIMA model

Calibration of the log of sales for product 1

m_1 <- Arima(sales_brands.xts$qfab1, order = c(2,0,0), seasonal=list(order=c(1,1,0),period=12),include.constant = TRUE, lambda = 0)
m_1
## Series: sales_brands.xts$qfab1 
## ARIMA(2,0,0)(1,1,0)[12] with drift 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##          ar1     ar2     sar1   drift
##       0.3356  0.1918  -0.4171  0.0085
## s.e.  0.1310  0.1391   0.1200  0.0013
## 
## sigma^2 estimated as 0.005833:  log likelihood=67.71
## AIC=-125.42   AICc=-124.26   BIC=-115.12
m_1a <- sarima(log(sales_brands.xts$qfab1), p=2, d=0, q=0, P=1, D=1, Q=0, S=12)
## initial  value -2.339814 
## iter   2 value -2.550462
## iter   3 value -2.580508
## iter   4 value -2.585312
## iter   5 value -2.586270
## iter   6 value -2.586331
## iter   7 value -2.586332
## iter   7 value -2.586332
## iter   7 value -2.586332
## final  value -2.586332 
## converged

m_1a
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ar2     sar1  constant
##       0.3356  0.1918  -0.4171    0.0085
## s.e.  0.1310  0.1391   0.1200    0.0013
## 
## sigma^2 estimated as 0.005425:  log likelihood = 67.71,  aic = -125.42
## 
## $degrees_of_freedom
## [1] 54
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1        0.3356 0.1310  2.5620  0.0132
## ar2        0.1918 0.1391  1.3785  0.1737
## sar1      -0.4171 0.1200 -3.4768  0.0010
## constant   0.0085 0.0013  6.7942  0.0000
## 
## $AIC
## [1] -1.548366
## 
## $AICc
## [1] -1.541869
## 
## $BIC
## [1] -1.421179

5.0.5 Interpretation of an ARIMA-SARIMA model

m_1a$ttable
##          Estimate     SE t.value p.value
## ar1        0.3356 0.1310  2.5620  0.0132
## ar2        0.1918 0.1391  1.3785  0.1737
## sar1      -0.4171 0.1200 -3.4768  0.0010
## constant   0.0085 0.0013  6.7942  0.0000

5.0.5.1 What is white noise series?

IN WHITE NOISE THE MEAN IS EQUAL TO 0, THE TIME-SERIES IS STATIONARY AND THE AUTOCORRELATIONS BETWEEN THE VARIABLES AND LAGS IS EQUAL TO 0.

5.0.6 Autocorrelations of the Model residuals/errors

qfab1_RE <- m_1a[["fit"]][["residuals"]]
head(qfab1_RE)
## Time Series:
## Start = 1 
## End = 6 
## Frequency = 1 
## [1] 0.005814540 0.005862934 0.005811224 0.005181171 0.005093136 0.005478768

5.0.7 Test whether the residuals is a white noise series

acf2(qfab1_RE)

##       [,1] [,2] [,3] [,4]  [,5] [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  -0.01 0.02 0.13 -0.1 -0.15 0.17 -0.01 -0.16 -0.06 -0.29  0.06 -0.07  0.17
## PACF -0.01 0.02 0.13 -0.1 -0.16 0.17  0.03 -0.16 -0.15 -0.29  0.19 -0.09  0.18
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF   0.02 -0.04 -0.08 -0.04 -0.14  0.20  0.08
## PACF -0.07 -0.07 -0.04 -0.11 -0.17  0.19 -0.05

5.0.7.1 Why do you think that the residuals of your best model should look like A White noise?

BECAUSE IF THE MEAN AND THE AUTOCCORRELATIONS BETWEEN VARIABLE AND LAGS IS EQUAL TO 0, THIS WOULD REPRESENT THAT MY MODEL IS WELL CALIBRATED.

5.0.8 Forecast sales product 1

Create 24 new observations in the future Management data

# Convert the xts to a ts dataset: 
qfab_1.ts <- ts(coredata(sales_brands.xts$qfab1), start=c(2012,1), frequency = 12)
qfab_1.ts
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2012 338.0100 357.8112 342.6911 184.0687 170.0012 252.1363 179.6723 179.9935
## 2013 391.1488 402.2955 403.4620 204.9068 203.2642 261.1563 173.9211 194.2030
## 2014 352.1699 497.8295 450.2763 267.4923 255.3361 324.7467 230.2683 229.0850
## 2015 477.4290 437.6544 427.0321 262.7887 254.4105 320.5292 245.1620 233.3661
## 2016 480.6635 493.3284 429.3796 300.9901 293.1736 349.5101 284.4155 290.8479
## 2017 531.4313 543.0544 451.3241 313.1662 312.9280 372.2951 307.8225 299.9826
## 2018       NA       NA       NA       NA       NA       NA       NA       NA
##           Sep      Oct      Nov      Dec
## 2012 170.4417 151.3453 156.3783 206.7963
## 2013 175.0611 165.7026 173.4322 216.8979
## 2014 236.9812 218.0010 220.9612 253.1830
## 2015 243.5807 226.4656 225.1428 293.7678
## 2016 273.6415 264.9729 271.2281 306.9521
## 2017 299.2357 276.1388       NA       NA
## 2018       NA       NA
# Run the Arima model:
m_1 <- Arima(qfab_1.ts, order = c(2,0,0), seasonal = list(order=c(1,1,0),period=12),include.constant = TRUE, lambda = 0)
m_1
## Series: qfab_1.ts 
## ARIMA(2,0,0)(1,1,0)[12] with drift 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##          ar1     ar2     sar1   drift
##       0.3356  0.1918  -0.4171  0.0085
## s.e.  0.1310  0.1391   0.1200  0.0013
## 
## sigma^2 estimated as 0.005833:  log likelihood=67.71
## AIC=-125.42   AICc=-124.26   BIC=-115.12
coeftest(m_1)
## 
## z test of coefficients:
## 
##         Estimate Std. Error z value  Pr(>|z|)    
## ar1    0.3356360  0.1310070  2.5620 0.0104080 *  
## ar2    0.1917799  0.1391192  1.3785 0.1680398    
## sar1  -0.4170940  0.1199652 -3.4768 0.0005075 ***
## drift  0.0085315  0.0012557  6.7942 1.089e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Create a forecast for the future observations using your last model

forecast_qfab_1<-forecast(m_1, h=24)
autoplot(forecast_qfab_1)
## Warning: Removed 12 row(s) containing missing values (geom_path).

attributes(forecast_qfab_1)
## $names
##  [1] "method"    "model"     "level"     "mean"      "lower"     "upper"    
##  [7] "x"         "series"    "fitted"    "residuals"
## 
## $class
## [1] "forecast"
forecast_qfab_1$mean
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2017                                                                        
## 2018 649.8982 665.6756 560.0577 389.8095 387.2543 461.0663 379.8071 374.5312
## 2019 719.2199 737.1913 622.4971 433.6641 430.0735 512.1603 421.4283 417.0415
##           Sep      Oct      Nov      Dec
## 2017                   323.4230 379.3843
## 2018 368.3474 343.9315 355.2466 421.1893
## 2019 408.4264 382.6928

5.1 AIC

The AIC (Akaike Information Criterios) is a mathematical method that helps to evaluate how well a model fits the data it was generated from. Its used to compare different possible models and determine which one is the best fit for the data. Calculated from:

~The number of independent variables used to build the model. ~The maximum likelihood estimate of the model (how well the model reproduces the data).