The main purpose of this workshop is to illustrate the foundations of Autoregressive Moving Average models - ARMA. We will apply the ARMA model to a dataset of sales of real consumer products.
This methodology is part of the Box-Jenkins models for time series, which provide a series of tools to understand past behaviour and predict future values of time series. Given a time series Yt, the ARMA model can be be denoted as ARMA(p,q) where p is the number of autoregressive terms and q refers to the number of moving-average terms.
In general the autoregressive models assume that there is some kind of autocorrelation phenomenon in the time series. The concept of autocorrelation implies the correlation of a random variables with its past and future values. This is the building block behind these models.
In time-series econometrics we can talk about an autocorrelation in the dependent variable (also called, endogeneous variable). This means the future values of the economic or financial variable can be explained somehow by its previous values. This is the fundation of what econometricias have called an Autoregressive Model also denoted as AR model. The autoregressive notation of the stochastic model defined by Yt can be described as follows:
Yt=ϕ0+ϕ1Yt−1+ϕ2Yt−2+…+ϕpYt−p+ϵt
Where ϕ1,…ϕp are the parameters of the model, ϵt refers to the error term or random shock. We can express the same equation as follows:
Yt=ϕ0+∑k=1pϕkYt−k+ϵt
In order to apply this model we have to prove the series Yt is stationary. We will learn later how to test statistically whether a series can be considered stationary or not. If the serie is not stationary, we can calculate the first difference of the series, and most of the time the resulting series will become stationary.
In linear regression models, the method to estimate the regression coefficients is the Ordinary Least Squares (OLS). For ARMA models, the best method to estimate the autoregressive coefficients is the Maximum Likelihood (ML) method. The ML method works with interations, while OLS has an analytical solution (in othe words, a formula).
In this model, we assume that the errors are not correlated among them. If that is the case, that means that we can include another autoregressive term to have a better model.
The AR(p) model is a process with a long-term memory. Why? this is because a shock today will impact in the future forever. If you read the mathematical equation, this can be interpreted as follows. The value of the series tomorrow will depend on the value of today, after applying the filter ϕ1. Then, the value of tomorrow already has information about the value of today. And the value of tomorrow will impact the value of 2 days from now. Then, the value of today will have an impact in the whole future of the series.
However, the impact of the value of today will be diminishihg exponentially according to (ϕ1)N, where N is the number of periods in the future. Since |ϕ1|^1, then the bigger the exponent N -which refers to a further future period- the smaller the effect.
If we want to model a series with a short-term memory, then we have to think in a different equation:
Yt=θ0+θ1ϵt−1+θ2ϵt−2+…+θqϵt−q+ϵt
This is called a “Moving-Average” model with q terms. In this case, what impact the value of today is not directly its own value, but the past random shocks. In the case of q=1, this process does not have long memory since the value of today is only affected only by the shock of yesterday.
The more q terms included into the model, then the longer the memory of the model. Most of the cases, real-world series only include very few q terms.
In the MA(q) model, θ0, … θq are the parameters of the model, ϵt−k refers to past external errors/shocks, and the current external shock is defined as ϵt, so:
Yt=θ0+ϵt+∑k=1qθkϵt−k
We can interpret an MA model as a random process where the current value of the series is influenced by the current random shock, and by some percentages of previous random q shocks/errors. The percentages are actually the values of the coefficients θ.
In the finance world when we want to model the behaviour of market returns, we know that there are thousands of events that affect how the market returns are moving today. These thousands of events are basically summarized in a “random shock” that nobody knows what will be its magnitud and direction (positive or negative effect).
Now we can combine both AR(p) terms and MA(q) terms into ONE model, that is called ARMA(p,q):
Yt=ϕ0+∑k=1pϕkYt−k+∑k=1qθkϵt−k+ϵt
We combine AR and MA terms to model Y in case Y can be modeled including short-term and long-term memory. The long-term memory is modeled with the AR terms, while the short-term memory is modeled with the MA terms. We will practice with an example to ilustrate this theoretical model.
sales_brands <- read_excel("salesbrands3.xlsx", sheet = "Resultado")
# Always familiarize with the data
head(sales_brands, 5)## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.0.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
# We will start by turning sales_brands into a data frame
# because data frames are easier to manipulate
sales_brands.df <- as.data.frame(sales_brands)
# Divide the index and the core data
fecha <- sales_brands.df$date
sales_brands.df <- sales_brands.df[,-1]
# Join both objects using xts() and assign the resulting object to sales.xts
sales.xts <- xts(x = sales_brands.df, order.by = fecha)
head(sales.xts, 5)## Warning: timezone of object (UTC) is different than current timezone ().
## qbrand1tons salesbrand1 qbrand2tons salesbrand2 qbrand3tons
## 2015-01-01 106.62 60780.72 195.46 60700.51 293.10
## 2015-02-01 105.82 60411.47 216.09 68289.15 494.46
## 2015-03-01 88.82 53174.89 220.15 62529.54 356.88
## 2015-04-01 86.69 50861.69 170.74 50575.68 291.07
## 2015-05-01 87.23 51014.23 177.17 54327.48 318.09
## salesbrand3 qbrand4tons salesbrand4 qbrand5tons salesbrand5
## 2015-01-01 75196.35 185.91 48228.51 28.94 2295.50
## 2015-02-01 105010.90 149.30 39922.14 24.66 1973.43
## 2015-03-01 90902.84 144.65 39788.22 23.75 1903.24
## 2015-04-01 72853.34 161.43 42692.59 21.00 1684.28
## 2015-05-01 75277.10 141.09 37167.52 23.44 1883.41
## qbrand6tons salesbrand6 tonstotal salestotal
## 2015-01-01 338.38 55561.26 1148.41 302762.8
## 2015-02-01 284.52 51469.74 1274.85 327076.8
## 2015-03-01 251.34 50395.48 1085.59 298694.2
## 2015-04-01 230.29 49124.61 961.22 267792.2
## 2015-05-01 250.45 46906.63 997.47 266576.4
YES THE XTS THE INFORMATION IS DISPLAYED IN AN EASIER TO READ WAY
This brand has a clear growing trend over time. As we can see, many consumer products have a growing or a declining trend. And other products might seem to be stationary over time, with sales over an average over time.
A series with a growing or declining trend is non-stationary. Most of the variables that represent sales of any product can be treated as non-stationary. Then, a rule of thumb when we work with business variables such as volume sales, value sales, cost of good sold, etc is to transform the series into a stationary series.
When you work with sales data, I strongly recommend to create the first difference of the logarithm of the variable. This is actually its percentage change (in continuously compounded). If you do this process to most of real-world sales variable the result will be stationary most of the time (about 95% of the variables). The other reason why it is recommended to do this transformation is that the first difference of the log of a variable is actually the % change of the variable from one period to the next (in continuously compounded percentage). Then, we can do this transformation as follows:
ln_qbrand1tons <- log(sales.xts$qbrand1tons)
ln_qbrand2tons <- log(sales.xts$qbrand2tons)
ln_qbrand3tons <- log(sales.xts$qbrand3tons)
ln_qbrand4tons <- log(sales.xts$qbrand4tons)
ln_qbrand5tons <- log(sales.xts$qbrand5tons)
ln_qbrand6tons <- log(sales.xts$qbrand6tons)The first difference of the log value in R is diff(ln_sales$qbrand2tons). The diff() function generates on the fly this first difference as:
diff(ln_salesqbrand2tons)=lnsalesqbrand2tons - lag(ln_sales$qbrand2tons, 1)
Remember that the dollar sign operator ($) is used to access/refer to the columns of a data frame or xts object.
We can see now that the 2 series look like stationary since their means tend to be in the same level.
R performs the first difference calculation temporarily to do a graph, and then, this variable is not saved in the dataset or environment. In other words, we do not need to generate the first difference as another variable (column), but we can if we need to.
Another transformation with monthly data is annual percentage change. In this case, this is the difference between the log value of the series and its value 12 periods ago. We can check this difference using the diff() function with the lag argument set to 12:
In this case, the series for both brands do not look stationary. But we need to do a statistical test to check whether a series can be treated as stationary or not. Some series might not look stationary for our eyes, but we need to run statistical tests to decide whether to treat the series stationary or not. There is a statistical way to evaluate whether these series are stationary. We need a statistical confirmation in addition to the visual tools. The Dicky-Fuller test is one of the most used statistical tests for stationary. This test is called a unit root test. The Dicky Fuller test assumes that the series is non stationary. In other words, it assumes that the series follows a random walk with a unit root (phi1=1). We can run this test in R as.
You need to install the package tseries to do the following analysis.
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Augmented Dickey-Fuller Test
##
## data: na.omit(diff(ln_sales$qbrand2tons, lag = 12))
## Dickey-Fuller = -3.574, Lag order = 3, p-value = 0.04701
## alternative hypothesis: stationary
The adf.test() function from the tseries package will do a Dickey-Fuller test if we set lags (k) equal to 0. The k parameter is equal to the number of ϕ in our ARMA model. Use the default value of k. Use ?adf.test to read about this function.
This is another way to do the same Dickey Fuller test:
s12.lnbrand2tons <- na.omit(diff(ln_sales$qbrand2tons, lag = 12))
adf.test(s12.lnbrand2tons, alternative = "stationary")##
## Augmented Dickey-Fuller Test
##
## data: s12.lnbrand2tons
## Dickey-Fuller = -3.574, Lag order = 3, p-value = 0.04701
## alternative hypothesis: stationary
The Dicky Fuller test reports a p-value less than 0.05, indicating that we can reject the null hypothesis that states a unit root for these series (ϕ1=1). In other words, for the series with a Dicky-Fuller p-value less than 0.05 we can say that there is statistical evidence to conclude that the differenced series (s12.lnbrand2tons) is stationary.
Then for the variable s12.lnbrand2tons we can say that there is statistical evidence to say that the series is stationary.
If we try the same for brand 6 we will find that its seasonal difference (s12.lnbrand6tons) is also stationary. Try this by yourself.
For the case of monthly times series variables like sales of products or sales of a company, I strongly recommend to work with the seasonal difference of the log of sales (s12.lnbrand2tons), which is the annual percentage change month by month. The reason is because most of consumer products have a seasonal pattern. We can do this ONLY if the seasonal difference of the log is STATIONARY.
For these 2 products, then, I will use the seasonal difference of the log of sales.
In time series analysis, a correlogram is an chart of correlation statistics also known as an autocorrelation plot. This is a plot of the sample autocorrelations of the dependent variable Yt and the time lags. In R, the command to compute the autocorrelations is acf2() from the astsa package, which gives you both, the autocorrelation (AC) figures and the partial-autocorrelations (PAC), as well. Let’s examine the autocorrelations of brand 2.
You need to install the astsa package to do auto-correlation plots.
## Warning: package 'astsa' was built under R version 4.0.3
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## ACF 0.59 0.51 0.35 0.21 0.05 -0.04 -0.13 -0.09
## PACF 0.59 0.26 -0.04 -0.11 -0.15 -0.06 -0.05 0.11
The acf2() function gives you the autocorrelogram, which is the autocorrelations in a graph where it is easy to identify which lags of the variable have a significant autocorrelation.
The X axis represents the lag numbers, while the Y axis is the level of autocorrelation. The vertical lines are the levels of autocorrelations between the current value of Y (Yt) and its own lags. The dotted, blue line delimits the 95% confidence interval for the autocorrelations. Then, if the autocorrelation goes out of the 95% C.I., then the autocorrelation of the variable Yt with its corresponding lag Yt−1 (AutoCorr(Yt,Yt−1)) is statistically different than zero. We can see that only the lag 1 (Yt−1) and lag 2 (Yt−2) is significantly autocorrelated (positively) with the current value of the variable since this autocorrelation is significantly greater than zero.
What is the meaning of the first significant lag? In this case, this means that the value of Y at any point in time t has an average autocorrelation equal to ϕ1 (if the model ends up being an AR(1)) with its own value at t−1. The autocorrelation of Yt with its previous value Yt−1 can be calculated manually as the correlation, wich is equal to their covariance divided by the product of their standard deviation. Remembering the formula for correlation between 2 variables Y and X:
CORR(Y,X)=COV(Y,X)SD(Y)∗SD(X) In the case of “Autocorrelation”, we only have one variable Y, so we are looking at the correlation of Yt with its own previous value Yt−1. Then, applying the correlation formula between Yt and Yt−1 (instead of X), we can calculate this autocorrelation as:
AUTOCORR(Yt,Yt−1)=AUTOCOV(Yt,Yt−1)SD(Yt)∗SD(Yt−1) Since Y is supposed to be stationary series, then the standard deviation of Yt has to be the same as the standard deviation of Yt−1. Then:
AUTOCORR(Yt,Yt−1)=AUTOCOV(Yt,Yt−1)SD(Yt)∗SD(Yt−1)=γ(1)σ2y Fortunately, R does the calculation of all the autocorrelations of Yt with its own lags, and these autocorrelations are plotted using the acf2 command (as seen above).
Unlike the AC plot, the partial-autocorrelation (PAC) plot shows the autocorrelation of the lag with the variable after considering the effect of lower-order autocorrelations. The autocorrelation and partial autocorrelation between Yt and Yt−1 are the same, since there is no lower-order autocorrelations. The autocorrelation between Yt and Yt−1 and the partial autocorrelation between Yt and Yt−1 will be different since the partial autocorrelation is calculated AFTER the autocorrelation between Yt and Yt−1 is taken into consideration.
In other words, the PAC plot shows the amount of autocorrelation between Yt and its lag K (Yt−k) that is not explained by lower-order autocorrelations (Yt with (Yt−k) and all the corresponding autocorrelations until Yt−1).
We see that in the ac plot lag 1 and 2 autocorrelations are positive and significant, and the following lag autocorrelations are also positive but not significant. We see that for each lag the autocorrelation diminishes. In the partial autocorrelation (pac) we can see that ONLY the autocorrelation of lag 1 is significant and positive, and immediatly the rest of the correlation go close to zero suddenly. This is a classical AR(1) signature with only the first lag. We do not need to include LAG 2 since the pac indicates to include only LAG1.
Then we can see that the pac plot helps us to identify the # of AR terms. Then, looking at the partial autocorrelation plot (pac), only lag 1 seems to be autocorrelated with the current value AFTER considering the effect of the lower-level autocorreations of lags 2,3,4, etc. In other words, the annual %change in sales of the current month seems to be positively correlated with the annual % change in sales 1 months ago.
A first order autoregressive model using only the lag 1 can be expresed as follows for an stochastic process Yt:
Yt=ϕ0+ϕ1Yt−1+ϵt
The process is stationary only if ∣ϕ1∣<1
I can run an AR model in R for brand 2 to test whether the absolute value of ϕ1 is lower than 1. In R the function to run AR, MA, ARMA and ARIMA models is the same: sarima() from the astsa package. Of course, the syntax of the function’s arguments has to be adjusted to each model. So, let’s run an AR model with lag 1 for brand 2.
You need to install the forecast package.
##
## Attaching package: 'forecast'
## The following object is masked from 'package:astsa':
##
## gas
## Series: s12.lnbrand2tons
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.6266 0.0515
## s.e. 0.1236 0.0355
##
## sigma^2 estimated as 0.008241: log likelihood=41.95
## AIC=-77.9 AICc=-77.27 BIC=-72.69
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.003930423 0.08859139 0.06879971 -52.73237 144.4614 0.7041704
## ACF1
## Training set -0.2125151
According to the result, we can see that the coefficient ϕ1 is positive, its absolute value is less than one, and it is significant since its p-value is less than 0.05. In these cases we can say that the autocorrelation is positive at the 95% confidence level.
We can write the equation of first order models for brand 2 as follows:
s12.lnbrand2tons(t) = 0.0515 - 0.6266 * s12.lnbrand2tons(t-1)+ error(t)
Now we can use this model to do predictions. If I want to do a forecast for 12 months, I will use the forecast() function.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 3628801 0.02233683 -0.09400135 0.1386750 -0.1555871 0.2002607
## 3715201 0.03321748 -0.10407454 0.1705095 -0.1767526 0.2431875
## 3801601 0.04003561 -0.10465689 0.1847281 -0.1812525 0.2613237
## 3888001 0.04430804 -0.10318887 0.1918049 -0.1812690 0.2698851
## 3974401 0.04698526 -0.10159836 0.1955689 -0.1802538 0.2742243
## 4060801 0.04866289 -0.10034527 0.1976711 -0.1792254 0.2765512
## 4147201 0.04971414 -0.09946040 0.1988887 -0.1784286 0.2778569
## 4233601 0.05037288 -0.09886693 0.1996127 -0.1778697 0.2786155
## 4320001 0.05078567 -0.09847977 0.2000511 -0.1774961 0.2790675
## 4406401 0.05104433 -0.09823117 0.2003198 -0.1772528 0.2793415
## 4492801 0.05120642 -0.09807303 0.2004859 -0.1770968 0.2795096
## 4579201 0.05130799 -0.09797302 0.2005890 -0.1769976 0.2796136
Since the prediction is in natural log and is a seasonal difference, I have to come back to the original scale, so I need to create the forecast variable in tons, not in log of tons:
# Create a Date object with the dates of the forecast periods
date_index <- seq.Date(from=as.Date("2019-07-01"), by="months", length.out = 12)
# Construct an xts object with the prediction and date_index
hat_s12.lnbrand2tons <- xts(forecast(m1, h = 12)$mean, order.by = date_index)
# Merge original ln of tons with seasonal difference prediction
aux <- merge(ln_sales$qbrand2tons, hat_s12.lnbrand2tons)
# Obtain prediction of ln of tons
aux$hat_lnbrand2tons <- lag(aux$qbrand2tons, 12) + aux$hat_s12.lnbrand2tons
# Merge original with predicted ln of tons
lnbrand2tons<- merge(ln_sales$qbrand2tons, aux$hat_lnbrand2tons)
# Return to original measure
hat_brand2tons <- exp(lnbrand2tons)
plot(hat_brand2tons)##
## Augmented Dickey-Fuller Test
##
## data: na.omit(diff(ln_sales$qbrand1tons, lag = 12))
## Dickey-Fuller = -4.0415, Lag order = 3, p-value = 0.01757
## alternative hypothesis: stationary
s11.lnbrand1tons <- na.omit(diff(ln_sales$qbrand1tons, lag = 12))
adf.test(s11.lnbrand1tons, alternative = "stationary")##
## Augmented Dickey-Fuller Test
##
## data: s11.lnbrand1tons
## Dickey-Fuller = -4.0415, Lag order = 3, p-value = 0.01757
## alternative hypothesis: stationary
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## ACF 0.63 0.48 0.39 0.20 0.22 0.31 0.38 0.34
## PACF 0.63 0.15 0.06 -0.16 0.16 0.24 0.20 -0.10
## Series: s11.lnbrand1tons
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.6330 0.0717
## s.e. 0.1178 0.0244
##
## sigma^2 estimated as 0.003824: log likelihood=58.07
## AIC=-110.14 AICc=-109.51 BIC=-104.92
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.001543299 0.06034798 0.04710515 -113.7927 194.7673 0.5835665
## ACF1
## Training set -0.1170992
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 3628801 0.03454252 -0.04470642 0.1137915 -0.08665828 0.1557433
## 3715201 0.04818881 -0.04560119 0.1419788 -0.09525062 0.1916282
## 3801601 0.05682637 -0.04219201 0.1558448 -0.09460918 0.2082619
## 3888001 0.06229360 -0.03874359 0.1633308 -0.09222945 0.2168167
## 3974401 0.06575415 -0.03608063 0.1675889 -0.08998871 0.2214970
## 4060801 0.06794453 -0.03420804 0.1700971 -0.08828435 0.2241734
## 4147201 0.06933096 -0.03294866 0.1716106 -0.08709222 0.2257541
## 4233601 0.07020851 -0.03212196 0.1725390 -0.08629244 0.2267095
## 4320001 0.07076397 -0.03158687 0.1731148 -0.08576813 0.2272961
## 4406401 0.07111555 -0.03124345 0.1734745 -0.08542903 0.2276601
## 4492801 0.07133809 -0.03102418 0.1737004 -0.08521149 0.2278877
## 4579201 0.07147894 -0.03088463 0.1738425 -0.08507264 0.2280305
# Create a Date object with the dates of the forecast periods
date1_index <- seq.Date(from=as.Date("2019-07-01"), by="months", length.out = 12)
# Construct an xts object with the prediction and date_index
hat_s11.lnbrand1tons <- xts(forecast(m1, h = 12)$mean, order.by = date_index)
# Merge original ln of tons with seasonal difference prediction
aux <- merge(ln_sales$qbrand1tons, hat_s11.lnbrand1tons)
# Obtain prediction of ln of tons
aux$hat_lnbrand1tons <- lag(aux$qbrand1tons, 12) + aux$hat_s11.lnbrand1tons
# Merge original with predicted ln of tons
lnbrand1tons<- merge(ln_sales$qbrand1tons, aux$hat_lnbrand1tons)
# Return to original measure
hat_brand1tons <- exp(lnbrand1tons)
plot(hat_brand1tons)