1 Forecasting consumer demand with an ARIMAX model

Assume that this industry is an oligopoly where there are few main players in the market that account for most of the market share. In an oligopoly, it is very common that price elasticity of demand is high and sensitive. In other words, when a player decreases the price of a product, it is very likely that its demand will significantly increase. For example, for a price elasticity of -2, if a company decides to reduce the price by 5% in one month, it is expected that on average the demand will increase by about 10% on average.

It is also important to examine not only how the demand of one product changes with respect to its own price, but also how its demand changes with respect to the changes in prices of other products. The concept of cross-elasticity refers to the change of demand of one product with respect to the change in price of another product. If the cross-elasticity coefficient between the demand of product 1 and price of product 2 is about 1.5, this is a sign of competition between both products: when the price of product 2 increases in about 1%, then the demand of product 1 increases in about 1.5%. This price sensitivity might be due to competition between products. If you get a negative cross elasticity coefficient, then this might be a sign of complement between the products.

The concept of price elasticity is simple, but in reality it is not easy to get a good estimate since there are other factors that can influence the demand of a product. For example, it is common that the sales of products in units are seasonal, so the increase in demand might be due to seasonaly more than price elasticity. Another common driver of consumer demand is the organic trend due to either distribution or consumer acceptance. So, we need an econometric model that can separate these drivers and be able to measure price elasticities using a robust method.

Let’s see how we can estimate price elasticities, and how these elasticities along with other main factors that affect demand can help us to do a good demand forecast for products.

For this exercise we will use a sample dataset of main consumer products from main competitors of a category in a country.This dataset has historical sales of the main 7 products of a category, which account for 70% of the share in the category. This dataset was created using real data but changing the name of products. For each product you have sales in value ($) and sales in volume (kgs).

Download the dataset from: http://www.apradie.com/ec2004/salesfabs.xlsx

#I set the working directory:

library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(tseries)
library(forecast)
library(lmtest)
library(astsa)

# Download the excel file from a web site:
dataset <- download.file("http://www.apradie.com/ec2004/salesfabs.xlsx",
              "dataw6.xlsx", mode="wb")
# The first parameter is the link and the second is a name for the
#  local file

# Use the function read_excel()
dataset <- read_excel("dataw6.xlsx")

The columns that start with qfab are related to sales in volume, and vfab refer to sales in value ($). This dataset has historical monthly sales data for 7 consumer products (qfab1, qfab2, etc).

We can create 2 datasets, one for sales in volume and another for sales in value. This will help to easily create another dataset with prices.

salesvol <- dataset %>% select(contains("qfab"))
salesval <- dataset %>% select(contains("vfab"))

Now we can create another dataset for prices. We just divide sales in value by sales in volume to get average historical monthly prices:

# I construct a date variable as a sequence of dates using the month column:
date <- seq(as.Date(dataset$month[1]),
                      by="month",
                      length.out = nrow(dataset) )
# I delete the first column since it has the month, and I will construct
#   an xts dataset with the index equal to the month

# I create the xts object:
dataset.xts<- xts(x=dataset[,-1], frequency = "monthly", order.by=date)

prices <- salesval / salesvol
names(prices)<-c("pfab1","pfab2","pfab3","pfab4","pfab5","pfab6","pfab7")

qfabs <- ts(coredata(salesvol),start=c(2014,12),frequency=12)
pfabs <- ts(coredata(prices),start=c(2014,12),frequency=12)

2 Q Challenge - Calibrate an ARIMAX model

You have to calibrate an ARIMA model for product 4 following the method we learned here. You have to show and explain your steps, and also you have to INTERPRET your final model, and do a prediction for the next 15 months.

We will model the annual % change in demand of product 4 (month by month). We will assume that this product might be competing with another product or might be a complement of other consumer products in the dataset. Then, for this ARIMAX model we have to include the annual % price change of ALL products as independent/explanatory variables.

But before setting up this ARIMAX model, we need to calibrate an ARIMA/SARIMA model for each price of the products in the dataset. In this case, I will provide you with the calibration of the ARIMA/SARIMA models for all prices of the products. I calibrated these models according to what we learned in previous workshops.

We will use the Arima function with a specific calibration for each price:

# Product price 1:
mprice1 <- Arima(pfabs[,"pfab1"], order = c(1,0,0), 
            seasonal = list(order=c(0,1,0),period=12),
            include.constant = TRUE,
            lambda = 0) 

# I check the pvalues of the coefficients:
coeftest(mprice1)

## 
## z test of coefficients:
## 
##         Estimate Std. Error z value Pr(>|z|)    
## ar1   0.59807762 0.10544181  5.6721 1.41e-08 ***
## drift 0.00508341 0.00063234  8.0391 9.05e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# I create and plot the price forecast
forecastp1 <- forecast(mprice1, h=15)
autoplot(forecastp1)

# Product price 2:
mprice2 <- Arima(pfabs[,"pfab2"], order = c(1,0,0), 
            seasonal = list(order=c(0,1,0),period=12),
            include.constant = TRUE,
            lambda = 0) 

# I check the pvalues of the coefficients:
coeftest(mprice2)

## 
## z test of coefficients:
## 
##        Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.6480305  0.0979253  6.6176 3.651e-11 ***
## drift 0.0053835  0.0010509  5.1228 3.010e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# I create and plot the price forecast
forecastp2 <- forecast(mprice2, h=15)
autoplot(forecastp2)

# Product price 3:
mprice3 <- Arima(pfabs[,"pfab3"], order = c(1,0,0), 
            seasonal = list(order=c(0,1,1),period=12),
            include.constant = TRUE,
            lambda = 0) 

# I check the pvalues of the coefficients:
coeftest(mprice3)

## 
## z test of coefficients:
## 
##          Estimate  Std. Error z value  Pr(>|z|)    
## ar1    0.63551916  0.10995083  5.7800 7.469e-09 ***
## sma1  -0.22374081  0.19500882 -1.1473  0.251242    
## drift  0.00217850  0.00072782  2.9932  0.002761 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# I create and plot the price forecast
forecastp3 <- forecast(mprice3, h=15)
autoplot(forecastp3)

# Product price 4:
mprice4 <- Arima(pfabs[,"pfab4"], order = c(1,0,0), 
            seasonal = list(order=c(0,1,1),period=12),
            include.constant = TRUE,
            lambda = 0) 

# I check the pvalues of the coefficients:
coeftest(mprice4)

## 
## z test of coefficients:
## 
##          Estimate  Std. Error z value  Pr(>|z|)    
## ar1    0.66101380  0.09695434  6.8178 9.246e-12 ***
## sma1  -0.56239455  0.16024533 -3.5096 0.0004488 ***
## drift  0.00421852  0.00042551  9.9141 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# I create and plot the price forecast
forecastp4 <- forecast(mprice4, h=15)
autoplot(forecastp4)

# Product price 5:
mprice5 <- Arima(pfabs[,"pfab5"], order = c(1,0,0), 
            seasonal = list(order=c(0,1,0),period=12),
            include.constant = TRUE,
            lambda = 0) 

# I check the pvalues of the coefficients:
coeftest(mprice5)

## 
## z test of coefficients:
## 
##         Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.73931832 0.09285911  7.9617 1.697e-15 ***
## drift 0.00556178 0.00095356  5.8326 5.456e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# I create and plot the price forecast
forecastp5 <- forecast(mprice5, h=15)
autoplot(forecastp5)

# Product price 6:
mprice6 <- Arima(pfabs[,"pfab6"], order = c(1,0,0), 
            seasonal = list(order=c(1,1,0),period=12),
            include.constant = TRUE,
            lambda = 0) 

# I check the pvalues of the coefficients:
coeftest(mprice6)

## 
## z test of coefficients:
## 
##         Estimate Std. Error z value  Pr(>|z|)    
## ar1    0.8984396  0.0640989 14.0165 < 2.2e-16 ***
## sar1  -0.3927082  0.1315205 -2.9859  0.002827 ** 
## drift  0.0044928  0.0017715  2.5361  0.011208 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# I create and plot the price forecast
forecastp6 <- forecast(mprice6, h=15)
autoplot(forecastp6)

# Product price 7:
mprice7 <- Arima(pfabs[,"pfab7"], order = c(1,0,0), 
            seasonal = list(order=c(0,1,0),period=12),
            include.constant = TRUE,
            lambda = 0) 

# I check the pvalues of the coefficients:
coeftest(mprice7)

## 
## z test of coefficients:
## 
##        Estimate Std. Error z value Pr(>|z|)    
## ar1   0.7346684  0.0865368  8.4897   <2e-16 ***
## drift 0.0010131  0.0021806  0.4646   0.6422    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# I create and plot the price forecast
forecastp7 <- forecast(mprice7, h=15)
autoplot(forecastp7)

Now that we have a forecast for prices for each product, we can design an ARIMAX model for the demand of product 4 (qfab4):

The historical data of qfab4 is:

qfabs[, "qfab4"]

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2014                                                                      
## 2015  602.1692  616.7962  576.9154  552.2004  583.3495  530.5445  544.4641
## 2016  682.1081  707.9934  655.1503  625.7916  617.9704  583.9963  631.9970
## 2017  812.1743  848.8255  817.2468  745.7488  734.3957  654.5958  684.2820
## 2018  801.1178  898.1721  779.3245  759.8208  787.5706  776.4568  773.0577
## 2019  938.2758 1114.3176  883.4799  815.0494  867.1072  849.1447  765.6616
## 2020  986.4335 1071.5877  884.9235  876.3864  872.0506  843.9563  841.7164
##            Aug       Sep       Oct       Nov       Dec
## 2014                                          604.8942
## 2015  555.6245  532.7308  562.9418  611.9219  735.2260
## 2016  627.0950  656.2282  654.0440  673.1533  759.4932
## 2017  659.1869  701.4818  710.1428  769.3757  917.9404
## 2018  759.5814  766.0799  767.7176  892.9676 1009.6267
## 2019  782.4975  778.8295  810.1746  920.9684 1073.6561
## 2020  826.6552  807.3565

I start calibrating an ARIMA/SARIMA model for the demand of product 4 (qfab4). This this is the demand of a consumer product, I start checking whether the annual % growth is stationary:

s12.lnfab4 <- diff(log(qfabs[,"qfab4"]),lag=12)
plot(s12.lnfab4)

Now I run the Dicky-Fuller test to check whether the seasonal difference of the log of product 4 demand is stationary:

adf.test(s12.lnfab4,k=0)

## Warning in adf.test(s12.lnfab4, k = 0): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  s12.lnfab4
## Dickey-Fuller = -4.926, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary

Since p-value<0.05, then we can treat the series as stationary, so I set the D=1. In other words, I will continue with the seasonal difference of the log of the series (that is the annual cc % change of demand 4) to do the ARIMA-SARIMA model.

Next, I need to do the ACF and the PACF plots to examine the autocorrelations between the annual % change of product 4 demand and its own lags:

acf2(s12.lnfab4,max.lag=12)

##      [,1] [,2]  [,3] [,4]  [,5] [,6] [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF  0.53 0.35  0.19 0.13  0.06 0.29 0.34  0.19  0.09 -0.14 -0.19 -0.33
## PACF 0.53 0.10 -0.03 0.02 -0.02 0.35 0.13 -0.20 -0.07 -0.29  0.03 -0.29

I see a classical AR signature with p=1 since the ACF shows a slow decay in the autocorrelations, while the PACF shows a fast decay after autocorrelation with lag 2.

Then I start including an AR(1) term to the model:

m1qfab4 <- Arima(qfabs[,"qfab4"], order = c(1,0,0), 
            seasonal = list(order=c(0,1,0),period=12),
            include.constant = TRUE,
            lambda = 0) 
coeftest(m1qfab4)

## 
## z test of coefficients:
## 
##        Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.5471474  0.1116303  4.9014 9.514e-07 ***
## drift 0.0078296  0.0013094  5.9797 2.236e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I check whether the model residuals is a white noise:

acf2(m1qfab4$residuals, max.lag=24)

##       [,1] [,2]  [,3] [,4]  [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  -0.06 0.09 -0.02 0.05 -0.18 0.21 0.26 0.04 0.13 -0.17  0.03 -0.36  0.01
## PACF -0.06 0.09 -0.01 0.04 -0.17 0.20 0.33 0.04 0.10 -0.25  0.05 -0.34 -0.20
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF   0.13  0.14 -0.06  0.15 -0.09 -0.15  0.04 -0.01  0.15  0.10  0.08
## PACF  0.19  0.07  0.05  0.15  0.04  0.18  0.01 -0.11  0.04 -0.04 -0.11

The autocorrelation with lag 12 is negative and significant in both plots, so I add a seasonal MA term:

m2qfab4 <- Arima(qfabs[,"qfab4"], order = c(1,0,0), 
            seasonal = list(order=c(0,1,1),period=12),
            include.constant = TRUE,
            lambda = 0) 
coeftest(m2qfab4)

## 
## z test of coefficients:
## 
##          Estimate  Std. Error z value  Pr(>|z|)    
## ar1    0.57750589  0.11162316  5.1737 2.295e-07 ***
## sma1  -0.48653698  0.15099391 -3.2222  0.001272 ** 
## drift  0.00783918  0.00080503  9.7377 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I check whether the model 2 residuals is a stationary series:

acf2(m2qfab4$residuals,max.lag = 12)

##       [,1] [,2] [,3] [,4]  [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF  -0.11 0.12 0.05 0.05 -0.09 0.28 0.17 0.05 0.10 -0.12  0.03 -0.03
## PACF -0.11 0.11 0.07 0.05 -0.10 0.25 0.26 0.06 0.05 -0.19  0.00 -0.05

Now the lag 6 autocorrelation is positive and significant in both plots. We can think in including either an AR(6) or an MA(6). In these cases, we can look at the ACF plot. For an AR(6) process with positive autocorrelation, we would see that the lag 6, lag 12, lag 18, lag 24 autocorrelations decay slowly. We can expand the ACF plot to 24 lags:

acf2(m2qfab4$residuals,max.lag = 24)

##       [,1] [,2] [,3] [,4]  [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  -0.11 0.12 0.05 0.05 -0.09 0.28 0.17 0.05 0.10 -0.12  0.03 -0.03 -0.07
## PACF -0.11 0.11 0.07 0.05 -0.10 0.25 0.26 0.06 0.05 -0.19  0.00 -0.05 -0.22
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF   0.19  0.05 -0.08  0.13 -0.08 -0.11  0.05  0.00  0.03  0.08  0.03
## PACF  0.13  0.04 -0.04  0.19 -0.07 -0.02 -0.01 -0.13  0.11 -0.03  0.05

Looking at the ACF plot, lag 6, lag 12, lag 18 autocorrelations does not show a slow decay of autocorrelations, so this might be a sign of an MA(6) process only the lag 6 autocorrelation is positive and significant, but there is a no slow decay in the following lag 12, lag 18 and lag 24 autocorrelations.

We could also run both models, with an AR(6) and an MA(6) and keep the model with the more negative Akaike Information Criteria.

I add an MA(6) term. For now, I need to add all MA terms from 1 to 6:

m3qfab4 <- Arima(qfabs[,"qfab4"], order = c(1,0,6), 
            seasonal = list(order=c(0,1,1),period=12),
            include.constant = TRUE,
            lambda = 0) 
coeftest(m3qfab4)

## 
## z test of coefficients:
## 
##         Estimate Std. Error z value  Pr(>|z|)    
## ar1    0.7726986  0.1334643  5.7896 7.057e-09 ***
## ma1   -0.3939526  0.1651325 -2.3857 0.0170478 *  
## ma2    0.0580795  0.1304586  0.4452 0.6561787    
## ma3   -0.1134056  0.1219631 -0.9298 0.3524563    
## ma4    0.1960571  0.1360630  1.4409 0.1496050    
## ma5   -0.2320074  0.1477422 -1.5704 0.1163331    
## ma6    0.5463719  0.1521144  3.5918 0.0003283 ***
## sma1  -0.4183691  0.1735053 -2.4113 0.0158968 *  
## drift  0.0078121  0.0013491  5.7905 7.016e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Unfortunately the terms for MA 1, 2, 3, 4 and 5 were also included. Curiously, the MA 1 term was significant, so I can try to keep it in the model. I can indicate to the Arima model that we DO NOT want to include the MA terms that are not significant by using the option fixed. The coefficients of the model that are show above are in the following order:

ar1, ma1, ma2, ma3, ma4, ma5, ma6, sma1, drift

In the fixed option we set a vector with zeros to the terms that we DO NOT want to include:

m3qfab4 <- Arima(qfabs[,"qfab4"], order = c(1,0,6), 
            seasonal = list(order=c(0,1,1),period=12),
            fixed=c(NA,NA,0,0,0,0,NA,NA,NA),
            include.constant = TRUE,
            lambda = 0) 
#In the fixed option I set a vector with the following values:

#fixed =c(NA,NA,0,0,0,0,NA,NA,NA)

#Since I do not want to include the terms ma2, ma3, ma4 and ma5, then I set
#to zero the values 3, 4, 5, and 6, and NA to the rest. 

coeftest(m3qfab4)

## 
## z test of coefficients:
## 
##         Estimate Std. Error z value  Pr(>|z|)    
## ar1    0.7976025  0.1173969  6.7941 1.090e-11 ***
## ma1   -0.4317473  0.1517493 -2.8451  0.004439 ** 
## ma6    0.4233557  0.1549114  2.7329  0.006278 ** 
## sma1  -0.4284858  0.1654629 -2.5896  0.009608 ** 
## drift  0.0078171  0.0014167  5.5178 3.433e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I see that the mr6 term is positive and significant as expected. Also, I see that the sum of the AR and MA terms is less than 1 (0.797 + -0.431 + 0.423), that is an important condition for the calibration of an ARIMA model.

I check whether the model residuals is a stationary series:

acf2(m3qfab4$residuals,max.lag = 12)

##      [,1] [,2]  [,3] [,4]  [,5]  [,6] [,7]  [,8] [,9] [,10] [,11] [,12]
## ACF  0.02 0.03 -0.02    0 -0.09 -0.02 0.12 -0.02 0.07 -0.15 -0.04 -0.06
## PACF 0.02 0.03 -0.02    0 -0.09 -0.01 0.13 -0.03 0.07 -0.16 -0.04 -0.03

Now I see that the residuals look like a stationary series, so I stop here and use this model to start calibrating the ARIMAX model by including the price variables:

# I generate the log of the product prices since the lamda =0 (log 
#   transformation) does not apply to the independent variables 
lnpfabs <- log(pfabs)

m1qfab4x <- Arima(qfabs[,"qfab4"], order = c(1,0,6), 
            seasonal = list(order=c(0,1,1),period=12),
            fixed=c(NA,NA,0,0,0,0,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
            include.constant = TRUE,
            xreg = lnpfabs,
            lambda = 0) 
coeftest(m1qfab4x)

## 
## z test of coefficients:
## 
##          Estimate  Std. Error z value  Pr(>|z|)    
## ar1    0.70581474  0.15501660  4.5532 5.285e-06 ***
## ma1   -0.42741487  0.15671674 -2.7273  0.006385 ** 
## ma6    0.39044162  0.14697393  2.6565  0.007895 ** 
## sma1  -0.73833927  0.27309231 -2.7036  0.006859 ** 
## drift  0.01274847  0.00061499 20.7295 < 2.2e-16 ***
## pfab1 -0.19275941  0.15904008 -1.2120  0.225506    
## pfab2 -0.14462470  0.14818485 -0.9760  0.329077    
## pfab3  0.03983060  0.12635832  0.3152  0.752595    
## pfab4 -1.14086373  0.21974575 -5.1917 2.083e-07 ***
## pfab5 -0.00981323  0.19476756 -0.0504  0.959816    
## pfab6  0.42601386  0.16178352  2.6332  0.008458 ** 
## pfab7 -0.20312498  0.09615632 -2.1124  0.034648 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now I start dropping independent variables one by one, starting with the variable with the highest p-value (the less significant).

I dropped pfab5. I can drop it using the fixed option and setting to zero the cell that is in the position of pfab5, or just deleting pfab5 from the independent variables. I do the first option:

# I will write the fixed vector and the corresponding variable in the model:
#c(NA, 0 ,  0,  0,  0, NA, NA,    NA,   NA,   NA,   NA,   NA,   NA,   NA, NA)
# ar1,ar2,ar3,ar4,ar5,ar6,sma1,drift,pfab1,pfab2,pfab3,pfab4,pfab5,pfab6,pfab7   
mqfab4x <- Arima(qfabs[,"qfab4"], order = c(1,0,6), 
            seasonal = list(order=c(0,1,1),period=12),
            fixed=c(NA,NA,0,0,0,0,NA,NA,NA,NA,NA,NA,NA,0,NA,NA),
            include.constant = TRUE,
            xreg = lnpfabs,
            lambda = 0) 
coeftest(mqfab4x)

## 
## z test of coefficients:
## 
##         Estimate Std. Error z value  Pr(>|z|)    
## ar1    0.7063496  0.1553820  4.5459 5.470e-06 ***
## ma1   -0.4289063  0.1535095 -2.7940  0.005206 ** 
## ma6    0.3918765  0.1500680  2.6113  0.009019 ** 
## sma1  -0.7375099  0.2735282 -2.6963  0.007012 ** 
## drift  0.0127051  0.0020026  6.3441 2.237e-10 ***
## pfab1 -0.1940457  0.2028249 -0.9567  0.338711    
## pfab2 -0.1452253  0.1594926 -0.9105  0.362535    
## pfab3  0.0411864  0.1288014  0.3198  0.749145    
## pfab4 -1.1434493  0.2375941 -4.8126 1.490e-06 ***
## pfab6  0.4272293  0.1787154  2.3906  0.016823 *  
## pfab7 -0.2041541  0.0959618 -2.1275  0.033383 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now I drop pfab3:

mqfab4x <- Arima(qfabs[,"qfab4"], order = c(1,0,6), 
            seasonal = list(order=c(0,1,1),period=12),
            fixed=c(NA,NA,0,0,0,0,NA,NA,NA,NA,NA,0,NA,0,NA,NA),
            include.constant = TRUE,
            xreg = lnpfabs,
            lambda = 0) 
coeftest(mqfab4x)

## 
## z test of coefficients:
## 
##         Estimate Std. Error z value  Pr(>|z|)    
## ar1    0.7112106  0.1532052  4.6422 3.447e-06 ***
## ma1   -0.4311787  0.1533635 -2.8115  0.004931 ** 
## ma6    0.3885091  0.1510796  2.5716  0.010124 *  
## sma1  -0.7398107  0.2764273 -2.6763  0.007443 ** 
## drift  0.0127165  0.0020064  6.3381 2.326e-10 ***
## pfab1 -0.1863923  0.2027396 -0.9194  0.357903    
## pfab2 -0.1386816  0.1585221 -0.8748  0.381661    
## pfab4 -1.1475783  0.2386049 -4.8095 1.513e-06 ***
## pfab6  0.4349775  0.1779510  2.4444  0.014511 *  
## pfab7 -0.2069631  0.0958833 -2.1585  0.030890 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I drop pfab2:

mqfab4x <- Arima(qfabs[,"qfab4"], order = c(1,0,6), 
            seasonal = list(order=c(0,1,1),period=12),
            fixed=c(NA,NA,0,0,0,0,NA,NA,NA,NA,0,0,NA,0,NA,NA),
            include.constant = TRUE,
            xreg = lnpfabs,
            lambda = 0) 
coeftest(mqfab4x)

## 
## z test of coefficients:
## 
##         Estimate Std. Error z value  Pr(>|z|)    
## ar1    0.7260142  0.1638397  4.4312 9.369e-06 ***
## ma1   -0.4800588  0.1674882 -2.8662  0.004154 ** 
## ma6    0.3814251  0.1439032  2.6506  0.008036 ** 
## sma1  -0.7436167  0.2774953 -2.6797  0.007368 ** 
## drift  0.0120168  0.0018676  6.4343 1.241e-10 ***
## pfab1 -0.1897519  0.2014832 -0.9418  0.346308    
## pfab4 -1.1793780  0.2436953 -4.8396 1.301e-06 ***
## pfab6  0.4525333  0.1788069  2.5308  0.011379 *  
## pfab7 -0.2100093  0.0960790 -2.1858  0.028830 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I drop pfab1:

mqfab4x <- Arima(qfabs[,"qfab4"], order = c(1,0,6), 
            seasonal = list(order=c(0,1,1),period=12),
            fixed=c(NA,NA,0,0,0,0,NA,NA,NA,0,0,0,NA,0,NA,NA),
            include.constant = TRUE,
            xreg = lnpfabs,
            lambda = 0) 
coeftest(mqfab4x)

## 
## z test of coefficients:
## 
##         Estimate Std. Error z value  Pr(>|z|)    
## ar1    0.7220916  0.1574207  4.5870 4.496e-06 ***
## ma1   -0.4678493  0.1519663 -3.0786  0.002079 ** 
## ma6    0.3941961  0.1498912  2.6299  0.008541 ** 
## sma1  -0.7721748  0.3001826 -2.5724  0.010101 *  
## drift  0.0107543  0.0012532  8.5818 < 2.2e-16 ***
## pfab4 -1.1299331  0.2358930 -4.7900 1.668e-06 ***
## pfab6  0.4761994  0.1762298  2.7022  0.006889 ** 
## pfab7 -0.2309810  0.0954122 -2.4209  0.015483 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All independent variables and the AR and the seasonal MA terms are significant (pvalues<0.05). I stop here the ARIMAX calibration process.

I can re-run the same model including only the variables I need instead of fixing to 0 the value for the independent variables. I will get the same result:

mqfab4x <- Arima(qfabs[,"qfab4"], order = c(1,0,6), 
            seasonal = list(order=c(0,1,1),period=12),
            fixed=c(NA,NA,0,0,0,0,NA,NA,NA,NA,NA,NA),
            include.constant = TRUE,
            xreg = lnpfabs[,c("pfab4","pfab6","pfab7")],
            lambda = 0) 
# I assign a variable to the coefficient table:
coefs=coeftest(mqfab4x)
coefs

## 
## z test of coefficients:
## 
##          Estimate  Std. Error  z value  Pr(>|z|)    
## ar1    0.72209346  0.15566436   4.6388 3.505e-06 ***
## ma1   -0.46786882  0.15084700  -3.1016 0.0019247 ** 
## ma6    0.39419576  0.13903703   2.8352 0.0045799 ** 
## sma1  -0.77218837  0.29019826  -2.6609 0.0077932 ** 
## drift  0.01075430  0.00059463  18.0858 < 2.2e-16 ***
## pfab4 -1.12992075  0.03904113 -28.9418 < 2.2e-16 ***
## pfab6  0.47619743  0.13403670   3.5527 0.0003812 ***
## pfab7 -0.23098294  0.09532131  -2.4232 0.0153843 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since all independent variables are significant, and also the coefficients of the ARIMA/SARIMA model, then I stop here the ARIMAX calibration process.

INTERPRETATION of the model:

CONSIDERING THE EFFECT OF PRICE CHANGES AND PREVIOUS SHOCKS, THE ANNUAL % DEMAND GROWTH OF PRODUCT 4 IS POSITIVE AND SIGNIFICANTLY RELATED WITH ITS OWN ANNUAL % GROWTH OF THE LAST MONTH. FOR EACH +1% ANNUAL GROWTH OF LAST MONTH, IT IS EXPECTED THAT THE CURRENT ANNUAL % GROWTH WILL MOVE IN ABOUT +0.7220935%. IN OTHER WORDS, 72.2093464% OF THE LAST-MONTH ANNUAL % GROWTH WILL IMPACT THE CURRENT ANNUAL % DEMAND GROWTH OF PRODUCT 4.

THE SHOCKS OF PREVIOUS MONTH AND 12 MONTHS AGO SIGNIFICANTLY AND NEGATIVELY IMPACT THE ANNUAL % DEMAND GROWTH OF PRODUCT 4 AFTER CONSIDERING THE EFFECT OF PRICE CHANGES AND LAST MONTH ANNUAL GROWTH.

THE SHOCK OF 6 MONTHS AGO SIGNIFICANTLY AND POSITIVELY IMPACT THE ANNUAL % DEMAND GROWTH OF PRODUCT 4 AFTER CONSIDERING THE EFFECT OF PRICE CHANGES AND LAST MONTH ANNUAL GROWTH.

NOW, RELATED TO THE PRICE CHANGE RELATIONSHIPS, HERE ARE THE INTERPRETATIONS:

1. THE AVERAGE DIRECT ELASTICITY OF PRODUCT 4 IS -1.1299207, AND THIS ELASTICITY IS SIGNIFICANT. IN OTHER WORDS, FOR EACH +1% CHANGE IN PRODUCT 4 PRICE, IT IS EXPECTED THAT ITS OWN DEMAND WILL MOVE IN ABOUT -1.1299207%, AFTER CONSIDERING THE EFFECT OF PRICE CHANGES OF PRODUCTS 6 AND 7, AND THE AUTOCORRELATIONS OF THE ARIMAX MODEL.

2. THE AVERAGE CROSS-ELASTICITY OF PRODCUT 6 IS 0.4761974. THIS MEANS THAT FOR EACH +1% CHANGE IN PRODUCT 6 PRICE, IT IS EXPECTED THAT THE PRODUCT 4 DEMAND WILL MOVE IN ABOUT 0.4761974%. SINCE THIS COEFFICIENT IS POSSITIVE AND SIGNIFICANT, THEN THIS IS AN EVIDENCE OF POSSIBLE COMPETITION BETWEEN PRODUCT 6 AND PRODUCT 4 SINCE THE PRODUCT 4 DEMAND DECREASES WHEN PRODUCT 6 PRICE DECREASES.

3. THE AVERAGE CROSS-ELASTICITY OF PRODUCT 7 IS -0.2309829. THIS MEANS THAT FOR EACH +1% CHANGE IN PRODUCT 7 PRICE, IT IS EXPECTED THAT THE PRODUCT 4 DEMAND WILL MOVE IN ABOUT 0.4761974%. SINCE THIS COEFFICIENT IS NEGATIVE AND SIGNIFICANT, THEN THIS IS AN EVIDENCE OF COMPLEMENTARITY BETWEEN PRODUCT 7 AND PRODUCT 4 SINCE THE PRODUCT 4 DEMAND INCREASES WHEN PRODUCT 7 PRICE DECREASES.

Finally, I do the forecast of product 4 demand with this ARIMAX model:

# Create a dataset for the forecast log prices of the explanatory variables:

pricesff <- cbind(forecastp4$mean,forecastp6$mean,forecastp7$mean)
lnpricesff <- log(pricesff)
colnames(lnpricesff) <- c("pfab4","pfab6","pfab7")
qfab4forecast <- forecast(mqfab4x,xreg=lnpricesff, h=15)
autoplot(qfab4forecast)

The forecast for product 4 demand is the following:

qfab4forecast$mean

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2020                                                                      
## 2021 1046.9095 1164.6108  987.5000  954.5529  984.1217  928.3148  928.1803
##            Aug       Sep       Oct       Nov       Dec
## 2020                      864.4570  963.8279 1110.1721
## 2021  925.1270  927.7429  966.7506 1083.5664 1238.3819

Workshop 6 SOLUTION, Financial Econometrics II

Alberto Dorantes, Ph.D.

Apr 6, 2022

1 Forecasting consumer demand with an ARIMAX model

2 Q Challenge - Calibrate an ARIMAX model