Supply Chain Analytics HW 3

US Electronic and Appliances Sales: https://research.stlouisfed.org/fred2

1. (5 pts) plot this demand time series and describe qualitatively in a few sentences the effect of the 2008 financial crisis on sales for this segment of the economy. What are the difficulties in directly assessing the impact of the financial crisis on demand?

library(fpp)
library(forecast)

set.seed(2)

data = read.csv("US Electronics and Appliance Sales.csv", header=TRUE, sep=",")
head(data)

##         Date RSEASN
## 1 2000-01-01   6591
## 2 2000-02-01   6503
## 3 2000-03-01   6716
## 4 2000-04-01   6071
## 5 2000-05-01   6360
## 6 2000-06-01   6343

data = ts(data, start=2000, frequency=12)[,2]
ts.plot(data, col="green", main="US Electronic & Appliances Demand", ylab="Demand")

plot of chunk unnamed-chunk-1

For this segment of the economy, the 2008 financial crisis had a negative impact on demand, as evidenced by the above plot. However, it is difficult to directly assess the impact of the financial crisis on demand without explicitly separating the effect of the financial crisis from the effect of other explanatory variables (i.e., seasonality, cyclicality, etc.)

To examine the benefit of detailed modeling of an “ intervention” we will first develop a simple ARIMA model with seasonality and then we compare it with models that model explicitly the effect of the 2008 financial crisis. For all of the models in this assignment define the BoxCox transformation using the value of the scaling parameter (lambda) obtained by the BoxCox.lambda(…) function for the entire time series.

2. (5 pts) Use the auto.arima(…) function to fit a seasonal ARIMA model to the entire time series and report the summary fit statistics and the residual diagnostics. We will use these as benchmark to assess model improvement.

#find optimal value of lambda
L = BoxCox.lambda(data)
L

## [1] 0.09028898

baseline_model = auto.arima(data, lambda=L)

## Warning in auto.arima(data, lambda = L): Unable to fit final model using
## maximum likelihood. AIC value approximated

summary(baseline_model)

## Series: data 
## ARIMA(1,1,2)(0,0,2)[12] with drift         
## Box Cox transformation: lambda= 0.09028898 
## 
## Coefficients:
##          ar1      ma1      ma2   sma1    sma2   drift
##       0.0942  -0.5640  -0.4095  1.184  0.6181  0.0034
## s.e.  0.1902   0.1778   0.1604  0.068  0.0498  0.0015
## 
## sigma^2 estimated as 0.03102:  log likelihood=59.38
## AIC=-103.71   AICc=-103.08   BIC=-81.05
## 
## Training set error measures:
##                    ME     RMSE     MAE        MPE     MAPE     MASE
## Training set 56.73874 713.9929 464.792 -0.1931356 5.373248 1.268846
##                    ACF1
## Training set -0.0126521

tsdiag(baseline_model)

plot of chunk unnamed-chunk-2

To obtain this model you need to first create a time series variable containing the demand from January 2000 through July 2008. You can accomplish this by using the window(…) function using the directive end=2008.55. In the description that follows we will refer to this pre-crisis demand series as y.PRE.

3. (5 pts.) Examine the ACF and PACF of this time series and difference it enough times until the time series becomes stationary and it passes the ADF test. What are the seasonal and non- seasonal components of the model you would suggest based on your examination of the data. If more than one model appears appropriate discuss also the alternative models.

#create pre-crisis demand ts using window function
y.PRE = window(data, end=2008.55)

#number of  differences necessary 
ndiffs(y.PRE)

## [1] 1

y.PRE = diff(y.PRE)
ndiffs(y.PRE)

## [1] 0

#p-value of adf test proves it's stationary
adf.test(y.PRE)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  y.PRE
## Dickey-Fuller = -7.1407, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

#examining seasonal and non-seasonal patterns in data to make model suggestion
tsdisplay(y.PRE)

plot of chunk unnamed-chunk-3

Based on the ACF and PACF plots, there are a couple of p and q combinations that would prove to be an appropriate fit for the time series. For example, ARIMA(1,0,0)(2,0,1)12 would be an appropriate fit, as would ARIMA(0,0,0)(1,0,0)12.

4. (5 pts) Use the auto.arima(…) function to fit a model to the pre-crisis demand. Report the summary fit data and residual diagnostics.

#fit model to pre-crisis ts
pre_crisis = auto.arima(y.PRE, lambda=L)
summary(pre_crisis)

## Series: y.PRE 
## ARIMA(1,0,0)(2,0,1)[12] with non-zero mean 
## Box Cox transformation: lambda= 0.09028898 
## 
## Coefficients:
##           ar1    sar1    sar2    sma1  intercept
##       -0.2941  0.4099  0.3899  0.6125    -8.2114
## s.e.   0.0947  0.0896  0.0811  0.1310     5.5228
## 
## sigma^2 estimated as 94.88:  log likelihood=-376.92
## AIC=765.9   AICc=766.78   BIC=781.65
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE     MASE
## Training set -268.8281 1972.512 905.2503 -34.69358 161.0594 4.697176
##                    ACF1
## Training set -0.2434513

tsdiag(pre_crisis)

plot of chunk unnamed-chunk-4

5. (10 pts) Quantify the impact of the financial crisis on the sales of this segment of the economy. To this effect use the model in Question 4 to prepare a 24-month ahead forecast of demand, and then compare the mean of the forecast with the actual sales from August 2008 through July 2010. Report the monthly impact and annual impact for each of the two years (August 2008-July 2009 and August 2009-July 2010).

#forecasting 24 months ahead with model from Q4
f_impact = forecast(pre_crisis, h=24)
#full forecast
f_impact

##          Point Forecast         Lo 80         Hi 80         Lo 95
## Aug 2008     212.557530  4.167628e-04  7.326947e+04 -9.826058e-12
## Sep 2008    -172.443820 -7.857031e+04 -6.230074e-05 -7.422421e+05
## Oct 2008     968.665000  1.431944e-02  2.223628e+05  1.824906e-14
## Nov 2008     720.270141  6.216590e-03  1.858942e+05  4.761474e-24
## Dec 2008    1069.706543  1.862270e-02  2.366362e+05  2.838126e-13
## Jan 2009   -1149.243173 -2.473077e+05 -2.256077e-02 -1.922913e+06
## Feb 2009     271.576759  2.906193e-04  1.040438e+05 -5.105660e-10
## Mar 2009    -843.956778 -2.046936e+05 -9.728425e-03 -1.643625e+06
## Apr 2009     -88.816967 -5.489103e+04 -3.324781e-06 -5.586702e+05
## May 2009     251.116903  2.210997e-04  9.939495e+04 -1.205675e-09
## Jun 2009      -9.261917 -1.639764e+04 -7.965017e-16 -2.118824e+05
## Jul 2009       9.893702  4.474430e-15  1.695860e+04 -2.122097e-03
## Aug 2009      47.751262 -8.975507e-08  2.454053e+05 -2.988278e+00
## Sep 2009     -21.625752 -1.947274e+05  1.352037e-05 -3.461241e+06
## Oct 2009      31.051061 -3.606423e-06  2.310793e+05 -8.656733e+00
## Nov 2009     165.933490 -8.398036e-12  5.049551e+05 -9.226930e-01
## Dec 2009     249.090960 -4.036547e-16  6.160414e+05 -4.668022e-01
## Jan 2010    -158.814801 -4.943865e+05  1.650890e-11 -7.181520e+06
## Feb 2010      13.484045 -7.032360e-05  1.605469e+05 -2.067234e+01
## Mar 2010     -22.767249 -2.015642e+05  1.236578e-05 -3.571887e+06
## Apr 2010     -11.679019 -1.509919e+05  1.070444e-04 -2.860282e+06
## May 2010      58.492682 -1.536410e-07  3.083939e+05 -4.078892e+00
## Jun 2010      -2.314434 -7.805220e+04  4.026664e-03 -1.730701e+06
## Jul 2010       6.012371 -5.856474e-04  1.143927e+05 -4.269107e+01
##                  Hi 95
## Aug 2008  6.450569e+05
## Sep 2008  2.356500e-08
## Oct 2008  1.759368e+06
## Nov 2008  1.517568e+06
## Dec 2008  1.853727e+06
## Jan 2009 -1.548542e-12
## Feb 2009  9.410140e+05
## Mar 2009 -5.277967e-17
## Apr 2009  1.879924e-06
## May 2009  9.064202e+05
## Jun 2009  2.441404e-03
## Jul 2009  2.176174e+05
## Aug 2009  3.932370e+06
## Sep 2009  1.231677e+01
## Oct 2009  3.967921e+06
## Nov 2009  7.301594e+06
## Dec 2009  8.539074e+06
## Jan 2010  9.897122e-01
## Feb 2010  2.998161e+06
## Mar 2010  1.218596e+01
## Apr 2010  2.369663e+01
## May 2010  4.965798e+06
## Jun 2010  8.956342e+01
## Jul 2010  2.313359e+06

#point forecast (a.k.a. "mean of forecast")
f_impact$mean

##               Jan          Feb          Mar          Apr          May
## 2008                                                                 
## 2009 -1149.243173   271.576759  -843.956778   -88.816967   251.116903
## 2010  -158.814801    13.484045   -22.767249   -11.679019    58.492682
##               Jun          Jul          Aug          Sep          Oct
## 2008                             212.557530  -172.443820   968.665000
## 2009    -9.261917     9.893702    47.751262   -21.625752    31.051061
## 2010    -2.314434     6.012371                                       
##               Nov          Dec
## 2008   720.270141  1069.706543
## 2009   165.933490   249.090960
## 2010

#FIRST COMPARISON: start = Aug. 2008, end = July 2010
DF = f_impact$mean - diff(window(data,start=2008.45,end=2010.55),1)

#compare OVERALL - accuracy function only compares pairwise observations
accuracy(f_impact$mean, diff(data))

##                 ME     RMSE      MAE     MPE     MAPE       ACF1 Theil's U
## Test set -98.06994 1634.272 1008.664 101.279 106.2155 -0.2338725 0.8195738

#YEAR 1 IMPACT - AUGUST 2008 - JULY 2009
sum(DF[1:12])

## [1] 2446.064

#YEAR 2 IMPACT - AUGUST 2009 - JULY 2010
sum(DF[13:24])

## [1] -92.38538

The first comparison (commented above) displays the difference between the mean of the forecasted observations (point forecasts) and the actual sales at the month level - starting from August 2008 and ending in July 2010. Obviously, these numbers allow us to quantify the impact at the individual year level. For example, the total difference between the forecasted and actual values in year 1 is shown above: 2446.064. In the following year, the total difference is far smaller: -92.385.

Likewise, the accuracy function (commented as “compare OVERALL” above) computes the difference between the aforementioned sets of observations and provides summary statistics. The RMSE over two years in this case is ~1,634.27.

6. (10 pts) Obtain the step intervention model using the directive xreg=x.D and the complete time series y, by specifying the model you obtained in Question 4 in the Arima(…) function. Report the summary fit statistics and the residual diagnostics and compare them to the ones obtained for the model in Question 2. Is the intervention model better? Why? Why not?

#creating a step dummy variable
x.D <-  1*(seq(data)>=104)

#arima function doesn't take "lambda" as parameter 
#transforming data using lambda so we can compare to Q2 results
trans = BoxCox(data, L)

intervention = Arima(trans, xreg=x.D, order = c(1L, 0L, 0L), seasonal= list(order = c(2L, 0L, 1L)))
summary(intervention)

## Series: trans 
## ARIMA(1,0,0)(2,0,1)[12] with non-zero mean 
## 
## Coefficients:
##          ar1    sar1     sar2     sma1  intercept      x.D
##       0.8807  1.1872  -0.1902  -0.5334    13.8737  -0.0868
## s.e.  0.0348  0.1898   0.1885   0.1652     0.9957   0.0518
## 
## sigma^2 estimated as 0.0039:  log likelihood=231.34
## AIC=-448.68   AICc=-448.06   BIC=-425.99
## 
## Training set error measures:
##                       ME       RMSE        MAE        MPE      MAPE
## Training set 0.008417847 0.06244695 0.04843014 0.06013632 0.3491989
##                  MASE       ACF1
## Training set 0.467937 -0.1836512

tsdiag(intervention)

plot of chunk unnamed-chunk-6

This intervention model is far better than the baseline model obtained in question 2, as evidenced by its much lower RMSE of .0624 and AICc value of -448.06 (compared to Q2 model's AICc of -103.08)

7. (5 pts) Use the model in Question 2 to obtain a 24-month ahead forecast (October 2015 through September 2017) for the sales of electronics and appliances stores in the US.

baseline_forecast = forecast(baseline_model, h=24)
summary(baseline_forecast)

## 
## Forecast method: ARIMA(1,1,2)(0,0,2)[12] with drift        
## 
## Model Information:
## Series: data 
## ARIMA(1,1,2)(0,0,2)[12] with drift         
## Box Cox transformation: lambda= 0.09028898 
## 
## Coefficients:
##          ar1      ma1      ma2   sma1    sma2   drift
##       0.0942  -0.5640  -0.4095  1.184  0.6181  0.0034
## s.e.  0.1902   0.1778   0.1604  0.068  0.0498  0.0015
## 
## sigma^2 estimated as 0.03102:  log likelihood=59.38
## AIC=-103.71   AICc=-103.08   BIC=-81.05
## 
## Error measures:
##                    ME     RMSE     MAE        MPE     MAPE     MASE
## Training set 56.73874 713.9929 464.792 -0.1931356 5.373248 1.268846
##                    ACF1
## Training set -0.0126521
## 
## Forecasts:
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Oct 2015       8228.862  7442.185  9090.479  7054.041  9579.082
## Nov 2015      10402.480  9305.800 11615.508  8768.854 12308.464
## Dec 2015      12157.128 10889.777 13557.276 10268.692 14356.420
## Jan 2016       8616.354  7690.867  9642.077  7238.449 10228.908
## Feb 2016       8539.775  7621.514  9557.610  7172.670 10139.979
## Mar 2016       8308.452  7412.678  9301.650  6974.925  9870.046
## Apr 2016       7759.758  6917.966  8693.732  6506.810  9228.502
## May 2016       8154.187  7273.078  9131.362  6842.574  9690.689
## Jun 2016       8497.272  7582.040  9511.935  7134.741 10092.569
## Jul 2016       8292.586  7397.252  9285.450  6959.767  9853.719
## Aug 2016       8403.497  7496.936  9408.720  7053.934  9984.025
## Sep 2016       8128.909  7249.196  9104.699  6819.429  9663.302
## Oct 2016       8310.018  7028.480  9800.740  6425.470 10684.741
## Nov 2016       9591.864  8029.053 11426.706  7299.094 12522.223
## Dec 2016      10285.228  8615.724 12244.057  7835.525 13413.041
## Jan 2017       9050.440  7564.680 10797.119  6871.439 11841.011
## Feb 2017       9044.937  7559.127 10791.874  6865.925 11836.006
## Mar 2017       8764.287  7319.962 10463.406  6646.417 11479.378
## Apr 2017       8733.868  7293.303 10428.823  6621.594 11442.421
## May 2017       8852.058  7392.782 10568.866  6712.296 11595.461
## Jun 2017       9044.408  7555.220 10796.035  6860.669 11843.285
## Jul 2017       8917.452  7446.591 10648.057  6760.756 11682.975
## Aug 2017       8965.773  7486.751 10706.020  6797.123 11746.722
## Sep 2017       8719.660  7277.076 10417.902  6604.712 11433.866

plot(baseline_forecast, main="Baseline Model Forecast 2015-2017")

plot of chunk unnamed-chunk-7

8. (10 pts) Use the intervention model in Question 6 to obtain a 24-month ahead forecast of US sales of electronics and appliance stores. Notice that this will require you to create a dummy variable with ones x.FD <- c(rep(1,24) to extrapolate the dummy variable used in the estimation of the model parameters.

#Dummy variable with ones
xFD = c(rep(1,24))
int_forecast = forecast(intervention, xreg=xFD, h=24, lambda=L)
summary(int_forecast)

## 
## Forecast method: ARIMA(1,0,0)(2,0,1)[12] with non-zero mean
## 
## Model Information:
## Series: trans 
## ARIMA(1,0,0)(2,0,1)[12] with non-zero mean 
## 
## Coefficients:
##          ar1    sar1     sar2     sma1  intercept      x.D
##       0.8807  1.1872  -0.1902  -0.5334    13.8737  -0.0868
## s.e.  0.0348  0.1898   0.1885   0.1652     0.9957   0.0518
## 
## sigma^2 estimated as 0.0039:  log likelihood=231.34
## AIC=-448.68   AICc=-448.06   BIC=-425.99
## 
## Error measures:
##                       ME       RMSE        MAE        MPE      MAPE
## Training set 0.008417847 0.06244695 0.04843014 0.06013632 0.3491989
##                  MASE       ACF1
## Training set 0.467937 -0.1836512
## 
## Forecasts:
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Oct 2015       8034.260  7753.321  8324.429  7608.247  8481.864
## Nov 2015      10436.805  9964.162 10929.756  9721.922 11199.196
## Dec 2015      13221.339 12545.081 13930.600 12199.975 14319.910
## Jan 2016       7992.481  7526.168  8484.931  7289.467  8756.645
## Feb 2016       7867.463  7380.261  8383.752  7133.616  8669.361
## Mar 2016       8118.586  7596.375  8673.254  7332.481  8980.630
## Apr 2016       7270.411  6784.153  7788.171  6538.901  8075.629
## May 2016       7760.921  7233.554  8323.037  6967.781  8635.365
## Jun 2016       7928.489  7382.018  8511.518  7106.819  8835.696
## Jul 2016       7994.688  7437.343  8589.775  7156.835  8920.848
## Aug 2016       8260.526  7680.990  8879.572  7389.412  9224.087
## Sep 2016       8039.055  7469.743  8647.573  7183.450  8986.392
## Oct 2016       8013.533  7398.926  8674.240  7091.232  9043.707
## Nov 2016      10403.018  9580.107 11289.742  9168.884 11786.476
## Dec 2016      13179.060 12118.442 14323.466 11588.989 14965.200
## Jan 2017       7984.442  7294.721  8732.990  6951.974  9154.579
## Feb 2017       7879.616  7184.307  8635.616  6839.279  9061.989
## Mar 2017       8135.433  7408.679  8926.477  7048.351  9372.974
## Apr 2017       7282.629  6618.306  8007.074  6289.412  8416.553
## May 2017       7777.461  7065.916  8553.607  6713.718  8992.397
## Jun 2017       7928.084  7199.185  8723.516  6838.523  9173.360
## Jul 2017       8007.501  7268.221  8814.569  6902.532  9271.124
## Aug 2017       8275.426  7510.627  9110.432  7132.341  9582.825
## Sep 2017       8044.807  7297.282  8861.362  6927.685  9323.488

9. (5 pts) What are the differences between the two forecasts in questions 7 and 8? Are there any advantages to the forecasts obtained in question 8?

The primary difference between the two forecasts is as follows:

In question 7, a simple ARIMA model with seasonality is used to produce the forecasts. In contrast, the intervention model used to produce forecasts in question 8 attempts to capture the effects of the 2008 financial crisis, and there are certainly advantages to doing this, as evidenced by the model's superior performance. Specifically, the primary advantage to using intervention models is that they help us measure the impact of a given event on the forecasted time series. Additionally, they provide modeling flexibility in that the model's parameters are selected with the explicit purpose of fitting only the characteristics of the normal demand patterns and the relationship with the explanatory variables. Thus, the models are not distorted by known non-recurring events.