US Electronic and Appliances Sales: https://research.stlouisfed.org/fred2
1. (5 pts) plot this demand time series and describe qualitatively in a few sentences the effect of the 2008 financial crisis on sales for this segment of the economy. What are the difficulties in directly assessing the impact of the financial crisis on demand?
library(fpp)
library(forecast)
set.seed(2)
data = read.csv("US Electronics and Appliance Sales.csv", header=TRUE, sep=",")
head(data)
## Date RSEASN
## 1 2000-01-01 6591
## 2 2000-02-01 6503
## 3 2000-03-01 6716
## 4 2000-04-01 6071
## 5 2000-05-01 6360
## 6 2000-06-01 6343
data = ts(data, start=2000, frequency=12)[,2]
ts.plot(data, col="green", main="US Electronic & Appliances Demand", ylab="Demand")
For this segment of the economy, the 2008 financial crisis had a negative impact on demand, as evidenced by the above plot. However, it is difficult to directly assess the impact of the financial crisis on demand without explicitly separating the effect of the financial crisis from the effect of other explanatory variables (i.e., seasonality, cyclicality, etc.)
To examine the benefit of detailed modeling of an “ intervention” we will first develop a simple ARIMA model with seasonality and then we compare it with models that model explicitly the effect of the 2008 financial crisis.
For all of the models in this assignment define the BoxCox transformation using the value of the scaling parameter (lambda) obtained by the BoxCox.lambda(…) function for the entire time series.
2. (5 pts) Use the auto.arima(…) function to fit a seasonal ARIMA model to the entire time series and report the summary fit statistics and the residual diagnostics. We will use these as benchmark to assess model improvement.
#find optimal value of lambda
L = BoxCox.lambda(data)
L
## [1] 0.09028898
baseline_model = auto.arima(data, lambda=L)
## Warning in auto.arima(data, lambda = L): Unable to fit final model using
## maximum likelihood. AIC value approximated
summary(baseline_model)
## Series: data
## ARIMA(1,1,2)(0,0,2)[12] with drift
## Box Cox transformation: lambda= 0.09028898
##
## Coefficients:
## ar1 ma1 ma2 sma1 sma2 drift
## 0.0942 -0.5640 -0.4095 1.184 0.6181 0.0034
## s.e. 0.1902 0.1778 0.1604 0.068 0.0498 0.0015
##
## sigma^2 estimated as 0.03102: log likelihood=59.38
## AIC=-103.71 AICc=-103.08 BIC=-81.05
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 56.73874 713.9929 464.792 -0.1931356 5.373248 1.268846
## ACF1
## Training set -0.0126521
tsdiag(baseline_model)
To obtain this model you need to first create a time series variable containing the demand from January 2000 through July 2008. You can accomplish this by using the window(…) function using the directive end=2008.55. In the description that follows we will refer to this pre-crisis demand series as y.PRE.
3. (5 pts.) Examine the ACF and PACF of this time series and difference it enough times until the time series becomes stationary and it passes the ADF test. What are the seasonal and non- seasonal components of the model you would suggest based on your examination of the data. If more than one model appears appropriate discuss also the alternative models.
#create pre-crisis demand ts using window function
y.PRE = window(data, end=2008.55)
#number of differences necessary
ndiffs(y.PRE)
## [1] 1
y.PRE = diff(y.PRE)
ndiffs(y.PRE)
## [1] 0
#p-value of adf test proves it's stationary
adf.test(y.PRE)
##
## Augmented Dickey-Fuller Test
##
## data: y.PRE
## Dickey-Fuller = -7.1407, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#examining seasonal and non-seasonal patterns in data to make model suggestion
tsdisplay(y.PRE)
Based on the ACF and PACF plots, there are a couple of p and q combinations that would prove to be an appropriate fit for the time series. For example, ARIMA(1,0,0)(2,0,1)12 would be an appropriate fit, as would ARIMA(0,0,0)(1,0,0)12.
4. (5 pts) Use the auto.arima(…) function to fit a model to the pre-crisis demand. Report the summary fit data and residual diagnostics.
#fit model to pre-crisis ts
pre_crisis = auto.arima(y.PRE, lambda=L)
summary(pre_crisis)
## Series: y.PRE
## ARIMA(1,0,0)(2,0,1)[12] with non-zero mean
## Box Cox transformation: lambda= 0.09028898
##
## Coefficients:
## ar1 sar1 sar2 sma1 intercept
## -0.2941 0.4099 0.3899 0.6125 -8.2114
## s.e. 0.0947 0.0896 0.0811 0.1310 5.5228
##
## sigma^2 estimated as 94.88: log likelihood=-376.92
## AIC=765.9 AICc=766.78 BIC=781.65
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -268.8281 1972.512 905.2503 -34.69358 161.0594 4.697176
## ACF1
## Training set -0.2434513
tsdiag(pre_crisis)
5. (10 pts) Quantify the impact of the financial crisis on the sales of this segment of the economy. To this effect use the model in Question 4 to prepare a 24-month ahead forecast of demand, and then compare the mean of the forecast with the actual sales from August 2008 through July 2010. Report the monthly impact and annual impact for each of the two years (August 2008-July 2009 and August 2009-July 2010).
#forecasting 24 months ahead with model from Q4
f_impact = forecast(pre_crisis, h=24)
#full forecast
f_impact
## Point Forecast Lo 80 Hi 80 Lo 95
## Aug 2008 212.557530 4.167628e-04 7.326947e+04 -9.826058e-12
## Sep 2008 -172.443820 -7.857031e+04 -6.230074e-05 -7.422421e+05
## Oct 2008 968.665000 1.431944e-02 2.223628e+05 1.824906e-14
## Nov 2008 720.270141 6.216590e-03 1.858942e+05 4.761474e-24
## Dec 2008 1069.706543 1.862270e-02 2.366362e+05 2.838126e-13
## Jan 2009 -1149.243173 -2.473077e+05 -2.256077e-02 -1.922913e+06
## Feb 2009 271.576759 2.906193e-04 1.040438e+05 -5.105660e-10
## Mar 2009 -843.956778 -2.046936e+05 -9.728425e-03 -1.643625e+06
## Apr 2009 -88.816967 -5.489103e+04 -3.324781e-06 -5.586702e+05
## May 2009 251.116903 2.210997e-04 9.939495e+04 -1.205675e-09
## Jun 2009 -9.261917 -1.639764e+04 -7.965017e-16 -2.118824e+05
## Jul 2009 9.893702 4.474430e-15 1.695860e+04 -2.122097e-03
## Aug 2009 47.751262 -8.975507e-08 2.454053e+05 -2.988278e+00
## Sep 2009 -21.625752 -1.947274e+05 1.352037e-05 -3.461241e+06
## Oct 2009 31.051061 -3.606423e-06 2.310793e+05 -8.656733e+00
## Nov 2009 165.933490 -8.398036e-12 5.049551e+05 -9.226930e-01
## Dec 2009 249.090960 -4.036547e-16 6.160414e+05 -4.668022e-01
## Jan 2010 -158.814801 -4.943865e+05 1.650890e-11 -7.181520e+06
## Feb 2010 13.484045 -7.032360e-05 1.605469e+05 -2.067234e+01
## Mar 2010 -22.767249 -2.015642e+05 1.236578e-05 -3.571887e+06
## Apr 2010 -11.679019 -1.509919e+05 1.070444e-04 -2.860282e+06
## May 2010 58.492682 -1.536410e-07 3.083939e+05 -4.078892e+00
## Jun 2010 -2.314434 -7.805220e+04 4.026664e-03 -1.730701e+06
## Jul 2010 6.012371 -5.856474e-04 1.143927e+05 -4.269107e+01
## Hi 95
## Aug 2008 6.450569e+05
## Sep 2008 2.356500e-08
## Oct 2008 1.759368e+06
## Nov 2008 1.517568e+06
## Dec 2008 1.853727e+06
## Jan 2009 -1.548542e-12
## Feb 2009 9.410140e+05
## Mar 2009 -5.277967e-17
## Apr 2009 1.879924e-06
## May 2009 9.064202e+05
## Jun 2009 2.441404e-03
## Jul 2009 2.176174e+05
## Aug 2009 3.932370e+06
## Sep 2009 1.231677e+01
## Oct 2009 3.967921e+06
## Nov 2009 7.301594e+06
## Dec 2009 8.539074e+06
## Jan 2010 9.897122e-01
## Feb 2010 2.998161e+06
## Mar 2010 1.218596e+01
## Apr 2010 2.369663e+01
## May 2010 4.965798e+06
## Jun 2010 8.956342e+01
## Jul 2010 2.313359e+06
#point forecast (a.k.a. "mean of forecast")
f_impact$mean
## Jan Feb Mar Apr May
## 2008
## 2009 -1149.243173 271.576759 -843.956778 -88.816967 251.116903
## 2010 -158.814801 13.484045 -22.767249 -11.679019 58.492682
## Jun Jul Aug Sep Oct
## 2008 212.557530 -172.443820 968.665000
## 2009 -9.261917 9.893702 47.751262 -21.625752 31.051061
## 2010 -2.314434 6.012371
## Nov Dec
## 2008 720.270141 1069.706543
## 2009 165.933490 249.090960
## 2010
#FIRST COMPARISON: start = Aug. 2008, end = July 2010
DF = f_impact$mean - diff(window(data,start=2008.45,end=2010.55),1)
#compare OVERALL - accuracy function only compares pairwise observations
accuracy(f_impact$mean, diff(data))
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -98.06994 1634.272 1008.664 101.279 106.2155 -0.2338725 0.8195738
#YEAR 1 IMPACT - AUGUST 2008 - JULY 2009
sum(DF[1:12])
## [1] 2446.064
#YEAR 2 IMPACT - AUGUST 2009 - JULY 2010
sum(DF[13:24])
## [1] -92.38538
The first comparison (commented above) displays the difference between the mean of the forecasted observations (point forecasts) and the actual sales at the month level - starting from August 2008 and ending in July 2010. Obviously, these numbers allow us to quantify the impact at the individual year level. For example, the total difference between the forecasted and actual values in year 1 is shown above: 2446.064. In the following year, the total difference is far smaller: -92.385.
Likewise, the accuracy function (commented as “compare OVERALL” above) computes the difference between the aforementioned sets of observations and provides summary statistics. The RMSE over two years in this case is ~1,634.27.
6. (10 pts) Obtain the step intervention model using the directive xreg=x.D and the complete time series y, by specifying the model you obtained in Question 4 in the Arima(…) function. Report the summary fit statistics and the residual diagnostics and compare them to the ones obtained for the model in Question 2. Is the intervention model better? Why? Why not?
#creating a step dummy variable
x.D <- 1*(seq(data)>=104)
#arima function doesn't take "lambda" as parameter
#transforming data using lambda so we can compare to Q2 results
trans = BoxCox(data, L)
intervention = Arima(trans, xreg=x.D, order = c(1L, 0L, 0L), seasonal= list(order = c(2L, 0L, 1L)))
summary(intervention)
## Series: trans
## ARIMA(1,0,0)(2,0,1)[12] with non-zero mean
##
## Coefficients:
## ar1 sar1 sar2 sma1 intercept x.D
## 0.8807 1.1872 -0.1902 -0.5334 13.8737 -0.0868
## s.e. 0.0348 0.1898 0.1885 0.1652 0.9957 0.0518
##
## sigma^2 estimated as 0.0039: log likelihood=231.34
## AIC=-448.68 AICc=-448.06 BIC=-425.99
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.008417847 0.06244695 0.04843014 0.06013632 0.3491989
## MASE ACF1
## Training set 0.467937 -0.1836512
tsdiag(intervention)
This intervention model is far better than the baseline model obtained in question 2, as evidenced by its much lower RMSE of .0624 and AICc value of -448.06 (compared to Q2 model's AICc of -103.08)
7. (5 pts) Use the model in Question 2 to obtain a 24-month ahead forecast (October 2015 through September 2017) for the sales of electronics and appliances stores in the US.
baseline_forecast = forecast(baseline_model, h=24)
summary(baseline_forecast)
##
## Forecast method: ARIMA(1,1,2)(0,0,2)[12] with drift
##
## Model Information:
## Series: data
## ARIMA(1,1,2)(0,0,2)[12] with drift
## Box Cox transformation: lambda= 0.09028898
##
## Coefficients:
## ar1 ma1 ma2 sma1 sma2 drift
## 0.0942 -0.5640 -0.4095 1.184 0.6181 0.0034
## s.e. 0.1902 0.1778 0.1604 0.068 0.0498 0.0015
##
## sigma^2 estimated as 0.03102: log likelihood=59.38
## AIC=-103.71 AICc=-103.08 BIC=-81.05
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 56.73874 713.9929 464.792 -0.1931356 5.373248 1.268846
## ACF1
## Training set -0.0126521
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Oct 2015 8228.862 7442.185 9090.479 7054.041 9579.082
## Nov 2015 10402.480 9305.800 11615.508 8768.854 12308.464
## Dec 2015 12157.128 10889.777 13557.276 10268.692 14356.420
## Jan 2016 8616.354 7690.867 9642.077 7238.449 10228.908
## Feb 2016 8539.775 7621.514 9557.610 7172.670 10139.979
## Mar 2016 8308.452 7412.678 9301.650 6974.925 9870.046
## Apr 2016 7759.758 6917.966 8693.732 6506.810 9228.502
## May 2016 8154.187 7273.078 9131.362 6842.574 9690.689
## Jun 2016 8497.272 7582.040 9511.935 7134.741 10092.569
## Jul 2016 8292.586 7397.252 9285.450 6959.767 9853.719
## Aug 2016 8403.497 7496.936 9408.720 7053.934 9984.025
## Sep 2016 8128.909 7249.196 9104.699 6819.429 9663.302
## Oct 2016 8310.018 7028.480 9800.740 6425.470 10684.741
## Nov 2016 9591.864 8029.053 11426.706 7299.094 12522.223
## Dec 2016 10285.228 8615.724 12244.057 7835.525 13413.041
## Jan 2017 9050.440 7564.680 10797.119 6871.439 11841.011
## Feb 2017 9044.937 7559.127 10791.874 6865.925 11836.006
## Mar 2017 8764.287 7319.962 10463.406 6646.417 11479.378
## Apr 2017 8733.868 7293.303 10428.823 6621.594 11442.421
## May 2017 8852.058 7392.782 10568.866 6712.296 11595.461
## Jun 2017 9044.408 7555.220 10796.035 6860.669 11843.285
## Jul 2017 8917.452 7446.591 10648.057 6760.756 11682.975
## Aug 2017 8965.773 7486.751 10706.020 6797.123 11746.722
## Sep 2017 8719.660 7277.076 10417.902 6604.712 11433.866
plot(baseline_forecast, main="Baseline Model Forecast 2015-2017")
8. (10 pts) Use the intervention model in Question 6 to obtain a 24-month ahead forecast of US sales of electronics and appliance stores. Notice that this will require you to create a dummy variable with ones x.FD <- c(rep(1,24) to extrapolate the dummy variable used in the estimation of the model parameters.
#Dummy variable with ones
xFD = c(rep(1,24))
int_forecast = forecast(intervention, xreg=xFD, h=24, lambda=L)
summary(int_forecast)
##
## Forecast method: ARIMA(1,0,0)(2,0,1)[12] with non-zero mean
##
## Model Information:
## Series: trans
## ARIMA(1,0,0)(2,0,1)[12] with non-zero mean
##
## Coefficients:
## ar1 sar1 sar2 sma1 intercept x.D
## 0.8807 1.1872 -0.1902 -0.5334 13.8737 -0.0868
## s.e. 0.0348 0.1898 0.1885 0.1652 0.9957 0.0518
##
## sigma^2 estimated as 0.0039: log likelihood=231.34
## AIC=-448.68 AICc=-448.06 BIC=-425.99
##
## Error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.008417847 0.06244695 0.04843014 0.06013632 0.3491989
## MASE ACF1
## Training set 0.467937 -0.1836512
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Oct 2015 8034.260 7753.321 8324.429 7608.247 8481.864
## Nov 2015 10436.805 9964.162 10929.756 9721.922 11199.196
## Dec 2015 13221.339 12545.081 13930.600 12199.975 14319.910
## Jan 2016 7992.481 7526.168 8484.931 7289.467 8756.645
## Feb 2016 7867.463 7380.261 8383.752 7133.616 8669.361
## Mar 2016 8118.586 7596.375 8673.254 7332.481 8980.630
## Apr 2016 7270.411 6784.153 7788.171 6538.901 8075.629
## May 2016 7760.921 7233.554 8323.037 6967.781 8635.365
## Jun 2016 7928.489 7382.018 8511.518 7106.819 8835.696
## Jul 2016 7994.688 7437.343 8589.775 7156.835 8920.848
## Aug 2016 8260.526 7680.990 8879.572 7389.412 9224.087
## Sep 2016 8039.055 7469.743 8647.573 7183.450 8986.392
## Oct 2016 8013.533 7398.926 8674.240 7091.232 9043.707
## Nov 2016 10403.018 9580.107 11289.742 9168.884 11786.476
## Dec 2016 13179.060 12118.442 14323.466 11588.989 14965.200
## Jan 2017 7984.442 7294.721 8732.990 6951.974 9154.579
## Feb 2017 7879.616 7184.307 8635.616 6839.279 9061.989
## Mar 2017 8135.433 7408.679 8926.477 7048.351 9372.974
## Apr 2017 7282.629 6618.306 8007.074 6289.412 8416.553
## May 2017 7777.461 7065.916 8553.607 6713.718 8992.397
## Jun 2017 7928.084 7199.185 8723.516 6838.523 9173.360
## Jul 2017 8007.501 7268.221 8814.569 6902.532 9271.124
## Aug 2017 8275.426 7510.627 9110.432 7132.341 9582.825
## Sep 2017 8044.807 7297.282 8861.362 6927.685 9323.488
9. (5 pts) What are the differences between the two forecasts in questions 7 and 8? Are there any advantages to the forecasts obtained in question 8?
The primary difference between the two forecasts is as follows:
In question 7, a simple ARIMA model with seasonality is used to produce the forecasts. In contrast, the intervention model used to produce forecasts in question 8 attempts to capture the effects of the 2008 financial crisis, and there are certainly advantages to doing this, as evidenced by the model's superior performance. Specifically, the primary advantage to using intervention models is that they help us measure the impact of a given event on the forecasted time series. Additionally, they provide modeling flexibility in that the model's parameters are selected with the explicit purpose of fitting only the characteristics of the normal demand patterns and the relationship with the explanatory variables. Thus, the models are not distorted by known non-recurring events.