Exercise 1 Read in the data in
RSFHFSN_Furniture.xlsx and store it as
furniture_data. Convert the entire data into a time series
object and name it total_ts. The frequency
should be 12. Visualize total_ts. What is the clear pattern
that you see?
rm(list=ls())
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(readxl)
furniture_data <- read_excel("./RSFHFSN_Furniture.xlsx")
total_ts <- ts(furniture_data$RSFHFSN, start = c(1992,1), freq=12)
show(total_ts)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1992 3699 3759 3997 3981 4109 4188 4237 4273 4161 4324 4508 5086
## 1993 3954 3766 4209 4212 4370 4402 4513 4561 4455 4602 4976 5432
## 1994 3881 4000 4658 4538 4698 4784 4809 5172 5027 5143 5542 6096
## 1995 4501 4270 4922 4664 5032 5033 5028 5334 5249 5332 5893 6210
## 1996 4639 4684 5242 5139 5424 5305 5493 5704 5409 5775 6221 6523
## 1997 5148 4960 5536 5493 5825 5625 5838 6071 5903 6169 6573 7320
## 1998 5646 5411 5971 5851 6034 6115 6265 6301 6135 6487 6968 7850
## 1999 5752 5823 6593 6289 6493 6656 6764 7037 6938 7132 7788 8643
## 2000 6461 6805 7420 6944 7432 7287 7310 7761 7337 7461 8125 8411
## 2001 6866 6619 7372 6753 7351 7291 7259 7757 6888 7399 8320 9178
## 2002 6956 6900 7566 7167 7770 7293 7455 7974 7247 7637 8589 9261
## 2003 6984 6523 7400 7158 7843 7526 7838 8143 7786 8111 8761 9882
## 2004 7437 7400 8259 7765 7886 8209 8649 8639 8211 8479 9336 10625
## 2005 7696 7762 8550 8190 8415 8723 8718 9350 8974 8870 9965 11008
## 2006 8381 8248 9178 8346 8980 9212 9018 9695 9278 8953 9941 10782
## 2007 8588 8220 9183 8336 9030 8932 8952 9601 8643 8812 9731 10192
## 2008 7939 7887 8158 7815 8381 7904 8269 8238 7474 7506 7907 8630
## 2009 6611 6416 6713 6420 6777 6722 6985 6954 6764 6650 7161 8156
## 2010 6145 6285 7062 6535 6739 6703 7036 7124 6898 6600 7418 8249
## 2011 6073 6311 7319 6705 6886 6839 7043 7470 7242 6771 7715 8787
## 2012 6677 6976 7568 6679 7468 7175 7341 7840 7239 7064 8044 8987
## 2013 7117 6920 7695 7146 7658 7390 7805 8285 7686 7897 8668 9013
## 2014 7071 7100 7966 7654 8244 7749 8261 8591 8235 8301 8799 10053
## 2015 7840 7407 8495 8150 8785 8479 9012 9056 8886 9026 9398 10810
## 2016 8117 8323 9198 8545 8943 9056 9114 9503 9461 8978 9866 10756
## 2017 8232 8200 9605 8541 9369 9317 9195 9813 9467 9436 10468 11181
## 2018 8679 8508 9963 9152 9883 9692 9703 10169 9560 9746 10563 10912
## 2019 8641 8463 9825 9394 10105 9565 9986 10402 9914 10027 10789 10926
## 2020 9225 9193 7897 3780 7072 9581 10172 10692 10671 10645 10697 11493
## 2021 10215 9690 12133 11591 11580 11350 11660 11728 11616 11704 12443 12340
## 2022 10107 10422 12363 11967 11977 11654 11503 12294 11852 11770 12368 12227
## 2023 11121 10678 12098 10868 11541 11338 10970 11726 11092 10707 11958 11490
## 2024 9745 10140 10801 10388 11186 10597 11145 11679 11038 11194 12345 12455
## 2025 10414 10444 11549 11219 11766 11053 11840 12018
plot(total_ts)
Exercise 2 Use the window() function to
create a subset of the total_ts that spans from 1992/01 to
2019/12 and name the new time series object furniture_ts.
Visualize it as well. What is the pattern you see and why?
furniture_ts <- ts(furniture_data$RSFHFSN, start = c(1992, 1), end = c(2019, 12), freq=12)
show(furniture_ts)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1992 3699 3759 3997 3981 4109 4188 4237 4273 4161 4324 4508 5086
## 1993 3954 3766 4209 4212 4370 4402 4513 4561 4455 4602 4976 5432
## 1994 3881 4000 4658 4538 4698 4784 4809 5172 5027 5143 5542 6096
## 1995 4501 4270 4922 4664 5032 5033 5028 5334 5249 5332 5893 6210
## 1996 4639 4684 5242 5139 5424 5305 5493 5704 5409 5775 6221 6523
## 1997 5148 4960 5536 5493 5825 5625 5838 6071 5903 6169 6573 7320
## 1998 5646 5411 5971 5851 6034 6115 6265 6301 6135 6487 6968 7850
## 1999 5752 5823 6593 6289 6493 6656 6764 7037 6938 7132 7788 8643
## 2000 6461 6805 7420 6944 7432 7287 7310 7761 7337 7461 8125 8411
## 2001 6866 6619 7372 6753 7351 7291 7259 7757 6888 7399 8320 9178
## 2002 6956 6900 7566 7167 7770 7293 7455 7974 7247 7637 8589 9261
## 2003 6984 6523 7400 7158 7843 7526 7838 8143 7786 8111 8761 9882
## 2004 7437 7400 8259 7765 7886 8209 8649 8639 8211 8479 9336 10625
## 2005 7696 7762 8550 8190 8415 8723 8718 9350 8974 8870 9965 11008
## 2006 8381 8248 9178 8346 8980 9212 9018 9695 9278 8953 9941 10782
## 2007 8588 8220 9183 8336 9030 8932 8952 9601 8643 8812 9731 10192
## 2008 7939 7887 8158 7815 8381 7904 8269 8238 7474 7506 7907 8630
## 2009 6611 6416 6713 6420 6777 6722 6985 6954 6764 6650 7161 8156
## 2010 6145 6285 7062 6535 6739 6703 7036 7124 6898 6600 7418 8249
## 2011 6073 6311 7319 6705 6886 6839 7043 7470 7242 6771 7715 8787
## 2012 6677 6976 7568 6679 7468 7175 7341 7840 7239 7064 8044 8987
## 2013 7117 6920 7695 7146 7658 7390 7805 8285 7686 7897 8668 9013
## 2014 7071 7100 7966 7654 8244 7749 8261 8591 8235 8301 8799 10053
## 2015 7840 7407 8495 8150 8785 8479 9012 9056 8886 9026 9398 10810
## 2016 8117 8323 9198 8545 8943 9056 9114 9503 9461 8978 9866 10756
## 2017 8232 8200 9605 8541 9369 9317 9195 9813 9467 9436 10468 11181
## 2018 8679 8508 9963 9152 9883 9692 9703 10169 9560 9746 10563 10912
## 2019 8641 8463 9825 9394 10105 9565 9986 10402 9914 10027 10789 10926
plot(furniture_ts)
Exercise 3 Based on furniture_ts,
compute the annual total sales for each year from 1992 to 2019 using the
aggregate() function and store it to
annual_ts. Further plot annual_ts, which gives
a clear visualization for trend.
annual_ts <- aggregate(furniture_ts, FUN=sum, nfrequency = 1)
show(annual_ts)
## Time Series:
## Start = 1992
## End = 2019
## Frequency = 1
## [1] 50322 53452 58348 61468 65558 70461 75034 81908 88754 89053
## [11] 91815 93955 100895 106221 110012 108220 96108 82329 82794 85161
## [21] 89058 93280 98024 105344 109860 112824 116530 118037
plot(annual_ts, main = "Annual sales")
Exercise 4 Compute the monthly sales averaged across
1992-2019 using the tapply() and cycle()
function and store it to monthly_sales. Further plot
monthly_sales, which gives a clear visualization for
monthly seasonality.
monthly_sales <- tapply(furniture_ts, INDEX=cycle(furniture_ts), FUN=mean)
show(monthly_sales)
## 1 2 3 4 5 6 7 8
## 6633.250 6569.500 7343.679 6912.929 7354.643 7256.143 7424.857 7743.500
## 9 10 11 12
## 7373.821 7453.143 8144.036 8891.393
plot(monthly_sales, main = "Monthly sales", type = "l")
Note that for time series forecasting, data partition should respect
the time order as we would like to evaluate the performance of
statistical models on the most recent data. Partition
furniture_ts into a validation set of length
3 years. Allocate the rest of the data to training
data.
nTotal <- length(furniture_ts)
nValid <- 3 * 12
nTrain <- nTotal - nValid
train_ts <- window(furniture_ts, start=c(1992,1), end=c(1992, nTrain))
valid_ts <- window(furniture_ts, start = c(1992,nTrain+1))
show(nTotal)
## [1] 336
show(nValid)
## [1] 36
show(nTrain)
## [1] 300
show(start(train_ts))
## [1] 1992 1
show(end(train_ts))
## [1] 2016 12
show(start(valid_ts))
## [1] 2017 1
show(end(valid_ts))
## [1] 2019 12
Exercise 5 Estimate an additive regression model
with linear trend and dummy variable seasonality. Name it
model1. Plot the fitted value of model1 for
the training data, along with the training data.
model1 <- tslm(train_ts~trend+season)
summary(model1)
##
## Call:
## tslm(formula = train_ts ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1672.0 -819.2 -156.9 824.8 2213.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4514.509 213.433 21.152 < 2e-16 ***
## trend 13.053 0.641 20.364 < 2e-16 ***
## season2 -69.213 271.720 -0.255 0.799121
## season3 615.934 271.722 2.267 0.024148 *
## season4 212.682 271.726 0.783 0.434445
## season5 603.549 271.731 2.221 0.027123 *
## season6 511.496 271.738 1.882 0.060805 .
## season7 675.003 271.747 2.484 0.013564 *
## season8 958.831 271.756 3.528 0.000487 ***
## season9 589.458 271.768 2.169 0.030905 *
## season10 654.525 271.781 2.408 0.016658 *
## season11 1310.832 271.795 4.823 2.30e-06 ***
## season12 2086.860 271.811 7.678 2.56e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 960.7 on 287 degrees of freedom
## Multiple R-squared: 0.6489, Adjusted R-squared: 0.6343
## F-statistic: 44.21 on 12 and 287 DF, p-value: < 2.2e-16
plot(train_ts)
lines(fitted(model1), col = "red")
legend("topleft", legend = c("Training", "Model 1"), col = c("black", "red"), lty = 1, bty = "n")
Exercise 6 Which month has the highest sales based
on the estimated coefficients of model1? Does it match the
visualization plot in Exercise 4?
# December. Yes, it matches the visualization plot in exercise 4
We further create an exogenous variable to capture the
post-financial-crisis period using the function time() and
we store it as xreg.
post_crisis_train <- as.numeric(time(train_ts)>=2008)
xreg <- cbind(post_crisis=post_crisis_train)
Exercise 7 Estimate an additive regression model
with linear trend and dummy variable seasonality and the created
exogeneous variable in xreg. Name it model2.
What is the effect of the 2008 financial crisis?
model2 <- tslm(train_ts~trend+season+xreg)
summary(model2)
##
## Call:
## tslm(formula = train_ts ~ trend + season + xreg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -752.95 -240.63 -76.32 128.40 2216.92
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3548.3401 103.8332 34.173 < 2e-16 ***
## trend 27.4732 0.5382 51.049 < 2e-16 ***
## season2 -83.6332 126.5522 -0.661 0.509234
## season3 587.0936 126.5556 4.639 5.33e-06 ***
## season4 169.4204 126.5613 1.339 0.181750
## season5 545.8672 126.5693 4.313 2.22e-05 ***
## season6 439.3940 126.5796 3.471 0.000598 ***
## season7 588.4808 126.5922 4.649 5.11e-06 ***
## season8 857.8876 126.6071 6.776 7.04e-11 ***
## season9 474.0944 126.6243 3.744 0.000219 ***
## season10 524.7413 126.6437 4.143 4.51e-05 ***
## season11 1166.6281 126.6654 9.210 < 2e-16 ***
## season12 1928.2349 126.6894 15.220 < 2e-16 ***
## xreg -3124.4259 97.0198 -32.204 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 447.4 on 286 degrees of freedom
## Multiple R-squared: 0.9241, Adjusted R-squared: 0.9207
## F-statistic: 267.9 on 13 and 286 DF, p-value: < 2.2e-16
Exercise 8 Plot the fitted value of both
model1 and model2 for the training data, along
with the training data.
plot(train_ts)
lines(model1$fitted.values, col = 'red')
lines(model2$fitted.values, col = 'blue')
Exercise 9 Let’s now evaluate the forecasting
performance of the two models model1 and
model2 on the validation data. First, let’s generate the
prediction by each model and store them in model1_pred and
model2_pred respectively. Set the level of confidence for
the prediction interval as 90. We then use the
accuracy() function to obtain the error metrics.
Note that to generate the forecast result for model2, we
need to construct an accompanying data.matrix named
newxreg for the post event indicator variable.
model1_pred <- forecast(model1, h=nValid, level=90)
model1_pred
## Point Forecast Lo 90 Hi 90
## Jan 2017 8443.392 6818.315 10068.47
## Feb 2017 8387.232 6762.155 10012.31
## Mar 2017 9085.432 7460.355 10710.51
## Apr 2017 8695.232 7070.155 10320.31
## May 2017 9099.152 7474.075 10724.23
## Jun 2017 9020.152 7395.075 10645.23
## Jul 2017 9196.712 7571.635 10821.79
## Aug 2017 9493.592 7868.515 11118.67
## Sep 2017 9137.272 7512.195 10762.35
## Oct 2017 9215.392 7590.315 10840.47
## Nov 2017 9884.752 8259.675 11509.83
## Dec 2017 10673.832 9048.755 12298.91
## Jan 2018 8600.025 6973.610 10226.44
## Feb 2018 8543.865 6917.450 10170.28
## Mar 2018 9242.065 7615.650 10868.48
## Apr 2018 8851.865 7225.450 10478.28
## May 2018 9255.785 7629.370 10882.20
## Jun 2018 9176.785 7550.370 10803.20
## Jul 2018 9353.345 7726.930 10979.76
## Aug 2018 9650.225 8023.810 11276.64
## Sep 2018 9293.905 7667.490 10920.32
## Oct 2018 9372.025 7745.610 10998.44
## Nov 2018 10041.385 8414.970 11667.80
## Dec 2018 10830.465 9204.050 12456.88
## Jan 2019 8756.658 7128.808 10384.51
## Feb 2019 8700.498 7072.648 10328.35
## Mar 2019 9398.698 7770.848 11026.55
## Apr 2019 9008.498 7380.648 10636.35
## May 2019 9412.418 7784.568 11040.27
## Jun 2019 9333.418 7705.568 10961.27
## Jul 2019 9509.978 7882.128 11137.83
## Aug 2019 9806.858 8179.008 11434.71
## Sep 2019 9450.538 7822.688 11078.39
## Oct 2019 9528.658 7900.808 11156.51
## Nov 2019 10198.018 8570.168 11825.87
## Dec 2019 10987.098 9359.248 12614.95
post_crisis_valid <- as.numeric(time(valid_ts)>=2008)
newxreg <- cbind(post_crisis=post_crisis_valid)
model2_pred <- forecast(model2, h=nValid, level=90, newdata=newxreg)
## Warning in forecast.lm(model2, h = nValid, level = 90, newdata = newxreg):
## newdata column names not specified, defaulting to first variable required.
model2_pred
## Point Forecast Lo 90 Hi 90
## Jan 2017 8693.346 7936.363 9450.329
## Feb 2017 8637.186 7880.203 9394.169
## Mar 2017 9335.386 8578.403 10092.369
## Apr 2017 8945.186 8188.203 9702.169
## May 2017 9349.106 8592.123 10106.089
## Jun 2017 9270.106 8513.123 10027.089
## Jul 2017 9446.666 8689.683 10203.649
## Aug 2017 9743.546 8986.563 10500.529
## Sep 2017 9387.226 8630.243 10144.209
## Oct 2017 9465.346 8708.363 10222.329
## Nov 2017 10134.706 9377.723 10891.689
## Dec 2017 10923.786 10166.803 11680.769
## Jan 2018 9023.024 8265.216 9780.832
## Feb 2018 8966.864 8209.056 9724.672
## Mar 2018 9665.064 8907.256 10422.872
## Apr 2018 9274.864 8517.056 10032.672
## May 2018 9678.784 8920.976 10436.592
## Jun 2018 9599.784 8841.976 10357.592
## Jul 2018 9776.344 9018.536 10534.152
## Aug 2018 10073.224 9315.416 10831.032
## Sep 2018 9716.904 8959.096 10474.712
## Oct 2018 9795.024 9037.216 10552.832
## Nov 2018 10464.384 9706.576 11222.192
## Dec 2018 11253.464 10495.656 12011.272
## Jan 2019 9352.702 8593.921 10111.484
## Feb 2019 9296.542 8537.761 10055.324
## Mar 2019 9994.742 9235.961 10753.524
## Apr 2019 9604.542 8845.761 10363.324
## May 2019 10008.462 9249.681 10767.244
## Jun 2019 9929.462 9170.681 10688.244
## Jul 2019 10106.022 9347.241 10864.804
## Aug 2019 10402.902 9644.121 11161.684
## Sep 2019 10046.582 9287.801 10805.364
## Oct 2019 10124.702 9365.921 10883.484
## Nov 2019 10794.062 10035.281 11552.844
## Dec 2019 11583.142 10824.361 12341.924
accuracy(model1_pred$mean, valid_ts)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 298.7718 404.0957 354.5877 2.952371 3.607991 0.1354622 0.490252
accuracy(model2_pred$mean, valid_ts)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -124.2274 308.9465 233.1961 -1.43048 2.508064 0.1508157 0.3612082
Exercise 10 Which model provides better forecasting performance on the validation data? Does the additional event effect help with prediction?
# Model 2 provides better forecasting performance on the validation data and the additional event effect helps significantly with the prediction.
Exercise 11 Does model1 over or under
estimate the furniture sales in the validation data?
# Model 1 underestimated the furniture sales in the validation data.
Exercise 12 Visualize the prediction of
model2, together with the actual time series and the
predicted mean of model1.
plot(model2_pred)
lines(valid_ts)
lines(model1_pred$mean, col='red')
abline(v=2017)
legend("topleft", legend = c("model1", "model2", "actual"), col = c("red", "blue", "black"), lty = 1, bty = "n")
Exercise 13-15 are optional.
We further create a fourier series based seasonality variables with
K=4 and name it fs_train. Furthermore, we
combine post_crisis_train and fs_train
together into the data.matrix named xreg that
stores all the exogenous covariates.
# fs_train <- fourier(train_ts, K=4)
# xreg <- cbind(post_crisis=post_crisis_train, fs=fs_train)
Exercise 13 Estimate an additive regression model
with linear trend and fourier series based seasonality and the post
event indicator stored in xreg. Name it
model3.
Exercise 14 Generate the prediction by
model3 and store it in model3_pred. Note that
to generate the forecast result for model3, we need to
construct an accompanying data.matrix named
newxreg that contains 1. fourier series and 2. the post
event indicator variable for the validation period. Set the level of
confidence for the prediction interval as 90.
Use the accuracy() function to obtain the error metrics
for model3. Among model1,model2,model3, which
one performs the best in the validation data?
# fs_valid <- fourier(train_ts, K=4, h=nValid)
# newxreg <- cbind(post_crisis=post_crisis_valid, fs=fs_valid)
Exercise 15 What is the upper bound of the 90%
Prediction Interval by model3 for 2019-December?