While predicting the preformance of the tech industry as a
whole is a task rarely preformed well, the forecasting for technological
appliances and products is much easier to predict. With seasonal sales,
such as Black Friday, and the Christmas holidays, the technological
appliances industry can almost always expect a spike in demand in Q4 of
any given year.
Despite the fairly high levels of
seasonality, technology manufacturers still have to be aware what the
demand for their products is likely to be, so that they can accurately
stock up. This need for demand forecasting will be the focus of this
paper.
Using data from the US Census Bureau, I track monthly
sales of technology products from 2015 to 2019, where years 2015 - 2018
are used to train my models, and the year 2019 is my evaluation year. I
will evaluate the performance of an ETS, ARIMA, and Dynamic Regression
models, also creating an ensamble model to try and capitalize on the
best preforming aspects of the models. To evaluate these models, I will
use the AIC, BIC, and RMSE values, as well as the R-2 of the models.
Data %>%
autoplot(TechSales)
Train <- Data[c(1:48),]
Test <- Data[c(49:60),]
head(Data)
## # A tsibble: 6 x 4 [1M]
## Date TechSales DATE YearMonth
## <chr> <dbl> <date> <mth>
## 1 January 2015 7985 2015-01-01 2015 Jan
## 2 February 2015 7701 2015-02-01 2015 Feb
## 3 March 2015 7872 2015-03-01 2015 Mar
## 4 April 2015 7102 2015-04-01 2015 Apr
## 5 May 2015 7616 2015-05-01 2015 May
## 6 June 2015 7915 2015-06-01 2015 Jun
The first model I created is an ETS model, specifically a
Holt-Winter’s Damped Method model. While the dampening aspect likely
won’t be visible due to the relatively short forecast period, the
remaining aspects of the model preformed quite well.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The output of `fortify(<fable>)` has changed to better suit usage with the ggdist package.
## If you're using it to extract intervals, consider using `hilo()` to compute intervals, and `unpack_hilo()` to obtain values.
## Series: TechSales
## Model: ETS(M,Ad,M)
## Smoothing parameters:
## alpha = 0.4163731
## beta = 0.0001011928
## gamma = 0.0001010908
## phi = 0.9766682
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5]
## 8600.328 -33.07871 1.477373 1.220624 0.9306403 0.9519484 0.9901403 0.9387185
## s[-6] s[-7] s[-8] s[-9] s[-10] s[-11]
## 0.9450128 0.9199612 0.8567557 0.954311 0.8967048 0.9178102
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 709.7198 733.3060 743.4014
## # A tibble: 1 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS Training 0.330 173. 140. -0.0159 1.74 0.439 0.437 -0.130
## [1] 0.9827512
As the R2 is the only evaluation metric we can observe in
vacuum, it is my sole point of focus before running the following two
models. That being said, with an R2 of 0.98, the model is preforming
very well.
As for the ARIMA model, I utilized the ARIMA function
which generates the most optimal version of the model.
## Warning: The output of `fortify(<fable>)` has changed to better suit usage with the ggdist package.
## If you're using it to extract intervals, consider using `hilo()` to compute intervals, and `unpack_hilo()` to obtain values.
## Series: TechSales
## Model: ARIMA(1,1,0)(0,1,0)[12]
##
## Coefficients:
## ar1
## -0.4581
## s.e. 0.1566
##
## sigma^2 estimated as 68927: log likelihood=-244.23
## AIC=492.46 AICc=492.83 BIC=495.57
## # A tibble: 1 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA Training 11.9 221. 153. 0.235 1.95 0.479 0.558 0.0117
## [1] 0.9926409
The usage of the ARIMA function worked exceptionally well.
The R2 increased relative to the ETS model to 0.99, and the AIC and BIC
values are also much lower for the ARIMA model. The RMSE values seem to
be slightly higher for the ARIMA model, which is quite strange, as the
forecast from ARIMA seems to be superior in every other way; still, the
ARIMA model seems to be an improvement upon our forecast.
As I do not have any additional regressor variables I can
add, due to the nature of my datastet I created a Sales binary variable,
which takes on the value of one if the month is October, November, or
December, as those months have the highest quantity of discounts.
## Warning: The output of `fortify(<fable>)` has changed to better suit usage with the ggdist package.
## If you're using it to extract intervals, consider using `hilo()` to compute intervals, and `unpack_hilo()` to obtain values.
## Series: TechSales
## Model: LM w/ ARIMA(0,0,2) errors
##
## Coefficients:
## ma1 ma2 Train$Sales intercept
## 0.2678 -0.5421 1810.0257 7458.7428
## s.e. 0.1399 0.1164 333.9044 116.1878
##
## sigma^2 estimated as 654535: log likelihood=-387.98
## AIC=785.96 AICc=787.39 BIC=795.31
## # A tibble: 1 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 lag0 Training 5.34 775. 561. -0.783 6.92 1.76 1.96 -0.0241
The model unfortunately does not preform extremely well.
The AIC, BIC, and RMSE values are the highest of the three models, which
makes perfect sense if we observe the preformance of the model on the
graph.
For the creation of my ensemble model, I give equal weight
to all three models and combine their predictions.
## [1] 0.9430493
As can be seen from the above graph and R2 value, the
ensemble model is unfortunately dragged down by the DRM model, showing a
lower R2 of only 0.94. If I were to improve upon this, I would likely
gather some additional external regressors for the DRM model.
The race was close, however, it is clear that the best
preforming model is the ARIMA model, with all of the preformance metrics
used indicating its superiority. With an R2 of over 99%, I believe using
such a model to forecast demand for electronic appliances would be an
effective way for retailors to optimize their inventory.