Introduction

      While predicting the preformance of the tech industry as a whole is a task rarely preformed well, the forecasting for technological appliances and products is much easier to predict. With seasonal sales, such as Black Friday, and the Christmas holidays, the technological appliances industry can almost always expect a spike in demand in Q4 of any given year.
      Despite the fairly high levels of seasonality, technology manufacturers still have to be aware what the demand for their products is likely to be, so that they can accurately stock up. This need for demand forecasting will be the focus of this paper.
      Using data from the US Census Bureau, I track monthly sales of technology products from 2015 to 2019, where years 2015 - 2018 are used to train my models, and the year 2019 is my evaluation year. I will evaluate the performance of an ETS, ARIMA, and Dynamic Regression models, also creating an ensamble model to try and capitalize on the best preforming aspects of the models. To evaluate these models, I will use the AIC, BIC, and RMSE values, as well as the R-2 of the models.

Data %>%
  autoplot(TechSales)

Train <- Data[c(1:48),]
Test <- Data[c(49:60),]

head(Data)

## # A tsibble: 6 x 4 [1M]
##   Date          TechSales DATE       YearMonth
##   <chr>             <dbl> <date>         <mth>
## 1 January 2015       7985 2015-01-01  2015 Jan
## 2 February 2015      7701 2015-02-01  2015 Feb
## 3 March 2015         7872 2015-03-01  2015 Mar
## 4 April 2015         7102 2015-04-01  2015 Apr
## 5 May 2015           7616 2015-05-01  2015 May
## 6 June 2015          7915 2015-06-01  2015 Jun

Models

ETS

The first model I created is an ETS model, specifically a Holt-Winter’s Damped Method model. While the dampening aspect likely won’t be visible due to the relatively short forecast period, the remaining aspects of the model preformed quite well.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: The output of `fortify(<fable>)` has changed to better suit usage with the ggdist package.
## If you're using it to extract intervals, consider using `hilo()` to compute intervals, and `unpack_hilo()` to obtain values.

## Series: TechSales 
## Model: ETS(M,Ad,M) 
##   Smoothing parameters:
##     alpha = 0.4163731 
##     beta  = 0.0001011928 
##     gamma = 0.0001010908 
##     phi   = 0.9766682 
## 
##   Initial states:
##      l[0]      b[0]     s[0]    s[-1]     s[-2]     s[-3]     s[-4]     s[-5]
##  8600.328 -33.07871 1.477373 1.220624 0.9306403 0.9519484 0.9901403 0.9387185
##      s[-6]     s[-7]     s[-8]    s[-9]    s[-10]    s[-11]
##  0.9450128 0.9199612 0.8567557 0.954311 0.8967048 0.9178102
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 709.7198 733.3060 743.4014

## # A tibble: 1 × 10
##   .model .type       ME  RMSE   MAE     MPE  MAPE  MASE RMSSE   ACF1
##   <chr>  <chr>    <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 ETS    Training 0.330  173.  140. -0.0159  1.74 0.439 0.437 -0.130

## [1] 0.9827512

As the R2 is the only evaluation metric we can observe in vacuum, it is my sole point of focus before running the following two models. That being said, with an R2 of 0.98, the model is preforming very well.

ARIMA

As for the ARIMA model, I utilized the ARIMA function which generates the most optimal version of the model.

## Warning: The output of `fortify(<fable>)` has changed to better suit usage with the ggdist package.
## If you're using it to extract intervals, consider using `hilo()` to compute intervals, and `unpack_hilo()` to obtain values.

## Series: TechSales 
## Model: ARIMA(1,1,0)(0,1,0)[12] 
## 
## Coefficients:
##           ar1
##       -0.4581
## s.e.   0.1566
## 
## sigma^2 estimated as 68927:  log likelihood=-244.23
## AIC=492.46   AICc=492.83   BIC=495.57

## # A tibble: 1 × 10
##   .model .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 ARIMA  Training  11.9  221.  153. 0.235  1.95 0.479 0.558 0.0117

## [1] 0.9926409

The usage of the ARIMA function worked exceptionally well. The R2 increased relative to the ETS model to 0.99, and the AIC and BIC values are also much lower for the ARIMA model. The RMSE values seem to be slightly higher for the ARIMA model, which is quite strange, as the forecast from ARIMA seems to be superior in every other way; still, the ARIMA model seems to be an improvement upon our forecast.

DRM

As I do not have any additional regressor variables I can add, due to the nature of my datastet I created a Sales binary variable, which takes on the value of one if the month is October, November, or December, as those months have the highest quantity of discounts.

## Warning: The output of `fortify(<fable>)` has changed to better suit usage with the ggdist package.
## If you're using it to extract intervals, consider using `hilo()` to compute intervals, and `unpack_hilo()` to obtain values.

## Series: TechSales 
## Model: LM w/ ARIMA(0,0,2) errors 
## 
## Coefficients:
##          ma1      ma2  Train$Sales  intercept
##       0.2678  -0.5421    1810.0257  7458.7428
## s.e.  0.1399   0.1164     333.9044   116.1878
## 
## sigma^2 estimated as 654535:  log likelihood=-387.98
## AIC=785.96   AICc=787.39   BIC=795.31

## # A tibble: 1 × 10
##   .model .type       ME  RMSE   MAE    MPE  MAPE  MASE RMSSE    ACF1
##   <chr>  <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 lag0   Training  5.34  775.  561. -0.783  6.92  1.76  1.96 -0.0241

The model unfortunately does not preform extremely well. The AIC, BIC, and RMSE values are the highest of the three models, which makes perfect sense if we observe the preformance of the model on the graph.

Ensemble

For the creation of my ensemble model, I give equal weight to all three models and combine their predictions.

## [1] 0.9430493

As can be seen from the above graph and R2 value, the ensemble model is unfortunately dragged down by the DRM model, showing a lower R2 of only 0.94. If I were to improve upon this, I would likely gather some additional external regressors for the DRM model.

Homework 2

Samuel C. Singer

2024-04-15

Introduction

Models

ETS

ARIMA

DRM

Ensemble

Conclusion