1. Business Problem


“The future belongs to those who prepare for it today” - M.K. Gandhi

This phrase still holds true today, especially for revenue leaders, sales executives, and CEOs who are keen to drive favorable sales outcomes for their teams. Accurately forecasting their sales numbers can be an invaluable tool in helping them achieve those outcomes. These outcomes are invariably linked to a company’s health and growth. Sales forecasting can indeed make or break a business.

Before we deep-dive into sales prediction methods and tools, let’s define what sales forecasting actually is.

Sales forecasting is the process of predicting how much revenue a company, team, or person will generate within a specific timeframe. This could be a week, month, quarter, or even a year. Having an accurate sales forecast is critical because enterprises cannot afford to miss their forecasts, as a missed forecast could prove disastrous.

For example, if an organization over forecasts and under-delivers, it could affect the company valuation in a negative way. whereas if the organization under-forecasts and over-delivers, which albeit a good thing, would not give orgs the necessary lead time to leave money on the table in terms of marketing plans, product launches, and hiring.

With this happening, the company really needs a reliable fortune teller, who can use the right method to accurately predict the company’s condition in the future. In this project, we will use Simple Moving Average, Exponential Smoothing, Autoregressive Integrated Moving Average (ARIMA), and Nonparametric Regression (with Fourier Series Estimator) to predict monthly sales, then compare their performance.



2. Dependencies


# load library
library(tsdl)
library(zoo)
library(dplyr)
library(forecast)
library(MLmetrics)
library(Metrics)
library(TSstudio)
library(TTR)
library(tseries)
library(ggplot2)



3. Data Preparation / Data Preprocessing


In this project, we’re gonna use sales data with frequency = 12 from Time Series Data Library (TSDL). TSDL was created by Rob Hyndman, Professor of Statistics at Monash University, Australia.

To use the dataset from TSDL, at first, we must install its development version from Github: devtools::install_github("FinYang/tsdl").

# load data
sales <- subset(tsdl, 12, "Sales")
sales_ts <- sales[[1]]
sales_ts
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1960  87695  86890  96442  98133 113615 123924 128924 134775 117357 114626
## 1961  92188  88591  98683  99207 125485 124677 132543 140735 124008 121194
## 1962 101007  94228 104255 106922 130621 125251 140318 146174 122318 128770
## 1963 108497 100482 106140 118581 132371 132042 151938 150997 130931 137018
## 1964 109894 106061 112539 125745 136251 140892 158390 148314 144148 140138
## 1965 109895 109044 122499 124264 142296 150693 163331 165837 151731 142491
## 1966 116963 118049 137869 127392 154166 160227 165869 173522 155828 153771
## 1967 124046 121260 138870 129782 162312 167211 172897 189689 166496 160754
## 1968 139625 137361 138963 155301 172026 165004 185861 190270 163903 174270
## 1969 146182 137728 148932 156751 177998 174559 198079 189073 175702 180097
## 1970 154277 144998 159644 168646 166273 190176 205541 193657 182617 189614
## 1971 158167 156261 176353 175720 193939 201269 218960 209861 198688 190474
## 1972 166286 170699 181468 174241 210802 212262 218099 229001 203200 212557
## 1973 188992 175347 196265 203526 227443 233038 234119 255133 216478 232868
## 1974 194784 189756 193522 212870 248565 221532 252642 255007 206826 233231
## 1975 199024 191813 195997 208684 244113 243108 255918 244642 237579 237579
##         Nov    Dec
## 1960 107677 108087
## 1961 111634 111565
## 1962 117518 115492
## 1963 121271 123548
## 1964 124075 136485
## 1965 140229 140463
## 1966 143963 143898
## 1967 155582 145936
## 1968 160272 165614
## 1969 155202 174508
## 1970 174176 184416
## 1971 194502 190755
## 1972 197095 193693
## 1973 221616 209893
## 1974 212678 217173
## 1975 217775 227621

It turns out that the data from TSDL is already in the form of a time-series object from Jan 1960 - Dec 1975. But, just in case, we can also convert this time-series object into a dataframe.

# change ts object to dataframe
sales_df <- data.frame(date = as.Date(as.yearmon(time(sales_ts))), y = as.matrix(sales_ts))
sales_df$index <- 1:nrow(sales_df)
sales_df

After creating a dataframe, we must check whether the data have any missing values.

# check missing values
anyNA(sales_ts)
## [1] FALSE

There is no missing values in the dataset, so we can proceed to the next step.



4. Exploratory Data Analysis


# decomposition
sales_dc <- decompose(sales_ts, type = "additive")
autoplot(sales_dc)

Based on the decomposition, seasonal patterns have been captured (the trend is already smooth), and it can be seen that this data have an additive trend and additive seasonal.

Reference:



5. Cross Validation


In order to evaluate the prediction performance, the dataset will be divided into two sections: sales_train to train a model and sales_test to evaluate the performance.

# cross validation
sales_train <- head(sales_ts, n = length(sales_ts) - 24) 
sales_test <- tail(sales_ts, n = 24)



6. Model Fitting and Forecast


Simple Moving Average (SMA)

SMA is a method that is usually not recommended in forecasting as it only calculates the mean of n previous data with the same weighting for each data (new data and past data have the same weight).

# model fitting
model_sma <- SMA(sales_train, n = 12)
model_sma
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1960       NA       NA       NA       NA       NA       NA       NA       NA
## 1961 110219.8 110361.6 110548.3 110637.8 111627.0 111689.8 111991.3 112488.0
## 1962 114944.1 115413.8 115878.2 116521.1 116949.1 116996.9 117644.8 118098.1
## 1963 120030.3 120551.5 120708.6 121680.2 121826.0 122391.9 123360.2 123762.2
## 1964 126267.8 126732.7 127265.9 127862.9 128186.2 128923.8 129461.4 129237.8
## 1965 131911.1 132159.7 132989.7 132866.2 133370.0 134186.8 134598.5 136058.8
## 1966 139153.4 139903.8 141184.7 141445.3 142434.5 143229.0 143440.5 144080.9
## 1967 146550.0 146817.6 146901.0 147100.2 147779.0 148361.0 148946.7 150293.9
## 1968 154201.2 155542.9 155550.7 157677.2 158486.8 158302.8 159383.2 159431.6
## 1969 162918.9 162949.5 163780.2 163901.1 164398.8 165195.0 166213.2 166113.4
## 1970 168575.5 169181.3 170074.0 171065.2 170088.2 171389.6 172011.4 172393.4
## 1971 176493.8 177432.3 178824.8 179414.2 181719.8 182644.2 183762.4 185112.8
## 1972 189422.3 190625.5 191051.8 190928.5 192333.8 193249.8 193178.1 194773.1
## 1973 199342.4 199729.8 200962.8 203403.2 204790.0 206521.3 207856.3 210034.0
##           Sep      Oct      Nov      Dec
## 1960       NA       NA       NA 109845.4
## 1961 113042.2 113589.6 113919.3 114209.2
## 1962 117957.2 118588.6 119078.9 119406.2
## 1963 124479.9 125167.2 125480.0 126151.3
## 1964 130339.2 130599.2 130832.9 131911.0
## 1965 136690.7 136886.8 138232.9 138564.4
## 1966 144422.3 145362.3 145673.5 145959.8
## 1967 151182.9 151764.8 152733.1 152902.9
## 1968 159215.5 160341.8 160732.7 162372.5
## 1969 167096.7 167582.2 167159.8 167900.9
## 1970 172969.7 173762.8 175343.9 176169.6
## 1971 186452.0 186523.7 188217.5 188745.8
## 1972 195149.1 196989.3 197205.4 197450.2
## 1973 211140.5 212833.1 214876.5 216226.5
# forecast
forecast_sma <- forecast(object = model_sma, h = 24)
forecast_sma
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1974       217934.2 217073.8 218794.7 216618.3 219250.2
## Feb 1974       219581.7 218464.1 220699.4 217872.4 221291.1
## Mar 1974       221229.2 219805.4 222653.1 219051.7 223406.8
## Apr 1974       222876.7 221106.5 224646.9 220169.4 225584.0
## May 1974       224524.2 222372.9 226675.6 221234.0 227814.4
## Jun 1974       226171.7 223608.1 228735.3 222251.0 230092.4
## Jul 1974       227819.2 224814.8 230823.6 223224.3 232414.1
## Aug 1974       229466.7 225994.8 232938.6 224156.8 234776.6
## Sep 1974       231114.2 227149.5 235078.9 225050.7 237177.7
## Oct 1974       232761.7 228280.1 237243.3 225907.6 239615.7
## Nov 1974       234409.2 229387.5 239430.9 226729.2 242089.2
## Dec 1974       236056.7 230472.5 241640.8 227516.5 244596.9
## Jan 1975       237704.2 231535.9 243872.4 228270.6 247137.7
## Feb 1975       239351.7 232578.1 246125.2 228992.5 249710.8
## Mar 1975       240999.2 233599.8 248398.5 229682.8 252315.5
## Apr 1975       242646.6 234601.3 250691.9 230342.4 254950.9
## May 1975       244294.1 235583.2 253005.1 230971.8 257616.4
## Jun 1975       245941.6 236545.6 255337.6 231571.7 260311.6
## Jul 1975       247589.1 237489.1 257689.2 232142.4 263035.8
## Aug 1975       249236.6 238413.8 260059.4 232684.5 265788.7
## Sep 1975       250884.1 239320.1 262448.1 233198.5 268569.8
## Oct 1975       252531.6 240208.2 264855.0 233684.6 271378.6
## Nov 1975       254179.1 241078.3 267279.9 234143.2 274215.0
## Dec 1975       255826.6 241930.8 269722.4 234574.7 277078.4
# visualization
autoplot(sales_ts) +
  autolayer(model_sma) +
  autolayer(forecast_sma$mean)

# mape
mape_sma <- MAPE(y_pred = forecast_sma$mean, 
                 y_true = sales_test)*100
mape_sma
## [1] 9.996427

The prediction results of the model have a MAPE of 9.996%; or they may have errors of +9.996% or -9.996% from the actual point.



Holt Winters Exponential Smoothing

Exponential Smoothing is a method that considers new data to have a greater weight than past data. Smoothing is done by calculating the average of n previous values, where the previous values have been multiplied by the smoothing coefficient (weighting):

  • Error: alpha
  • Trends: beta
  • Seasonal: gamma

Alpha, beta, and gamma values range from 0 - 1 with interpretation:

  • approaches 1: new data will be more considered (having higher weight), while past data will be more ignored
  • close to 0: new data and past data have the same weight

Because our data has both trend and seasonality, the suitable method for exponential smoothing is: Holt Winters Exponential Smoothing (Triple Exponential Smoothing)

# model fitting
model_holtw <- HoltWinters(sales_train, seasonal = "additive")
model_holtw
## Holt-Winters exponential smoothing with trend and additive seasonal component.
## 
## Call:
## HoltWinters(x = sales_train, seasonal = "additive")
## 
## Smoothing parameters:
##  alpha: 0.03998687
##  beta : 0.2948723
##  gamma: 0.09662213
## 
## Coefficients:
##           [,1]
## a   220821.762
## b     1495.224
## s1  -20174.366
## s2  -25018.888
## s3  -13113.638
## s4  -11439.715
## s5   10183.605
## s6   12476.648
## s7   23048.920
## s8   26336.362
## s9    6273.779
## s10   7350.147
## s11  -3549.882
## s12  -3633.538
# forecast
forecast_holtw <- forecast(object = model_holtw, h = 24)
forecast_holtw
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1974       202142.6 195117.0 209168.2 191397.9 212887.3
## Feb 1974       198793.3 191758.3 205828.3 188034.2 209552.4
## Mar 1974       212193.8 205144.6 219242.9 201413.0 222974.5
## Apr 1974       215362.9 208293.9 222432.0 204551.8 226174.1
## May 1974       238481.5 231386.0 245577.0 227629.9 249333.1
## Jun 1974       242269.8 235140.3 249399.2 231366.2 253173.3
## Jul 1974       254337.3 247165.5 261509.0 243369.0 265305.5
## Aug 1974       259119.9 251896.7 266343.2 248072.9 270166.9
## Sep 1974       240552.6 233267.9 247837.2 229411.7 251693.4
## Oct 1974       243124.2 235767.6 250480.7 231873.2 254375.1
## Nov 1974       233719.3 226279.6 241159.1 222341.2 245097.5
## Dec 1974       235130.9 227596.3 242665.6 223607.6 246654.2
## Jan 1975       220085.3 212308.2 227862.4 208191.3 231979.3
## Feb 1975       216736.0 208841.3 224630.8 204662.1 228810.0
## Mar 1975       230136.5 222111.4 238161.6 217863.1 242409.8
## Apr 1975       233305.6 225137.2 241474.1 220813.1 245798.2
## May 1975       256424.2 248099.3 264749.1 243692.3 269156.0
## Jun 1975       260212.4 251717.9 268707.0 247221.2 273203.7
## Jul 1975       272279.9 263602.5 280957.3 259009.0 285550.9
## Aug 1975       277062.6 268189.2 285936.0 263491.9 290633.4
## Sep 1975       258495.2 249412.7 267577.8 244604.7 272385.8
## Oct 1975       261066.8 251762.3 270371.4 246836.7 275297.0
## Nov 1975       251662.0 242122.7 261201.4 237072.8 266251.2
## Dec 1975       253073.6 243286.9 262860.3 238106.2 268041.0
# visualization
test_forecast(actual = sales_ts, 
              forecast.obj = forecast_holtw, 
              test = sales_test)
# mape
mape_hw <- MAPE(y_pred = forecast_holtw$mean, 
                y_true = sales_test)*100
mape_hw
## [1] 8.488444

The prediction results of the model have a MAPE of 8.49%; or they may have errors of +8.49% or -8.49% from the actual point.



Autoregressive Integrated Moving Average (ARIMA)

Autoregressive Integrated Moving Average (ARIMA) is a combination of two methods, namely Autoregressive (AR) and Moving Average (MA).

The ARIMA model has 3 parameters:

  • p: order for AR model
  • d: the number of times the data is differentiated until they’re stationary
  • q: order for MA model

It can be seen from the previous autoplot, that our dataset has a trend and is not stationary. Therefore, it is necessary to do differencing.

# differencing
adf.test(diff(sales_train, lag = 12) %>% diff())
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(sales_train, lag = 12) %>% diff()
## Dickey-Fuller = -8.0316, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

By using the diff seasonal 1x and diff overall data 1x above, the dataset becomes more stationary.

# ACF & PACF plot
tsdisplay(diff(sales_train, lag = 12) %>% diff(),
          lag.max = 36
          )

With the graph above and reference below, we can determine the coefficients of ARIMA. But to avoid subjectivity, let’s just use the auto.arima() function

# model fitting
model_auto_arima <- auto.arima(sales_train, seasonal = T)
summary(model_auto_arima)
## Series: sales_train 
## ARIMA(2,1,0)(0,1,1)[12] 
## 
## Coefficients:
##           ar1      ar2     sma1
##       -1.0407  -0.7023  -0.8117
## s.e.   0.0573   0.0572   0.0852
## 
## sigma^2 = 23451105:  log likelihood = -1541.02
## AIC=3090.03   AICc=3090.3   BIC=3102.21
## 
## Training set error measures:
##                    ME     RMSE      MAE      MPE     MAPE     MASE        ACF1
## Training set 500.6924 4606.264 3489.947 0.202112 2.173995 0.406734 -0.09215646
# forecast
forecast_model_auto_arima <- forecast(model_auto_arima, h = 24)
# mape
mape_auto_arima <-  MAPE(y_pred = forecast_model_auto_arima$mean, 
                         y_true = sales_test)*100
mape_auto_arima
## [1] 5.693121

ARIMA(2,1,0)(0,1,1)[12] generates predictions with MAPE = 5.69%, or the predictions may have errors of +5.69% or -5.69% from the actual point.

# visualization
test_forecast(actual = sales_ts, 
              forecast.obj = forecast_model_auto_arima, 
              test = sales_test)

Assumption Check:

a] Normality of Errors

Using Shapiro-Wilk hypothesis test:

  • H0: errors are normally-distributed (desired)
  • H1: errors are not normally-distributed
# normality check
shapiro.test(model_auto_arima$residuals) 
## 
##  Shapiro-Wilk normality test
## 
## data:  model_auto_arima$residuals
## W = 0.99214, p-value = 0.4923

Insight: errors are normally distributed -> assumption is met

# residual plot
plot(model_auto_arima$residuals)

b] No-Autocorrelation on Errors

  • H0: there is no auto correlation in errors (desired)
  • H1: there is auto correlation in errors
# Ljung-Box test
Box.test(model_auto_arima$residuals)
## 
##  Box-Pierce test
## 
## data:  model_auto_arima$residuals
## X-squared = 1.4268, df = 1, p-value = 0.2323

Insight: there is no auto correlation in errors -> assumption is met



Nonparametric Regression with Fourier Series Estimator

ARIMA (also Seasonal ARIMA) is one of parametric approaches which consider some assumptions that should be satisfied. In some conditions, the assumptions are hard to achieved and it makes the predictions unreliable. Therefore, we also use nonparametric regression of Fourier series estimator, in which do not have many assumptions.

Nonparametric regression approach is used to determine data patterns that cannot be estimated by parametric curve models. One of the estimators used in nonparametric regression is Fourier series. Fourier series is a function that must be resolved in the form of fluctuating data, taken, and repeated. In this case, fluctuating patterns of sales are felt to be accordance with Fourier series applications since the series has periodical pattern.

# find optimal lambda with GCV
GCV<-function(data,lawal,lakhir)
{
    prediktor<-data[,1]
    dataurut<-data[order(prediktor),1:2]
    Y<-dataurut[,2]
    t<-dataurut[,1]
    n<-length(Y)
    nlambda<-lakhir-lawal+1
    GCV<-matrix(0,nlambda,2)
    GCV[,1]<-c(lawal:lakhir)
    for(k in lawal:lakhir)
    {
        a<-c(0,k)
        for(j in 1:k)
        {
            a[j]<-0
            for(i in 1:n)
            {
                a[j]<-a[j]+((2/n)*Y[i]*cos((2*pi*j*(i-1))/n))
            }
        }
        b<-c(0,k)
        for(j in 1:k)
        {
            b[j]<-0
            for(i in 1:n)
            {
                b[j]<-b[j]+((2/n)*Y[i]*sin((2*pi*j*(i-1))/n))
            }
        }
        betanol<-sum(Y)/n
        GCVatas<-0
        for(i in 1:n)
        {
            smt<-0
            for(j in 1:k)
            {
                smt<-smt+(a[j]*cos((2*pi*j*(i-1))/n)+b[j]*sin((2*pi*j*(i-1))/n))
            }
            mt<-betanol+smt
            GCVatas<-GCVatas+((1/n)*(Y[i]-mt)^2)
        }
        GCV[k-lawal+1,2]<-GCVatas/(1-(2*k+1)/n)^2
    }
    cat("\n")
    cat(" GCV matrix :\n")
    cat("\n")
    terkecil<-GCV[1,2]
    for(k in 1:nlambda)
    {
        if(GCV[k,2]<terkecil)
        {
            terkecil<-GCV[k,2]
        }
    }
    cat("=====================================================\n")
    cat("          lambda             GCV value \n")
    cat("=====================================================\n")
    cat("\n")
    for(k in 1:nlambda)
    {
        cat("          ",format(GCV[k,1]),"               ",format(GCV[k,2]),"\n")
    }
    cat("\n")
    cat("======================================================\n")
    cat("\n")
    cat("Lambda with smallest GCV : ")
    cat("\n")
    for(k in 1:nlambda)
    {
        if(GCV[k,2]==terkecil)
        {
            cat("lambda = ", format(GCV[k,1]),", GCV = ",format(GCV[k,2]),"\n")
        }
    }
    cat("\n")
    plot(GCV[,1],GCV[,2],type="l",xlim=c(min(GCV[,1]),max(GCV[,1])),ylim=
    c(min(GCV[,2]),max(GCV[,2])),
    ylab="GCV",xlab="Lambda")
}
GCV(sales_df %>% select(-index),1,20)
## 
##  GCV matrix :
## 
## =====================================================
##           lambda             GCV value   
## =====================================================
## 
##            1                 877946805 
##            2                 646539452 
##            3                 563500556 
##            4                 529026276 
##            5                 511105376 
##            6                 506906840 
##            7                 497636284 
##            8                 489289591 
##            9                 488365754 
##            10                 488969813 
##            11                 492943933 
##            12                 497030868 
##            13                 502581203 
##            14                 503385198 
##            15                 509891522 
##            16                 144356893 
##            17                 146287630 
##            18                 148479411 
##            19                 149925341 
##            20                 151586170 
## 
## ======================================================
## 
## Lambda with smallest GCV : 
## lambda =  16 , GCV =  144356893

Based on the graph above, it can be concluded k = 16 being the optimal k which has minimum GCV value. Hence, the Fourier series estimator in nonparametric regression with k = 16 was chosen to predict the sales. For k = 16, the estimator is obtained as follows.

# change dataframe structure
sales_df <- sales_df %>% select(c(index,y))
sales_df_train <- head(sales_df, n = length(sales_df) - 24) 
sales_df_test <- tail(sales_df, n = 24)
# model fitting and forecast
estimasi<-function(insample,outsample,k)
{
    prediktor<-insample[,1]
    dataurut<-insample[order(prediktor),1:2]
    Y<-dataurut[,2]
    t<-dataurut[,1]
    n<-length(t)
    a<-c(0,k)
    b<-c(0,k)
    d<-matrix(0,k,2)
    for(j in 1:k)
    {
        a[j]<-0
        b[j]<-0
        for(i in 1:n)
        {
            a[j]<-a[j]+((2/n)*Y[i]*cos((2*pi*j*(i-1))/n))
            b[j]<-b[j]+((2/n)*Y[i]*sin((2*pi*j*(i-1))/n))
        }
        d[j,1]<-a[j]
        d[j,2]<-b[j]
    }
    betanol<-sum(Y)/n
    mt<-rep(0,n)
    error<-rep(0,n)
    for(i in 1:n)
    {
        smt<-0
        for(j in 1:k)
        {
            smt<-smt+(a[j]*cos((2*pi*j*(i-1))/n)+b[j]*sin((2*pi*j*(i-1))/n))
        }
        mt[i]<-betanol+smt
    }
#   jpeg("fittedplot.jpg")
#   plot(t,Y,type="p",ylab="Y",xlab="t")
#   lines(t,mt,type="l",col="blue")
#   dev.off()
    errorin<-rep(0,n)
    errorin<-Y-mt
    cat("==========================================\n")
    cat("            FITTED VALUES            \n")
    cat("       r                 m(tr)       \n")
    cat("==========================================\n")
    for(i in 1:n)
    {
        cat("      ",t[i],"            ",mt[i],"\n")
    }
    SSerror<-sum((Y-mt)^2)
    SSTotal<-sum((Y-mean(mt))^2)
    R2<-(1-(SSerror/SSTotal))
    MAPE<-mape(Y,mt)
    cat("MAPE: ",MAPE,"\n")
    cat("R2  : ",R2,"\n")
    cat("==========================================\n")
    
    
    xout<-outsample[,1]
    p<-length(xout)
    n<-192
    out<-rep(0,p)
    for(i in 1:p)
    {
        pred<-0
        for(j in 1:k)
        {
            pred<-pred+(a[j]*cos((2*pi*j*(xout[i]-1))/n)+b[j]*sin((2*pi*j*(xout[i]-1))/n))
        }
        out[i]<-betanol+pred
    }
    cat("==========================================\n")
    cat("               FORECAST            \n")
    cat("         r                m(tr)       \n")
    cat("==========================================\n")
    for(i in 1:p)
    {
        cat("      ",xout[i],"            ",out[i],"\n")
    }
    
    
    yout<-outsample[,2]
    MAPEo<-mape(yout,out)
    cat("MAPE :",MAPEo,"\n")
    cat("==========================================\n") 
    
#   jpeg("forecastplot.jpg")
#   plot(xout,yout,col="red",ylab="Y",xlab="t",ylim=c(100000,300000))
#   lines(xout,out,type="l",col="blue")
#   dev.off()
}
estimasi(sales_df_train,sales_df_test,16)
## ==========================================
##             FITTED VALUES            
##        r                 m(tr)       
## ==========================================
##        1              124604.8 
##        2              109282.6 
##        3              101892.3 
##        4              101913.7 
##        5              107326.7 
##        6              115153.8 
##        7              122223.2 
##        8              125963.6 
##        9              125027.9 
##        10              119586.6 
##        11              111213.5 
##        12              102397.8 
##        13              95812.28 
##        14              93539.13 
##        15              96464.47 
##        16              104010.5 
##        17              114280.4 
##        18              124578.4 
##        19              132160.1 
##        20              135004.5 
##        21              132390.7 
##        22              125113.1 
##        23              115273.2 
##        24              105702.6 
##        25              99175.76 
##        26              97634.42 
##        27              101640.7 
##        28              110214.2 
##        29              121100.2 
##        30              131395.6 
##        31              138357.9 
##        32              140169.8 
##        33              136447.8 
##        34              128349.6 
##        35              118253.1 
##        36              109096.2 
##        37              103561.4 
##        38              103333.7 
##        39              108636 
##        40              118169.6 
##        41              129475.1 
##        42              139610 
##        43              145955.8 
##        44              146931.5 
##        45              142419.7 
##        46              133790.3 
##        47              123520.2 
##        48              114515.9 
##        49              109325.1 
##        50              109452.8 
##        51              114963.3 
##        52              124474.1 
##        53              135536.4 
##        54              145298.4 
##        55              151269.6 
##        56              151982.7 
##        57              147377.8 
##        58              138811.1 
##        59              128689.6 
##        60              119830.7 
##        61              114719 
##        62              114854.7 
##        63              120363.9 
##        64              129970.2 
##        65              141333 
##        66              151661.1 
##        67              158443 
##        68              160101.3 
##        69              156403.1 
##        70              148519.3 
##        71              138718 
##        72              129770.3 
##        73              124223.1 
##        74              123727.7 
##        75              128599.3 
##        76              137723.8 
##        77              148840.6 
##        78              159130 
##        79              165954.6 
##        80              167561.8 
##        81              163562.5 
##        82              155055.2 
##        83              144357.4 
##        84              134410.2 
##        85              128007.7 
##        86              127053.2 
##        87              132041.7 
##        88              141908.9 
##        89              154294.9 
##        90              166156.8 
##        91              174572.7 
##        92              177522 
##        93              174432.7 
##        94              166344.2 
##        95              155636.5 
##        96              145394.1 
##        97              138574.4 
##        98              137204.9 
##        99              141828.4 
##        100              151350.6 
##        101              163335.6 
##        102              174669.3 
##        103              182412.8 
##        104              184609.3 
##        105              180822.7 
##        106              172257.7 
##        107              161427.6 
##        108              151460.4 
##        109              145235.7 
##        110              144591.5 
##        111              149821.3 
##        112              159599.6 
##        113              171355.6 
##        114              181989.4 
##        115              188729.6 
##        116              189894.6 
##        117              185346.1 
##        118              176513.3 
##        119              165985.1 
##        120              156788 
##        121              151553 
##        122              151803.8 
##        123              157559.4 
##        124              167355.2 
##        125              178670 
##        126              188632.3 
##        127              194808.1 
##        128              195852.6 
##        129              191851.3 
##        130              184265.3 
##        131              175506.2 
##        132              168264.2 
##        133              164778.5 
##        134              166245.1 
##        135              172518 
##        136              182173.6 
##        137              192909.5 
##        138              202163.1 
##        139              207781.9 
##        140              208569.7 
##        141              204574 
##        142              197048.6 
##        143              188115.1 
##        144              180215.5 
##        145              175501 
##        146              175308.4 
##        147              179847.2 
##        148              188166.6 
##        149              198401.2 
##        150              208230.9 
##        151              215441.4 
##        152              218457.1 
##        153              216722.9 
##        154              210852.7 
##        155              202509.2 
##        156              194041.2 
##        157              187953.2 
##        158              186320.4 
##        159              190273.6 
##        160              199669.1 
##        161              213023.8 
##        162              227740.7 
##        163              240593.6 
##        164              248378.3 
##        165              248596.4 
##        166              240024.9 
##        167              223037.6 
##        168              199594.3 
##        169              172884.2 
##        170              146689.9 
## MAPE:  0.04227138 
## R2  :  0.9422628 
## ==========================================
## ==========================================
##                FORECAST            
##          r                m(tr)       
## ==========================================
##        169              205939.7 
##        170              213234.5 
##        171              217597.5 
##        172              218317 
##        173              215381.6 
##        174              209494.5 
##        175              201957.2 
##        176              194444.4 
##        177              188705.3 
##        178              186242.1 
##        179              188019.5 
##        180              194255.3 
##        181              204332.3 
##        182              216849.3 
##        183              229810.9 
##        184              240929.1 
##        185              247991.3 
##        186              249233.8 
##        187              243658.1 
##        188              231230.2 
##        189              212924.1 
##        190              190595.1 
##        191              166697 
##        192              143889.9 
## MAPE : 0.1249785 
## ==========================================

Nonparametric Regression with Fourier Series Estimator generates predictions with MAPE = 12.498%.

Visualization of fitted values to actual data is as follows.

And the visualization of the forecasts to the actual data (sales_test) is as follows.

It can be seen that the prediction results follow the actual data pattern well enough, but the values are still a little inaccurate.



7. Conclusion


# comparison
model <- c("SMA","Holt Winters Exp Smoothing","ARIMA(2,1,0)(0,1,1)[12]","Nonparametric Reg. with Fourier Estimator")
mape <- c(round(mape_sma,3),round(mape_hw,2),round(mape_auto_arima,2),round(mape_fourier,2))
as.data.frame(cbind(model,mape))

Based on the reference and MAPE values, it can be concluded that Simple Moving Average, Holt Winters Exponential Smoothing, and ARIMA(2,1,0)(0,1,1)[12] produce high accurate predictions. Meanwhile, nonparametric regression with Fourier series estimator produce fairly good predictions.

To wrap it up, of all models, ARIMA(2,1,0)(0,1,1)[12] can produce more accurate predictions with MAPE value only at 5.69%. Moreover, this SARIMA model can fulfill all of its assumptions.