“The future belongs to those who prepare for it today” - M.K. Gandhi
This phrase still holds true today, especially for revenue leaders, sales executives, and CEOs who are keen to drive favorable sales outcomes for their teams. Accurately forecasting their sales numbers can be an invaluable tool in helping them achieve those outcomes. These outcomes are invariably linked to a company’s health and growth. Sales forecasting can indeed make or break a business.
Before we deep-dive into sales prediction methods and tools, let’s define what sales forecasting actually is.
Sales forecasting is the process of predicting how much revenue a company, team, or person will generate within a specific timeframe. This could be a week, month, quarter, or even a year. Having an accurate sales forecast is critical because enterprises cannot afford to miss their forecasts, as a missed forecast could prove disastrous.
For example, if an organization over forecasts and under-delivers, it could affect the company valuation in a negative way. whereas if the organization under-forecasts and over-delivers, which albeit a good thing, would not give orgs the necessary lead time to leave money on the table in terms of marketing plans, product launches, and hiring.
With this happening, the company really needs a reliable fortune teller, who can use the right method to accurately predict the company’s condition in the future. In this project, we will use Simple Moving Average, Exponential Smoothing, Autoregressive Integrated Moving Average (ARIMA), and Nonparametric Regression (with Fourier Series Estimator) to predict monthly sales, then compare their performance.
# load library
library(tsdl)
library(zoo)
library(dplyr)
library(forecast)
library(MLmetrics)
library(Metrics)
library(TSstudio)
library(TTR)
library(tseries)
library(ggplot2)In this project, we’re gonna use sales data with frequency = 12 from Time Series Data Library (TSDL). TSDL was created by Rob Hyndman, Professor of Statistics at Monash University, Australia.
To use the dataset from TSDL, at first, we must install its
development version from Github:
devtools::install_github("FinYang/tsdl").
# load data
sales <- subset(tsdl, 12, "Sales")
sales_ts <- sales[[1]]
sales_ts## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1960 87695 86890 96442 98133 113615 123924 128924 134775 117357 114626
## 1961 92188 88591 98683 99207 125485 124677 132543 140735 124008 121194
## 1962 101007 94228 104255 106922 130621 125251 140318 146174 122318 128770
## 1963 108497 100482 106140 118581 132371 132042 151938 150997 130931 137018
## 1964 109894 106061 112539 125745 136251 140892 158390 148314 144148 140138
## 1965 109895 109044 122499 124264 142296 150693 163331 165837 151731 142491
## 1966 116963 118049 137869 127392 154166 160227 165869 173522 155828 153771
## 1967 124046 121260 138870 129782 162312 167211 172897 189689 166496 160754
## 1968 139625 137361 138963 155301 172026 165004 185861 190270 163903 174270
## 1969 146182 137728 148932 156751 177998 174559 198079 189073 175702 180097
## 1970 154277 144998 159644 168646 166273 190176 205541 193657 182617 189614
## 1971 158167 156261 176353 175720 193939 201269 218960 209861 198688 190474
## 1972 166286 170699 181468 174241 210802 212262 218099 229001 203200 212557
## 1973 188992 175347 196265 203526 227443 233038 234119 255133 216478 232868
## 1974 194784 189756 193522 212870 248565 221532 252642 255007 206826 233231
## 1975 199024 191813 195997 208684 244113 243108 255918 244642 237579 237579
## Nov Dec
## 1960 107677 108087
## 1961 111634 111565
## 1962 117518 115492
## 1963 121271 123548
## 1964 124075 136485
## 1965 140229 140463
## 1966 143963 143898
## 1967 155582 145936
## 1968 160272 165614
## 1969 155202 174508
## 1970 174176 184416
## 1971 194502 190755
## 1972 197095 193693
## 1973 221616 209893
## 1974 212678 217173
## 1975 217775 227621
It turns out that the data from TSDL is already in the form of a time-series object from Jan 1960 - Dec 1975. But, just in case, we can also convert this time-series object into a dataframe.
# change ts object to dataframe
sales_df <- data.frame(date = as.Date(as.yearmon(time(sales_ts))), y = as.matrix(sales_ts))
sales_df$index <- 1:nrow(sales_df)
sales_dfAfter creating a dataframe, we must check whether the data have any missing values.
# check missing values
anyNA(sales_ts)## [1] FALSE
There is no missing values in the dataset, so we can proceed to the next step.
# decomposition
sales_dc <- decompose(sales_ts, type = "additive")
autoplot(sales_dc)Based on the decomposition, seasonal patterns have been captured (the trend is already smooth), and it can be seen that this data have an additive trend and additive seasonal.
Reference:
In order to evaluate the prediction performance, the dataset will be
divided into two sections: sales_train to train a model and
sales_test to evaluate the performance.
# cross validation
sales_train <- head(sales_ts, n = length(sales_ts) - 24)
sales_test <- tail(sales_ts, n = 24)SMA is a method that is usually not recommended in forecasting as it
only calculates the mean of n previous data with the same
weighting for each data (new data and past data have the same
weight).
# model fitting
model_sma <- SMA(sales_train, n = 12)
model_sma## Jan Feb Mar Apr May Jun Jul Aug
## 1960 NA NA NA NA NA NA NA NA
## 1961 110219.8 110361.6 110548.3 110637.8 111627.0 111689.8 111991.3 112488.0
## 1962 114944.1 115413.8 115878.2 116521.1 116949.1 116996.9 117644.8 118098.1
## 1963 120030.3 120551.5 120708.6 121680.2 121826.0 122391.9 123360.2 123762.2
## 1964 126267.8 126732.7 127265.9 127862.9 128186.2 128923.8 129461.4 129237.8
## 1965 131911.1 132159.7 132989.7 132866.2 133370.0 134186.8 134598.5 136058.8
## 1966 139153.4 139903.8 141184.7 141445.3 142434.5 143229.0 143440.5 144080.9
## 1967 146550.0 146817.6 146901.0 147100.2 147779.0 148361.0 148946.7 150293.9
## 1968 154201.2 155542.9 155550.7 157677.2 158486.8 158302.8 159383.2 159431.6
## 1969 162918.9 162949.5 163780.2 163901.1 164398.8 165195.0 166213.2 166113.4
## 1970 168575.5 169181.3 170074.0 171065.2 170088.2 171389.6 172011.4 172393.4
## 1971 176493.8 177432.3 178824.8 179414.2 181719.8 182644.2 183762.4 185112.8
## 1972 189422.3 190625.5 191051.8 190928.5 192333.8 193249.8 193178.1 194773.1
## 1973 199342.4 199729.8 200962.8 203403.2 204790.0 206521.3 207856.3 210034.0
## Sep Oct Nov Dec
## 1960 NA NA NA 109845.4
## 1961 113042.2 113589.6 113919.3 114209.2
## 1962 117957.2 118588.6 119078.9 119406.2
## 1963 124479.9 125167.2 125480.0 126151.3
## 1964 130339.2 130599.2 130832.9 131911.0
## 1965 136690.7 136886.8 138232.9 138564.4
## 1966 144422.3 145362.3 145673.5 145959.8
## 1967 151182.9 151764.8 152733.1 152902.9
## 1968 159215.5 160341.8 160732.7 162372.5
## 1969 167096.7 167582.2 167159.8 167900.9
## 1970 172969.7 173762.8 175343.9 176169.6
## 1971 186452.0 186523.7 188217.5 188745.8
## 1972 195149.1 196989.3 197205.4 197450.2
## 1973 211140.5 212833.1 214876.5 216226.5
# forecast
forecast_sma <- forecast(object = model_sma, h = 24)
forecast_sma## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1974 217934.2 217073.8 218794.7 216618.3 219250.2
## Feb 1974 219581.7 218464.1 220699.4 217872.4 221291.1
## Mar 1974 221229.2 219805.4 222653.1 219051.7 223406.8
## Apr 1974 222876.7 221106.5 224646.9 220169.4 225584.0
## May 1974 224524.2 222372.9 226675.6 221234.0 227814.4
## Jun 1974 226171.7 223608.1 228735.3 222251.0 230092.4
## Jul 1974 227819.2 224814.8 230823.6 223224.3 232414.1
## Aug 1974 229466.7 225994.8 232938.6 224156.8 234776.6
## Sep 1974 231114.2 227149.5 235078.9 225050.7 237177.7
## Oct 1974 232761.7 228280.1 237243.3 225907.6 239615.7
## Nov 1974 234409.2 229387.5 239430.9 226729.2 242089.2
## Dec 1974 236056.7 230472.5 241640.8 227516.5 244596.9
## Jan 1975 237704.2 231535.9 243872.4 228270.6 247137.7
## Feb 1975 239351.7 232578.1 246125.2 228992.5 249710.8
## Mar 1975 240999.2 233599.8 248398.5 229682.8 252315.5
## Apr 1975 242646.6 234601.3 250691.9 230342.4 254950.9
## May 1975 244294.1 235583.2 253005.1 230971.8 257616.4
## Jun 1975 245941.6 236545.6 255337.6 231571.7 260311.6
## Jul 1975 247589.1 237489.1 257689.2 232142.4 263035.8
## Aug 1975 249236.6 238413.8 260059.4 232684.5 265788.7
## Sep 1975 250884.1 239320.1 262448.1 233198.5 268569.8
## Oct 1975 252531.6 240208.2 264855.0 233684.6 271378.6
## Nov 1975 254179.1 241078.3 267279.9 234143.2 274215.0
## Dec 1975 255826.6 241930.8 269722.4 234574.7 277078.4
# visualization
autoplot(sales_ts) +
autolayer(model_sma) +
autolayer(forecast_sma$mean)# mape
mape_sma <- MAPE(y_pred = forecast_sma$mean,
y_true = sales_test)*100
mape_sma## [1] 9.996427
The prediction results of the model have a MAPE of 9.996%; or they may have errors of +9.996% or -9.996% from the actual point.
Exponential Smoothing is a method that considers new data to have a
greater weight than past data. Smoothing is done by calculating the
average of n previous values, where the previous values
have been multiplied by the smoothing coefficient (weighting):
Alpha, beta, and gamma values range from 0 - 1 with interpretation:
Because our data has both trend and seasonality, the suitable method for exponential smoothing is: Holt Winters Exponential Smoothing (Triple Exponential Smoothing)
# model fitting
model_holtw <- HoltWinters(sales_train, seasonal = "additive")
model_holtw## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = sales_train, seasonal = "additive")
##
## Smoothing parameters:
## alpha: 0.03998687
## beta : 0.2948723
## gamma: 0.09662213
##
## Coefficients:
## [,1]
## a 220821.762
## b 1495.224
## s1 -20174.366
## s2 -25018.888
## s3 -13113.638
## s4 -11439.715
## s5 10183.605
## s6 12476.648
## s7 23048.920
## s8 26336.362
## s9 6273.779
## s10 7350.147
## s11 -3549.882
## s12 -3633.538
# forecast
forecast_holtw <- forecast(object = model_holtw, h = 24)
forecast_holtw## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1974 202142.6 195117.0 209168.2 191397.9 212887.3
## Feb 1974 198793.3 191758.3 205828.3 188034.2 209552.4
## Mar 1974 212193.8 205144.6 219242.9 201413.0 222974.5
## Apr 1974 215362.9 208293.9 222432.0 204551.8 226174.1
## May 1974 238481.5 231386.0 245577.0 227629.9 249333.1
## Jun 1974 242269.8 235140.3 249399.2 231366.2 253173.3
## Jul 1974 254337.3 247165.5 261509.0 243369.0 265305.5
## Aug 1974 259119.9 251896.7 266343.2 248072.9 270166.9
## Sep 1974 240552.6 233267.9 247837.2 229411.7 251693.4
## Oct 1974 243124.2 235767.6 250480.7 231873.2 254375.1
## Nov 1974 233719.3 226279.6 241159.1 222341.2 245097.5
## Dec 1974 235130.9 227596.3 242665.6 223607.6 246654.2
## Jan 1975 220085.3 212308.2 227862.4 208191.3 231979.3
## Feb 1975 216736.0 208841.3 224630.8 204662.1 228810.0
## Mar 1975 230136.5 222111.4 238161.6 217863.1 242409.8
## Apr 1975 233305.6 225137.2 241474.1 220813.1 245798.2
## May 1975 256424.2 248099.3 264749.1 243692.3 269156.0
## Jun 1975 260212.4 251717.9 268707.0 247221.2 273203.7
## Jul 1975 272279.9 263602.5 280957.3 259009.0 285550.9
## Aug 1975 277062.6 268189.2 285936.0 263491.9 290633.4
## Sep 1975 258495.2 249412.7 267577.8 244604.7 272385.8
## Oct 1975 261066.8 251762.3 270371.4 246836.7 275297.0
## Nov 1975 251662.0 242122.7 261201.4 237072.8 266251.2
## Dec 1975 253073.6 243286.9 262860.3 238106.2 268041.0
# visualization
test_forecast(actual = sales_ts,
forecast.obj = forecast_holtw,
test = sales_test)# mape
mape_hw <- MAPE(y_pred = forecast_holtw$mean,
y_true = sales_test)*100
mape_hw## [1] 8.488444
The prediction results of the model have a MAPE of 8.49%; or they may have errors of +8.49% or -8.49% from the actual point.
Autoregressive Integrated Moving Average (ARIMA) is a combination of two methods, namely Autoregressive (AR) and Moving Average (MA).
The ARIMA model has 3 parameters:
p: order for AR modeld: the number of times the data is differentiated until
they’re stationaryq: order for MA modelIt can be seen from the previous autoplot, that our dataset has a trend and is not stationary. Therefore, it is necessary to do differencing.
# differencing
adf.test(diff(sales_train, lag = 12) %>% diff())##
## Augmented Dickey-Fuller Test
##
## data: diff(sales_train, lag = 12) %>% diff()
## Dickey-Fuller = -8.0316, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
By using the diff seasonal 1x and diff overall data 1x above, the dataset becomes more stationary.
# ACF & PACF plot
tsdisplay(diff(sales_train, lag = 12) %>% diff(),
lag.max = 36
)With the graph above and reference below, we can determine the coefficients of ARIMA. But to avoid subjectivity, let’s just use the auto.arima() function
# model fitting
model_auto_arima <- auto.arima(sales_train, seasonal = T)
summary(model_auto_arima)## Series: sales_train
## ARIMA(2,1,0)(0,1,1)[12]
##
## Coefficients:
## ar1 ar2 sma1
## -1.0407 -0.7023 -0.8117
## s.e. 0.0573 0.0572 0.0852
##
## sigma^2 = 23451105: log likelihood = -1541.02
## AIC=3090.03 AICc=3090.3 BIC=3102.21
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 500.6924 4606.264 3489.947 0.202112 2.173995 0.406734 -0.09215646
# forecast
forecast_model_auto_arima <- forecast(model_auto_arima, h = 24)# mape
mape_auto_arima <- MAPE(y_pred = forecast_model_auto_arima$mean,
y_true = sales_test)*100
mape_auto_arima## [1] 5.693121
ARIMA(2,1,0)(0,1,1)[12] generates predictions with MAPE = 5.69%, or the predictions may have errors of +5.69% or -5.69% from the actual point.
# visualization
test_forecast(actual = sales_ts,
forecast.obj = forecast_model_auto_arima,
test = sales_test)Assumption Check:
a] Normality of Errors
Using Shapiro-Wilk hypothesis test:
# normality check
shapiro.test(model_auto_arima$residuals) ##
## Shapiro-Wilk normality test
##
## data: model_auto_arima$residuals
## W = 0.99214, p-value = 0.4923
Insight: errors are normally distributed -> assumption is met
# residual plot
plot(model_auto_arima$residuals)b] No-Autocorrelation on Errors
# Ljung-Box test
Box.test(model_auto_arima$residuals)##
## Box-Pierce test
##
## data: model_auto_arima$residuals
## X-squared = 1.4268, df = 1, p-value = 0.2323
Insight: there is no auto correlation in errors -> assumption is met
ARIMA (also Seasonal ARIMA) is one of parametric approaches which consider some assumptions that should be satisfied. In some conditions, the assumptions are hard to achieved and it makes the predictions unreliable. Therefore, we also use nonparametric regression of Fourier series estimator, in which do not have many assumptions.
Nonparametric regression approach is used to determine data patterns that cannot be estimated by parametric curve models. One of the estimators used in nonparametric regression is Fourier series. Fourier series is a function that must be resolved in the form of fluctuating data, taken, and repeated. In this case, fluctuating patterns of sales are felt to be accordance with Fourier series applications since the series has periodical pattern.
# find optimal lambda with GCV
GCV<-function(data,lawal,lakhir)
{
prediktor<-data[,1]
dataurut<-data[order(prediktor),1:2]
Y<-dataurut[,2]
t<-dataurut[,1]
n<-length(Y)
nlambda<-lakhir-lawal+1
GCV<-matrix(0,nlambda,2)
GCV[,1]<-c(lawal:lakhir)
for(k in lawal:lakhir)
{
a<-c(0,k)
for(j in 1:k)
{
a[j]<-0
for(i in 1:n)
{
a[j]<-a[j]+((2/n)*Y[i]*cos((2*pi*j*(i-1))/n))
}
}
b<-c(0,k)
for(j in 1:k)
{
b[j]<-0
for(i in 1:n)
{
b[j]<-b[j]+((2/n)*Y[i]*sin((2*pi*j*(i-1))/n))
}
}
betanol<-sum(Y)/n
GCVatas<-0
for(i in 1:n)
{
smt<-0
for(j in 1:k)
{
smt<-smt+(a[j]*cos((2*pi*j*(i-1))/n)+b[j]*sin((2*pi*j*(i-1))/n))
}
mt<-betanol+smt
GCVatas<-GCVatas+((1/n)*(Y[i]-mt)^2)
}
GCV[k-lawal+1,2]<-GCVatas/(1-(2*k+1)/n)^2
}
cat("\n")
cat(" GCV matrix :\n")
cat("\n")
terkecil<-GCV[1,2]
for(k in 1:nlambda)
{
if(GCV[k,2]<terkecil)
{
terkecil<-GCV[k,2]
}
}
cat("=====================================================\n")
cat(" lambda GCV value \n")
cat("=====================================================\n")
cat("\n")
for(k in 1:nlambda)
{
cat(" ",format(GCV[k,1])," ",format(GCV[k,2]),"\n")
}
cat("\n")
cat("======================================================\n")
cat("\n")
cat("Lambda with smallest GCV : ")
cat("\n")
for(k in 1:nlambda)
{
if(GCV[k,2]==terkecil)
{
cat("lambda = ", format(GCV[k,1]),", GCV = ",format(GCV[k,2]),"\n")
}
}
cat("\n")
plot(GCV[,1],GCV[,2],type="l",xlim=c(min(GCV[,1]),max(GCV[,1])),ylim=
c(min(GCV[,2]),max(GCV[,2])),
ylab="GCV",xlab="Lambda")
}
GCV(sales_df %>% select(-index),1,20)##
## GCV matrix :
##
## =====================================================
## lambda GCV value
## =====================================================
##
## 1 877946805
## 2 646539452
## 3 563500556
## 4 529026276
## 5 511105376
## 6 506906840
## 7 497636284
## 8 489289591
## 9 488365754
## 10 488969813
## 11 492943933
## 12 497030868
## 13 502581203
## 14 503385198
## 15 509891522
## 16 144356893
## 17 146287630
## 18 148479411
## 19 149925341
## 20 151586170
##
## ======================================================
##
## Lambda with smallest GCV :
## lambda = 16 , GCV = 144356893
Based on the graph above, it can be concluded k = 16 being the optimal k which has minimum GCV value. Hence, the Fourier series estimator in nonparametric regression with k = 16 was chosen to predict the sales. For k = 16, the estimator is obtained as follows.
# change dataframe structure
sales_df <- sales_df %>% select(c(index,y))
sales_df_train <- head(sales_df, n = length(sales_df) - 24)
sales_df_test <- tail(sales_df, n = 24)# model fitting and forecast
estimasi<-function(insample,outsample,k)
{
prediktor<-insample[,1]
dataurut<-insample[order(prediktor),1:2]
Y<-dataurut[,2]
t<-dataurut[,1]
n<-length(t)
a<-c(0,k)
b<-c(0,k)
d<-matrix(0,k,2)
for(j in 1:k)
{
a[j]<-0
b[j]<-0
for(i in 1:n)
{
a[j]<-a[j]+((2/n)*Y[i]*cos((2*pi*j*(i-1))/n))
b[j]<-b[j]+((2/n)*Y[i]*sin((2*pi*j*(i-1))/n))
}
d[j,1]<-a[j]
d[j,2]<-b[j]
}
betanol<-sum(Y)/n
mt<-rep(0,n)
error<-rep(0,n)
for(i in 1:n)
{
smt<-0
for(j in 1:k)
{
smt<-smt+(a[j]*cos((2*pi*j*(i-1))/n)+b[j]*sin((2*pi*j*(i-1))/n))
}
mt[i]<-betanol+smt
}
# jpeg("fittedplot.jpg")
# plot(t,Y,type="p",ylab="Y",xlab="t")
# lines(t,mt,type="l",col="blue")
# dev.off()
errorin<-rep(0,n)
errorin<-Y-mt
cat("==========================================\n")
cat(" FITTED VALUES \n")
cat(" r m(tr) \n")
cat("==========================================\n")
for(i in 1:n)
{
cat(" ",t[i]," ",mt[i],"\n")
}
SSerror<-sum((Y-mt)^2)
SSTotal<-sum((Y-mean(mt))^2)
R2<-(1-(SSerror/SSTotal))
MAPE<-mape(Y,mt)
cat("MAPE: ",MAPE,"\n")
cat("R2 : ",R2,"\n")
cat("==========================================\n")
xout<-outsample[,1]
p<-length(xout)
n<-192
out<-rep(0,p)
for(i in 1:p)
{
pred<-0
for(j in 1:k)
{
pred<-pred+(a[j]*cos((2*pi*j*(xout[i]-1))/n)+b[j]*sin((2*pi*j*(xout[i]-1))/n))
}
out[i]<-betanol+pred
}
cat("==========================================\n")
cat(" FORECAST \n")
cat(" r m(tr) \n")
cat("==========================================\n")
for(i in 1:p)
{
cat(" ",xout[i]," ",out[i],"\n")
}
yout<-outsample[,2]
MAPEo<-mape(yout,out)
cat("MAPE :",MAPEo,"\n")
cat("==========================================\n")
# jpeg("forecastplot.jpg")
# plot(xout,yout,col="red",ylab="Y",xlab="t",ylim=c(100000,300000))
# lines(xout,out,type="l",col="blue")
# dev.off()
}
estimasi(sales_df_train,sales_df_test,16)## ==========================================
## FITTED VALUES
## r m(tr)
## ==========================================
## 1 124604.8
## 2 109282.6
## 3 101892.3
## 4 101913.7
## 5 107326.7
## 6 115153.8
## 7 122223.2
## 8 125963.6
## 9 125027.9
## 10 119586.6
## 11 111213.5
## 12 102397.8
## 13 95812.28
## 14 93539.13
## 15 96464.47
## 16 104010.5
## 17 114280.4
## 18 124578.4
## 19 132160.1
## 20 135004.5
## 21 132390.7
## 22 125113.1
## 23 115273.2
## 24 105702.6
## 25 99175.76
## 26 97634.42
## 27 101640.7
## 28 110214.2
## 29 121100.2
## 30 131395.6
## 31 138357.9
## 32 140169.8
## 33 136447.8
## 34 128349.6
## 35 118253.1
## 36 109096.2
## 37 103561.4
## 38 103333.7
## 39 108636
## 40 118169.6
## 41 129475.1
## 42 139610
## 43 145955.8
## 44 146931.5
## 45 142419.7
## 46 133790.3
## 47 123520.2
## 48 114515.9
## 49 109325.1
## 50 109452.8
## 51 114963.3
## 52 124474.1
## 53 135536.4
## 54 145298.4
## 55 151269.6
## 56 151982.7
## 57 147377.8
## 58 138811.1
## 59 128689.6
## 60 119830.7
## 61 114719
## 62 114854.7
## 63 120363.9
## 64 129970.2
## 65 141333
## 66 151661.1
## 67 158443
## 68 160101.3
## 69 156403.1
## 70 148519.3
## 71 138718
## 72 129770.3
## 73 124223.1
## 74 123727.7
## 75 128599.3
## 76 137723.8
## 77 148840.6
## 78 159130
## 79 165954.6
## 80 167561.8
## 81 163562.5
## 82 155055.2
## 83 144357.4
## 84 134410.2
## 85 128007.7
## 86 127053.2
## 87 132041.7
## 88 141908.9
## 89 154294.9
## 90 166156.8
## 91 174572.7
## 92 177522
## 93 174432.7
## 94 166344.2
## 95 155636.5
## 96 145394.1
## 97 138574.4
## 98 137204.9
## 99 141828.4
## 100 151350.6
## 101 163335.6
## 102 174669.3
## 103 182412.8
## 104 184609.3
## 105 180822.7
## 106 172257.7
## 107 161427.6
## 108 151460.4
## 109 145235.7
## 110 144591.5
## 111 149821.3
## 112 159599.6
## 113 171355.6
## 114 181989.4
## 115 188729.6
## 116 189894.6
## 117 185346.1
## 118 176513.3
## 119 165985.1
## 120 156788
## 121 151553
## 122 151803.8
## 123 157559.4
## 124 167355.2
## 125 178670
## 126 188632.3
## 127 194808.1
## 128 195852.6
## 129 191851.3
## 130 184265.3
## 131 175506.2
## 132 168264.2
## 133 164778.5
## 134 166245.1
## 135 172518
## 136 182173.6
## 137 192909.5
## 138 202163.1
## 139 207781.9
## 140 208569.7
## 141 204574
## 142 197048.6
## 143 188115.1
## 144 180215.5
## 145 175501
## 146 175308.4
## 147 179847.2
## 148 188166.6
## 149 198401.2
## 150 208230.9
## 151 215441.4
## 152 218457.1
## 153 216722.9
## 154 210852.7
## 155 202509.2
## 156 194041.2
## 157 187953.2
## 158 186320.4
## 159 190273.6
## 160 199669.1
## 161 213023.8
## 162 227740.7
## 163 240593.6
## 164 248378.3
## 165 248596.4
## 166 240024.9
## 167 223037.6
## 168 199594.3
## 169 172884.2
## 170 146689.9
## MAPE: 0.04227138
## R2 : 0.9422628
## ==========================================
## ==========================================
## FORECAST
## r m(tr)
## ==========================================
## 169 205939.7
## 170 213234.5
## 171 217597.5
## 172 218317
## 173 215381.6
## 174 209494.5
## 175 201957.2
## 176 194444.4
## 177 188705.3
## 178 186242.1
## 179 188019.5
## 180 194255.3
## 181 204332.3
## 182 216849.3
## 183 229810.9
## 184 240929.1
## 185 247991.3
## 186 249233.8
## 187 243658.1
## 188 231230.2
## 189 212924.1
## 190 190595.1
## 191 166697
## 192 143889.9
## MAPE : 0.1249785
## ==========================================
Nonparametric Regression with Fourier Series Estimator generates predictions with MAPE = 12.498%.
Visualization of fitted values to actual data is as follows.
And the visualization of the forecasts to the actual data (sales_test) is as follows.
It can be seen that the prediction results follow the actual data pattern well enough, but the values are still a little inaccurate.
# comparison
model <- c("SMA","Holt Winters Exp Smoothing","ARIMA(2,1,0)(0,1,1)[12]","Nonparametric Reg. with Fourier Estimator")
mape <- c(round(mape_sma,3),round(mape_hw,2),round(mape_auto_arima,2),round(mape_fourier,2))
as.data.frame(cbind(model,mape))Based on the reference and MAPE values, it can be concluded that Simple Moving Average, Holt Winters Exponential Smoothing, and ARIMA(2,1,0)(0,1,1)[12] produce high accurate predictions. Meanwhile, nonparametric regression with Fourier series estimator produce fairly good predictions.
To wrap it up, of all models, ARIMA(2,1,0)(0,1,1)[12] can produce more accurate predictions with MAPE value only at 5.69%. Moreover, this SARIMA model can fulfill all of its assumptions.