The Time Series Forecasting Method is a method in statistics that can
estimate the value of a variable in the future based on historical data
as the basis for its modelling. Unlike regression analysis, this method
requires variable Y in the form of a series of times in a sequence in a
numeric value with variable Y influenced by the value of Y itself in the
past (Yt−1). Forecasting is the process of estimating the value that
will occur in the future. This analysis can forecast rainfall in an area
in one year, monthly goat meat prices, aeroplane passengers per
semester, and other cases.
source
Time-series forecasting refers to the use of a model to predict future values based on previously observed values. Many researchers are familiar with time-series forecasting yet they struggle with specific types of time-series data. One such type of data is data with seasonality. There can be many types of seasonalities present (e.g., time of day, daily, weekly, monthly, yearly). source
Things to remember when doing time series analysis are the data
requirements that must be met. Data requirements that must be treated in
Time Series, among others: -Data must not have NA
-Data must be sorted by time
-Data must not have any gaps
We will use data from time series data library TSDL. Each series within tsdl is a class of ts. We chose monthly data relating to hotels, motels and guesthouses in Victoria on Jan 1980 – June 1995 (Australian Bureau of Statistics).
We will analyze it as a monthly seasonality and forecast it 24 months onwards. We will try to compare the forecasting results using multiple models.
#install.packages("devtools")
devtools::install_github("FinYang/tsdl")
library(tsdl) #for extracting data set
library(forecast) #for time series modeling and forecasting
library(dplyr) # for general data wrangling
library(lubridate) # for date manipulation
library(padr) # for completing time series data
library(zoo) # for impute missing value on time series data
library(MLmetrics) # calculate error
library(tseries) # adf.test
library(fpp) # for forecasting: principles and practice
library(TSstudio) # for beautify timeseries visualization
library(ggplot2) # for making intuitive plot
Because we use the TSDL data set library, we will check what other time series data libraries are available on it.
tsdl
## Time Series Data Library: 648 time series
##
## Frequency
## Subject 0.1 0.25 1 4 5 6 12 13 52 365 Total
## Agriculture 0 0 37 0 0 0 3 0 0 0 40
## Chemistry 0 0 8 0 0 0 0 0 0 0 8
## Computing 0 0 6 0 0 0 0 0 0 0 6
## Crime 0 0 1 0 0 0 2 1 0 0 4
## Demography 1 0 9 2 0 0 3 0 0 2 17
## Ecology 0 0 23 0 0 0 0 0 0 0 23
## Finance 0 0 23 5 0 0 20 0 2 1 51
## Health 0 0 8 0 0 0 6 0 1 0 15
## Hydrology 0 0 42 0 0 0 78 1 0 6 127
## Industry 0 0 9 0 0 0 2 0 1 0 12
## Labour market 0 0 3 4 0 0 17 0 0 0 24
## Macroeconomic 0 0 18 33 0 0 5 0 0 0 56
## Meteorology 0 0 18 0 0 0 17 0 0 12 47
## Microeconomic 0 0 27 1 0 0 7 0 1 0 36
## Miscellaneous 0 0 4 0 1 1 3 0 1 0 10
## Physics 0 0 12 0 0 0 4 0 0 0 16
## Production 0 0 4 14 0 0 28 1 1 0 48
## Sales 0 0 10 3 0 0 24 0 9 0 46
## Sport 0 1 1 0 0 0 0 0 0 0 2
## Transport and tourism 0 0 1 1 0 0 12 0 0 0 14
## Tree-rings 0 0 34 0 0 0 1 0 0 0 35
## Utilities 0 0 2 1 0 0 8 0 0 0 11
## Total 1 1 300 64 1 1 240 3 16 21 648
We will use the Monthly Transport and tourism data (frequency 12) and choose one of them.
xyz <- subset(tsdl,12,"Transport and tourism")
room <- xyz[[2]]
head(room)
## Room_nights Takings
## Jan 1980 276986 7673
## Feb 1980 260633 7472
## Mar 1980 291551 8339
## Apr 1980 275383 7805
## May 1980 275302 7889
## Jun 1980 231693 6619
## attr(,"source")
## [1] Australian Bureau of Statistics
## attr(,"description")
## [1] Monthly data relating to hotels, motels and guesthouses in Victoria. Jan 1980 – June 1995. First column: total number of room nights occupied; Second column: total takings from accommodation
## attr(,"subject")
## [1] Transport and tourism
Room night is a statistical metric for the hotel industry. It is calculated by multiplying one room times one night.
In this LBB, we will forecast room_nights. Then we will keep
Room_nights column and take out Takings
column.
room <- room[ , "Room_nights"]
Because our data set already ts object, and we will skip the time series padding process. But we must check, is there any missing value or not.
Check Missing Value
anyNA(room)
## [1] FALSE
It’s important to understand what the difference between a
multiplicative time series and an additive one before we go any
further.
There are three components to a time series:
– trend how things are overall changing
– seasonality how things change within a given period e.g. a year,
month, week, day
– error/residual/irregular activity not explained by the trend or the
seasonal value
room %>% autoplot()
As we see in the graph above, our data is an multiplicative time series because the seasonality increase over time even it’s not significant.
Decompose and Plot Time Series object to see is there any Seasonal and Trend component from our dataset.
room %>% decompose(type = "multiplicative") %>%
autoplot()
Based on the plot, the trend doesn’t show a pattern of ups and downs which indicates a multiseasonal time series object. Then we can conclude that our data has single seasonality (monthly).
We will check the data stationary using ADF (Augmented Dickey–Fuller) test. In statistics, an augmented Dickey–Fuller test (ADF) tests the null hypothesis that a unit root is present in a time series sample.
#ADF Test
room %>% adf.test()
##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -3.8287, Lag order = 5, p-value = 0.01902
## alternative hypothesis: stationary
Condition:
Our p-value from the ADF test is 0.01902. It’s lower than alpha (0.05) and is already stationary, so we don’t need to differencing the seasonality process before we build SARIMA model.
The Train Test Splitting concept for Time Series models is similar to
the previous one, but there are criteria that must be
met, namely:
- Train data uses initial data
- Test data uses final data
The test data will be liked to future data we want to forecast to be
compared for an evaluation. Let’s try to take the room data
for the last 24 months (July 1993-June 1995) as test data, while the
initial data will be used as train data.
#subsetting data test
room_test <- room %>% tail(24)
room_test
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1993 391000 380900 431400 465400
## 1994 446400 421500 504800 492071 421253 396682 428000 421900 465600 525793
## 1995 479499 473027 554410 489574 462157 420331
## Nov Dec
## 1993 471500 387500
## 1994 499855 435287
## 1995
#subsetting data train
room_train <- room %>% head(-24)
room_train
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1980 276986 260633 291551 275383 275302 231693 238829 274215 277808 299060
## 1981 294053 267510 309739 280733 287298 235672 256449 288997 290789 321898
## 1982 295469 258200 306102 281480 283101 237414 274834 299340 300383 340862
## 1983 322656 281563 323461 312579 310784 262785 273754 320036 310336 342206
## 1984 326988 300713 346414 317325 326208 270657 278158 324584 321801 343542
## 1985 330246 307344 375874 335309 339271 280264 293689 341161 345097 368712
## 1986 340981 319072 374214 344529 337271 281016 282224 320984 325426 366276
## 1987 359326 327610 383563 352405 329351 294486 333454 334339 358000 396057
## 1988 363909 344700 397561 376791 337085 299252 323136 329091 346991 461999
## 1989 415467 382110 432197 424254 386728 354508 375765 367986 402378 426516
## 1990 416834 381099 445673 412408 393997 348241 380134 373688 393588 434192
## 1991 411891 370497 437305 411270 385495 341273 384217 373223 415771 448634
## 1992 419104 398027 456059 430052 399757 362731 384896 385349 432289 468891
## 1993 439400 393900 468700 438800 430100 366300
## Nov Dec
## 1980 286629 232313
## 1981 291834 241380
## 1982 318794 265740
## 1983 320052 265582
## 1984 354040 278179
## 1985 369403 288384
## 1986 380296 300727
## 1987 386976 307155
## 1988 436533 360372
## 1989 433313 338461
## 1990 430731 344468
## 1991 454341 350297
## 1992 442702 370178
## 1993
Then visualize our data after cross validation process
autoplot(room_train) + autolayer(room_test)
ETS Holt-Winters or Exponential Smoothing Holt-Winters method — also known as triple exponential smoothing — is an incredibly popular and relatively simple method for time series forecasting. The Holt-Winters method is a very common time series forecasting procedure capable of including both trend and seasonality. Our data has trend and seasonality then it’s suitable with ETS Holt-Winters.
room_hw <- HoltWinters(room_train, seasonal="multiplicative")
room_hw
## Holt-Winters exponential smoothing with trend and multiplicative seasonal component.
##
## Call:
## HoltWinters(x = room_train, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha: 0.2918549
## beta : 0
## gamma: 0.4019896
##
## Coefficients:
## [,1]
## a 4.259205e+05
## b 1.019729e+03
## s1 9.320145e-01
## s2 9.336742e-01
## s3 1.017368e+00
## s4 1.117023e+00
## s5 1.094770e+00
## s6 8.797319e-01
## s7 1.047106e+00
## s8 9.572793e-01
## s9 1.117158e+00
## s10 1.044814e+00
## s11 9.894247e-01
## s12 8.645433e-01
Let’s visualize our model
room_train %>%
autoplot() +
autolayer(room_test, series = "Data Test") +
autolayer(room_hw$fitted[, 1])
### Forecasting
forecast_hw <- forecast(room_hw, h=24)
forecast_hw
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jul 1993 397914.5 388182.9 407646.1 383031.3 412797.7
## Aug 1993 399575.2 388663.7 410486.7 382887.5 416262.9
## Sep 1993 436430.2 424083.6 448776.8 417547.6 455312.7
## Oct 1993 480319.4 466406.8 494232.0 459041.9 501596.9
## Nov 1993 471866.7 457273.1 486460.3 449547.7 494185.8
## Dec 1993 380078.4 366412.8 393744.0 359178.7 400978.1
## Jan 1994 453458.2 437349.6 469566.8 428822.3 478094.2
## Feb 1994 415534.2 399637.3 431431.1 391222.0 439846.4
## Mar 1994 486073.3 467554.4 504592.3 457751.0 514395.6
## Apr 1994 455661.8 437411.6 473912.0 427750.5 483573.1
## May 1994 432514.7 414342.4 450686.9 404722.6 460306.8
## Jun 1994 378805.9 364722.0 392889.8 357266.5 400345.3
## Jul 1994 409319.3 389428.1 429210.5 378898.3 439740.3
## Aug 1994 411000.3 390485.9 431514.7 379626.3 442374.4
## Sep 1994 448879.5 426573.6 471185.3 414765.6 482993.4
## Oct 1994 493988.1 469593.0 518383.3 456679.0 531297.3
## Nov 1994 485263.1 460738.2 509788.1 447755.5 522770.8
## Dec 1994 390843.5 369336.1 412350.8 357950.8 423736.1
## Jan 1995 466271.4 441353.7 491189.1 428163.0 504379.7
## Feb 1995 427248.2 403450.4 451046.0 390852.6 463643.8
## Mar 1995 499743.7 472456.5 527030.9 458011.5 541475.9
## Apr 1995 468446.9 442118.2 494775.7 428180.6 508713.3
## May 1995 444622.0 418922.5 470321.5 405318.0 483926.0
## Jun 1995 389385.1 369467.5 409302.7 358923.8 419846.4
room_train %>%
autoplot() +
autolayer(room_test, series = "Data Test") +
autolayer(room_hw$fitted[, 1], series= "Model")+
autolayer(forecast_hw$mean, series="Forecast")
### Model Evaluation
acc_hw <- forecast::accuracy(forecast_hw$mean, room_test)
acc_hw
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 14274.81 24001.2 19626.96 2.9607 4.241977 0.4214471 0.4853429
The SARIMA (seasonal ARIMA) method is a development of the Box-Jenkins (ARIMA) method. The SARIMA model can overcome seasonal patterns of a period. This model requires several approaches, such as stationarity assumptions, differencing, and data transformation.
room_sarima <- auto.arima(room_train, d=1, D=1)
room_sarima
## Series: room_train
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.5833 -0.5447
## s.e. 0.0796 0.0752
##
## sigma^2 = 1.62e+08: log likelihood = -1621
## AIC=3248 AICc=3248.16 BIC=3257.01
forecast_sarima <- forecast(room_sarima, h=24)
forecast_sarima
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jul 1993 402338.7 386028.9 418648.5 377395.1 427282.4
## Aug 1993 399722.4 382053.5 417391.4 372700.1 426744.8
## Sep 1993 438152.7 419221.8 457083.5 409200.4 467104.9
## Oct 1993 476523.0 456409.3 496636.7 445761.8 507284.2
## Nov 1993 464943.4 443712.7 486174.1 432473.8 497413.0
## Dec 1993 380727.5 358435.7 403019.4 346635.1 414820.0
## Jan 1994 449099.8 425795.1 472404.5 413458.3 484741.3
## Feb 1994 412448.5 388173.1 436723.9 375322.5 449574.5
## Mar 1994 479284.3 454075.6 504493.0 440730.9 517837.6
## Apr 1994 451681.3 425572.6 477789.9 411751.5 491611.0
## May 1994 432710.7 405732.1 459689.3 391450.5 473970.9
## Jun 1994 381819.0 353997.7 409640.4 339269.9 424368.1
## Jul 1994 413822.3 382576.7 445067.8 366036.4 461608.2
## Aug 1994 411206.0 378432.6 443979.3 361083.4 461328.5
## Sep 1994 449636.2 415403.1 483869.3 397281.2 501991.2
## Oct 1994 488006.5 452373.4 523639.6 433510.4 542502.6
## Nov 1994 476426.9 439446.8 513407.1 419870.8 532983.1
## Dec 1994 392211.1 353931.3 430490.8 333667.2 450754.9
## Jan 1995 460583.3 421046.6 500120.0 400117.2 521049.5
## Feb 1995 423932.0 383177.1 464686.9 361602.8 486261.3
## Mar 1995 490767.8 448830.1 532705.6 426629.6 554906.1
## Apr 1995 463164.8 420076.7 506252.9 397267.2 529062.4
## May 1995 444194.2 399985.7 488402.8 376583.1 511805.4
## Jun 1995 393302.6 348001.3 438603.9 324020.2 462584.9
room_train %>%
autoplot() +
autolayer(room_test, series = "Data Test") +
autolayer(forecast_sarima$fitted, series = "Model") +
autolayer(forecast_sarima$mean, series = "Forecast")
acc_sarima <- forecast::accuracy(forecast_sarima$mean, room_test)
acc_sarima
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 16213.92 26169.59 21396.77 3.322237 4.582359 0.5129179 0.5248379
TBATS ((Trigonometric Seasonal, Box-Cox Transformation, ARMA
residuals, Trend and Seasonality). TBATS is a forecasting method to
model time series data. The main aim of this is to forecast time series
with complex seasonal patterns using exponential smoothing. TBATS will
take into account multiple options and fit a number of models. It will
take models into account:
-with Box-Cox transformation and without it Box-Cox
transformation.
-with considering Trend and without Trend.
-with Trend Damping and without Trend Damping.
-with ARIMA(p,q) and without ARMA(p,q) process used to model
residuals.
-non-seasonal model.
-various amounts of harmonics used to model seasonal effects.
The final model will be chosen using Akaike information criterion
(AIC).
source
we can use tbats() function from forecast
library.
room_tbats <- room_train %>% tbats(use.box.cox = T,
use.trend = T,
use.damped.trend = T)
room_tbats
## TBATS(0.001, {5,0}, 0.99, {<12,5>})
##
## Call: tbats(y = ., use.box.cox = T, use.trend = T, use.damped.trend = T)
##
## Parameters
## Lambda: 0.000618
## Alpha: 0.2120948
## Beta: -0.004433207
## Damping Parameter: 0.99004
## Gamma-1 Values: -9.151155e-05
## Gamma-2 Values: 3.860713e-05
## AR coefficients: -0.089713 0.250756 -0.064269 0.166968 -0.286575
##
## Seed States:
## [,1]
## [1,] 12.532488156
## [2,] 0.005150968
## [3,] 0.037851547
## [4,] -0.077567522
## [5,] -0.007010412
## [6,] 0.030235710
## [7,] 0.038233544
## [8,] -0.016030452
## [9,] 0.036742554
## [10,] 0.006382289
## [11,] 0.027205848
## [12,] -0.019113818
## [13,] 0.000000000
## [14,] 0.000000000
## [15,] 0.000000000
## [16,] 0.000000000
## [17,] 0.000000000
## attr(,"lambda")
## [1] 0.0006183753
##
## Sigma: 0.03667788
## AIC: 3932.559
forecast_tbats <- forecast(room_tbats, h= 24)
forecast_tbats
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jul 1993 390734.3 372931.7 409386.2 363838.1 419617.4
## Aug 1993 418806.4 399587.5 438949.1 389773.0 450001.0
## Sep 1993 429365.3 407817.0 452051.5 396851.0 464541.8
## Oct 1993 479030.5 454810.5 504539.3 442488.7 518587.9
## Nov 1993 461291.4 436018.4 488028.4 423204.7 502803.5
## Dec 1993 376617.2 355921.0 398515.9 345429.3 410619.0
## Jan 1994 445254.5 419716.9 472344.9 406796.1 487346.2
## Feb 1994 413328.9 389619.4 438480.3 377623.7 452407.9
## Mar 1994 472435.6 444374.4 502267.7 430200.4 518814.6
## Apr 1994 448014.7 421402.5 476306.3 407960.5 491998.7
## May 1994 425312.9 399028.9 453327.1 385778.6 468895.9
## Jun 1994 373967.9 350838.4 398621.3 339178.9 412322.8
## Jul 1994 389464.0 364612.2 416008.4 352104.7 430784.5
## Aug 1994 428321.6 400960.5 457548.6 387190.8 473818.7
## Sep 1994 430623.7 402312.6 460925.9 388086.5 477820.2
## Oct 1994 492145.5 459753.7 526818.0 443478.1 546150.1
## Nov 1994 463606.0 432343.9 497127.2 416656.8 515841.9
## Dec 1994 386265.4 360174.0 414245.7 347082.8 429868.4
## Jan 1995 447768.1 416869.3 480955.6 401384.7 499507.9
## Feb 1995 423103.9 393853.9 454524.8 379197.1 472091.2
## Mar 1995 474722.4 441264.0 510716.0 424517.0 530861.1
## Apr 1995 457870.1 425529.3 492667.1 409343.8 512144.9
## May 1995 427536.6 396806.3 460645.1 381442.7 479196.6
## Jun 1995 381751.2 354243.3 411393.6 340492.8 428005.5
room_train %>%
autoplot() +
autolayer(room_test, series = "Data Test") +
autolayer(room_tbats$fitted, series= "Model")+
autolayer(forecast_tbats$mean, series="Forecast")
acc_tbats <- forecast::accuracy(forecast_tbats,room_test)
acc_tbats
## ME RMSE MAE MPE MAPE MASE
## Training set -533.7765 12860.30 9486.427 -0.3259455 2.772652 0.614035
## Test set 22020.8676 33278.31 27189.069 4.5646563 5.845190 1.759887
## ACF1 Theil's U
## Training set 0.03657143 NA
## Test set 0.44742401 0.6851876
acc_tbats[[5]]
## [1] 9486.427
Then, we can see our model and forecasting result using an intuitive plot and compare the models visually.
a <- autoplot(room_train) +
autolayer(room_test, series = "Data Test") +
autolayer(room_hw$fitted[, 1], series= "Model")+
autolayer(forecast_hw$mean, series="Forecast")+
labs(title= "ETS Holt-Winters",
y = "Room Nights") +
theme_minimal()
b <- autoplot(room_train) +
autolayer(room_test, series = "Data Test") +
autolayer(room_sarima$fitted, series = "Model") +
autolayer(forecast_sarima$mean, series = "Forecast") +
labs( title = "SARIMA",
y = "Room Nights") +
theme_minimal()
c <- autoplot(room_train) +
autolayer(room_test, series = "Data Test") +
autolayer(room_tbats$fitted, series= "Model")+
autolayer(forecast_tbats$mean, series="Forecast") +
labs(title = "TBATS",
y = "Room Nights") +
theme_minimal()
gridExtra::grid.arrange(a,b,c)
The graph above shows that the ETS Holt-Winters error is smaller than
others because the forecast line is very close to the test data
line.
compare <- data.frame(Model = c("Holt-Winters","SARIMA", "TBATS"),
MAE = c(acc_hw[[3]], acc_sarima[[3]],acc_tbats[[6]]),
MAPE = c(acc_hw[[5]], acc_sarima[[5]],acc_tbats[[10]]),
RMSE = c(acc_hw[[2]], acc_sarima[[2]],acc_tbats[[4]]))
compare
## Model MAE MAPE RMSE
## 1 Holt-Winters 19626.96 4.241977 24001.20
## 2 SARIMA 21396.77 4.582359 26169.59
## 3 TBATS 27189.07 5.845190 33278.31
Based on the graph above, the lowest error model is the Holt-Winters model, but we will try to tune it.
This assumption is only for the SARIMA model.
No-autocorrelation residual
The autocorrelation can be checked using the Ljung-Box test. If there
are correlations between residuals, then there is information left in
the residuals which should be used in computing forecasts. Based on
Hypothesis Testing criterion : H0: no-autocorrelation <- p-value >
0.05 H1: autocorrelation <- p-value < 0.05
Box.test(room_sarima$residuals, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: room_sarima$residuals
## X-squared = 0.063182, df = 1, p-value = 0.8015
Based on the p-value above, room_sarima model does not
have autocorrelated residuals.
Normality of Residual
We will also check if the residuals for each model is normally
distributed using Shapiro-Wilk Test. If the residuals are not normally
distributed, it will lead to a biased parameter and less optimal
forecast. This is also indicate that we can still tweak our model to get
a better performance. Based on Hypothesis Testing criterion : H0:
residuals are normally distributed <- p-value > 0.05 H1: residuals
are not normally distributed <- p-value < 0.05
shapiro.test(room_sarima$residuals)
##
## Shapiro-Wilk normality test
##
## data: room_sarima$residuals
## W = 0.88747, p-value = 9.561e-10
Based on the p-value above, room_sarima model residual
is not normally distributed.
If the parameters alpha, beta, and
gamma are not defined, then HoltWinters will
try to determine the best values for these three parameters
automatically. However, these values may not be the most appropriate for
our data. So, we can try to specify the parameter values manually:
room_tuning <- HoltWinters(room_train, beta=0.4, seasonal = "multiplicative")
forecast_tuning <- forecast(room_tuning, h = 24)
forecast::accuracy(forecast_tuning$mean, room_test)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 5232.791 17100.47 13082.97 1.017326 2.870605 0.0608822 0.3523958
forecast::accuracy(forecast_hw$mean, room_test)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 14274.81 24001.2 19626.96 2.9607 4.241977 0.4214471 0.4853429
acc_tuning <- forecast::accuracy(forecast_tuning$mean, room_test)
forecast_tuning$mean
## Jan Feb Mar Apr May Jun Jul Aug
## 1993 393075.1 405805.5
## 1994 458431.7 421314.9 492921.5 461154.7 441186.2 385154.4 413847.0 427156.1
## 1995 482033.7 442913.3 518083.3 484595.2 463517.1 404567.3
## Sep Oct Nov Dec
## 1993 435799.6 483633.2 474972.9 381616.8
## 1994 458628.2 508857.4 499638.1 401348.7
## 1995
autoplot(forecast_tuning$mean)
The highest room_nights will be on March 1995.
compare <- data.frame(Model = c("Holt-Winters","SARIMA", "TBATS", "Holt-Winters Tuning"),
MAE = c(acc_hw[[3]], acc_sarima[[3]],acc_tbats[[6]], acc_tuning[[3]]),
MAPE = c(acc_hw[[5]], acc_sarima[[5]],acc_tbats[[10]], acc_tuning[[5]]),
RMSE = c(acc_hw[[2]], acc_sarima[[2]],acc_tbats[[4]], acc_tuning[[2]]))
compare
## Model MAE MAPE RMSE
## 1 Holt-Winters 19626.96 4.241977 24001.20
## 2 SARIMA 21396.77 4.582359 26169.59
## 3 TBATS 27189.07 5.845190 33278.31
## 4 Holt-Winters Tuning 13082.97 2.870605 17100.47
Based on the graph above, MAE, MAPE, and RMSE values decrease after
tuning, but they are still too big. There are some method that can be
try to fix it.
-Look for Model Misspecification: When a model fails to consider all the
pertinent information in the data, it is said to have been
misspecificated. The residuals may show patterns or trends as a result
of this. This can be resolved by verifying the model’s underlying
assumptions, such as that the variance remains constant over time and
that the residual distribution is normal.
-Think About Including More Predictors: It can signify that more
predictors must be added to the model if the residuals exhibit patterns
or trends. It can be essential to include seasonal variables in the
model, for instance, if the residuals exhibit a seasonal pattern.
-Transform the Data: We can attempt to attain normalcy by modifying the
data if the residuals are not normally distributed. Taking the data’s
square root or logarithm is a common transformation source.
https://medium.com/analytics-vidhya/a-thorough-introduction-to-holt-winters-forecasting-c21810b8c0e6
https://timeseriesreasoning.com/contents/holt-winters-exponential-smoothing/
https://medium.com/analytics-vidhya/time-series-forecasting-using-tbats-model-ce8c429442a9
https://bicorner.com/2015/05/02/what-is-time-seires-analysis-part-ii/
https://medium.com/@ShankiiZ_/residual-analysis-in-time-series-612a450b08f5