Introduction

The Time Series Forecasting Method is a method in statistics that can estimate the value of a variable in the future based on historical data as the basis for its modelling. Unlike regression analysis, this method requires variable Y in the form of a series of times in a sequence in a numeric value with variable Y influenced by the value of Y itself in the past (Yt−1). Forecasting is the process of estimating the value that will occur in the future. This analysis can forecast rainfall in an area in one year, monthly goat meat prices, aeroplane passengers per semester, and other cases.
source

Time-series forecasting refers to the use of a model to predict future values based on previously observed values. Many researchers are familiar with time-series forecasting yet they struggle with specific types of time-series data. One such type of data is data with seasonality. There can be many types of seasonalities present (e.g., time of day, daily, weekly, monthly, yearly). source

Things to remember when doing time series analysis are the data requirements that must be met. Data requirements that must be treated in Time Series, among others: -Data must not have NA
-Data must be sorted by time
-Data must not have any gaps

We will use data from time series data library TSDL. Each series within tsdl is a class of ts. We chose monthly data relating to hotels, motels and guesthouses in Victoria on Jan 1980 – June 1995 (Australian Bureau of Statistics).

We will analyze it as a monthly seasonality and forecast it 24 months onwards. We will try to compare the forecasting results using multiple models.

Data Wrangling

Importing Libraries

#install.packages("devtools")
devtools::install_github("FinYang/tsdl")

library(tsdl) #for extracting data set
library(forecast) #for time series modeling and forecasting
library(dplyr) # for general data wrangling
library(lubridate) # for date manipulation
library(padr) # for completing time series data
library(zoo) # for impute missing value on time series data
library(MLmetrics) # calculate error
library(tseries) # adf.test
library(fpp) # for forecasting: principles and practice
library(TSstudio) # for beautify timeseries visualization
library(ggplot2) # for making intuitive plot

Importing Dataset

Because we use the TSDL data set library, we will check what other time series data libraries are available on it.

tsdl

## Time Series Data Library: 648 time series  
## 
##                        Frequency
## Subject                 0.1 0.25   1   4   5   6  12  13  52 365 Total
##   Agriculture             0    0  37   0   0   0   3   0   0   0    40
##   Chemistry               0    0   8   0   0   0   0   0   0   0     8
##   Computing               0    0   6   0   0   0   0   0   0   0     6
##   Crime                   0    0   1   0   0   0   2   1   0   0     4
##   Demography              1    0   9   2   0   0   3   0   0   2    17
##   Ecology                 0    0  23   0   0   0   0   0   0   0    23
##   Finance                 0    0  23   5   0   0  20   0   2   1    51
##   Health                  0    0   8   0   0   0   6   0   1   0    15
##   Hydrology               0    0  42   0   0   0  78   1   0   6   127
##   Industry                0    0   9   0   0   0   2   0   1   0    12
##   Labour market           0    0   3   4   0   0  17   0   0   0    24
##   Macroeconomic           0    0  18  33   0   0   5   0   0   0    56
##   Meteorology             0    0  18   0   0   0  17   0   0  12    47
##   Microeconomic           0    0  27   1   0   0   7   0   1   0    36
##   Miscellaneous           0    0   4   0   1   1   3   0   1   0    10
##   Physics                 0    0  12   0   0   0   4   0   0   0    16
##   Production              0    0   4  14   0   0  28   1   1   0    48
##   Sales                   0    0  10   3   0   0  24   0   9   0    46
##   Sport                   0    1   1   0   0   0   0   0   0   0     2
##   Transport and tourism   0    0   1   1   0   0  12   0   0   0    14
##   Tree-rings              0    0  34   0   0   0   1   0   0   0    35
##   Utilities               0    0   2   1   0   0   8   0   0   0    11
##   Total                   1    1 300  64   1   1 240   3  16  21   648

We will use the Monthly Transport and tourism data (frequency 12) and choose one of them.

xyz <- subset(tsdl,12,"Transport and tourism")
room <- xyz[[2]]
head(room)

##          Room_nights Takings
## Jan 1980      276986    7673
## Feb 1980      260633    7472
## Mar 1980      291551    8339
## Apr 1980      275383    7805
## May 1980      275302    7889
## Jun 1980      231693    6619
## attr(,"source")
## [1] Australian Bureau of Statistics
## attr(,"description")
## [1] Monthly data relating to hotels, motels and guesthouses in Victoria. Jan 1980 – June 1995. First column: total number of room nights occupied; Second column: total takings from accommodation
## attr(,"subject")
## [1] Transport and tourism

Room night is a statistical metric for the hotel industry. It is calculated by multiplying one room times one night.

Data Aggregation

In this LBB, we will forecast room_nights. Then we will keep Room_nights column and take out Takings column.

room <- room[ , "Room_nights"]

Because our data set already ts object, and we will skip the time series padding process. But we must check, is there any missing value or not.

Check Missing Value

anyNA(room)

## [1] FALSE

Exploratory Data Analysis

Is our data additive or multiplicative time series?

It’s important to understand what the difference between a multiplicative time series and an additive one before we go any further.
There are three components to a time series:
– trend how things are overall changing
– seasonality how things change within a given period e.g. a year, month, week, day
– error/residual/irregular activity not explained by the trend or the seasonal value

room %>%  autoplot()

As we see in the graph above, our data is an multiplicative time series because the seasonality increase over time even it’s not significant.

Does our data have multiseasonality?

Decompose and Plot Time Series object to see is there any Seasonal and Trend component from our dataset.

room %>% decompose(type = "multiplicative") %>% 
  autoplot()

Based on the plot, the trend doesn’t show a pattern of ups and downs which indicates a multiseasonal time series object. Then we can conclude that our data has single seasonality (monthly).

Is our data already stationary?

We will check the data stationary using ADF (Augmented Dickey–Fuller) test. In statistics, an augmented Dickey–Fuller test (ADF) tests the null hypothesis that a unit root is present in a time series sample.

#ADF Test
room %>% adf.test()

## 
##  Augmented Dickey-Fuller Test
## 
## data:  .
## Dickey-Fuller = -3.8287, Lag order = 5, p-value = 0.01902
## alternative hypothesis: stationary

Condition:

p-value > alpha: data is not yet stationary
p-value < alpha: data is already stationary

Our p-value from the ADF test is 0.01902. It’s lower than alpha (0.05) and is already stationary, so we don’t need to differencing the seasonality process before we build SARIMA model.

Cross Validation

The Train Test Splitting concept for Time Series models is similar to the previous one, but there are criteria that must be met, namely:
- Train data uses initial data
- Test data uses final data

The test data will be liked to future data we want to forecast to be compared for an evaluation. Let’s try to take the room data for the last 24 months (July 1993-June 1995) as test data, while the initial data will be used as train data.

#subsetting data test
room_test <- room %>% tail(24)
room_test

##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1993                                           391000 380900 431400 465400
## 1994 446400 421500 504800 492071 421253 396682 428000 421900 465600 525793
## 1995 479499 473027 554410 489574 462157 420331                            
##         Nov    Dec
## 1993 471500 387500
## 1994 499855 435287
## 1995

#subsetting data train
room_train <-  room %>% head(-24)
room_train

##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1980 276986 260633 291551 275383 275302 231693 238829 274215 277808 299060
## 1981 294053 267510 309739 280733 287298 235672 256449 288997 290789 321898
## 1982 295469 258200 306102 281480 283101 237414 274834 299340 300383 340862
## 1983 322656 281563 323461 312579 310784 262785 273754 320036 310336 342206
## 1984 326988 300713 346414 317325 326208 270657 278158 324584 321801 343542
## 1985 330246 307344 375874 335309 339271 280264 293689 341161 345097 368712
## 1986 340981 319072 374214 344529 337271 281016 282224 320984 325426 366276
## 1987 359326 327610 383563 352405 329351 294486 333454 334339 358000 396057
## 1988 363909 344700 397561 376791 337085 299252 323136 329091 346991 461999
## 1989 415467 382110 432197 424254 386728 354508 375765 367986 402378 426516
## 1990 416834 381099 445673 412408 393997 348241 380134 373688 393588 434192
## 1991 411891 370497 437305 411270 385495 341273 384217 373223 415771 448634
## 1992 419104 398027 456059 430052 399757 362731 384896 385349 432289 468891
## 1993 439400 393900 468700 438800 430100 366300                            
##         Nov    Dec
## 1980 286629 232313
## 1981 291834 241380
## 1982 318794 265740
## 1983 320052 265582
## 1984 354040 278179
## 1985 369403 288384
## 1986 380296 300727
## 1987 386976 307155
## 1988 436533 360372
## 1989 433313 338461
## 1990 430731 344468
## 1991 454341 350297
## 1992 442702 370178
## 1993

Then visualize our data after cross validation process

autoplot(room_train) + autolayer(room_test)

Model and Evaluation

ETS Holt-Winters Model

ETS Holt-Winters or Exponential Smoothing Holt-Winters method — also known as triple exponential smoothing — is an incredibly popular and relatively simple method for time series forecasting. The Holt-Winters method is a very common time series forecasting procedure capable of including both trend and seasonality. Our data has trend and seasonality then it’s suitable with ETS Holt-Winters.

Model Fitting

room_hw <- HoltWinters(room_train, seasonal="multiplicative")
room_hw

## Holt-Winters exponential smoothing with trend and multiplicative seasonal component.
## 
## Call:
## HoltWinters(x = room_train, seasonal = "multiplicative")
## 
## Smoothing parameters:
##  alpha: 0.2918549
##  beta : 0
##  gamma: 0.4019896
## 
## Coefficients:
##             [,1]
## a   4.259205e+05
## b   1.019729e+03
## s1  9.320145e-01
## s2  9.336742e-01
## s3  1.017368e+00
## s4  1.117023e+00
## s5  1.094770e+00
## s6  8.797319e-01
## s7  1.047106e+00
## s8  9.572793e-01
## s9  1.117158e+00
## s10 1.044814e+00
## s11 9.894247e-01
## s12 8.645433e-01

Let’s visualize our model

room_train %>% 
  autoplot() +
  autolayer(room_test, series = "Data Test") +
  autolayer(room_hw$fitted[, 1])

### Forecasting

forecast_hw <- forecast(room_hw, h=24)
forecast_hw

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jul 1993       397914.5 388182.9 407646.1 383031.3 412797.7
## Aug 1993       399575.2 388663.7 410486.7 382887.5 416262.9
## Sep 1993       436430.2 424083.6 448776.8 417547.6 455312.7
## Oct 1993       480319.4 466406.8 494232.0 459041.9 501596.9
## Nov 1993       471866.7 457273.1 486460.3 449547.7 494185.8
## Dec 1993       380078.4 366412.8 393744.0 359178.7 400978.1
## Jan 1994       453458.2 437349.6 469566.8 428822.3 478094.2
## Feb 1994       415534.2 399637.3 431431.1 391222.0 439846.4
## Mar 1994       486073.3 467554.4 504592.3 457751.0 514395.6
## Apr 1994       455661.8 437411.6 473912.0 427750.5 483573.1
## May 1994       432514.7 414342.4 450686.9 404722.6 460306.8
## Jun 1994       378805.9 364722.0 392889.8 357266.5 400345.3
## Jul 1994       409319.3 389428.1 429210.5 378898.3 439740.3
## Aug 1994       411000.3 390485.9 431514.7 379626.3 442374.4
## Sep 1994       448879.5 426573.6 471185.3 414765.6 482993.4
## Oct 1994       493988.1 469593.0 518383.3 456679.0 531297.3
## Nov 1994       485263.1 460738.2 509788.1 447755.5 522770.8
## Dec 1994       390843.5 369336.1 412350.8 357950.8 423736.1
## Jan 1995       466271.4 441353.7 491189.1 428163.0 504379.7
## Feb 1995       427248.2 403450.4 451046.0 390852.6 463643.8
## Mar 1995       499743.7 472456.5 527030.9 458011.5 541475.9
## Apr 1995       468446.9 442118.2 494775.7 428180.6 508713.3
## May 1995       444622.0 418922.5 470321.5 405318.0 483926.0
## Jun 1995       389385.1 369467.5 409302.7 358923.8 419846.4

Plot of Forecasting Result

room_train %>% 
  autoplot() +
  autolayer(room_test, series = "Data Test") +
  autolayer(room_hw$fitted[, 1], series= "Model")+
  autolayer(forecast_hw$mean, series="Forecast")

### Model Evaluation

acc_hw <- forecast::accuracy(forecast_hw$mean, room_test)
acc_hw

##                ME    RMSE      MAE    MPE     MAPE      ACF1 Theil's U
## Test set 14274.81 24001.2 19626.96 2.9607 4.241977 0.4214471 0.4853429

SARIMA Model

The SARIMA (seasonal ARIMA) method is a development of the Box-Jenkins (ARIMA) method. The SARIMA model can overcome seasonal patterns of a period. This model requires several approaches, such as stationarity assumptions, differencing, and data transformation.

Model Fitting

room_sarima <- auto.arima(room_train, d=1, D=1)
room_sarima

## Series: room_train 
## ARIMA(0,1,1)(0,1,1)[12] 
## 
## Coefficients:
##           ma1     sma1
##       -0.5833  -0.5447
## s.e.   0.0796   0.0752
## 
## sigma^2 = 1.62e+08:  log likelihood = -1621
## AIC=3248   AICc=3248.16   BIC=3257.01

Forecasting

forecast_sarima <- forecast(room_sarima, h=24)
forecast_sarima

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jul 1993       402338.7 386028.9 418648.5 377395.1 427282.4
## Aug 1993       399722.4 382053.5 417391.4 372700.1 426744.8
## Sep 1993       438152.7 419221.8 457083.5 409200.4 467104.9
## Oct 1993       476523.0 456409.3 496636.7 445761.8 507284.2
## Nov 1993       464943.4 443712.7 486174.1 432473.8 497413.0
## Dec 1993       380727.5 358435.7 403019.4 346635.1 414820.0
## Jan 1994       449099.8 425795.1 472404.5 413458.3 484741.3
## Feb 1994       412448.5 388173.1 436723.9 375322.5 449574.5
## Mar 1994       479284.3 454075.6 504493.0 440730.9 517837.6
## Apr 1994       451681.3 425572.6 477789.9 411751.5 491611.0
## May 1994       432710.7 405732.1 459689.3 391450.5 473970.9
## Jun 1994       381819.0 353997.7 409640.4 339269.9 424368.1
## Jul 1994       413822.3 382576.7 445067.8 366036.4 461608.2
## Aug 1994       411206.0 378432.6 443979.3 361083.4 461328.5
## Sep 1994       449636.2 415403.1 483869.3 397281.2 501991.2
## Oct 1994       488006.5 452373.4 523639.6 433510.4 542502.6
## Nov 1994       476426.9 439446.8 513407.1 419870.8 532983.1
## Dec 1994       392211.1 353931.3 430490.8 333667.2 450754.9
## Jan 1995       460583.3 421046.6 500120.0 400117.2 521049.5
## Feb 1995       423932.0 383177.1 464686.9 361602.8 486261.3
## Mar 1995       490767.8 448830.1 532705.6 426629.6 554906.1
## Apr 1995       463164.8 420076.7 506252.9 397267.2 529062.4
## May 1995       444194.2 399985.7 488402.8 376583.1 511805.4
## Jun 1995       393302.6 348001.3 438603.9 324020.2 462584.9

Plot of Forecasting Result

room_train %>% 
  autoplot() +
  autolayer(room_test, series = "Data Test") +
  autolayer(forecast_sarima$fitted, series = "Model") + 
  autolayer(forecast_sarima$mean, series = "Forecast")

Model Evaluation

acc_sarima <- forecast::accuracy(forecast_sarima$mean, room_test)
acc_sarima

##                ME     RMSE      MAE      MPE     MAPE      ACF1 Theil's U
## Test set 16213.92 26169.59 21396.77 3.322237 4.582359 0.5129179 0.5248379

TBATS Model

TBATS ((Trigonometric Seasonal, Box-Cox Transformation, ARMA residuals, Trend and Seasonality). TBATS is a forecasting method to model time series data. The main aim of this is to forecast time series with complex seasonal patterns using exponential smoothing. TBATS will take into account multiple options and fit a number of models. It will take models into account:
-with Box-Cox transformation and without it Box-Cox transformation.
-with considering Trend and without Trend.
-with Trend Damping and without Trend Damping.
-with ARIMA(p,q) and without ARMA(p,q) process used to model residuals.
-non-seasonal model.
-various amounts of harmonics used to model seasonal effects.
The final model will be chosen using Akaike information criterion (AIC).
source

Model Fitting

we can use tbats() function from forecast library.

room_tbats <- room_train %>% tbats(use.box.cox = T,
                                        use.trend = T,
                                        use.damped.trend = T)
room_tbats

## TBATS(0.001, {5,0}, 0.99, {<12,5>})
## 
## Call: tbats(y = ., use.box.cox = T, use.trend = T, use.damped.trend = T)
## 
## Parameters
##   Lambda: 0.000618
##   Alpha: 0.2120948
##   Beta: -0.004433207
##   Damping Parameter: 0.99004
##   Gamma-1 Values: -9.151155e-05
##   Gamma-2 Values: 3.860713e-05
##   AR coefficients: -0.089713 0.250756 -0.064269 0.166968 -0.286575
## 
## Seed States:
##               [,1]
##  [1,] 12.532488156
##  [2,]  0.005150968
##  [3,]  0.037851547
##  [4,] -0.077567522
##  [5,] -0.007010412
##  [6,]  0.030235710
##  [7,]  0.038233544
##  [8,] -0.016030452
##  [9,]  0.036742554
## [10,]  0.006382289
## [11,]  0.027205848
## [12,] -0.019113818
## [13,]  0.000000000
## [14,]  0.000000000
## [15,]  0.000000000
## [16,]  0.000000000
## [17,]  0.000000000
## attr(,"lambda")
## [1] 0.0006183753
## 
## Sigma: 0.03667788
## AIC: 3932.559

Forecasting

forecast_tbats <- forecast(room_tbats, h= 24)
forecast_tbats

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jul 1993       390734.3 372931.7 409386.2 363838.1 419617.4
## Aug 1993       418806.4 399587.5 438949.1 389773.0 450001.0
## Sep 1993       429365.3 407817.0 452051.5 396851.0 464541.8
## Oct 1993       479030.5 454810.5 504539.3 442488.7 518587.9
## Nov 1993       461291.4 436018.4 488028.4 423204.7 502803.5
## Dec 1993       376617.2 355921.0 398515.9 345429.3 410619.0
## Jan 1994       445254.5 419716.9 472344.9 406796.1 487346.2
## Feb 1994       413328.9 389619.4 438480.3 377623.7 452407.9
## Mar 1994       472435.6 444374.4 502267.7 430200.4 518814.6
## Apr 1994       448014.7 421402.5 476306.3 407960.5 491998.7
## May 1994       425312.9 399028.9 453327.1 385778.6 468895.9
## Jun 1994       373967.9 350838.4 398621.3 339178.9 412322.8
## Jul 1994       389464.0 364612.2 416008.4 352104.7 430784.5
## Aug 1994       428321.6 400960.5 457548.6 387190.8 473818.7
## Sep 1994       430623.7 402312.6 460925.9 388086.5 477820.2
## Oct 1994       492145.5 459753.7 526818.0 443478.1 546150.1
## Nov 1994       463606.0 432343.9 497127.2 416656.8 515841.9
## Dec 1994       386265.4 360174.0 414245.7 347082.8 429868.4
## Jan 1995       447768.1 416869.3 480955.6 401384.7 499507.9
## Feb 1995       423103.9 393853.9 454524.8 379197.1 472091.2
## Mar 1995       474722.4 441264.0 510716.0 424517.0 530861.1
## Apr 1995       457870.1 425529.3 492667.1 409343.8 512144.9
## May 1995       427536.6 396806.3 460645.1 381442.7 479196.6
## Jun 1995       381751.2 354243.3 411393.6 340492.8 428005.5

Plot of Forecasting Result

room_train %>% 
  autoplot() +
  autolayer(room_test, series = "Data Test") +
  autolayer(room_tbats$fitted, series= "Model")+
  autolayer(forecast_tbats$mean, series="Forecast")

Model Evaluation

acc_tbats <- forecast::accuracy(forecast_tbats,room_test)
acc_tbats

##                      ME     RMSE       MAE        MPE     MAPE     MASE
## Training set  -533.7765 12860.30  9486.427 -0.3259455 2.772652 0.614035
## Test set     22020.8676 33278.31 27189.069  4.5646563 5.845190 1.759887
##                    ACF1 Theil's U
## Training set 0.03657143        NA
## Test set     0.44742401 0.6851876

acc_tbats[[5]]

## [1] 9486.427

Model Comparasion

Plot of all Model

Then, we can see our model and forecasting result using an intuitive plot and compare the models visually.

a <-   autoplot(room_train) +
  autolayer(room_test, series = "Data Test") +
  autolayer(room_hw$fitted[, 1], series= "Model")+
  autolayer(forecast_hw$mean, series="Forecast")+ 
  labs(title= "ETS Holt-Winters",
       y = "Room Nights") +
  theme_minimal()

b <-   autoplot(room_train) +
  autolayer(room_test, series = "Data Test") +
  autolayer(room_sarima$fitted, series = "Model") + 
  autolayer(forecast_sarima$mean, series = "Forecast") +
  labs( title = "SARIMA",
      y = "Room Nights") +
  theme_minimal()

c <-  autoplot(room_train) +
  autolayer(room_test, series = "Data Test") +
  autolayer(room_tbats$fitted, series= "Model")+
  autolayer(forecast_tbats$mean, series="Forecast") +
  labs(title = "TBATS",
       y = "Room Nights") +
  theme_minimal()

gridExtra::grid.arrange(a,b,c)

The graph above shows that the ETS Holt-Winters error is smaller than others because the forecast line is very close to the test data line.

Table comparasion

compare <- data.frame(Model = c("Holt-Winters","SARIMA", "TBATS"),
                      MAE = c(acc_hw[[3]], acc_sarima[[3]],acc_tbats[[6]]),
                      MAPE = c(acc_hw[[5]], acc_sarima[[5]],acc_tbats[[10]]),
                      RMSE = c(acc_hw[[2]], acc_sarima[[2]],acc_tbats[[4]]))
compare

##          Model      MAE     MAPE     RMSE
## 1 Holt-Winters 19626.96 4.241977 24001.20
## 2       SARIMA 21396.77 4.582359 26169.59
## 3        TBATS 27189.07 5.845190 33278.31

Based on the graph above, the lowest error model is the Holt-Winters model, but we will try to tune it.

Model Assumption Checking

This assumption is only for the SARIMA model.

No-autocorrelation residual
The autocorrelation can be checked using the Ljung-Box test. If there are correlations between residuals, then there is information left in the residuals which should be used in computing forecasts. Based on Hypothesis Testing criterion : H0: no-autocorrelation <- p-value > 0.05 H1: autocorrelation <- p-value < 0.05

Box.test(room_sarima$residuals, type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  room_sarima$residuals
## X-squared = 0.063182, df = 1, p-value = 0.8015

Based on the p-value above, room_sarima model does not have autocorrelated residuals.

Normality of Residual
We will also check if the residuals for each model is normally distributed using Shapiro-Wilk Test. If the residuals are not normally distributed, it will lead to a biased parameter and less optimal forecast. This is also indicate that we can still tweak our model to get a better performance. Based on Hypothesis Testing criterion : H0: residuals are normally distributed <- p-value > 0.05 H1: residuals are not normally distributed <- p-value < 0.05

shapiro.test(room_sarima$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  room_sarima$residuals
## W = 0.88747, p-value = 9.561e-10

Based on the p-value above, room_sarima model residual is not normally distributed.

Model Tuning

If the parameters alpha, beta, and gamma are not defined, then HoltWinters will try to determine the best values for these three parameters automatically. However, these values may not be the most appropriate for our data. So, we can try to specify the parameter values manually:

room_tuning <- HoltWinters(room_train, beta=0.4, seasonal = "multiplicative")
forecast_tuning <- forecast(room_tuning, h = 24)
forecast::accuracy(forecast_tuning$mean, room_test)

##                ME     RMSE      MAE      MPE     MAPE      ACF1 Theil's U
## Test set 5232.791 17100.47 13082.97 1.017326 2.870605 0.0608822 0.3523958

forecast::accuracy(forecast_hw$mean, room_test)

##                ME    RMSE      MAE    MPE     MAPE      ACF1 Theil's U
## Test set 14274.81 24001.2 19626.96 2.9607 4.241977 0.4214471 0.4853429

acc_tuning <- forecast::accuracy(forecast_tuning$mean, room_test)

forecast_tuning$mean

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1993                                                       393075.1 405805.5
## 1994 458431.7 421314.9 492921.5 461154.7 441186.2 385154.4 413847.0 427156.1
## 1995 482033.7 442913.3 518083.3 484595.2 463517.1 404567.3                  
##           Sep      Oct      Nov      Dec
## 1993 435799.6 483633.2 474972.9 381616.8
## 1994 458628.2 508857.4 499638.1 401348.7
## 1995

autoplot(forecast_tuning$mean)

The highest room_nights will be on March 1995.

Conclusion

compare <- data.frame(Model = c("Holt-Winters","SARIMA", "TBATS", "Holt-Winters Tuning"),
                      MAE = c(acc_hw[[3]], acc_sarima[[3]],acc_tbats[[6]], acc_tuning[[3]]),
                      MAPE = c(acc_hw[[5]], acc_sarima[[5]],acc_tbats[[10]], acc_tuning[[5]]),
                      RMSE = c(acc_hw[[2]], acc_sarima[[2]],acc_tbats[[4]], acc_tuning[[2]]))
compare

##                 Model      MAE     MAPE     RMSE
## 1        Holt-Winters 19626.96 4.241977 24001.20
## 2              SARIMA 21396.77 4.582359 26169.59
## 3               TBATS 27189.07 5.845190 33278.31
## 4 Holt-Winters Tuning 13082.97 2.870605 17100.47

Based on the graph above, MAE, MAPE, and RMSE values decrease after tuning, but they are still too big. There are some method that can be try to fix it.
-Look for Model Misspecification: When a model fails to consider all the pertinent information in the data, it is said to have been misspecificated. The residuals may show patterns or trends as a result of this. This can be resolved by verifying the model’s underlying assumptions, such as that the variance remains constant over time and that the residual distribution is normal.
-Think About Including More Predictors: It can signify that more predictors must be added to the model if the residuals exhibit patterns or trends. It can be essential to include seasonal variables in the model, for instance, if the residuals exhibit a seasonal pattern.
-Transform the Data: We can attempt to attain normalcy by modifying the data if the residuals are not normally distributed. Taking the data’s square root or logarithm is a common transformation source.

References

https://medium.com/analytics-vidhya/a-thorough-introduction-to-holt-winters-forecasting-c21810b8c0e6
https://timeseriesreasoning.com/contents/holt-winters-exponential-smoothing/
https://medium.com/analytics-vidhya/time-series-forecasting-using-tbats-model-ce8c429442a9
https://bicorner.com/2015/05/02/what-is-time-seires-analysis-part-ii/
https://medium.com/@ShankiiZ_/residual-analysis-in-time-series-612a450b08f5

Time Series Forecasting: The number of Roomnights in Hotels, Motels and Guesthouses in Victoria, Australia

Galuh Chynintya

2024-08-21

Introduction

Data Wrangling

Importing Libraries

Importing Dataset

Data Aggregation

Exploratory Data Analysis

Is our data additive or multiplicative time series?

Does our data have multiseasonality?

Is our data already stationary?

Cross Validation

Model and Evaluation

ETS Holt-Winters Model

Model Fitting

Plot of Forecasting Result

SARIMA Model

Model Fitting

Forecasting

Plot of Forecasting Result

Model Evaluation

TBATS Model

Model Fitting

Forecasting

Plot of Forecasting Result

Model Evaluation

Model Comparasion

Plot of all Model

Table comparasion

Model Assumption Checking

Model Tuning

Conclusion

References