Forecasting Tourist Arrival Data Using ARIMA Seasonal and Holt Winters Model

Gaos Tipki Alpandi | @gaostipkialpandi

2022-06-12

Data Explanation

Objective

The data in this study are secondary data sourced from the Central Statistics Agency or Badan Pusat Statistik (BPS) in Muhammad Farhan Naufal’s final project entitled “FORECASTING NUMBER OF FOREIGN TOURISTS ARRIVAL TO INDONESIA BASED ON ENTRANCE USING SUPPORT VECTOR MACHINE (SVM) METHOD” from Institute of Technology Sepuluh Nopember. This data consists of data on the number of tourists who come to Indonesia and the date. This data gathered from January 2008 to December 2015.

Why?

The development of the number of foreign tourists who come to Indonesia has experienced a significant development since 2008 until now. BPS (2009) noted that the number of foreign tourists who came to Indonesia in December 2008 reached 610 thousand people, an increase of 17.69% compared to the previous year. The number of tourist arrivals to Indonesia is calculated based on the main entrance. Batam, as one of the main entry points for foreign tourists to Indonesia, is one of the outermost regions of Indonesia which is directly adjacent to neighboring countries such as Singapore and Malaysia (Border and Land Agency, 2014). In addition, as a free trade area and free port, Batam is a tourist destination because it has a variety of attractive tourist destinations (Badan Pengusahaan Batam or Batam Concession Agency, 2021).

As the main entrance, Batam is the largest contributor to the number of foreign tourist arrivals in the Riau Islands Province when compared to other entrances such as Karimun, Bintan, and Tanjung Pinang (BPS, 2016). Therefore, data modeling was carried out to predict the number of foreign tourists who came to Indonesia in the period 2008 – 2015 using the seasonal ARIMA method. Then, an evaluation of the model was carried out using the Mean Absolute Percentage Error (MAPE) on the prediction results with the original data in 2016.

Data Preparation

library(dplyr) 
library(lubridate) 
library(forecast) 
library(tseries) 
library(ggplot2)

Reading Data

batam <- read.csv("batam_new.csv")

Changing to Date Type

batam$waktu<- ym(batam$waktu)

Dropping Rows With NaN Value

batam <- batam %>% 
  drop_na()

Checking Completeness of The Date Data

all(seq.Date(from = as.Date("2008-01-01"), to = as.Date("2015-12-01"), by="month")==batam$waktu)
## [1] TRUE

Result: The data has been completed. There is no missed date in the dataset.

Data Table

rmarkdown::paged_table(batam)

Exploratory Data Analysis

Changing Data Type to Time Series Data Type

batam_ts <- ts(data = batam$Wisatawan, start = c(2008,1), frequency = 12)

Result: We have to input start = c(2008,1) because the data is started from January 2008 with frequency = 12 that means the data is annual.

Splitting Data into Data Train and Data Test

train_batam <- head(batam_ts, n=-12)
test_batam  <- tail(batam_ts, n=12)

Plotting Time Series Data Train

train_batam %>% autoplot()

Result: If we focused on observing the plot, we can see that the plot will be at the top of the plot on December in every year and at the lowest point on January. Due to repeating pattern, we can diagnoze that there is seasonality on the data.

Checking the Trend and Seasonality

batam_decomp <- decompose(train_batam)
batam_decomp %>% autoplot()

Result: Based on the plot above, we can see that there is an increasing trend. Repeated pattern on the plot indicates that there is seasonality in the data.

Stationary Test

Due to trend existed on the data, we need to do differencing method to make data stationary distributed.

diff_ns <- diff(train_batam)
diff_ns %>% autoplot()

Result: After differencing the data, we can observe that the increasing trend disappear. Then, we have to take a look on ACF and PACF plot.

acf(diff_ns)

Result: Based on the acf plot, there are repeating pattern on the lag that goes out of the boundary line. So we have to do differencing seasonal with lag 12.

diff_s <- diff(diff_ns, lag = 12)
diff_s %>% tsdisplay(lag.max = 36)

Result: We get that non-seasonal parameter p = 0,1,2 ; d = 1; q = 0,1 and seasonal parameter P = 0,1; D = 1; Q = 0,1. The best model for this data would probably one of the combination of all parameters.

Modelling: ARIMA Seasonal

Auto-Modelling

arima_model <- auto.arima(y = train_batam, seasonal = T)
summary(arima_model)
## Series: train_batam 
## ARIMA(2,1,0)(0,1,1)[12] 
## 
## Coefficients:
##           ar1      ar2     sma1
##       -0.8188  -0.4719  -0.6136
## s.e.   0.1055   0.1071   0.1395
## 
## sigma^2 = 69869669:  log likelihood = -743.68
## AIC=1495.36   AICc=1495.96   BIC=1504.41
## 
## Training set error measures:
##                    ME     RMSE      MAE     MPE     MAPE      MASE        ACF1
## Training set 879.3273 7520.715 5512.268 0.45692 5.718135 0.5400405 -0.02401754

Result: From auto.arima() function we get that ARIMA with p=2, d=1, q=1, P=0, D=1, Q=1 or ARIMA(2,1,1)(0,1,1)^12 is the best model for the time series data.

ARIMA Assumptions Test

Ljung-Box Test

Box.test(arima_model$residuals, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  arima_model$residuals
## X-squared = 0.050206, df = 1, p-value = 0.8227

Result: The p-value 0.9835 or greater than alpha 0.05 indicates that there is no autocorrelation in residual data. This means that the data has fulfilled autocorrelation assumption.

Normality Test

shapiro.test(arima_model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  arima_model$residuals
## W = 0.96438, p-value = 0.02007

Result: The p-value is greater than alpha 0.05 indicates that the residual data is normally distributed. So, the residual has fulfilled normality assumption.

Forecasting Using ARIMA Seasonal Model

for_arima <- forecast(arima_model, h=12)

Visualizing Forecasting Result and Data Test

train_batam %>% 
  autoplot()+
  autolayer(test_batam, series = "Data Test")+
  autolayer(for_arima$mean, series = "Forecast")

Evaluating ARIMA Model

accuracy(for_arima$mean, test_batam)
##                 ME     RMSE      MAE       MPE     MAPE       ACF1 Theil's U
## Test set -7418.614 12755.93 11137.22 -6.184633 8.765224 -0.2394782 0.6217098

Result: The value of MAPE 11.95% shows that the model is good enough to forecasting the number of tourist who come to Indonesia through Batam.

Modelling: Holt Winters

holt_model <- HoltWinters(train_batam)
holt_model
## Holt-Winters exponential smoothing with trend and additive seasonal component.
## 
## Call:
## HoltWinters(x = train_batam)
## 
## Smoothing parameters:
##  alpha: 0.2631331
##  beta : 0.07898211
##  gamma: 0.3856263
## 
## Coefficients:
##            [,1]
## a   126336.3359
## b     1055.1906
## s1   -3467.6148
## s2  -10731.8628
## s3    6033.6144
## s4   -6185.1665
## s5    -157.5064
## s6   16694.0424
## s7   -9702.7124
## s8   -2335.0155
## s9   -5171.0284
## s10   -656.0962
## s11   3888.2640
## s12  37811.6980

Forecasting Using Holt Winters Model

for_holt<-forecast(holt_model, h=12)

Evaluating Holt Winters Model

accuracy(for_holt$mean, test_batam)
##                 ME     RMSE      MAE       MPE     MAPE      ACF1 Theil's U
## Test set -6545.293 11855.97 10367.56 -5.507612 8.181812 -0.089083 0.5875311

Best Model and Conclusion

By comparing the evaluation result of two models, we can finally choose the best model by looking at MAPE and RMSE value.

accuracy(for_arima$mean, test_batam)
##                 ME     RMSE      MAE       MPE     MAPE       ACF1 Theil's U
## Test set -7418.614 12755.93 11137.22 -6.184633 8.765224 -0.2394782 0.6217098
accuracy(for_holt$mean, test_batam)
##                 ME     RMSE      MAE       MPE     MAPE      ACF1 Theil's U
## Test set -6545.293 11855.97 10367.56 -5.507612 8.181812 -0.089083 0.5875311

Result: From the result above, we can see that ARIMA seasonal model is better than Holt Winters model in forecasting the data of number of tourist arrival to Indonesia through Batam gate with the value of MAPE and RMSE are 11.95% and 16486.05 respectively.

References

  1. Badan Pengusahaan Batam. 2021. Destinasi Wisata. URL: https://bpbatam.go.id/tentang-batam/destinasi-wisata/. Retrieved June 12nd 2022
  2. Badan Pusat Statistik. 2016. Jumlah Wisatawan Mancanegara yang Datang. URL: https://kepri.bps.go.id/indicator/16/29/7/jumlah-wisatawan-mancanegara-yang-datang.html. Retrieved June 12nd 2022
  3. Badan Pusat Statistik. 2009. Jumlah Wisatawan Mancanegara (Wisman) Desember 2008 Naik 16,46 Persen. URL: https://www.bps.go.id/pressrelease/2009/02/02/728/jumlah-wisatawan-mancanegara--wisman---desember-2008-naik-16-46-persen.html. Retrieved June 12nd 2022
  4. Naufal, M. F. 2017. FORECASTING NUMBER OF FOREIGN TOURISTS ARRIVAL TO INDONESIA BASED ON ENTRANCE USING SUPPORT VECTOR MACHINE (SVM) METHOD. Final Project. Faculty of Technological Information, Institute of Technology Sepuluh Nopember. Indonesia.