Data Explanation
Objective
The data in this study are secondary data sourced from the Central Statistics Agency or Badan Pusat Statistik (BPS) in Muhammad Farhan Naufal’s final project entitled “FORECASTING NUMBER OF FOREIGN TOURISTS ARRIVAL TO INDONESIA BASED ON ENTRANCE USING SUPPORT VECTOR MACHINE (SVM) METHOD” from Institute of Technology Sepuluh Nopember. This data consists of data on the number of tourists who come to Indonesia and the date. This data gathered from January 2008 to December 2015.
Why?
The development of the number of foreign tourists who come to Indonesia has experienced a significant development since 2008 until now. BPS (2009) noted that the number of foreign tourists who came to Indonesia in December 2008 reached 610 thousand people, an increase of 17.69% compared to the previous year. The number of tourist arrivals to Indonesia is calculated based on the main entrance. Batam, as one of the main entry points for foreign tourists to Indonesia, is one of the outermost regions of Indonesia which is directly adjacent to neighboring countries such as Singapore and Malaysia (Border and Land Agency, 2014). In addition, as a free trade area and free port, Batam is a tourist destination because it has a variety of attractive tourist destinations (Badan Pengusahaan Batam or Batam Concession Agency, 2021).
As the main entrance, Batam is the largest contributor to the number of foreign tourist arrivals in the Riau Islands Province when compared to other entrances such as Karimun, Bintan, and Tanjung Pinang (BPS, 2016). Therefore, data modeling was carried out to predict the number of foreign tourists who came to Indonesia in the period 2008 – 2015 using the seasonal ARIMA method. Then, an evaluation of the model was carried out using the Mean Absolute Percentage Error (MAPE) on the prediction results with the original data in 2016.
Data Preparation
library(dplyr)
library(lubridate)
library(forecast)
library(tseries)
library(ggplot2)
Reading Data
batam <- read.csv("batam_new.csv")
Changing to Date Type
batam$waktu<- ym(batam$waktu)
Dropping Rows With NaN Value
batam <- batam %>%
drop_na()
Checking Completeness of The Date Data
all(seq.Date(from = as.Date("2008-01-01"), to = as.Date("2015-12-01"), by="month")==batam$waktu)
## [1] TRUE
Result: The data has been completed. There is no missed date in the dataset.
Data Table
rmarkdown::paged_table(batam)
Exploratory Data Analysis
Changing Data Type to Time Series Data Type
batam_ts <- ts(data = batam$Wisatawan, start = c(2008,1), frequency = 12)
Result: We have to input start = c(2008,1) because the data is started from January 2008 with frequency = 12 that means the data is annual.
Splitting Data into Data Train and Data Test
train_batam <- head(batam_ts, n=-12)
test_batam <- tail(batam_ts, n=12)
Plotting Time Series Data Train
train_batam %>% autoplot()
Result: If we focused on observing the plot, we can see that the plot will be at the top of the plot on December in every year and at the lowest point on January. Due to repeating pattern, we can diagnoze that there is seasonality on the data.
Checking the Trend and Seasonality
batam_decomp <- decompose(train_batam)
batam_decomp %>% autoplot()
Result: Based on the plot above, we can see that there is an increasing trend. Repeated pattern on the plot indicates that there is seasonality in the data.
Stationary Test
Due to trend existed on the data, we need to do differencing method to make data stationary distributed.
diff_ns <- diff(train_batam)
diff_ns %>% autoplot()
Result: After differencing the data, we can observe that the increasing trend disappear. Then, we have to take a look on ACF and PACF plot.
acf(diff_ns)
Result: Based on the acf plot, there are repeating pattern on the lag that goes out of the boundary line. So we have to do differencing seasonal with lag 12.
diff_s <- diff(diff_ns, lag = 12)
diff_s %>% tsdisplay(lag.max = 36)
Result: We get that non-seasonal parameter p = 0,1,2 ; d = 1; q = 0,1 and seasonal parameter P = 0,1; D = 1; Q = 0,1. The best model for this data would probably one of the combination of all parameters.
Modelling: ARIMA Seasonal
Auto-Modelling
arima_model <- auto.arima(y = train_batam, seasonal = T)
summary(arima_model)
## Series: train_batam
## ARIMA(2,1,0)(0,1,1)[12]
##
## Coefficients:
## ar1 ar2 sma1
## -0.8188 -0.4719 -0.6136
## s.e. 0.1055 0.1071 0.1395
##
## sigma^2 = 69869669: log likelihood = -743.68
## AIC=1495.36 AICc=1495.96 BIC=1504.41
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 879.3273 7520.715 5512.268 0.45692 5.718135 0.5400405 -0.02401754
Result: From auto.arima() function we get that ARIMA with p=2, d=1, q=1, P=0, D=1, Q=1 or ARIMA(2,1,1)(0,1,1)^12 is the best model for the time series data.
ARIMA Assumptions Test
Ljung-Box Test
Box.test(arima_model$residuals, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: arima_model$residuals
## X-squared = 0.050206, df = 1, p-value = 0.8227
Result: The p-value 0.9835 or greater than alpha 0.05 indicates that there is no autocorrelation in residual data. This means that the data has fulfilled autocorrelation assumption.
Normality Test
shapiro.test(arima_model$residuals)
##
## Shapiro-Wilk normality test
##
## data: arima_model$residuals
## W = 0.96438, p-value = 0.02007
Result: The p-value is greater than alpha 0.05 indicates that the residual data is normally distributed. So, the residual has fulfilled normality assumption.
Forecasting Using ARIMA Seasonal Model
for_arima <- forecast(arima_model, h=12)
Visualizing Forecasting Result and Data Test
train_batam %>%
autoplot()+
autolayer(test_batam, series = "Data Test")+
autolayer(for_arima$mean, series = "Forecast")
Evaluating ARIMA Model
accuracy(for_arima$mean, test_batam)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -7418.614 12755.93 11137.22 -6.184633 8.765224 -0.2394782 0.6217098
Result: The value of MAPE 11.95% shows that the model is good enough to forecasting the number of tourist who come to Indonesia through Batam.
Modelling: Holt Winters
holt_model <- HoltWinters(train_batam)
holt_model
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = train_batam)
##
## Smoothing parameters:
## alpha: 0.2631331
## beta : 0.07898211
## gamma: 0.3856263
##
## Coefficients:
## [,1]
## a 126336.3359
## b 1055.1906
## s1 -3467.6148
## s2 -10731.8628
## s3 6033.6144
## s4 -6185.1665
## s5 -157.5064
## s6 16694.0424
## s7 -9702.7124
## s8 -2335.0155
## s9 -5171.0284
## s10 -656.0962
## s11 3888.2640
## s12 37811.6980
Forecasting Using Holt Winters Model
for_holt<-forecast(holt_model, h=12)
Evaluating Holt Winters Model
accuracy(for_holt$mean, test_batam)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -6545.293 11855.97 10367.56 -5.507612 8.181812 -0.089083 0.5875311
Best Model and Conclusion
By comparing the evaluation result of two models, we can finally choose the best model by looking at MAPE and RMSE value.
accuracy(for_arima$mean, test_batam)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -7418.614 12755.93 11137.22 -6.184633 8.765224 -0.2394782 0.6217098
accuracy(for_holt$mean, test_batam)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -6545.293 11855.97 10367.56 -5.507612 8.181812 -0.089083 0.5875311
Result: From the result above, we can see that ARIMA seasonal model is better than Holt Winters model in forecasting the data of number of tourist arrival to Indonesia through Batam gate with the value of MAPE and RMSE are 11.95% and 16486.05 respectively.
References
- Badan Pengusahaan Batam. 2021. Destinasi Wisata. URL: https://bpbatam.go.id/tentang-batam/destinasi-wisata/. Retrieved June 12nd 2022
- Badan Pusat Statistik. 2016. Jumlah Wisatawan Mancanegara yang Datang. URL: https://kepri.bps.go.id/indicator/16/29/7/jumlah-wisatawan-mancanegara-yang-datang.html. Retrieved June 12nd 2022
- Badan Pusat Statistik. 2009. Jumlah Wisatawan Mancanegara (Wisman) Desember 2008 Naik 16,46 Persen. URL: https://www.bps.go.id/pressrelease/2009/02/02/728/jumlah-wisatawan-mancanegara--wisman---desember-2008-naik-16-46-persen.html. Retrieved June 12nd 2022
- Naufal, M. F. 2017. FORECASTING NUMBER OF FOREIGN TOURISTS ARRIVAL TO INDONESIA BASED ON ENTRANCE USING SUPPORT VECTOR MACHINE (SVM) METHOD. Final Project. Faculty of Technological Information, Institute of Technology Sepuluh Nopember. Indonesia.