The tourism industry plays a significant role in the economic landscape of countries worldwide. Accurate forecasting of tourist arrivals is crucial for policymakers, businesses, and other stakeholders to make informed decisions and effectively plan for the future. This writings focuses on forecasting tourist arrivals in Australia, examining historical data and employing various time series forecasting methods.
Australia, known for its diverse landscapes, unique wildlife, and vibrant cities, attracts millions of tourists each year. Understanding and predicting the patterns of tourist arrivals can aid in optimizing resource allocation, marketing strategies, and infrastructure development.
pacman::p_load(e1071, tidyverse, caret, rmarkdown,
corrplot, readxl, ModelMetrics, fpp2,expsmooth,CombMSC)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
theme_set(theme_classic())
The dataset used for this analysis includes information on monthly short-term overseas visitors to Australia from May 1985 to April 2005. The data is obtained from reliable sources, providing a comprehensive view of the historical trends in tourist arrivals.
data(visitors)
view(visitors)
# Load the openxlsx package
library(openxlsx)
# Assuming "visitors" is your dataset
write.xlsx(visitors, file = "visitors_data.xlsx", rowNames = FALSE)
The time series plot reveals essential insights into the overall trend, seasonality, and variance in the data. Over the years 1985 to 2005, there is a noticeable increasing trend, accompanied by apparent seasonality. The plot displays peaks, suggesting specific periods of higher tourist activity.
Time Plots:
### Monthly Australian short-term overseas vistors. May 1985-April 2005
### Time Plot
autoplot(visitors) +
ggtitle("Australian short-term overseas vistors") +
xlab("Year") + ylab("# of Visitors(Thousands)")
ANALYIS: The time Series plot displays:
Seasonal plots and polar seasonal plots further illustrate the seasonality in the data. Peaks in tourist arrivals during certain months, such as July and December, indicate recurring patterns. Understanding these seasonal variations is crucial for accurate forecasting.
Season Plots:
ggseasonplot(visitors, year.labels = TRUE) +
ylab("# of Visitors(Thousands)") +
ggtitle("Monthly Australian short-term overseas vistors")
### Polar seasonal plot
ggseasonplot(visitors, polar=TRUE) +
ylab("# of Visitors(Thousands)") +
ggtitle("Monthly Australian short-term overseas vistors")
ANALYSIS :
By looking at the Seasonal Plot and Polar Seasonal Plot, we can infer that the data has a seasonality.
Subseries plots provide a detailed view of the mean number of visitors by month over the years. December consistently exhibits the highest mean number of visitors, followed by February, while May tends to have the lowest mean. Weather conditions, being summer in December and winter in May, might influence these patterns.
Subseries Plot (y vs. year, by month): Blue Line gives the mean in this
ggsubseriesplot(visitors) +
ylab("# of Visitors(Thousands)") +
ggtitle("Monthly Australian short-term overseas vistors")
ANALYSIS :
The mean for the number of visitors over the years for the month of December remains highest in comaprison to other states, followed by February, which has the second highest number of visitors over the years 1985-2005. The mean for the number of visitors for the month of May is the least.
This might be due to weather conditions in Australia as its Summer in December and Winter in May.
Auto correlation plots reveal the presence of strong seasonality, with positive correlations for all lags. The slow decrease in the auto correlation function (ACF) as lags increase is attributed to the trend, and the “scalloped” shape is indicative of seasonality.
Auto Correlation Plot:
# Lag Plot
gglagplot(visitors)
# ACF or Correlogram
ggAcf(visitors)
autoplot(visitors) + xlab("Year") + ylab("# of Visitors(Thousands)")
ggAcf(visitors, lag= 48)
tail(visitors)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2004 479.9 593.1
## 2005 462.4 501.6 504.7 409.5
ANALYSIS :
Lag Plot :
There is a positive correlation for all the lags indicating a strong seasonality in the data
Correlogram :
The dashed blue lines indicate whether the correlations are significantly different from zero.All of the autocorrelation coefficients lie beyond these limits, confirming that the data are not white noise.
The slow decrease in the ACF as the lags increase is due to the trend, while the “scalloped” shape is due to the seasonality.
library(dplyr)
#make this example reproducible
set.seed(1)
#create ID column
visitors<- 1:1x(df)
#use 70% of dataset as training set and 30% as test set
train <- df %>% dplyr::sample_frac(0.70)
test <- dplyr::anti_join(df, train, by = 'id')
## Error: <text>:7:15: unexpected symbol
## 6: #create ID column
## 7: visitors<- 1:1x
## ^
Test Train Split
library(caret)
data("visitors")
df <- visitors
#split data frame based on stratified sampling
train_index <- createDataPartition(visitors, p = 0.8, list = FALSE)
train <- df[train_index, ]
## Error in `[.default`(df, train_index, ): incorrect number of dimensions
test <- df[-train_index, ]
## Error in `[.default`(df, -train_index, ): incorrect number of dimensions
Holt Winters Multiplicative Method
fit1 <- hw(visitors_split$train,seasonal="additive")
## Error in eval(expr, envir, enclos): object 'visitors_split' not found
fit2 <- hw(visitors_split$train,seasonal="multiplicative")
## Error in eval(expr, envir, enclos): object 'visitors_split' not found
fit1[["model"]]
## Error in eval(expr, envir, enclos): object 'fit1' not found
fit2[["model"]]
## Error in eval(expr, envir, enclos): object 'fit2' not found
autoplot(visitors) +
autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
autolayer(fit2, series="HW multiplicative forecasts",
PI=FALSE) +
xlab("Year") +
ylab("Monthly Visitors") +
ggtitle("Monthly Australian overseas vistors") +
guides(colour=guide_legend(title="Forecast"))
## Error in eval(expr, envir, enclos): object 'fit1' not found
Holt-Winters’ additive method
Smoothing parameters: alpha = 0.4095 beta = 1e-04 gamma = 0.3117
AIC AICc BIC
2422.918 2426.008 2480.297
Holt-Winters’ multiplicative method
Smoothing parameters: alpha = 0.4379 beta = 0.0164 gamma = 1e-04
AIC AICc BIC
2326.608 2329.699 2383.988
We can see that the AIC, AICc and BIC are lower for multiplicative model so theoretically, Multiplicative model is better.
But visually, we see that the Seasonality doesn’t remain constant and keeps on increasing so Multiplicative Seasonality is necessary here.
ETS
fit <- ets(visitors_split$train) # using to generate a forecast but with prediction intervals
## Error in eval(expr, envir, enclos): object 'visitors_split' not found
summary(fit)
## Error in eval(expr, envir, enclos): object 'fit' not found
autoplot(fit)
## Error in eval(expr, envir, enclos): object 'fit' not found
cbind('Residuals' = residuals(fit), 'Forecast errors' = residuals(fit,type='response')) %>%
autoplot(facet=TRUE) +
xlab("Year") + ylab("")
## Error in eval(expr, envir, enclos): object 'fit' not found
### Forecasts with ETS Models
fit %>% forecast(h=24) %>%
autoplot() + ylab("Monthly Australian overseas vistors")
## Error in eval(expr, envir, enclos): object 'fit' not found
Additive ETS with Box-Cox Transformed Series
lambda <- BoxCox.lambda(visitors)
fit2 <- ets(visitors_split$train,additive.only=TRUE,lambda=lambda) # using to generate a forecast but with prediction intervals
## Error in eval(expr, envir, enclos): object 'visitors_split' not found
summary(fit2)
## Error in eval(expr, envir, enclos): object 'fit2' not found
autoplot(fit2)
## Error in eval(expr, envir, enclos): object 'fit2' not found
cbind('Residuals' = residuals(fit2), 'Forecast errors' = residuals(fit2,type='response')) %>%
autoplot(facet=TRUE) +
xlab("Year") + ylab("")
## Error in eval(expr, envir, enclos): object 'fit2' not found
### Forecasts with ETS Models
fit2_forecast <- fit2 %>% forecast(h=24) %>%
autoplot() + ylab("Monthly Australian overseas vistors")
## Error in eval(expr, envir, enclos): object 'fit2' not found
Seasonal Naive
seasonal_naive <- snaive(visitors_split$train, h=24)
## Error in eval(expr, envir, enclos): object 'visitors_split' not found
summary(seasonal_naive)
## Error in eval(expr, envir, enclos): object 'seasonal_naive' not found
autoplot(visitors) +
autolayer(seasonal_naive,
series="Seasonal naive", PI=FALSE) +
ggtitle("Forecasts for Monthly Australian overseas vistors") +
xlab("Year") + ylab("# of Visitors(Thousands") +
guides(color=guide_legend(title="Forecast"))
## Error in eval(expr, envir, enclos): object 'seasonal_naive' not found
Which Method is best of - ETS, Additive ETS with Box-Cox Transformation and Seasonal Naive
### Forecast accuracy (using Test data)
accuracy(forecast(fit,h=24), visitors_split$test)
## Error in eval(expr, envir, enclos): object 'fit' not found
accuracy(forecast(fit2,h=24), visitors_split$test)
## Error in eval(expr, envir, enclos): object 'fit2' not found
accuracy(forecast(seasonal_naive,h=24), visitors_split$test)
## Error in eval(expr, envir, enclos): object 'seasonal_naive' not found
Analysis :
As the RMSE for the Test is least for Seasonal Naive is the lowest so, in theory, the Seasonal Naive is the best Model.
Residual Test
#Finding Residuals
checkresiduals(fit)
## Error in eval(expr, envir, enclos): object 'fit' not found
checkresiduals(fit2)
## Error in eval(expr, envir, enclos): object 'fit2' not found
checkresiduals(seasonal_naive)
## Error in eval(expr, envir, enclos): object 'seasonal_naive' not found
# The resduals for each model are following a normal distribution.
Analysis:
ETS and Additive ETS with Box-Cox Transformation Residual plots show that there is White Noise as they mostly follow Normal Distribution and the ACF doesn’t show Autocorrelation.
Seasonal Naive - After plotting the Residuals, we can see that the distribution doesn’t look like a Normal Distribution. Neither are 95% or more fall under the blue margins(95% confidence interval) in the ACF so there is no White Noise and it fails the Residual Test.
END