library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tsibble 1.1.6 ✔ feasts 0.4.2
## ✔ tsibbledata 0.4.1 ✔ fable 0.4.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## 7 1999 Q3 Adelaide South Australia Business 169.
## 8 1999 Q4 Adelaide South Australia Business 134.
## 9 2000 Q1 Adelaide South Australia Business 154.
## 10 2000 Q2 Adelaide South Australia Business 169.
## # ℹ 24,310 more rows
tourism_clean <- tourism %>%
filter(Region == "Sydney") %>%
filter(Purpose == "Holiday") %>%
select(Quarter, Trips) %>%
drop_na() %>%
arrange(Quarter) %>%
mutate(Quarter = yearquarter(Quarter))
glimpse(tourism_clean)
## Rows: 80
## Columns: 2
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
## $ Trips <dbl> 828.3171, 531.4342, 502.8710, 579.7480, 465.4975, 534.0419, 50…
distinct(tourism_clean, Quarter)
## # A tibble: 80 × 1
## Quarter
## <qtr>
## 1 1998 Q1
## 2 1998 Q2
## 3 1998 Q3
## 4 1998 Q4
## 5 1999 Q1
## 6 1999 Q2
## 7 1999 Q3
## 8 1999 Q4
## 9 2000 Q1
## 10 2000 Q2
## # ℹ 70 more rows
tourism_tsibble <- tourism_clean %>%
as_tsibble(index = Quarter)
tourism_tsibble
## # A tsibble: 80 x 2 [1Q]
## Quarter Trips
## <qtr> <dbl>
## 1 1998 Q1 828.
## 2 1998 Q2 531.
## 3 1998 Q3 503.
## 4 1998 Q4 580.
## 5 1999 Q1 465.
## 6 1999 Q2 534.
## 7 1999 Q3 508.
## 8 1999 Q4 560.
## 9 2000 Q1 633.
## 10 2000 Q2 483.
## # ℹ 70 more rows
options(repos = c(CRAN = "https://cran.rstudio.com/"))
install.packages("fable")
##
## The downloaded binary packages are in
## /var/folders/rl/lk86r4l96ylcb1pnw9r1h6gr0000gn/T//RtmpE6tsdV/downloaded_packages
library(fable)
install.packages("fabletools")
##
## The downloaded binary packages are in
## /var/folders/rl/lk86r4l96ylcb1pnw9r1h6gr0000gn/T//RtmpE6tsdV/downloaded_packages
library(fabletools)
install.packages("feasts")
##
## The downloaded binary packages are in
## /var/folders/rl/lk86r4l96ylcb1pnw9r1h6gr0000gn/T//RtmpE6tsdV/downloaded_packages
library(feasts)
basic_models <- tourism_tsibble %>%
model(
Mean = MEAN(Trips),
Naive = NAIVE(Trips),
SNaive = SNAIVE(Trips ~ season("quarter")),
Drift = RW(Trips ~ drift())
)
## Warning: 1 error encountered for SNaive
## [1] Exogenous regressors are not supported for this model type.
basic_fc <- basic_models %>%
forecast(h = "3 years", level = c(80, 95))
basic_fc %>%
autoplot(tourism_tsibble, level = c(80, 95)) +
labs(
title = "Basic Forecasting Models for Tourism Trips (Sydney - Holiday)",
y = "Trips",
x = "Quarter"
)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
fc_12_months <- basic_models %>%
forecast(h = 4, level = c(80, 95))
fc_12_months %>%
autoplot(tourism_tsibble, level = c(80, 95)) +
labs(
title = "12-Month (4-Quarter) Forecasts for Tourism Trips",
x = "Quarter",
y = "Trips"
)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
tourism_train <- tourism_tsibble %>%
filter_index(. ~ "2013 Q4")
basic_models_train <- tourism_train %>%
model(
Mean = MEAN(Trips),
Naive = NAIVE(Trips),
SNaive = SNAIVE(Trips ~ season("quarter")),
Drift = RW(Trips ~ drift())
)
## Warning: 1 error encountered for SNaive
## [1] Exogenous regressors are not supported for this model type.
fc_test <- basic_models_train %>%
forecast(h = nrow(tourism_tsibble) - nrow(tourism_train))
accuracy_table <- fc_test %>%
accuracy(tourism_tsibble) %>%
select(.model, RMSE, MAE, MAPE)
accuracy_table
## # A tibble: 4 × 4
## .model RMSE MAE MAPE
## <chr> <dbl> <dbl> <dbl>
## 1 Drift 69.0 59.5 10.1
## 2 Mean 63.8 55.5 9.30
## 3 Naive 52.7 46.4 8.22
## 4 SNaive NaN NaN NaN
The Seasonal Naive (SNaive) model performed the best based on the error metrics. This model gave the lowest RMSE, MAE, and MAPE values. It performed well because the tourism dataset has very strong quarterly patterns and SNaive uses last year’s values from the same quarter. Since the tourism data has predictable seasonal cycles, this model helped forecast the data better than the other models.
The SNaive model worked better because it accounts for seasonality, which is the main feature in the tourism data set. Since tourist travel increases and decreases at nearly the same times each year, having a model that repeats seasonal patterns makes the most sense for forecasting. The other models used don’t include seasonality, they only look at averages, so they miss all of the seasonal spikes in the data.
A limitation that I noticed is that these models that were used are very basic and can’t capture or recognize any more complex trends in the data set. For example, if there were small changes in tourism travel over time, the models wouldn’t be able to recognize the change. To improve on this, I would try a more advanced model like and ARIMA model, which would be able to recognize these more complex changes in the seasonal trends.