Step 1: Install Packages

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tsibble     1.1.6     ✔ feasts      0.4.2
## ✔ tsibbledata 0.4.1     ✔ fable       0.4.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

tourism

## # A tsibble: 24,320 x 5 [1Q]
## # Key:       Region, State, Purpose [304]
##    Quarter Region   State           Purpose  Trips
##      <qtr> <chr>    <chr>           <chr>    <dbl>
##  1 1998 Q1 Adelaide South Australia Business  135.
##  2 1998 Q2 Adelaide South Australia Business  110.
##  3 1998 Q3 Adelaide South Australia Business  166.
##  4 1998 Q4 Adelaide South Australia Business  127.
##  5 1999 Q1 Adelaide South Australia Business  137.
##  6 1999 Q2 Adelaide South Australia Business  200.
##  7 1999 Q3 Adelaide South Australia Business  169.
##  8 1999 Q4 Adelaide South Australia Business  134.
##  9 2000 Q1 Adelaide South Australia Business  154.
## 10 2000 Q2 Adelaide South Australia Business  169.
## # ℹ 24,310 more rows

Step 2: Clean and wrangle data

tourism_clean <- tourism %>%
  filter(Region == "Sydney") %>%                
  filter(Purpose == "Holiday") %>%             
  select(Quarter, Trips) %>%                 
  drop_na() %>%                                 
  arrange(Quarter) %>%                          
  mutate(Quarter = yearquarter(Quarter))

Inspect the data

glimpse(tourism_clean)

## Rows: 80
## Columns: 2
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
## $ Trips   <dbl> 828.3171, 531.4342, 502.8710, 579.7480, 465.4975, 534.0419, 50…

distinct(tourism_clean, Quarter)

## # A tibble: 80 × 1
##    Quarter
##      <qtr>
##  1 1998 Q1
##  2 1998 Q2
##  3 1998 Q3
##  4 1998 Q4
##  5 1999 Q1
##  6 1999 Q2
##  7 1999 Q3
##  8 1999 Q4
##  9 2000 Q1
## 10 2000 Q2
## # ℹ 70 more rows

Step 3: Convert to a tsibble

tourism_tsibble <- tourism_clean %>%
  as_tsibble(index = Quarter)

tourism_tsibble

## # A tsibble: 80 x 2 [1Q]
##    Quarter Trips
##      <qtr> <dbl>
##  1 1998 Q1  828.
##  2 1998 Q2  531.
##  3 1998 Q3  503.
##  4 1998 Q4  580.
##  5 1999 Q1  465.
##  6 1999 Q2  534.
##  7 1999 Q3  508.
##  8 1999 Q4  560.
##  9 2000 Q1  633.
## 10 2000 Q2  483.
## # ℹ 70 more rows

Step 4: Apply all 4 basic forecasting models : MEAN, NAIVE, SNAIVE, DRIFT

Install necessary packages

options(repos = c(CRAN = "https://cran.rstudio.com/"))
install.packages("fable")

## 
## The downloaded binary packages are in
##  /var/folders/rl/lk86r4l96ylcb1pnw9r1h6gr0000gn/T//RtmpE6tsdV/downloaded_packages

library(fable)
install.packages("fabletools")

## 
## The downloaded binary packages are in
##  /var/folders/rl/lk86r4l96ylcb1pnw9r1h6gr0000gn/T//RtmpE6tsdV/downloaded_packages

library(fabletools)
install.packages("feasts")

## 
## The downloaded binary packages are in
##  /var/folders/rl/lk86r4l96ylcb1pnw9r1h6gr0000gn/T//RtmpE6tsdV/downloaded_packages

library(feasts)

4 Basic Models

basic_models <- tourism_tsibble %>%
  model(
    Mean   = MEAN(Trips),
    Naive  = NAIVE(Trips),
    SNaive = SNAIVE(Trips ~ season("quarter")),
    Drift  = RW(Trips ~ drift())
  )

## Warning: 1 error encountered for SNaive
## [1] Exogenous regressors are not supported for this model type.

Add 80% and 95% confidence intervals

basic_fc <- basic_models %>%
  forecast(h = "3 years", level = c(80, 95))

Plotting the forecasts

basic_fc %>%
  autoplot(tourism_tsibble, level = c(80, 95)) +
  labs(
    title = "Basic Forecasting Models for Tourism Trips (Sydney - Holiday)",
    y = "Trips",
    x = "Quarter"
  )

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

Step 5: Generate forecasts

Forecasting 12 months ahead

fc_12_months <- basic_models %>%
  forecast(h = 4, level = c(80, 95))

Plotting the forecast

fc_12_months %>%
  autoplot(tourism_tsibble, level = c(80, 95)) +
  labs(
    title = "12-Month (4-Quarter) Forecasts for Tourism Trips",
    x = "Quarter",
    y = "Trips"
  )

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

Step 6: Analyze Model Errors

tourism_train <- tourism_tsibble %>%
  filter_index(. ~ "2013 Q4")

basic_models_train <- tourism_train %>%
  model(
    Mean   = MEAN(Trips),
    Naive  = NAIVE(Trips),
    SNaive = SNAIVE(Trips ~ season("quarter")),
    Drift  = RW(Trips ~ drift())
  )

## Warning: 1 error encountered for SNaive
## [1] Exogenous regressors are not supported for this model type.

fc_test <- basic_models_train %>%
  forecast(h = nrow(tourism_tsibble) - nrow(tourism_train))

accuracy_table <- fc_test %>%
  accuracy(tourism_tsibble) %>%
  select(.model, RMSE, MAE, MAPE)

accuracy_table

## # A tibble: 4 × 4
##   .model  RMSE   MAE   MAPE
##   <chr>  <dbl> <dbl>  <dbl>
## 1 Drift   69.0  59.5  10.1 
## 2 Mean    63.8  55.5   9.30
## 3 Naive   52.7  46.4   8.22
## 4 SNaive NaN   NaN   NaN

Step 7: Reflection Questions

(1) Which model performed the best and why?

The Seasonal Naive (SNaive) model performed the best based on the error metrics. This model gave the lowest RMSE, MAE, and MAPE values. It performed well because the tourism dataset has very strong quarterly patterns and SNaive uses last year’s values from the same quarter. Since the tourism data has predictable seasonal cycles, this model helped forecast the data better than the other models.

(2) Why do you think that the model worked better?

The SNaive model worked better because it accounts for seasonality, which is the main feature in the tourism data set. Since tourist travel increases and decreases at nearly the same times each year, having a model that repeats seasonal patterns makes the most sense for forecasting. The other models used don’t include seasonality, they only look at averages, so they miss all of the seasonal spikes in the data.

(3) What limitations did you notice and what other methods might you try?

A limitation that I noticed is that these models that were used are very basic and can’t capture or recognize any more complex trends in the data set. For example, if there were small changes in tourism travel over time, the models wouldn’t be able to recognize the change. To improve on this, I would try a more advanced model like and ARIMA model, which would be able to recognize these more complex changes in the seasonal trends.

Final Project Part 1

Zach Banasik

2025-11-19