Forecasting: Principles and Practice Chapter 5 Exercises

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

Bricks (aus_production)

NSW Lambs (aus_livestock)

Household wealth (hh_budget).

Australian takeaway food turnover (aus_retail).

global_economy |>
  filter(Country == "Australia") |>
  model(RW(Population ~ drift())) |>
  forecast(h = 5) |>
  autoplot(global_economy) +
  labs(title = "Australia Population",
       subtitle = "5 Year Population Forecast")

bricks <- aus_production |>
  filter_index("1970 Q1" ~ "2004 Q4") |>
  select(Bricks)

bricks

## # A tsibble: 140 x 2 [1Q]
##    Bricks Quarter
##     <dbl>   <qtr>
##  1    386 1970 Q1
##  2    428 1970 Q2
##  3    434 1970 Q3
##  4    417 1970 Q4
##  5    385 1971 Q1
##  6    433 1971 Q2
##  7    453 1971 Q3
##  8    436 1971 Q4
##  9    399 1972 Q1
## 10    461 1972 Q2
## # ℹ 130 more rows

bricks |>
  model(SNAIVE(Bricks)) |>
  forecast(h = 5) |>
  autoplot(bricks) +
  labs(title = "Bricks",
       subtitle = "Future 5 Quarter Forecast")

aus_livestock |>
  filter(State == "New South Wales", Animal == 'Lambs') |>
  model(SNAIVE(Count)) |>
  forecast(h = 24) |>
  autoplot(aus_livestock) +
  labs(title = "Lambs in New South Wales",
       subtitle = "Dec 2018 - Dec 2020 Forecast")

hh_budget |>
  model(RW(Wealth ~ drift())) |>
  forecast(h = 5) |>
  autoplot(hh_budget) +
  labs(title = "Household Wealth",
       subtitle = "5 Year Household Wealth Forecast")

aus_retail |>
  filter(Industry == "Cafes, restaurants and takeaway food services") |>
  model(RW(Turnover ~ drift())) |>
  forecast( h = 24) |>
  autoplot(aus_retail) +
  labs(title = "Australian Takeaway Food Turnover",
       subtitle = "Apr 1982 - Dec 2018, Forecasted until Dec 2021") +
  facet_wrap(~State, scales = "free")

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

Produce forecasts using the drift method and plot them.

Show that the forecasts are identical to extending the line drawn between the first and last observations.

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fb_stock <- gafa_stock |>
  filter(Symbol == "FB") |>
  mutate(trading_day = row_number()) |>
  update_tsibble(index = trading_day, regular = TRUE)

fb_stock |> autoplot() +
    labs(title = "FB Stock Open Price")

## Plot variable not specified, automatically selected `.vars = Open`

fb_stock |>
  model(RW(Open ~ drift())) |>
  forecast(h = 63) |>
  autoplot(fb_stock) +
  labs(title = "FB Stock Open Price",
       y = "USD($)")

fb_stock |>
  model(RW(Open ~ drift())) |>
  forecast(h = 63) |>
  autoplot(fb_stock) +
  labs(title = "FB Stock Open Price") +
    geom_segment(aes(x = 1, y = 54.83, xend = 1258, yend = 134.45),
               colour = "red", linetype = "dashed")

## Warning in geom_segment(aes(x = 1, y = 54.83, xend = 1258, yend = 134.45), : All aesthetics have length 1, but the data has 1258 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

fb_stock |>
  model(Drift = NAIVE(Open ~ drift()),
        Mean = MEAN(Open),
        Naive = NAIVE(Open)) |>
  forecast(h = 63) |>
  autoplot(fb_stock, level = NULL) +
  labs(title = "FB Stock Open Price",
       y = "USD($)")

None of the additional benchmarks are particularly useful. However, if I had to pick one it would be the drift.

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

The residuals are not white noise. This can be concluded from the results from the ACF function demonstrating peaks in Q4.

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case

aus_exports <- global_economy |>
  filter(Country == "Australia")

aus_exports

## # A tsibble: 58 x 9 [1Y]
## # Key:       Country [1]
##    Country   Code   Year          GDP Growth   CPI Imports Exports Population
##    <fct>     <fct> <dbl>        <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Australia AUS    1960 18573188487.  NA     7.96    14.1    13.0   10276477
##  2 Australia AUS    1961 19648336880.   2.49  8.14    15.0    12.4   10483000
##  3 Australia AUS    1962 19888005376.   1.30  8.12    12.6    13.9   10742000
##  4 Australia AUS    1963 21501847911.   6.21  8.17    13.8    13.0   10950000
##  5 Australia AUS    1964 23758539590.   6.98  8.40    13.8    14.9   11167000
##  6 Australia AUS    1965 25931235301.   5.98  8.69    15.3    13.2   11388000
##  7 Australia AUS    1966 27261731437.   2.38  8.98    15.1    12.9   11651000
##  8 Australia AUS    1967 30389741292.   6.30  9.29    13.9    12.9   11799000
##  9 Australia AUS    1968 32657632434.   5.10  9.52    14.5    12.3   12009000
## 10 Australia AUS    1969 36620002240.   7.04  9.83    13.3    12.0   12263000
## # ℹ 48 more rows

fit <- aus_exports |>
  model(NAIVE(Exports))

fit |> gg_tsresiduals() +
  ggtitle("Residual Plot for Australian Exports")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

fit |> forecast() |> autoplot(aus_exports) +
  ggtitle("Australian Exports")

fit <- bricks |>
  model(SNAIVE(Bricks))

fit |> gg_tsresiduals() +
  ggtitle("Residual Plot for Brick Production")

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

fit |> forecast() |> autoplot(bricks) +
  ggtitle("Australian Brick Production")

This data is not seasonal for the exports, therefore NAIVE is a better choice of model. For the Bricks data, the SNAIVE is better as this is broken down into quarters. The lag plot demonstrates clear seasonality with brick production increasing during Q4 and a reduction in production until the following quarter next year.

Produce forecasts for the 7 Victorian series in aus_livestock using SNAIVE(). Plot the resulting forecasts including the historical data. Is this a reasonable benchmark for these series?

aus_livestock |>
  filter(State == "Victoria") |>
  model(SNAIVE(Count ~ lag("2 years"))) |>
  forecast(h = "2 years") %>%
  autoplot(aus_livestock) +
  labs(title = "Animals in Victoria") +
  facet_wrap(~Animal, scales = "free")

Lambs and Calves may benefit from a trend model as overtime the number of calves has increased steadily and the number of lambs increased. For the rest, there is clear seasonal patterns within the data so a SNAIVE model is more suitable.

Are the following statements true or false? Explain your answer.

Good forecast methods should have normally distributed residuals.

Yes normally distributed residuals outlines that there is less of a chance of the data being white noise and the model being more accurate.

A model with small residuals will give good forecasts.

No, even though there is little difference between the forecast and the actual values, the pattern of the data might change.

The best measure of forecast accuracy is MAPE.

This is generally the most used metric.

If your model doesn’t forecast well, you should make it more complicated.

Nope, sometimes choosing a more simplistic model can provide better results.

Always choose the model with the best forecast accuracy as measured on the test set.

No, this can result in overfitting. Consider the metric the model has against the validation/testing set.

For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries |>
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

#### Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

Check the residuals.

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

#### Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… SNAIV… Trai… 0.439  1.21 0.915  5.23  12.4     1     1 0.768

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Nort… Clothin… Test  0.836  1.55  1.24  5.94  9.06  1.36  1.28 0.601

Consider the number of pigs slaughtered in New South Wales (data set aus_livestock).

Produce some plots of the data in order to become familiar with it.

pigs <- aus_livestock |>
  filter(Animal == "Pigs" & State == "New South Wales")

pigs

## # A tsibble: 558 x 4 [1M]
## # Key:       Animal, State [1]
##       Month Animal State            Count
##       <mth> <fct>  <fct>            <dbl>
##  1 1972 Jul Pigs   New South Wales  97400
##  2 1972 Aug Pigs   New South Wales 114700
##  3 1972 Sep Pigs   New South Wales 109900
##  4 1972 Oct Pigs   New South Wales 108300
##  5 1972 Nov Pigs   New South Wales 122200
##  6 1972 Dec Pigs   New South Wales 106900
##  7 1973 Jan Pigs   New South Wales  96600
##  8 1973 Feb Pigs   New South Wales  96700
##  9 1973 Mar Pigs   New South Wales 121200
## 10 1973 Apr Pigs   New South Wales  99300
## # ℹ 548 more rows

pigs |> autoplot()

## Plot variable not specified, automatically selected `.vars = Count`

pigs |> gg_season()

## Plot variable not specified, automatically selected `y = Count`

pigs |> gg_subseries()

## Plot variable not specified, automatically selected `y = Count`

#### Create a training set of 486 observations, withholding a test set of 72 observations (6 years).

pigs_train <- pigs |>
  filter(year(Month) < 2014)

autoplot(pigs, Count) +
  autolayer(pigs_train, Count, colour = "red")

Try using various benchmark methods to forecast the training set and compare the results on the test set. Which method did best?

pigs_fit <- pigs_train |>
  model(SNAIVE(Count))

accuracy(pigs_fit)

## # A tibble: 1 × 12
##   Animal State    .model .type    ME   RMSE    MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <fct>  <fct>    <chr>  <chr> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Pigs   New Sou… SNAIV… Trai… -848. 14385. 10482. -1.85  10.1     1     1 0.606

pigs_fc <- pigs_fit |>
  forecast(h = 12, level = NULL) |>
  autoplot(pigs_train) +
  labs(title = "Piggy Forecast")
pigs_fc

Due to the stucture of the time series index, seasonal will likely always perform better.

Check the residuals of your preferred method. Do they resemble white noise?

pigs_fit |> gg_tsresiduals() +
  ggtitle("Residuals")

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

No, there does not appear to be white noise as the residuals are consistent.

Create a training set for household wealth (hh_budget) by withholding the last four years as a test set.

hh_budget_train <- hh_budget |>
  filter(Year <= 2010)

fit <- hh_budget_train |>
  model(RW(Wealth ~ drift()))

accuracy(fit)

## # A tibble: 4 × 11
##   Country   .model  .type       ME  RMSE   MAE     MPE  MAPE  MASE RMSSE    ACF1
##   <chr>     <chr>   <chr>    <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 Australia RW(Wea… Trai… 0         23.0  14.9 -0.173   4.26 0.984 0.997 -0.136 
## 2 Canada    RW(Wea… Trai… 1.60e-14  24.2  19.5 -0.137   4.46 0.957 0.984  0.0828
## 3 Japan     RW(Wea… Trai… 1.49e-14  17.0  12.2  0.0106  2.40 0.796 0.914  0.130 
## 4 USA       RW(Wea… Trai… 1.85e-14  32.8  26.9 -0.186   5.32 0.962 0.995  0.0907

fc <- fit |>
  forecast(h = 6, level=NULL) |>
  autoplot(hh_budget_train)+
  labs(title = "6 Year Household Wealth Forecast")

fc

#### Create a training set for Australian takeaway food turnover (aus_retail) by withholding the last four years as a test set.

takeaway <- aus_retail |>
  filter(Industry == "Cafes, restaurants and takeaway food services")

tw_train <- takeaway |>
  filter(year(Month) <= 2014)

Fit all the appropriate benchmark methods to the training set and forecast the periods covered by the test set.

fit <- takeaway |>
  model(RW(Turnover ~ drift()))

fit

## # A mable: 8 x 3
## # Key:     State, Industry [8]
##   State                        Industry                   RW(Turnover ~ drift(…¹
##   <chr>                        <chr>                                     <model>
## 1 Australian Capital Territory Cafes, restaurants and ta…          <RW w/ drift>
## 2 New South Wales              Cafes, restaurants and ta…          <RW w/ drift>
## 3 Northern Territory           Cafes, restaurants and ta…          <RW w/ drift>
## 4 Queensland                   Cafes, restaurants and ta…          <RW w/ drift>
## 5 South Australia              Cafes, restaurants and ta…          <RW w/ drift>
## 6 Tasmania                     Cafes, restaurants and ta…          <RW w/ drift>
## 7 Victoria                     Cafes, restaurants and ta…          <RW w/ drift>
## 8 Western Australia            Cafes, restaurants and ta…          <RW w/ drift>
## # ℹ abbreviated name: ¹`RW(Turnover ~ drift())`

fit |>
  forecast ( h = 48) |>
  autoplot(takeaway) + 
  labs(title = "Australian Takeaway Food Turnover",
       subtitle = "Apr 1982 - Dec 2018, Forecasted until Dec 2022") +
  facet_wrap(~State, scales = "free")

#### We will use the Bricks data from aus_production (Australian quarterly clay brick production 1956–2005) for this exercise.

Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)

bricks_additive <- bricks |>
  model(classical_decomposition(Bricks, type = "additive")) |>
  components() |>
  autoplot() +
  labs(title = "Classical additive decomposition of Australian brick production")

bricks_additive

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

bricks_multi <- bricks |>
  model(classical_decomposition(Bricks, type = "multiplicative")) |>
  components() |>
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Australian brick production")

bricks_multi

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

bricks_stl <- bricks |>
  model(STL(Bricks ~ season(window = 4), robust = TRUE)) |>
  components() |>
  autoplot() +
  labs(title = "STL decomposition of Australian brick production")

bricks_stl

bricks_stl_per <- bricks |>
  model(STL(Bricks ~ season(window = 'periodic'), robust = TRUE)) |>
  components() |>
  autoplot() +
  labs(title = "Periodic STL decomposition of Australian brick production")

bricks_stl_per

Compute and plot the seasonally adjusted data.

bricks

## # A tsibble: 140 x 2 [1Q]
##    Bricks Quarter
##     <dbl>   <qtr>
##  1    386 1970 Q1
##  2    428 1970 Q2
##  3    434 1970 Q3
##  4    417 1970 Q4
##  5    385 1971 Q1
##  6    433 1971 Q2
##  7    453 1971 Q3
##  8    436 1971 Q4
##  9    399 1972 Q1
## 10    461 1972 Q2
## # ℹ 130 more rows

bricks_season <- bricks |>
  model(stl = STL(Bricks)) 

brick_comp <- components(bricks_season)

brick_season_adjust <- bricks |>
  autoplot(Bricks, color = "green") +
  autolayer(components(bricks_season), season_adjust, color = "red") +
  labs(title = "Seasonally adjusted Australian bricks data")

brick_season_adjust

Use a naïve method to produce forecasts of the seasonally adjusted data.

brick_trend <- brick_comp |>
  select(-c(.model, Bricks, trend, season_year, remainder))

brick_trend |>
  model(NAIVE(season_adjust)) |>
  forecast( h = "5 years") |>
  autoplot(brick_trend) +
  labs(title = "Seasonally adjusted Naive forecast", y = "Bricks")

tourism contains quarterly visitor nights (in thousands) from 1998 to 2017 for 76 regions of Australia.

Extract data from the Gold Coast region using filter() and aggregate total overnight trips (sum over Purpose) using summarise(). Call this new dataset gc_tourism.

gc_tourism <- tourism |>
  filter(Region == "Gold Coast") |>
  group_by(Purpose) |>
  summarise(Total_Overnight_Trips = sum(Trips, na.rm = TRUE))

range(gc_tourism$Quarter)

## <yearquarter[2]>
## [1] "1998 Q1" "2017 Q4"
## # Year starts on: January

gc_train_1 <- gc_tourism |>
  slice(1:(n()-4))

gc_train_2 <- gc_tourism |>
  slice(1:(n()-8))

gc_train_3 <- gc_tourism |>
  slice(1:(n()-12))

gc_train_1

## # A tsibble: 316 x 3 [1Q]
## # Key:       Purpose [4]
##    Purpose  Quarter Total_Overnight_Trips
##    <chr>      <qtr>                 <dbl>
##  1 Business 1998 Q1                  65.5
##  2 Business 1998 Q2                  70.8
##  3 Business 1998 Q3                 110. 
##  4 Business 1998 Q4                 111. 
##  5 Business 1999 Q1                  98.0
##  6 Business 1999 Q2                  61.3
##  7 Business 1999 Q3                 121. 
##  8 Business 1999 Q4                 104. 
##  9 Business 2000 Q1                  95.2
## 10 Business 2000 Q2                  84.9
## # ℹ 306 more rows

gc_train_2

## # A tsibble: 312 x 3 [1Q]
## # Key:       Purpose [4]
##    Purpose  Quarter Total_Overnight_Trips
##    <chr>      <qtr>                 <dbl>
##  1 Business 1998 Q1                  65.5
##  2 Business 1998 Q2                  70.8
##  3 Business 1998 Q3                 110. 
##  4 Business 1998 Q4                 111. 
##  5 Business 1999 Q1                  98.0
##  6 Business 1999 Q2                  61.3
##  7 Business 1999 Q3                 121. 
##  8 Business 1999 Q4                 104. 
##  9 Business 2000 Q1                  95.2
## 10 Business 2000 Q2                  84.9
## # ℹ 302 more rows

gc_train_3

## # A tsibble: 308 x 3 [1Q]
## # Key:       Purpose [4]
##    Purpose  Quarter Total_Overnight_Trips
##    <chr>      <qtr>                 <dbl>
##  1 Business 1998 Q1                  65.5
##  2 Business 1998 Q2                  70.8
##  3 Business 1998 Q3                 110. 
##  4 Business 1998 Q4                 111. 
##  5 Business 1999 Q1                  98.0
##  6 Business 1999 Q2                  61.3
##  7 Business 1999 Q3                 121. 
##  8 Business 1999 Q4                 104. 
##  9 Business 2000 Q1                  95.2
## 10 Business 2000 Q2                  84.9
## # ℹ 298 more rows

Compute one year of forecasts for each training set using the seasonal naïve (SNAIVE()) method. Call these gc_fc_1, gc_fc_2 and gc_fc_3, respectively.

gc_tourism <- tourism |>
  filter(Region == "Gold Coast") |>
  group_by(Purpose) |>
  summarise(Total_Overnight_Trips = sum(Trips, na.rm = TRUE))

range(gc_tourism$Quarter)

## <yearquarter[2]>
## [1] "1998 Q1" "2017 Q4"
## # Year starts on: January

gc_train_1 <- gc_tourism |>
  slice(1:(n()-4))

gc_train_2 <- gc_tourism |>
  slice(1:(n()-8))

gc_train_3 <- gc_tourism |>
  slice(1:(n()-12))

gc_train_1

## # A tsibble: 316 x 3 [1Q]
## # Key:       Purpose [4]
##    Purpose  Quarter Total_Overnight_Trips
##    <chr>      <qtr>                 <dbl>
##  1 Business 1998 Q1                  65.5
##  2 Business 1998 Q2                  70.8
##  3 Business 1998 Q3                 110. 
##  4 Business 1998 Q4                 111. 
##  5 Business 1999 Q1                  98.0
##  6 Business 1999 Q2                  61.3
##  7 Business 1999 Q3                 121. 
##  8 Business 1999 Q4                 104. 
##  9 Business 2000 Q1                  95.2
## 10 Business 2000 Q2                  84.9
## # ℹ 306 more rows

gc_train_2

## # A tsibble: 312 x 3 [1Q]
## # Key:       Purpose [4]
##    Purpose  Quarter Total_Overnight_Trips
##    <chr>      <qtr>                 <dbl>
##  1 Business 1998 Q1                  65.5
##  2 Business 1998 Q2                  70.8
##  3 Business 1998 Q3                 110. 
##  4 Business 1998 Q4                 111. 
##  5 Business 1999 Q1                  98.0
##  6 Business 1999 Q2                  61.3
##  7 Business 1999 Q3                 121. 
##  8 Business 1999 Q4                 104. 
##  9 Business 2000 Q1                  95.2
## 10 Business 2000 Q2                  84.9
## # ℹ 302 more rows

gc_train_3

## # A tsibble: 308 x 3 [1Q]
## # Key:       Purpose [4]
##    Purpose  Quarter Total_Overnight_Trips
##    <chr>      <qtr>                 <dbl>
##  1 Business 1998 Q1                  65.5
##  2 Business 1998 Q2                  70.8
##  3 Business 1998 Q3                 110. 
##  4 Business 1998 Q4                 111. 
##  5 Business 1999 Q1                  98.0
##  6 Business 1999 Q2                  61.3
##  7 Business 1999 Q3                 121. 
##  8 Business 1999 Q4                 104. 
##  9 Business 2000 Q1                  95.2
## 10 Business 2000 Q2                  84.9
## # ℹ 298 more rows

gc_fc_1 <- gc_train_1 |>
  model(SNAIVE(Total_Overnight_Trips)) |>
  forecast( h = 4) 

gc_fc_1 |>
  autoplot(gc_train_1) +
  labs(title = "GC Train 1 SNaive Forecast")

gc_fc_2 <- gc_train_2 |>
  model(SNAIVE(Total_Overnight_Trips)) |>
  forecast( h = 4)

gc_fc_2 |> autoplot(gc_train_2) +
  labs(title = "GC Train 2 SNaive Forecast")

gc_fc_3 <- gc_train_3 |>
  model(SNAIVE(Total_Overnight_Trips)) |>
  forecast( h = 4) 

gc_fc_3 |>   autoplot(gc_train_3) +
  labs(title = "GC Train 3 SNaive Forecast")

gc_fc_1 |> accuracy(gc_tourism)

## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 4 observations are missing between 2018 Q1 and 2018 Q4

## # A tibble: 4 × 11
##   .model       Purpose .type    ME  RMSE   MAE    MPE  MAPE   MASE  RMSSE   ACF1
##   <chr>        <chr>   <chr> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>
## 1 SNAIVE(Tota… Busine… Test  NaN   NaN   NaN   NaN    NaN   NaN    NaN    NA    
## 2 SNAIVE(Tota… Holiday Test  NaN   NaN   NaN   NaN    NaN   NaN    NaN    NA    
## 3 SNAIVE(Tota… Other   Test  NaN   NaN   NaN   NaN    NaN   NaN    NaN    NA    
## 4 SNAIVE(Tota… Visiti… Test   20.4  64.2  59.0   5.41  17.8   1.39   1.20 -0.574

gc_fc_2 |> accuracy(gc_tourism)

## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 4 observations are missing between 2018 Q1 and 2018 Q4

## # A tibble: 4 × 11
##   .model     Purpose .type     ME  RMSE   MAE     MPE  MAPE   MASE  RMSSE   ACF1
##   <chr>      <chr>   <chr>  <dbl> <dbl> <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
## 1 SNAIVE(To… Busine… Test  NaN    NaN   NaN   NaN     NaN   NaN    NaN    NA    
## 2 SNAIVE(To… Holiday Test  NaN    NaN   NaN   NaN     NaN   NaN    NaN    NA    
## 3 SNAIVE(To… Other   Test  NaN    NaN   NaN   NaN     NaN   NaN    NaN    NA    
## 4 SNAIVE(To… Visiti… Test    7.05  53.9  47.2   0.253  15.7   1.12   1.01 -0.724

gc_fc_3 |> accuracy(gc_tourism)

## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 4 observations are missing between 2018 Q1 and 2018 Q4

## # A tibble: 4 × 11
##   .model    Purpose .type    ME  RMSE   MAE    MPE   MAPE    MASE   RMSSE   ACF1
##   <chr>     <chr>   <chr> <dbl> <dbl> <dbl>  <dbl>  <dbl>   <dbl>   <dbl>  <dbl>
## 1 SNAIVE(T… Busine… Test  NaN   NaN   NaN   NaN    NaN    NaN     NaN     NA    
## 2 SNAIVE(T… Holiday Test  NaN   NaN   NaN   NaN    NaN    NaN     NaN     NA    
## 3 SNAIVE(T… Other   Test  NaN   NaN   NaN   NaN    NaN    NaN     NaN     NA    
## 4 SNAIVE(T… Visiti… Test   13.3  28.4  24.9   5.20   8.84   0.575   0.520 -0.251

It’s clear that the gc_fc_3 has the lowest MAPE. This is due removing the last 3 years of observations.