DATA624_HW3_CameronGray

library(fpp3)
library(tidyverse)

5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

For the population of a country it normally does not have any seasonality to it so just using the drift function what is best for this data.

aus_pop <- global_economy |>
  filter(Country == 'Australia') |>
  select(Year, Population)

aus_pop |>
  model(RW(Population ~ drift())) |> 
  forecast(h = 5) |>
  autoplot() +
  autolayer(
    filter_index(aus_pop)
  )

* Bricks (aus_production)

SNAIVE is really able to capture the seasonal trend from the data that the other two models cannot.

bricks <- aus_production |> 
  select(Quarter, Bricks) |>
  filter(!is.na(Bricks)) 

bricks |>
  model(SNAIVE(Bricks)) |>
  forecast(h = 20)|>
  autoplot() +
  autolayer(
    filter_index(bricks)
  )

* NSW Lambs (aus_livestock)

Again with this seasonal data SNAIVE is the best option.

nswlambs <- aus_livestock |> 
  filter(State == 'New South Wales', Animal == "Lambs") |>
  select(Count)

nswlambs |>
  model(SNAIVE(Count)) |>
  forecast(h = 60)|>
  autoplot() +
  autolayer(
    filter_index(nswlambs)
  )

* Household wealth (hh_budget).

Wealth normally is just growth and not really a seasonal dataset so just using the RW drift is able to capture a good prediction.

wealth <- hh_budget |> 
  select(Wealth)

wealth |>
  model(RW(Wealth ~ drift())) |>
  forecast(h = 5) |>
  autoplot() +
  autolayer(
    filter_index(wealth)
  )

* Australian takeaway food turnover (aus_retail).

Finally, having seasonal data like what can be seen here for the turnover using SNAIVE is best again.

food_turnover <- aus_retail |>
  as.tibble() |>
  filter(Industry == 'Takeaway food services') |>
  select(Month, Turnover)|>
  group_by(Month) |>
  summarise(Turnover = sum(Turnover)) |>
  as_tsibble(index=Month)

food_turnover |>
  model(SNAIVE(Turnover)) |>
  forecast(h = 60) |>
  autoplot() +
  autolayer(
    filter_index(food_turnover)
  )

5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

fb_close <- gafa_stock |>
  filter(Symbol == 'FB') |>
  select(Close) |>
  mutate(day = row_number()) |>
  update_tsibble(index = day, regular = TRUE) |>
  print()

## # A tsibble: 1,258 x 3 [1]
##    Close Date         day
##    <dbl> <date>     <int>
##  1  54.7 2014-01-02     1
##  2  54.6 2014-01-03     2
##  3  57.2 2014-01-06     3
##  4  57.9 2014-01-07     4
##  5  58.2 2014-01-08     5
##  6  57.2 2014-01-09     6
##  7  57.9 2014-01-10     7
##  8  55.9 2014-01-13     8
##  9  57.7 2014-01-14     9
## 10  57.6 2014-01-15    10
## # ℹ 1,248 more rows

fb_close |> 
  autoplot()

b) Produce forecasts using the drift method and plot them.

fb_close |> 
  model(RW(Close ~ drift())) |>
  forecast(h = 90) |>
  autoplot() +
  autolayer(
    filter_index(fb_close)
  )

c) Show that the forecasts are identical to extending the line drawn between the first and last observations.

I just filtered out the first and last values and drew a line between them. You can see the two lines match up perfectly showing that the forecast is just an extension of this line.

min_max <- fb_close |>
  filter(day == min(day) | day == max(day))

fb_close |> 
  model(RW(Close ~ drift())) |>
  forecast(h = 90) |>
  autoplot() +
  autolayer(
    filter_index(fb_close)
  )+
  autolayer(
    filter_index(min_max),
    color = "red"
  )

d) Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

Bootstrapping NAIVE seems to have the best chances because MEAN and RW don’t have much insight into the data. RW mostly just have a general direction for the data which works as almost a trend line of where to expect the data to go but outliers can easily fall off this line. MEAN doesn’t do much other than to just take the mean but just from looking at the confidence intervals you can see a major flaw, it can be anywhere in a massive range. The bootstrapping NAIVE just has a range of options to choose from and is taking the previous data into account which is hard to see with the other functions.

fb_close |>
  model(MEAN(Close)) |>
  forecast(h = 90) |>
  autoplot() +
  autolayer(
    filter_index(fb_close)
  )

fb_close |>
  model(NAIVE(Close)) |>
  generate(h = 90, times = 5, bootstrap = TRUE) |>
  autoplot() +
  autolayer(
    filter_index(fb_close)
  )

5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

It does appear as though the residuals are white noise because they randomly fluctuate from quarter to quarter with very little rhyme or reason.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

What do you conclude?

Based on the given graphs it looks like there is a little variance of \(\pm 30\) for any given quarter but overall the seasonality and the amounts seems to stay relatively consistent. In the lag plot there seem to be periods of growth or decline by the lags but then they swing wildly in the other direction which seems to indicate randomness.

5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

In this case NAIVE makes more sense since the data is yearly and there isn’t decernable seasonality from the yearly data.

# Extract data of interest
aus_exports <- global_economy |>
  filter(Country == "Australia") |>
  select(Exports) |>
  print()

## # A tsibble: 58 x 2 [1Y]
##    Exports  Year
##      <dbl> <dbl>
##  1    13.0  1960
##  2    12.4  1961
##  3    13.9  1962
##  4    13.0  1963
##  5    14.9  1964
##  6    13.2  1965
##  7    12.9  1966
##  8    12.9  1967
##  9    12.3  1968
## 10    12.0  1969
## # ℹ 48 more rows

# Define and estimate a model
fit <- aus_exports |> model(NAIVE(Exports))
# Look at the residuals
fit |> gg_tsresiduals()

# Look a some forecasts
fit |> forecast() |> autoplot(aus_exports)

For the Bricks it makes sense to use the SNAIVE model since it can read into the seasonlity of the quarterly data that is present in the bricks to have a better prediction.

# Extract data of interest
bricks <- aus_production |>
  select(Bricks) |>
  filter(!is.na(Bricks)) |>
  print()

## # A tsibble: 198 x 2 [1Q]
##    Bricks Quarter
##     <dbl>   <qtr>
##  1    189 1956 Q1
##  2    204 1956 Q2
##  3    208 1956 Q3
##  4    197 1956 Q4
##  5    187 1957 Q1
##  6    214 1957 Q2
##  7    227 1957 Q3
##  8    222 1957 Q4
##  9    199 1958 Q1
## 10    229 1958 Q2
## # ℹ 188 more rows

# Define and estimate a model
fit <- bricks |> model(SNAIVE(Bricks))
# Look at the residuals
fit |> gg_tsresiduals()

# Look a some forecasts
fit |> forecast() |> autoplot(bricks)

5.7

For your retail time series (from Exercise 7 in Section 2.10):

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries |>
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

Check the residuals.

fit |> gg_tsresiduals()

Do the residuals appear to be uncorrelated and normally distributed?

The residuals do seem to have a clear correlation because the there appears to be a bit of a function to the lag in the ACF graph. The residuals are normally distributed based on the histogram.

Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))
fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… SNAIV… Trai… 0.439  1.21 0.915  5.23  12.4     1     1 0.768

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Nort… Clothin… Test  0.836  1.55  1.24  5.94  9.06  1.36  1.28 0.601

How sensitive are the accuracy measures to the amount of training data used?

Based on the chart produced down below with different sets of training data it is pretty clear that the more training data there is the more accurate the model is. Despite the weird fluctuation with the second set of data but other than that almost every form of error decreases while the amount of training data increases. I initially just tested by changed the ranges of the cutoff but that led me to realize the models with more training had less data to have to match up with so I also added a cutoff to the models so they had to compete with the same range of data after training. It gave the models with more training data an unfair advantage to have less follow-up data to have to compare to.

myseries_train <- myseries |>
  filter(year(Month) < 1990)
my_series_train_plus5 <- myseries |>
  filter(year(Month) < 1995)
fit <- myseries_train |>
  model(SNAIVE(Turnover))
fc <- fit |>
  forecast(new_data = anti_join(my_series_train_plus5, myseries_train))
fc |> autoplot(my_series_train_plus5)

fit_acc <- accuracy(fit)
fc_acc <- accuracy(fc, my_series_train_plus5)

for (x in c(1995, 2000, 2005, 2010)){
  myseries_train <- myseries |>
    filter(year(Month) < x)
  my_series_train_plus5 <- myseries |>
    filter(year(Month) < x + 5)
  fit <- myseries_train |>
    model(SNAIVE(Turnover))
  fc <- fit |>
    forecast(new_data = anti_join(my_series_train_plus5, myseries_train))
  fc |> autoplot(my_series_train_plus5) |> print()
  fit_acc <- rbind(fit_acc, accuracy(fit))
  fc_acc <- rbind(fc_acc, accuracy(fc, my_series_train_plus5))
}

fit_acc <- cbind(year_cutoff = c(1990,1995, 2000, 2005, 2010), fit_acc)
fc_acc <- cbind(year_cutoff = c(1990,1995, 2000, 2005, 2010), fc_acc)
fit_acc |> print()

##   year_cutoff              State
## 1        1990 Northern Territory
## 2        1995 Northern Territory
## 3        2000 Northern Territory
## 4        2005 Northern Territory
## 5        2010 Northern Territory
##                                              Industry           .model    .type
## 1 Clothing, footwear and personal accessory retailing SNAIVE(Turnover) Training
## 2 Clothing, footwear and personal accessory retailing SNAIVE(Turnover) Training
## 3 Clothing, footwear and personal accessory retailing SNAIVE(Turnover) Training
## 4 Clothing, footwear and personal accessory retailing SNAIVE(Turnover) Training
## 5 Clothing, footwear and personal accessory retailing SNAIVE(Turnover) Training
##          ME      RMSE       MAE       MPE     MAPE MASE RMSSE      ACF1
## 1 0.5222222 0.6137318 0.5444444 13.943347 14.73700    1     1 0.1514912
## 2 0.5376812 0.8791465 0.6884058 10.896813 13.67569    1     1 0.6668788
## 3 0.3472868 1.3961742 1.0248062  5.323851 16.77478    1     1 0.8439995
## 4 0.4010582 1.2843791 0.9587302  5.513408 14.44744    1     1 0.8164384
## 5 0.4397590 1.2182185 0.9192771  5.369537 12.68836    1     1 0.7948338

fc_acc |> print()

##   year_cutoff           .model              State
## 1        1990 SNAIVE(Turnover) Northern Territory
## 2        1995 SNAIVE(Turnover) Northern Territory
## 3        2000 SNAIVE(Turnover) Northern Territory
## 4        2005 SNAIVE(Turnover) Northern Territory
## 5        2010 SNAIVE(Turnover) Northern Territory
##                                              Industry .type       ME     RMSE
## 1 Clothing, footwear and personal accessory retailing  Test 2.250000 2.380126
## 2 Clothing, footwear and personal accessory retailing  Test 0.815000 1.509801
## 3 Clothing, footwear and personal accessory retailing  Test 2.148333 2.324256
## 4 Clothing, footwear and personal accessory retailing  Test 1.746667 2.233234
## 5 Clothing, footwear and personal accessory retailing  Test 0.535000 1.127460
##         MAE       MPE      MAPE      MASE     RMSSE      ACF1
## 1 2.2500000 40.161467 40.161467 4.1326531 3.8781211 0.6415629
## 2 1.0783333  9.289529 14.545537 1.5664211 1.7173490 0.8144617
## 3 2.1516667 23.993435 24.050906 2.0995840 1.6647322 0.6926770
## 4 1.8233333 14.673009 15.521786 1.9018212 1.7387653 0.6743633
## 5 0.8416667  3.335469  6.470488 0.9155745 0.9254992 0.3727848

DATA624_HW3_CameronGray

Cameron Gray

2025-02-18

5.1

5.2

5.3

5.4

5.7