DATA 624 HW3

suppressPackageStartupMessages(library(forecast))
suppressPackageStartupMessages(library(fable))
suppressPackageStartupMessages(library(feasts))
suppressPackageStartupMessages(library(seasonal))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(fpp3))

Home Work #3

Question 5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

global_economy %>% 
  filter(Country == "Australia") %>% 
  model(RW(GDP ~ drift())) %>%
  forecast(h = "10 years") %>%
  autoplot(global_economy)

Bricks (aus_production)

aus_production %>%
  filter(!is.na(Bricks)) %>%
  model(SNAIVE(Bricks)) %>%
  forecast(h = "5 years") %>%
  autoplot(aus_production)

## Warning: Removed 20 row(s) containing missing values (geom_path).

NSW Lambs (aus_livestock)

aus_livestock %>% 
  filter(Animal == "Lambs") %>%
  filter(State == "New South Wales") %>% 
  model(RW(Count ~ drift())) %>%
  forecast(h = "5 years") %>%
  autoplot(aus_livestock)

Household wealth (hh_budget).

hh_budget %>%
  filter(!is.na(Savings)) %>%
  model(NAIVE(Savings)) %>%
  forecast(h = "10 years") %>%
  autoplot(hh_budget)

Australian takeaway food turnover (aus_retail).

takeaway <- aus_retail %>%
  filter(Industry == "Takeaway food services") %>%
  summarise(Turnover = sum(Turnover)) 

takeaway %>%
  model(SNAIVE(Turnover ~ drift())) %>%
  forecast(h = "5 years") %>%
  autoplot(takeaway)

Question 5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

fb_stock <- gafa_stock %>%
  filter(Symbol == "FB")
fb_stock %>%
  autoplot(Close)

Produce forecasts using the drift method and plot them.

fb_stock <- fb_stock %>%
mutate(trading_day = row_number()) %>% update_tsibble(index = trading_day, regular = TRUE)

fb_stock %>%
  model(RW(Close ~ drift())) %>%
  forecast(h = 100) %>%
  autoplot(fb_stock)

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fb_stock %>%
  model(NAIVE(Close)) %>%
  forecast(h = 100) %>%
  autoplot(fb_stock)

Question 5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

Extract data of interest

recent_production <- aus_production %>%
  filter(year(Quarter) >= 1992)

Define and estimate a model

fit <- recent_production %>% model(SNAIVE(Beer))

Look at the residuals

fit %>% gg_tsresiduals()

## Warning: Removed 4 row(s) containing missing values (geom_path).

## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing non-finite values (stat_bin).

Look at some forecasts

fit %>% forecast() %>% autoplot(recent_production)

What do you conclude? The null is rejected - The Residuals are correlated

Question 5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

aus_exports <- global_economy %>%
  filter(Country == "Australia") 

aus_exports %>% autoplot(.vars = Exports)

Australian Exports series

# Define and estimate a model
fit <- aus_exports %>% model(NAIVE(Exports))

# Look at the residuals
fit %>% gg_tsresiduals()

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

Bricks series from aus_production

# Define and estimate a model
fit <- aus_production %>% model(NAIVE(Bricks))

# Look at the residuals
fit %>% gg_tsresiduals()

## Warning: Removed 21 row(s) containing missing values (geom_path).

## Warning: Removed 21 rows containing missing values (geom_point).

## Warning: Removed 21 rows containing non-finite values (stat_bin).

Question 5.7

For your retail time series (from Exercise 8 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

set.seed(12345678)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train %>%
  model(SNAIVE(Turnover))

Check the residuals.

fit %>% gg_tsresiduals()

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing non-finite values (stat_bin).

Do the residuals appear to be uncorrelated and normally distributed? Residuals are correlated and Normally distributed

Produce forecasts for the test data

fc <- fit %>%
    forecast(new_data = anti_join(myseries, myseries_train))

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

fc %>% autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit %>% accuracy()

## # A tibble: 1 × 12
##   State  Industry   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>      <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 North… Clothing,… SNAIV… Trai… 0.439  1.21 0.915  5.23  12.4     1     1 0.768

fc %>% accuracy(myseries)

## # A tibble: 1 × 12
##   .model  State  Industry  .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>   <chr>  <chr>     <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE… North… Clothing… Test  0.836  1.55  1.24  5.94  9.06  1.36  1.28 0.601

How sensitive are the accuracy measures to the amount of training data used?

The accuracy measures are very sensitive to the amount of training data used, the more data used the more accurate the model.