Data 624 - Hw3

library(dplyr)
library(feasts)
library(fpp3)
library(ggfortify)
library(httr)
library(lubridate)
library(readr)
library(readxl)
library(seasonal)
library(stats)
library(tsibble)
library(tsibbledata)
library(tidyr)
library(USgas)

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

global_economy %>% 
  filter(Country == "Australia") %>%
  model(RW(Population ~ drift())) %>%
  forecast(h = 6) %>%
  autoplot(global_economy)

Bricks (aus_production)

aus_production %>% 
  filter(!is.na(Bricks)) %>%
  model(SNAIVE(Bricks)) %>%
  forecast(h = 15) %>%
  autoplot(aus_production)

## Warning: Removed 20 rows containing missing values (`geom_line()`).

NSW Lambs (aus_livestock)

aus_livestock %>%
  filter(State == "New South Wales", 
         Animal == "Lambs") %>%
  model(NAIVE(Count)) %>%
  forecast(h = 25) %>%
  autoplot(aus_livestock)

Household wealth (hh_budget)

hh_budget %>%
  model(RW(Wealth ~ drift())) %>%
  forecast(h = 6) %>%
  autoplot(hh_budget)

Australian takeaway food turnover (aus_retail)

aus_retail %>%
  filter(Industry == "Cafes, restaurants and takeaway food services") %>%
  model(RW(Turnover ~ drift())) %>%
  forecast(h = 35) %>%
  autoplot(aus_retail) +
  facet_wrap(~State, scales = "free")

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

fb_stock <- gafa_stock %>%
  filter(Symbol == "FB") %>%
  mutate(day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE)

fb_stock%>%
  autoplot(Open)

Produce forecasts using the drift method and plot them.

fb_stock %>%
  model(RW(Open ~ drift())) %>%
  forecast(h = 64) %>%
  autoplot(fb_stock)

Show that the forecasts are identical to extending the line drawn between the first and last observations.

fb_stock %>%
  model(RW(Open ~ drift())) %>%
  forecast(h = 63) %>%
  autoplot(fb_stock) +
  geom_segment(aes(x = 1, y = 55.83, xend = 1258, yend = 134.46),
               colour = "green", linetype = "dashed")

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
The other benchmark functions do not seem to work well. Drift is probably most effective.

fb_stock %>%
  model(Mean = MEAN(Open),
        `Naïve` = NAIVE(Open),
        Drift = NAIVE(Open ~ drift())) %>%
  forecast(h = 64) %>%
  autoplot(fb_stock, level = NULL)

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
It does not look like white noise. The residuals are distinguishable from white noise because the p-value is below 0.05. The residuals are centered at about zero. The ACF plot lags too much. The model is not a good fit.

# Extract data of interest
 recent_production <- aus_production %>% filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))
# Look at the residuals
fit %>% gg_tsresiduals()

## Warning: Removed 4 rows containing missing values (`geom_line()`).

## Warning: Removed 4 rows containing missing values (`geom_point()`).

## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).

# Look a some forecasts
fit %>% forecast() %>% autoplot(recent_production)

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
Autralian Exports: Naive model is good because the residuals have a men of zero and the residuals cannot be distinguished from white noise.
Bricks: SNaive is not that good to use.

# Extract data of interest
ae <- global_economy %>%
  filter(Country == "Australia")

# Define and estimate a model
fit <- ae %>% model(NAIVE(Exports))

# Look at the residuals
fit %>% gg_tsresiduals() +
  ggtitle("Residual Plots for Australian Exports")

## Warning: Removed 1 row containing missing values (`geom_line()`).

## Warning: Removed 1 rows containing missing values (`geom_point()`).

## Warning: Removed 1 rows containing non-finite values (`stat_bin()`).

# Look at some forecasts
fit %>% forecast() %>% autoplot(ae) +
  ggtitle("Annual Australian Exports")

# Box-Pierce test
fit %>%
  augment() %>% 
  features(.innov, box_pierce, lag = 11, dof = 0)

## # A tibble: 1 × 4
##   Country   .model         bp_stat bp_pvalue
##   <fct>     <chr>            <dbl>     <dbl>
## 1 Australia NAIVE(Exports)    15.3     0.169

fit %>%
  augment()%>% features(.innov, ljung_box, lag = 11, dof = 0)

## # A tibble: 1 × 4
##   Country   .model         lb_stat lb_pvalue
##   <fct>     <chr>            <dbl>     <dbl>
## 1 Australia NAIVE(Exports)    17.3    0.0999

For your retail time series (from Exercise 8 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

#set.seed(12345678)
myseries <- aus_retail %>% filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries %>% filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train %>%
  model(SNAIVE(Turnover))

Check the residuals. Do the residuals appear to be uncorrelated and normally distributed?

fit %>% gg_tsresiduals()

## Warning: Removed 12 rows containing missing values (`geom_line()`).

## Warning: Removed 12 rows containing missing values (`geom_point()`).

## Warning: Removed 12 rows containing non-finite values (`stat_bin()`).

Produce forecasts for the test data

fc <- fit %>%  forecast(new_data = anti_join(myseries, myseries_train))

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

fc %>% autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit %>% accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Western… Cafes, … SNAIV… Trai…  10.5  16.5  12.4  9.03  10.6     1     1 0.830

How sensitive are the accuracy measures to the amount of training data used?
There is more accuracy with more training data but only up to a certain extent of training data.