Data 624 Homework 3

Exercise 5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:.

Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock)
Household wealth (hh_budget). -Australian takeaway food turnover (aus_retail).

A. Australian Population (global_economy).

global_economy %>%
  filter(Code == 'AUS') %>% 
  model(RW(Population ~ drift())) %>% 
  forecast(h = '15 years') %>% 
  autoplot(global_economy) +
  labs(title = 'Australian Population Drift Forecast')

B. Bricks (aus_production).

aus_production %>% 
  # Remove NAs from the "Bricks" column.
  drop_na(Bricks) %>%
  # Apply a seasonal naive forecast and plot the data.
  model(SNAIVE(Bricks)) %>% 
  forecast(h = '15 years') %>% 
  autoplot(aus_production) +
  labs(title = 'Australian Clay Brick Production Seasonal Naive Forecast')

C. NSW Lambs (aus_livestock).

# Disable scientific numbers for readability purposes.
options(scipen = 999)

# Filter the "aus_livestock" dataset by Animal (Lambs) and State (New South Wales).
NSWLambs <- aus_livestock %>% 
  filter(Animal == 'Lambs' & State == 'New South Wales')

# Apply a seasonal naive forecast and plot the data.
NSWLambs %>%
  model(SNAIVE(Count)) %>% 
  forecast(h = '15 years') %>% 
  autoplot(NSWLambs) +
  labs(title = 'New South Wales Lambs Seasonal Naive Forecast')

D. Household wealth (hh_budget).

hh_budget %>%
  model(RW(Wealth ~ drift())) %>% 
  forecast(h = '15 years') %>% 
  autoplot(hh_budget) +
  labs(title = 'Household Wealth Drift Forecast')

E. Australian takeaway food turnover (aus_retail).

TakeAwayFoodTurnover <- aus_retail %>%
  filter(Industry == 'Cafes, restaurants and takeaway food services')
TakeAwayFoodTurnover %>%
  model(RW(Turnover ~ drift())) %>% 
  forecast(h = '15 years') %>% 
  autoplot(TakeAwayFoodTurnover) +
  labs(title = 'Australian Takeaway Food Turnover Drift Forecast')

Exercise 5.2

Use the Facebook stock price data set (gafa_stock) to do the following:.

a. Produce a time plot of the series.

b. Produce forecasts using the drift method and plot them.

c. Show that the forecasts are identical to extending the line drawn between the first and last observations.

d. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

A. Produce a time plot of the series.

facebook_stock <- gafa_stock %>% 
  filter(Symbol == 'FB')

facebook_stock %>% autoplot(Close) +
  labs(title = 'Facebook Stock Time Plot',
       y = 'Closing Value',
       x = 'Date')

B. Produce forecasts using the drift method and plot them.

# Due to an issue with the "gafa_stock" dataset, I was getting the following error -
# "Error: Problem with `mutate()` input `RW(Close ~ drift())`. x `new_data.tbl_ts(.data, round(n))`
# can't handle tsibble of irregular interval. ℹ Input `RW(Close ~ drift())` is `(function (object, ...)".
# To fix this issue, I altered the suggested fix for this issue documented here - https://github.com/tidyverts/tsibbledata/issues/8. 
facebook_stock <- gafa_stock %>%
  filter(Symbol == "FB", year(Date) >= 2018) %>%
  mutate(Date = Date) %>%
  update_tsibble(index = Date, regular = TRUE)

fb_forecast <- facebook_stock %>%
  fill_gaps() %>% 
  model(RW(Close ~ drift())) %>%
  forecast(h = 15)

fb_forecast %>% autoplot(facebook_stock) +
  labs(title = 'Facebook Stock Drift Forecast',
       y = 'Closing Value',
       x = 'Date')

C. Show that the forecasts are identical to extending the line drawn between the first and last observations.

fb_forecast %>% autoplot(facebook_stock) +
  labs(title = 'Facebook Stock Drift Forecast',
       y = 'Closing Value',
       x = 'Date') +
  geom_segment(
    aes(
      xend = Date[length(Date)],
      yend = Close[length(Close)],
      x = Date[1],
      y = Close[1]
    ),
    linetype = 'dashed',
    color = '#DC143C',
    linetype = 'dashed'
  )

D. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

facebook_stock <- gafa_stock %>%
  filter(Symbol == "FB", year(Date) >= 2018) %>%
  mutate(Date = Date) %>%
  update_tsibble(index = Date, regular = TRUE)

fb_forecast <- facebook_stock %>%
  fill_gaps() %>% 
  model(
      Naive = NAIVE(Close),
      `Seasonal Naive` = SNAIVE(Close),
      Mean = MEAN(Close),
      Drift = RW(Close ~ drift())
  ) %>%
  forecast(h = 150)

fb_forecast %>% autoplot(facebook_stock, level = FALSE) +
  labs(title = 'Facebook Stock Forecasts Multiple Forecast Models',
       y = 'Closing Value',
       x = 'Date')

Due to the volatility of stock values (they tend to rise and fall unpredictably), I would say the Drift forecast model is the best fit for this kind of data. The above data is linear in nature and can go up and down which is also true of drift forecasts which makes it the best candidate for capturing upward and downward trends in the data.

Exercise 5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest.
recent_production <- aus_production %>%
  filter(year(Quarter) >= 1992)
# Define and estimate a model.
fit <- recent_production %>% model(SNAIVE(Beer))
# Look at the residuals.
fit %>% gg_tsresiduals()

# Look at some forecasts.
fit %>% forecast() %>% autoplot(recent_production)

A. What do you conclude?

According to the .resid plot, the residuals are close to, but not normal. The acf plot shows a large spike for the 4th lag of the first quarter. The fact that this spike is outside the bounds for the residuals to be considered white noise tells us that they are not white noise. Additionally, the forecasts plot shows a strong seasonal pattern for beer production.

Exercise 5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

A. Repeat the previous exercise using the Australian Exports series from global_economy.

australian_exports <- global_economy %>% 
  filter(Code == 'AUS')

# Define and estimate the Naive model.
australian_exports_fit <- australian_exports %>% model(NAIVE(Exports))

# Look at the residuals.
australian_exports_fit %>% gg_tsresiduals()

# Look at the forecasts.
australian_exports_fit %>% forecast() %>% autoplot(australian_exports)

A Conclusion. What do you conclude?

According to the .resid plot in this case, the residuals appear to be close to normal (more so than those of the Beer Production data). The acf plot shows a spike around the 1st lag that falls outside the bounds for residual white noise suggesting that the residuals should not be considered as white noise. The forecasts plot for this case does not show a strong seasonal pattern; rather it displays a cyclical and trending pattern.

B. Repeat the previous exercise using the Bricks series from aus_production.

# Define and estimate the SNaive model.
australian_bricks_fit <- aus_production %>% model(SNAIVE(Bricks))

# Look at the residuals.
australian_bricks_fit %>% gg_tsresiduals()

# Look at the forecasts.
australian_bricks_fit %>% forecast() %>% autoplot(aus_production)

B Conclusion. What do you conclude?

The .resid plot for the Bricks series suggests that the residuals are not normally distributed due to the long left tail. The acf plot shows numerous spikes that fall outside the bounds for residual white noise suggesting that the residuals are not white noise. The forecasts plot for this series shows seasonality.

Exercise 5.7

For your retail time series (from Exercise 8 in Section 2.10):.

a. Create a training dataset consisting of observations before 2011 using:

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

b. Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c. Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train %>%
  model(SNAIVE(Turnover))

retail_forecast <- fit %>% 
  forecast()

d. Check the residuals.

fit %>% gg_tsresiduals()

Do the residuals appear to be uncorrelated and normally distributed?

The residuals do look like they are close to normal, and they do not look like white noise according to the ACF plot so therefore they are correlated.

e. Produce forecasts for the test data

fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))
fc %>% autoplot(myseries)

f. Compare the accuracy of your forecasts against the actual values.

fit %>% accuracy()

## # A tibble: 1 x 12
##   State  Industry  .model  .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>     <chr>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 South… Liquor r… SNAIVE… Trai…  1.28  2.62  2.00  5.89  10.8  1.00     1 0.677

fc %>% accuracy(myseries)

## # A tibble: 1 x 12
##   .model   State  Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>  <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(… South… Liquor … Test   5.57  7.07  5.73  11.0  11.4  2.86  2.70 0.571

g. How sensitive are the accuracy measures to the amount of training data used?

The accuracy measures are highly sensitive to the training/test data split due to the fact that there is a lot of trend and variability within the data. Going off of the "accuracy()" table results, Percentage based measurements (i.e. MAPE, MPE, etc.) are less effected by this than the Mean based error measurements (MAE, ME, etc.).

Data 624 Homework 3

Week 4 Forecasting

Stephen Haslett

9/14/2021

Exercise 5.1

Exercise 5.2

Exercise 5.3

Exercise 5.4

Exercise 5.7