Homework 3

5.1

5.1.1

The Australian population data is most appropriately modeled using the drift method, and we have forecast the population growth for 4 years.

aus_pop <- global_economy |>
  filter(Country == "Australia") |>
  select(Country, Year, Population)

aus_pop_fc <- aus_pop |>
  model(RW(Population ~ drift())) |>
  forecast(h = 4)

aus_pop_fc |>
  autoplot(aus_pop)

5.1.2

Due to the seasonality in Bricks the seasonal naive method was used.

bricks <- aus_production |>
  select(Quarter, Bricks) |>
  drop_na()

brick_fc <- bricks |>
  model(snaive = SNAIVE(Bricks)) |>
  forecast(h = 8)

brick_fc |>
  autoplot(bricks)

5.1.3

Given that there is an annual seasonality to lamb slaughter numbers the seasonal naive forecast is the best option.

lambs <- aus_livestock |>
  filter(Animal == "Lambs", State == "New South Wales") |>
  select(Month, Count)

lambs_fc <- lambs |>
  model(snaive = SNAIVE(Count)) |>
  forecast(h = 24)

lambs_fc |> autoplot(lambs)

5.1.4

Given the positive trend of the data a drift forcast was used. Due to the fact wealth is expressed as a percentage of disposable income no aggregation or summarisation was performed as no population data was provided making accurate weighting impossible.

wealth <- hh_budget |>
  select(Country, Year, Wealth)

wealth_fc <- wealth |>
  model(RW(Wealth ~ drift())) |>
  forecast(h = 5)

wealth_fc |>
  autoplot(wealth)

5.1.5

Due to the clear seasonal component a seasonal naive forecast was used.

takeaway <- aus_retail |>
  filter(Industry == "Takeaway food services") |>
  summarise(total_turnover = sum(Turnover)) 

takeaway_fc <- takeaway |>
  model(snaive = SNAIVE(total_turnover)) |>
  forecast(h = 36)

takeaway_fc |> 
  autoplot(takeaway)

5.2

5.2.1

The adjusted closing price of the stock was plotted as it more accurately tracks the value of the stock.

facebook <- gafa_stock |>
  filter(Symbol == "FB") |>
  mutate(day = row_number()) |>
  update_tsibble(index = day, regular = TRUE)

facebook |>
  ggplot(aes(x = Date, y = Adj_Close)) +
  geom_line() +
  labs(x = "Date", y = "Price (USD)", title = "Facebook Adjusted Closing Price 2014-2019")

5.2.2

The drift forecast for the Facebook adjusted close. A long forecast window was chosen to increase visibility.

facebook_fc <- facebook |>
  model(RW(Adj_Close ~ drift())) |>
  forecast(h = 180)

facebook_fc |>
  autoplot(facebook) +
  labs(title = "Facebook Drift Forecast", x = "Trading Day", y = "Price (USD)")

5.2.3

By drawing a line between the first and last points of the facebook data we can see it matches the drift model.

facebook_fc |>
  autoplot(facebook) + 
  geom_segment(x = first(facebook$day),
               y = first(facebook$Adj_Close),
               xend = last(facebook$day),
               yend = last(facebook$Adj_Close),
               color = "red")

5.2.4

Plotting the naive, drift, and a seasonally adjusted naive forecast we can see the seasonally adjusted and naive plots are identical. This indicates no seasonality. The naive forecast is best as the stock price is influenced by many factors that are not in the dataset. This means any trends cannot be adequately explained and therefore cannot be accurately modeled.

facebook_fc <- facebook |>
  model(drift = RW(Adj_Close ~ drift()),
        naive = NAIVE(Adj_Close),
        stlf = decomposition_model(
          STL(Adj_Close ~ trend(window = 7), robust = TRUE),
          NAIVE(season_adjust)
        )) |>
  forecast(h = 180)

facebook_fc |>
  autoplot(facebook) +
  labs(title = "Facebook Drift Forecast", x = "Trading Day", y = "Price (USD)")

5.3

Looking at the residuals plots we see the average is around 0 which is a good sign. The ACF plot does show one period of statistical significance but this would not be unusual given the number of datapoints and is not repeated on future intervals so we can conclude there is no significant correlation. The residuals do not appear normally distributed as there is a bit of a dip in the middle which means prediction intervals assuming normal distribution will be inaccurate.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

From these plots we can conclude the residuals look like white noise and the seasonal naive method is a reasonable forecasting method for this data.

5.4

5.4.1

In the case of the Australian exports we can see there is no seasonality in the data so we will use the naive forecast.

# Extract data of interest
aus_exports <- global_economy |>
  filter(Country == "Australia")

aus_fit <- aus_exports |>
  model(NAIVE(Exports))

aus_fit |>
  forecast() |>
  autoplot(aus_exports)

Looking at the residuals we see no statistically significant correlation and the mean of the residuals is close to 0. Additionally the residuals appear normally distributed, so we can conclude the naive method produces forecasts that account for all available information and the residuals are white noise.

aus_fit |> gg_tsresiduals()

Performing a Ljung-Box test we see the p value is approximately 0.089 which while large enough for us to accept the white noise hypothesis is close to 0.05 threshold.

aus_fit |> augment() |> features(.resid, ljung_box, lag = 10)

## # A tibble: 1 × 4
##   Country   .model         lb_stat lb_pvalue
##   <fct>     <chr>            <dbl>     <dbl>
## 1 Australia NAIVE(Exports)    16.4    0.0896

5.4.2

Repeating the exercise for the Bricks data we see a high degree of variance in the residuals with some large outliers. There is clear statistically significant autocorrelation and the residuals are not normally distributed. We can conclude the seasonal naive method is not a good forecasting method for this data and the residuals are not white noise.

brick_fit <- bricks |>
  model(SNAIVE(Bricks))

brick_fit |>
  forecast(h = 10) |>
  autoplot(bricks)

brick_fit |> gg_tsresiduals()

5.7

5.7.1

set.seed(38472)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries |>
  filter(year(Month) < 2011)

5.7.2

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

5.7.3

fit <- myseries_train |>
  model(SNAIVE(Turnover))

fit |>
  forecast() |>
  autoplot(myseries_train)

5.7.4

Looking at the residuals we see significant correlation from the ACF in multiple lag periods. There are some large outliers in the residuals that could skew the average far enough from 0 to be meaningful. The distribution of the residuals is not normal as there is a rather long leading edge.

fit |> gg_tsresiduals()

5.7.5

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))
fc |> autoplot(myseries)

Windowing the forecast vs. actual section of the data to make the scale easier to see.

fc |> autoplot(
  myseries |>
    filter(ym(Month) >= ym("2010 Jan")))

5.7.6

Looking at the error scores we can see the seasonal naive method is not a great forecast but not terrible either. Overall the degree of error is acceptable but could be much better.

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Queensl… Footwea… SNAIV… Trai…  2.82  6.50  4.75  5.85  8.42     1     1 0.700

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Quee… Footwea… Test   10.4  14.6  11.9  9.07  10.9  2.51  2.25 0.620

5.7.7

We can see that with the smaller dataset from 1995-2011 the accuracy measures are exactly the same. This is because we are using the seasonal naive forecasting method which forecasts by using the last observed value for the same season. The forecast (and subsequently the errors) would only change if we used a training set with fewer values than the last season, or if we ended the training set earlier.

smaller_train <- myseries |>
  filter(1995 < year(Month),
         year(Month) < 2011)

small_fit <- smaller_train |>
  model(SNAIVE(Turnover))

small_fc <- small_fit |>
  forecast(new_data = anti_join(myseries |>
                                  filter(1995 < year(Month)), smaller_train))

small_fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Queensl… Footwea… SNAIV… Trai…  2.92  8.00  6.18  3.58  7.45     1     1 0.707

small_fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Quee… Footwea… Test   10.4  14.6  11.9  9.07  10.9  2.51  2.25 0.620