DATA624 HW3

Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book.

Ex 5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock)
Household wealth (hh_budget).
Australian takeaway food turnover (aus_retail).

library(fpp3)

global_economy |>
  filter(Country == 'Australia')|>
  #model(NAIVE(Population))|>
  model(RW(Population~drift()))|>
  forecast(h=10)|>
  autoplot(global_economy)

It has no seasonal trend. RW makes sense. Need to have previos trend.

aus_production |>
  filter(!is.na(Bricks))|>
  model(SNAIVE(Bricks))|>
  forecast(h=10)|>
  autoplot(aus_production)

It has seasonal trend. SNAIVE makes sense.

aus_livestock |>
  filter(Animal=='Lambs', State=='New South Wales')|>
  model(SNAIVE(Count))|>
  forecast(h=10)|>
  autoplot(aus_livestock)

I like the SNAIVE more than NAIVE for this one.

hh_budget |>
  filter(Country=='Australia')|>
  model(RW(Wealth~drift()))|>
  forecast(h=10)|>
  autoplot(hh_budget)

It is not seasonal, I like RW more than NAIVE.

aus_retail |>
  filter(Industry=='Takeaway food services', State=='New South Wales')|>
  #model(RW(Turnover~drift()))|>
  model(SNAIVE(Turnover~lag('year')))|>
  forecast(h=12)|>
  autoplot(aus_retail)

The SNAIVE is the best for this case.

Ex 5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series. Produce forecasts using the drift method and plot them. Show that the forecasts are identical to extending the line drawn between the first and last observations. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

gafa_stock|>
  filter(Symbol=='FB')|>
  autoplot(Close)

# Re-index based on trading days
fb_stock <- gafa_stock |>
  filter(Symbol == "FB") |>
  mutate(day = row_number()) |>
  update_tsibble(index = day, regular = TRUE)

# Fit the models
fb_stock |>
  model(
    Drift = RW(Close ~ drift())
  )|>
  forecast(h=180)|>
  autoplot()+
  autolayer(fb_stock, Close, colour = "black") +
  labs(y = "$US",
       title = "Facebook daily closing stock prices") +
  guides(colour = guide_legend(title = "Forecast"))

# Add a line
fb_stock |>
  model(
    Drift = RW(Close ~ drift())
  )|>
  forecast(h=180)|>
  autoplot()+
  autolayer(fb_stock, Close, colour = "black") +
  geom_line(data=data.frame(x=c(fb_stock$day[[1]],fb_stock$day[[1258]]), y=c(fb_stock$Close[[1]],fb_stock$Close[[1258]])), aes(x=x, y=y), color='red')+
  labs(y = "$US",
       title = "Facebook daily closing stock prices") +
  guides(colour = guide_legend(title = "Forecast"))

# Fit the models
fb_stock |>
  model(
   # `Naïve` = NAIVE(Close)
     SN = SNAIVE(Close)
  )|>
  forecast(h=180)|>
  autoplot()+
  autolayer(fb_stock, Close, colour = "black") +
  labs(y = "$US",
       title = "Facebook daily closing stock prices") +
  guides(colour = guide_legend(title = "Forecast"))

## Warning: 1 error encountered for SN
## [1] Non-seasonal model specification provided, use RW() or provide a different lag specification.

## Warning in max(ids, na.rm = TRUE): max里所有的参数都不存在；回覆-Inf

## Warning in max(ids, na.rm = TRUE): max里所有的参数都不存在；回覆-Inf

## Warning: Removed 180 rows containing missing values (`()`).

The last one is seasonal naive and stay flat without following the historical trend. Drift function is a good approximation.

Ex 5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)

# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))

# Look at the residuals
fit |> gg_tsresiduals()

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

#Try the Ljung-Box test test by using l=2m, m=12 -> l = 24
fit |> augment() |> features(.innov, ljung_box, lag = 24)

The residual has a mean close to zero and almost normally distributed.There is a spike at lag = 4. The lack of correlation suggesting the forecasts are good. The data is a seasonal data and the p-value is small. As such, we can conclude that the residuals are distinguishable from a white noise series.

Ex 5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

#Australian Exports

# Extract data of interest
aus_exports <- global_economy |>
  filter(Country == 'Australia')

# Define and estimate a model
fit_aus <- aus_exports |> model(NAIVE(Exports))

# Look at the residuals
fit_aus |> gg_tsresiduals()

# Look a some forecasts
fit_aus |> forecast() |> autoplot(aus_exports)

#Try the Ljung-Box test test by using l=10
fit_aus |> augment() |> features(.innov, ljung_box, lag = 10)

When I run SNAIVE(), it gives me an error because the data is not seasonal. I can only use NAIVE(). I use lag=10 for non-seasonal data. The p-value is not small. As such, we can conclude that the residuals are indistinguishable from a white noise series.

#Bricks from aus_production

#Use recent production from above

# Define and estimate a model
fit_bricks <- recent_production |> model(SNAIVE(Bricks))

# Look at the residuals
fit_bricks |> gg_tsresiduals()

# Look a some forecasts
fit_bricks |> forecast() |> autoplot(recent_production)

#Try the Ljung-Box test test by using l=2m, m=12 -> l = 24
fit_bricks |> augment() |> features(.innov, ljung_box, lag = 24)

The residual doesn’t have a mean close to zero and left skewed.There is a spike at lag = 1 and 2. The data is a seasonal data and the p-value is small. As such, we can conclude that the residuals are distinguishable from a white noise series.

Ex 5.7

#For your retail time series (from Exercise 8 in Section 2.10):
set.seed(123456)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
#Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries |>
  filter(year(Month) < 2011)

#Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

#Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit_turnover <- myseries_train |> model(SNAIVE(Turnover))

#Check the residuals.

fit_turnover |> gg_tsresiduals()

#Do the residuals appear to be uncorrelated and normally distributed?

#Produce forecasts for the test data

fc <- fit_turnover |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

fc |> autoplot(myseries)

#Compare the accuracy of your forecasts against the actual values.

fit_turnover |> accuracy()

fc |> accuracy(myseries)

#How sensitive are the accuracy measures to the amount of training data used?

The forecast is not very good. It is not very sensitive