HW3 Data 624 Predictive Analytics 2024 Spring Term

Question 5.1
Question 5.2
Question 5.3
Question 5.4
Question 5.7

Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book. (https://oteinsom/fpp3/)

library(fpp3)
library(tidyverse)

Question 5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

a.Australian Population (global_economy)

australian_population <- global_economy %>%
  filter(Country == "Australia")%>% 
  select(Population)

australian_population%>% 
  model(SNAIVE = SNAIVE(Population ~ lag("year")),
    Naive = NAIVE(Population),
    Drift = RW(Population ~ drift())
    )%>%
  forecast(h=10)%>%
  autoplot(australian_population, level = NULL)+
  labs(title = "Australian Population") +
  guides(colour = guide_legend(title = "Forecast"))

Comparing the forecast methods in Australian Population in the global_economy data set, the drift method seems to represent the linear growth the best.

b.Bricks (aus_production)

aus_brick <- aus_production %>% 
  filter(!is.na(Bricks)) %>%
  select(Bricks)

aus_brick%>%
  model(SNAIVE = SNAIVE(Bricks),
    Naive = NAIVE(Bricks),
    Drift = RW(Bricks ~ drift())
    )%>%
  forecast(h=10)%>%
  autoplot(aus_brick, level = NULL)+
  labs(title = "Brick Production in Australia") +
  guides(colour = guide_legend(title = "Forecast"))

The seasonal naive method projects the closest forecast for the quarterly estimates of brick production in Australia.

c.NSW Lambs (aus_livestock)

aus_NSW_Lambs <- aus_livestock %>%
  filter(State == "New South Wales",
         Animal == "Lambs",
         !is.na(Count))%>% 
  select(Count)

aus_NSW_Lambs%>%
  model(SNAIVE = SNAIVE(Count),
    Naive = NAIVE(Count),
    Drift = RW(Count ~ drift())
    )%>%
  forecast(h=50)%>%
  autoplot(aus_NSW_Lambs, level = NULL)+
  labs(title = "Lamb Production in New South Wales,Australia ") +
  guides(colour = guide_legend(title = "Forecast"))

The seasonal naive method projects the closest forecast for the year/month estimates of lamb production in New South Wales, Australia.

d.Household wealth (hh_budget).

hh_budget_wealth <- hh_budget %>%
  filter(!is.na(Wealth))%>% 
  select(Wealth)

hh_budget_wealth%>%
  model(SNAIVE = SNAIVE(Wealth),
    Naive = NAIVE(Wealth),
    Drift = RW(Wealth ~ drift())
    )%>%
  forecast(h=10)%>%
  autoplot(hh_budget_wealth, level = NULL)+
  labs(title = "Wealth for Australia, Japan, Canada and USA") +
  guides(colour = guide_legend(title = "Forecast"))+
  facet_wrap(~Country, scales = "free")

The drift method would be the most appropriate forecasting model.

e.Australian takeaway food turnover (aus_retail)

aus_retail_takeaway <- aus_retail %>%
  filter(Industry == "Takeaway food services",
         !is.na(Turnover))%>% 
  select(Turnover)

aus_retail_takeaway %>%
  model(SNAIVE = SNAIVE(Turnover),
    Naive = NAIVE(Turnover),
    Drift = RW(Turnover ~ drift())
    )%>%
  forecast(h=50)%>%
  autoplot(aus_retail_takeaway , level = NULL)+
  labs(title = "") +
  guides(colour = guide_legend(title = "Forecast"))+
  facet_wrap(~State, scales = "free")

For all the different State, the seasonal naive method is best for forecasting.

Question 5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

a.Produce a time plot of the series.

gafa_stock_FB <-gafa_stock %>%
  filter(Symbol == "FB") %>%
  mutate(day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE)

gafa_stock_FB%>%
  autoplot(Close)+
  labs(title = "Closing Facebook Stock Price")

b.Produce forecasts using the drift method and plot them.

gafa_stock_FB %>%
  model(RW(Close ~ drift()))%>%
  forecast(h = 100)%>%
  autoplot(gafa_stock_FB)+
  labs(title = "Closing Facebook Stock Price with Drift Forecast")

c.Show that the forecasts are identical to extending the line drawn between the first and last observations.

gafa_stock_FB %>%
  model(RW(Close ~ drift()))%>%
  forecast(h = 100)%>%
  autoplot(gafa_stock_FB)+
  labs(title = "Closing Facebook Stock Price with Drift Forecast") +  
  geom_segment(aes(x = day[1], y = Close[1], xend = day[nrow(gafa_stock_FB)], yend = Close[nrow(gafa_stock_FB)]),color = "red", linetype = 'dashed')

d.Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

gafa_stock_FB %>%
  model(Naive = NAIVE(Close),
    Drift = RW(Close ~ drift()),
    Mean = MEAN(Close)
    )%>%
  forecast(h=200)%>%
  autoplot(gafa_stock_FB , level = NULL)+
  labs(title = "") +
  guides(colour = guide_legend(title = "Forecast"))

The drift method produces the best results because we can see that there is some upward trend in the data.

Question 5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.What do you conclude?

# Extract data of interest
recent_production <- aus_production %>%
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production %>%
  model(SNAIVE(Beer))
# Look at the residuals
fit %>%
  gg_tsresiduals()

# Look a some forecasts
fit %>%
  forecast() %>%
  autoplot(recent_production)

The residuals exhibit a mean close to zero, and there’s no notable correlation in the residual series. In addition, in the ACF graph, for white noise, approximately 95% of the spikes should fall within plus or minus 2 divided by the square root of T, where T represents the time series length. However, if one or more significant spikes extend beyond these bounds, it’s likely that the time series isn’t white noise.We could also test p-values with a Box-Pierce test or a Ljung-Box test to determine if the value is sufficiently small, indicating significance and not in fact white noise.

augment(fit) %>%
  features(.innov, box_pierce, lag=10)

## # A tibble: 1 × 3
##   .model       bp_stat bp_pvalue
##   <chr>          <dbl>     <dbl>
## 1 SNAIVE(Beer)    34.4  0.000160

augment(fit) %>%
  features(.innov, ljung_box, lag=10)

## # A tibble: 1 × 3
##   .model       lb_stat lb_pvalue
##   <chr>          <dbl>     <dbl>
## 1 SNAIVE(Beer)    37.8 0.0000412

Question 5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Australian Exports series from global_economy

# Extract data of interest
global_economy_aus <- global_economy %>%
  filter(Country == "Australia")
# Define and estimate a model
fit_global_economy_aus <- global_economy_aus %>%
  model(NAIVE(Exports))
# Look at the residuals
fit_global_economy_aus %>%
  gg_tsresiduals()

# Look a some forecasts
fit_global_economy_aus %>%
  forecast(h = 5) %>%
  autoplot(global_economy_aus)

augment(fit_global_economy_aus) %>%
  features(.innov, box_pierce, lag=10)

## # A tibble: 1 × 4
##   Country   .model         bp_stat bp_pvalue
##   <fct>     <chr>            <dbl>     <dbl>
## 1 Australia NAIVE(Exports)    14.6     0.148

augment(fit_global_economy_aus) %>%
  features(.innov, ljung_box, lag=10)

## # A tibble: 1 × 4
##   Country   .model         lb_stat lb_pvalue
##   <fct>     <chr>            <dbl>     <dbl>
## 1 Australia NAIVE(Exports)    16.4    0.0896

Bricks series from aus_production

# Extract data of interest
aus_production_Bricks <- aus_production %>%
  filter(!is.na(Bricks))
# Define and estimate a model
fit_aus_production_Bricks  <- aus_production_Bricks  %>%
  model(SNAIVE(Bricks))
# Look at the residuals
fit_aus_production_Bricks %>%
  gg_tsresiduals()

# Look a some forecasts
fit_aus_production_Bricks %>%
  forecast(h = 20) %>%
  autoplot(aus_production_Bricks )

augment(fit_aus_production_Bricks) %>%
  features(.innov, box_pierce, lag=10)

## # A tibble: 1 × 3
##   .model         bp_stat bp_pvalue
##   <chr>            <dbl>     <dbl>
## 1 SNAIVE(Bricks)    292.         0

augment(fit_aus_production_Bricks) %>%
  features(.innov, ljung_box, lag=10)

## # A tibble: 1 × 3
##   .model         lb_stat lb_pvalue
##   <chr>            <dbl>     <dbl>
## 1 SNAIVE(Bricks)    301.         0

Question 5.7

For your retail time series (from Exercise 7 in Section 2.10): a.Create a training dataset consisting of observations before 2011 using

set.seed(1234)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
  filter(year(Month) < 2011)

b.Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c.Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

d.Check the residuals.Do the residuals appear to be uncorrelated and normally distributed?

fit |> gg_tsresiduals()

e.Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train,
                                by = join_by(State, Industry, `Series ID`, Month, Turnover)))
fc |> autoplot(myseries)

f.Compare the accuracy of your forecasts against the actual values.

fit %>%
    accuracy() %>%
    select(c(".model", ".type", "RMSE", "MAE", "MAPE", "MASE"))

## # A tibble: 1 × 6
##   .model           .type     RMSE   MAE  MAPE  MASE
##   <chr>            <chr>    <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(Turnover) Training  2.90  2.22  10.7     1

fc %>%
    accuracy(myseries) %>%
    select(c(".model", ".type", "RMSE", "MAE", "MAPE", "MASE"))

## # A tibble: 1 × 6
##   .model           .type  RMSE   MAE  MAPE  MASE
##   <chr>            <chr> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(Turnover) Test   9.13  7.58  14.4  3.42

g.How sensitive are the accuracy measures to the amount of training data used?

Accuracy measures tend to be quite sensitive to the amount of training data utilized. As the quantity of training data increases, accuracy typically improves, reflecting a more comprehensive understanding of the underlying patterns and relationships within the dataset.