Homework 3

1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

Bricks (aus_production)

NSW Lambs (aus_livestock)

Household wealth (hh_budget).

Australian takeaway food turnover (aus_retail).

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

library(tidyverse)

The population of Australia is increasing , so there’s an upward trend.For that reason I’ll use RW(y ~ drift())

aus_economy <- global_economy %>%
    filter(Code == "AUS")

aus_economy %>%
  model(Drift = RW(Population ~ drift())) %>%
  forecast(h = 15) %>%
  autoplot(aus_economy) +
    labs(title = "Australian Population Forcast")

Bricks data (as seen on previous assignments) has apparent seasonality so SNAIVE(y) will be a good choice

aus_production %>%
  filter(!is.na(Bricks)) %>%
  model(SNAIVE(Bricks ~ lag("year"))) %>%
  forecast(h = 15) %>%
  autoplot(aus_production) +
    labs(title = "Australian Bricks Production Forcast")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

NSW Lambs has neither seasonality nor cyclic patterns therefore a good fit will be NAIVE(y)

aus_livestock %>%
  filter(State == "New South Wales", 
         Animal == "Lambs") %>%
  model(NAIVE(Count)) %>%
  forecast(h = 24) %>%
  autoplot(aus_livestock) +
  labs(title = "Lambs in New South Wales",
       subtitle = "July 1976 - Dec 2018, Forecasted until Dec 2020")

Household wealth now has a slightly upward trend so I’ll use RW(y ~ drift())

hh_budget %>%
  model(Drift = RW(Wealth ~ drift())) %>%
  forecast(h = 15) %>%
  autoplot(hh_budget) +
    labs(title = "Household wealth Forcast")

Retail turnover, especially for takeaway food, can exhibit seasonal patterns so I’ll use SNAIVE(y)

aus_retail %>%
  filter(State == "South Australia",
         Industry == 'Takeaway food services') %>%
  model(SNAIVE(Turnover ~ lag("year"))) %>%
  forecast(h = 15) %>%
  autoplot(aus_retail) +
    labs(title = "South Australian takeaway food turnover Forcast")

2 Use the Facebook stock price (data set gafa_stock) to do the following:

1) Produce a time plot of the series.

2) Produce forecasts using the drift method and plot them.

3)Show that the forecasts are identical to extending the line drawn between the first and last observations.

4)Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

1)

fb_stock <- gafa_stock %>%
  filter(Symbol == "FB")

fb_stock_ts <-  as_tsibble(fb_stock, key = "Symbol", index = "Date", regular = TRUE) %>% fill_gaps()

autoplot(fb_stock, Close) + 
  ggtitle("Facebook Stock Price Time Series") + 
  xlab("Date") + 
  ylab("Stock Price")

2)

fb_stock_ts %>%
model(Drift = RW(Close ~ drift())) %>%
  forecast(h = 30) %>%
  autoplot(fb_stock) +
    labs(title = "Facebook Close Price Forcast")

3)

data <- data.frame(x1 = as.Date('2014-01-02'), x2 = as.Date('2018-12-31'), y1 = 54.71, y2 = 131.09)

fb_stock_ts %>%
model(Drift = RW(Close ~ drift())) %>%
  forecast(h = 90) %>%
  autoplot(fb_stock) +
    labs(title = "Facebook Close Price Forcast") +
  geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, colour = "segment"), data = data)

4)

fb_stock_ts %>%
  model(
      Mean = MEAN(Close),
      Naive = NAIVE(Close),
      Drift = RW(Close ~ drift())
  ) %>%
  forecast(h = 90) %>%
  autoplot(fb_stock_ts)

The naive benchmark provides the most accurate forecast for this dataset, as its prediction interval is the narrowest compared to other methods.

3 Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.What do you conclude?

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

The plot indicates that the results differ significantly from a white noise series, as the values are relatively small. The residuals exhibit a mean centered around zero and maintain constant variance, suggesting that the series is not simply white noise. Additionally, the ACF plot reveals that the value at lag 4 is notably larger than the others. This peak can be linked to seasonal patterns, with peaks occurring every fourth quarter (Q4) and troughs in the second quarter (Q2). This cyclical behavior highlights the influence of seasonal effects on the data, indicating that further exploration of seasonal decomposition could provide additional insights into the underlying trends.

4) Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Australian Exports

# Extract data of interest
aus_economy <- global_economy %>%
    filter(Country == "Australia") 
# Define and estimate a model
fit <- aus_economy %>% model(NAIVE(Exports))
# Look at the residuals
fit %>% gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit %>% forecast() %>% autoplot(aus_economy)

The graph shows that the residuals of the model fitted to Australian exports are mostly random, with no strong autocorrelations and a roughly normal distribution. The model seems to capture the overall trend well, though there are some larger deviations in certain years, especially in the early 2000s.

Bricks

fit <- aus_production %>%
  filter(!is.na(Bricks)) %>%
  model(SNAIVE(Bricks ~ lag("year")))
# Look at the residuals
fit %>% gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
# Look a some forecasts
fit %>% forecast() %>% 
  autoplot(aus_production)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

The residuals for the Bricks series from aus_production show some large deviations and autocorrelation at shorter lags, indicating the model does not fully capture certain time-dependent patterns. The residuals are skewed, with some extreme values, suggesting the model could be improved to better fit the data.

7 For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

set.seed(2001)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

myseries_train <- myseries |>
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

Check the residuals.Do the residuals appear to be uncorrelated and normally distributed?

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

The ACF plot shows significant autocorrelation at the first few lags, with values outside the confidence bounds, indicating the residuals are not entirely uncorrelated. The histogram of the residuals on the other hand is roughly bell-shaped, indicating a near-normal distribution. However, there is some asymmetry, the tails are slightly thicker than in a normal distribution, suggesting a slight deviation from normality.

Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual values

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 South A… Takeawa… SNAIV… Trai…  1.60  5.20  4.12  3.44  10.2     1     1 0.797

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Sout… Takeawa… Test   15.9  19.2  16.2  18.5  19.0  3.94  3.70 0.907

fit %>% accuracy() %>% select(MAE, RMSE, MAPE, MASE, RMSSE)

## # A tibble: 1 × 5
##     MAE  RMSE  MAPE  MASE RMSSE
##   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  4.12  5.20  10.2     1     1

fc %>% accuracy(myseries) %>% select(MAE, RMSE, MAPE, MASE, RMSSE)

## # A tibble: 1 × 5
##     MAE  RMSE  MAPE  MASE RMSSE
##   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  16.2  19.2  19.0  3.94  3.70

The model performs well on the training data but poorly on the test data, with much higher errors like MAPE, indicating overfitting. This happens when the model captures specific patterns in the training data but fails to generalize to new data. To fix this, the model can be simplified, more training data added, or regularization techniques used to improve its ability to generalize.

How sensitive are the accuracy measures to the amount of training data used?

Accuracy measures are highly sensitive to the amount of training data used. Larger training datasets typically lead to improved metrics like MAE, RMSE, and MAPE, reflecting better model performance. With more different examples, the model captures underlying patterns and reduces overfitting risk. Small training sets can enhance training accuracy while resulting in poor test performance. Overall, more training data usually leads to more reliable forecasts, while insufficient data weakens effectiveness.

Homework 3

Nikoleta Emanouilidi

2024-09-20

1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

Bricks (aus_production)

NSW Lambs (aus_livestock)

Household wealth (hh_budget).

Australian takeaway food turnover (aus_retail).

The population of Australia is increasing , so there’s an upward trend.For that reason I’ll use RW(y ~ drift())

Bricks data (as seen on previous assignments) has apparent seasonality so SNAIVE(y) will be a good choice

NSW Lambs has neither seasonality nor cyclic patterns therefore a good fit will be NAIVE(y)

Household wealth now has a slightly upward trend so I’ll use RW(y ~ drift())

Retail turnover, especially for takeaway food, can exhibit seasonal patterns so I’ll use SNAIVE(y)

2 Use the Facebook stock price (data set gafa_stock) to do the following:

1) Produce a time plot of the series.

2) Produce forecasts using the drift method and plot them.

3)Show that the forecasts are identical to extending the line drawn between the first and last observations.

4)Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

1)

2)

3)

4)

The naive benchmark provides the most accurate forecast for this dataset, as its prediction interval is the narrowest compared to other methods.

3 Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.What do you conclude?

4) Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Australian Exports

Bricks

7 For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

Check that your data have been split appropriately by producing the following plot.

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

Check the residuals.Do the residuals appear to be uncorrelated and normally distributed?

Produce forecasts for the test data

Compare the accuracy of your forecasts against the actual values

How sensitive are the accuracy measures to the amount of training data used?