HW 3 Data 624

1. Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.0     ✔ fabletools  0.4.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

#This data has an increasing trend (population), so using RW(y ~ drift()) will be best

aus_economy <- global_economy %>%
    filter(Code == "AUS")

aus_economy %>%
  model(Drift = RW(Population ~ drift())) %>%
  forecast(h = 15) %>%
  autoplot(aus_economy) +
    labs(title = "Australian Population Forecast")

#Because the data here is seasonal in quarters we will the SNAIVE method

autoplot(aus_production, Bricks) +
  labs( title = "Quarterly Production of Bricks in Australia")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  filter(!is.na(Bricks)) %>%
  model(SNAIVE(Bricks ~ lag("year"))) %>%
  forecast(h = 15) %>%
  autoplot(aus_production) +
    labs(title = "Australian Bricks Production Forecast")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_livestock %>%
  filter(State == "New South Wales", 
         Animal == "Lambs") %>%
  model(NAIVE(Count)) %>%
  forecast(h = 24) %>%
  autoplot(aus_livestock) +
  labs(title = "New South Wales Lambs")

#As there in no repetitive constant trend in seasonality I decided to use the NAIVE method.


hh_budget %>% 
  autoplot(Wealth)

#This data set is on household wealth, which we see has a positive trend, so we'll use the RW(y ~ drift()) method.

hh_budget %>%
  model(Drift = RW(Wealth ~ drift())) %>%
  forecast(h = 15) %>%
  autoplot(hh_budget) +
    labs(title = "Household wealth Forecast")

# This chart shows seasonality per year so SNAIVE is ideal here 

aus_retail %>%
  filter(State == "South Australia",
         Industry == 'Takeaway food services') %>%
  model(SNAIVE(Turnover ~ lag("year"))) %>%
  forecast(h = 15) %>%
  autoplot(aus_retail) +
    labs(title = "South Australia Takeout Turnover Forecast")

2. Use the Facebook stock price (data set gafa_stock) to do the following:

#Produce a time plot of the series.

data("gafa_stock")
data("hh_budget")
unique(gafa_stock$Symbol)

## [1] "AAPL" "AMZN" "FB"   "GOOG"

#  
gafa_stock %>%
  filter( Symbol == "FB") %>%
  autoplot(Close) +
  labs(y = "USD", Title = "Closing FB Prices")

#Produce forecasts using the drift method and plot them.

fb <- gafa_stock %>%
  filter( Symbol == "FB")

fb_tibble <- as_tsibble(fb, key = "Symbol", index = "Date", regular = TRUE) %>% fill_gaps()

fb_tibble %>%
  model(Drift = RW(Close ~ drift())) %>%
  forecast(h = 15) %>%
  autoplot(fb_tibble) +
    labs(title = "FB Closing Prices Forecast")

#Show that the forecasts are identical to extending the line drawn between the first and last observations.

df <- data.frame(x1 = as.Date('2014-01-02'), x2 = as.Date('2018-12-31'), y1 = 54.71, y2 = 131.09)

fb_tibble %>%
model(Drift = RW(Close ~ drift())) %>%
  forecast(h = 120) %>%
  autoplot(fb) +
    labs(title = "FB Closing Price Forecast") +
  geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, colour = "segment"), data = df)

# We can conclude that the line between the first and last observation does match the forecasts.

#Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fb_tibble %>%
  model(
      Mean = MEAN(Close),
      Naive = NAIVE(Close),
      Drift = RW(Close ~ drift())
  ) %>%
  forecast(h = 120) %>%
  autoplot(fb_tibble) +
    labs(title = "South Australian Takeout Turnover Forecast")

#From the used benchmark functions, the naive benchmark has the smallest prediction interval, therefore is the most suitable.

3. Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

data("aus_production")

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

What do you conclude?

We can conclude that the results are not white noise, the values don’t seem independent and random.The does appear to be a very gradual downwards trend in beer production. Because of this the snaive model is overestimating future beer production from the prior year’s production amount.

4. Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

data("global_economy")

# Extract data of interest
aus_exports <- global_economy |>
  filter(Country == "Australia")

# Define and estimate a model
fit_aus_exports <- aus_exports |> model(NAIVE(Exports))
# Look at the residuals
fit_aus_exports |> gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit_aus_exports |> forecast() |> autoplot(aus_exports)

For the Aus Exports, the native method was more viable as it is an economic time series. This data set does seem to contain white noise as the graph should a large count of residuals at 0.The lags are also close to the dashed line, we can conclude that the forecast is not bias.

# Define and estimate a model
fit <- aus_production |>
  filter(!is.na(Bricks)) |>
  model(SNAIVE(Bricks ~ lag("year")))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(aus_production)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

There is a high autocorrelation between multiple lags, as well as a seasonal pattern. The histogram is also not uniform, meaning that the model is not suitable for the time series.

5. For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

set.seed(15)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries |>
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

# the data has been split correctly (split is from 2011 onwards)

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

Check the residuals.

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

# The residuals are autocorrelated and are not normally distributed, they are right-tailed.

Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 New Sou… Furnitu… SNAIV… Trai…  8.41  18.4  13.3  4.83  8.10     1     1 0.667

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… New … Furnitu… Test   66.9  80.9  67.5  16.9  17.1  5.07  4.40 0.902

#The accuracy measures from the training set is smaller in comparison.

How sensitive are the accuracy measures to the amount of training data used?

The errors on the training data are much smaller than the test data. The model performed poorly on the test data, which had fewer points. When compared, all the metrics performed worse on the test dataset in comparison to the training set.

HW 3 Data 624

Natalie Kalukeerthie

2024-09-20

1. Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

2. Use the Facebook stock price (data set gafa_stock) to do the following:

3. Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

What do you conclude?

We can conclude that the results are not white noise, the values don’t seem independent and random.The does appear to be a very gradual downwards trend in beer production. Because of this the snaive model is overestimating future beer production from the prior year’s production amount.

4. Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

For the Aus Exports, the native method was more viable as it is an economic time series. This data set does seem to contain white noise as the graph should a large count of residuals at 0.The lags are also close to the dashed line, we can conclude that the forecast is not bias.

There is a high autocorrelation between multiple lags, as well as a seasonal pattern. The histogram is also not uniform, meaning that the model is not suitable for the time series.

5. For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

Check that your data have been split appropriately by producing the following plot.

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

Check the residuals.

Produce forecasts for the test data

Compare the accuracy of your forecasts against the actual values.

How sensitive are the accuracy measures to the amount of training data used?

The errors on the training data are much smaller than the test data. The model performed poorly on the test data, which had fewer points. When compared, all the metrics performed worse on the test dataset in comparison to the training set.