Homework 3

All libraries needed for the Homework

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## -- Attaching packages -------------------------------------------- fpp3 1.0.0 --

## v tibble      3.2.1     v tsibble     1.1.5
## v dplyr       1.1.2     v tsibbledata 0.4.1
## v tidyr       1.3.0     v feasts      0.3.2
## v lubridate   1.9.2     v fable       0.3.4
## v ggplot2     3.5.1     v fabletools  0.4.2

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date()    masks base::date()
## x dplyr::filter()      masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval()  masks lubridate::interval()
## x dplyr::lag()         masks stats::lag()
## x tsibble::setdiff()   masks base::setdiff()
## x tsibble::union()     masks base::union()

library(forecast)

## Warning: package 'forecast' was built under R version 4.3.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(tidyverse)

## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
## v forcats 1.0.0     v readr   2.1.4
## v purrr   1.0.1     v stringr 1.5.0

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter()     masks stats::filter()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag()        masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(lubridate)
library(tsibble)
library(pracma)

## Warning: package 'pracma' was built under R version 4.3.3

## 
## Attaching package: 'pracma'
## 
## The following object is masked from 'package:purrr':
## 
##     cross

5.1 - Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. How has this changed over time? .Australian Population (global_economy) .Bricks(aus_production) .NSW Lambs(aus_livestock) .Household wealth(hh_budget) .Australian takeaway food turnover(aus_retail)

Australian Population (global_economy)

# I will now plot the data for the Australian population
australian_pop <- global_economy %>% filter(Country == "Australia") %>% select(Population)
autoplot(australian_pop, Population)

#Because there is no seasonality in the plot and there is an upward trendline, 
#the drift method is suitable for the situation. I will forecast and plot
drift_population <- australian_pop %>% model(RW(Population ~ drift())) 
drift_forecast <- drift_population |> forecast(h = 14)
drift_forecast  %>% autoplot(australian_pop, level = NULL) + 
  autolayer(
    australian_pop,
    colour = "brown"
  ) + labs(title="Drift forecast Australian Population")

## Plot variable not specified, automatically selected `.vars = Population`

Bricks(aus_production)

# I will now plot the data for the Bricks population
bricks_pop <- aus_production %>% select(Bricks)
autoplot(bricks_pop, Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

#From the chart above it is clear that the data is seasonal. In consequence,
#the best method for this data is: seasonal-naive.
bricks <- aus_production %>%
  filter(!is.na(Bricks))
bricks_fit <- bricks %>%
  model(SNAIVE(Bricks ~ lag("year"))) %>%
  forecast(h=6)
bricks_fit %>% autoplot(bricks) + labs(title="Seasonal Naive forecast Australian Bricks Production")

It looks like the seasonal-naive model perfectly compliments/aligns for forecasting the seasonality of the bricks data.

NSW Lambs(aus_livestock)

# I will now plot the data for the Lambs population
lambs_pop <- aus_livestock %>% filter (State == "New South Wales" & Animal == "Lambs")
lambs_pop %>% autoplot()

## Plot variable not specified, automatically selected `.vars = Count`

#From the chart above we can tell there is some seasonality but not obvious. 
#Thus, I will use the Naive method to forecast and plot
lambs_naive<- lambs_pop %>% model(NAIVE(Count)) 
lamb_forecast <- lambs_naive|> forecast(h = 20)
lamb_forecast %>% autoplot(lambs_pop, level=NULL) + labs(title="Naive forecast Australian Livestock")

There is no descernable pattern in the data,so analyzing the very last observation is the best way to go in this case.

Household wealth (hh_budget)

# I will now plot the data for household wealth - as a percentage of income
household_wealth <- hh_budget %>% filter(Country == "Australia") %>% select(Wealth)
household_wealth |> autoplot()

## Plot variable not specified, automatically selected `.vars = Wealth`

#Not very seasonal but there are spurts of a positive trend. I will choose the Drift 
#method to forecast and plot
wealth_drift <- household_wealth %>% model(RW(Wealth ~ drift())) 
household_wealth_forecast <- wealth_drift |> forecast(h = 5)
household_wealth_forecast%>% autoplot(household_wealth, level = NULL) + 
  autolayer(
    household_wealth,
    colour = "brown"
  ) + labs(title="Drift forecast Australian Food Turnover")

## Plot variable not specified, automatically selected `.vars = Wealth`

Australian takeaway food turnover (aus_retail)

# I will now plot the data for the total take out food turnover
australian_takeout <- aus_retail %>% filter(Industry == "Takeaway food services") %>% index_by(yearmonth(Month)) %>% summarise(sum_turnover=sum(Turnover))
autoplot(australian_takeout, sum_turnover)

#Since there is a very clear upward trend with the presence of seasonality,
#I will use the Seasonal-naive method to forecast and plot
australian_takeout_sn <- australian_takeout |>
  model(
    naive = SNAIVE(sum_turnover)
  )
australian_takeout_forecast <- australian_takeout_sn |> forecast(h = 12)
australian_takeout_forecast %>% autoplot(australian_takeout, level=NULL) + 
  autolayer(
    australian_takeout,
    colour = "green"
  ) + 
  guides(colour = guide_legend(title = "Forecast for Australian Takeout"))

## Plot variable not specified, automatically selected `.vars = sum_turnover`

The seasonal naive model does a fine job predicting as it is consistent with the data’s seasonality.

5.2 - Use the Facebook stock price (data set gafa_stock) to do the following: a. Produce a time plot of the series b. Produce forecasts using the drift method and plot them c. Show that the forecasts are identical to extending the line drawn between the first and last observations d. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

Produce a time plot of the series

#I will filter the data set to retrieve and fill any data gaps in  Facebook's stock
facebook <- gafa_stock %>% filter(Symbol=="FB") %>% update_tsibble(regular = TRUE) %>% fill_gaps()

#I will plot a time series of Facebook's closing stock price
autoplot(facebook,Close)+labs(y="$USD", title="Facebook's Closing Stock Price")

Produce forecasts using the drift method and plot them

#I will utilize the drift method to forecast the Facebook stock price
facebook_drift <- facebook %>% model(Drift = RW(Close ~ drift()))
#I will now forecast and plot the drift 60 days in the future
facebook_forecast <- facebook_drift|> forecast(h = 60)
facebook_forecast %>% autoplot(facebook, level = NULL) + 
  autolayer(
    facebook,
    colour = "green"
  ) + guides(colour = guide_legend(title = "Stock Closing Price Forecast using Drift")) + labs(title= "Stock Closing Price Forecast using Drift")

## Plot variable not specified, automatically selected `.vars = Open`

Show that the forecasts are identical to extending the line drawn between the first and last observations

# I will filter facebook data for the first observation being 1/2/2014 and
#the last observation being 12/31/2018 to show that the forecasts are identical
facebook_first_last <-data.frame(x1 = as.Date('2014-01-02'), x2 = as.Date('2018-12-31'), y1 = 70.71, y2 = 149.77)

#I will add the line between my first and last observations
  autoplot(facebook) +
    labs(title = "Facebook Closing Price on Last Observation") +
  geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, colour = "segment"), data =facebook_first_last)

## Plot variable not specified, automatically selected `.vars = Open`

The line touches the peeks of the first and last observation and this shows and we can see that this forecast is identical to the one in part b.

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

#I will utilize the Mean, Naive and Drift as benchmark functions to forecast 
#and plot the same data set for 60 days into the future.
facebook %>%
  model(
      Mean = MEAN(Close),
      Naive = NAIVE(Close),
      Drift = RW(Close ~ drift())
  ) %>%
  forecast(h = 60) %>%
  autoplot(facebook)

I believe that the Drift method is the best benchmark. It shows a clear upward trend and best result as compared with Mean and Naive. Also important to note that Naive is implemented during seasonal data, which this is clearly not.

5.3 - Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help. What do you conclude?

# code from textbook
# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts

#Utilize the naive method 
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
aus_beer_prod <- recent_production %>% model(SNAIVE(Beer))

#plotting to check if residuals look like white noise
aus_beer_prod %>% gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

#plotting the forecast
aus_beer_prod %>% forecast() %>% autoplot(recent_production)

After plotting the forecast and from the residuals histogram above we can see a normal distribution which is a strong indicator that the residuals did not look like white noise.

5.4 - Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

#repeating exercise using the Australian exports series -- not including empty data
australian_bricks <- recent_production %>% filter(!is.na(Bricks))
australian_bricks_naive <- australian_bricks%>% model(NAIVE(Bricks))

#plot and check if residuals look like white noise
australian_bricks_naive %>% gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

#plotting the forecast
australian_bricks_naive %>% forecast() %>% autoplot(recent_production)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

After plotting the forecast and from the residuals histogram above we can see a normal distribution which is a strong indicator that the residuals did not look like white noise.

5.7 - For your retail time series (from Exercise 7 in Section 2.10): a.Create a training dataset consisting of observations before 2011 using b.Check that your data have been split appropriately by producing the following plot. c.Fit a seasonal naïve model using SNAIVE() applied to your training data d.Check the residuals. e.Produce forecasts for the test data f.Compare the accuracy of your forecasts against the actual values. g.How sensitive are the accuracy measures to the amount of training data used?

a.Create a training dataset consisting of observations before 2011 using

set.seed(6758)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

b.Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c.Fit a seasonal naïve model using SNAIVE() applied to your training data

fit <- myseries_train |>
  model(SNAIVE(Turnover))

d.Check the residuals. Do the residuals appear to be uncorrelated and normally distributed?

fit %>% gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

The residual histogram shows that the residuals are not normally distributed and are skewed to the right. The lack of correlation between the residuals is further depicted in the acf plot.

e.Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

f.Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

## # A tibble: 1 x 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 New Sou~ Hardwar~ SNAIV~ Trai~  9.46  26.3  21.2  4.66  12.8     1     1 0.807

fc |> accuracy(myseries)

## # A tibble: 1 x 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T~ New ~ Hardwar~ Test   78.4  98.6  79.6  17.5  17.9  3.75  3.75 0.936

The Mean Absolute Percentage Error of the training set is: 12.76 and for the test it is: 17.9. These results show that accuracy was reduced.

g.How sensitive are the accuracy measures to the amount of training data used?

The sensitivity of the accuracy measures to the amount used in the training set depends on what method you want to use. If you elect to use Naive() method all forecasts will be the value of the last observation. However, if we use seasonal-naive all data from the past will have a bearing on the training data thus,increasing its sensitivity to the quantity of training data.

Homework 3

Vladimir Nimchenko

2024-09-19