In this assignment, the problems 5.1, 5.2, 5.3, 5.4 and 5.7 have been solved from the Hyndman book (book link:https://otexts.com/fpp3/toolbox-exercises.html)

Libraries

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

Question-5.1:

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case.

Australian Population (global_economy)

#View(global_economy)

population_australia<-global_economy|>filter(Country=="Australia")
population_australia|>autoplot(Population)+labs(title="Australian Population")

The population time series shows almost a steady upward trend. Therefore, the most appropriate forecasting method would be Random Walk with Drift (RW(y ~ drift())). This method is basically ideal for time series with a clear trend because it incorporates a “drift” term, which accounts for the consistent upward movement observed in the data.

Forecasting of Australian population using drift method:

population_fit<-population_australia |> model(RW(Population ~ drift()))

population_fc <- population_fit |>forecast(h=20)
  
population_fc |>autoplot(population_australia,level=NULL) +
       labs(y = "Population",
       title = "Drift Forecast of the Population of Australia",
       subtitle = "A 20-Year Forecast")

Bricks (aus_production)

#head(aus_production)

bricks <- aus_production|> filter(!is.na(Bricks))

bricks|>autoplot(Bricks)+labs(title="Brick Production of Australia")

The time series of Australian brick production data shows a strong seasonality with troughs happening in first quarter each year. This suggests that SNAIVE method would be the most appropriate method of forecast here.

Forecasting of Australia’s brick production using SNAIVE

bricks_fit<-bricks |> model(SNAIVE(Bricks~ lag("year")))

bricks_fc <- bricks_fit |>forecast(h=10)
  
bricks_fc |>autoplot(bricks,level=NULL) +
       labs(y = "Bricks",
       title = "SNAIVE Forecast of the Brick Production of Australia",
       subtitle = "A 10-Year Forecast")

NSW Lambs (aus_livestock)

#print(aus_livestock)

lambs <- aus_livestock |> 
  filter(Animal == "Lambs",
         State == "New South Wales")

lambs|>autoplot()+labs(title= "New South Wales Monthly Lamb Slaughters")

## Plot variable not specified, automatically selected `.vars = Count`

It seems like the data has seasonal effects. To discover any potential seasonal pattern in the data it will be good to decompose the series. I will apply an STL decomposition here.

lambs_decomp <- lambs|>model(stl = STL(Count)) 

components(lambs_decomp)|> autoplot()

The decomposition of the time series reveals that the data has seasonality, which suggesting that SNAIVE method will be the most appropriate method to forecast this data.

lambs_fit<-lambs |> model(SNAIVE(Count~ lag("year")))
lambs_fc <- lambs_fit |>forecast(h=10)
lambs_fc |>autoplot(lambs,level=NULL) +
       labs(y = "Count",
       title = "SNAIVE Forecast for Monthly Lamb Slaughters in New South Wales",
       subtitle = "A 10-Year Forecast")

Household wealth (hh_budget)

#head(hh_budget)
hh_budget|>autoplot(Wealth)+labs(title="Household Wealth")

It seems like the data has short-term ups and downs but an overall upward trend.Therefore, a random walk drift method will be a good choice to forecast data.

budget_fit<-hh_budget|>model( RW(Wealth~drift() ) )
budget_fc<-budget_fit|>forecast(h = 10)
budget_fc|> autoplot(hh_budget,level=NULL) + 
  labs(title="Drift Household Wealth Forecast",y="Household Wealth")

Australian takeaway food turnover (aus_retail).

#print(aus_retail)

aus_takeaway<-aus_retail|> filter( Industry == "Takeaway food services", State == "Australian Capital Territory")|>
  select(State, Industry, Month, Turnover)
#head(aus_takeaway)
aus_takeaway|>autoplot(Turnover)+labs(title="Australian Takeaway Food Turnover")

It seems like the data has upward trend with seasonality. It will be more clear if a decomposition is done for this series.

aus_takeway_dcomp<-aus_takeaway|>model(STL(Turnover))
  
components(aus_takeway_dcomp)|> autoplot(show.legend = FALSE)

The plot clearly shows seasonal effect in the data, along with upward trend. SNAIVE method will be the most appropriate method here to forecast the data.

takeway_fit<-aus_takeaway |> model(SNAIVE(Turnover~ lag("year")))
takeway_fc <- takeway_fit |>forecast(h=24)
takeway_fc |>autoplot(aus_takeaway,level=NULL) +
       labs(y = "Turnover",
       title = "SNAIVE Forecast for Monthly Australian Takeway Food Services Turnover",
       subtitle = "A 24-Month Forecast")

Question-5.2:

Use the Facebook stock price (data set gafa_stock) to do the following:

a.Produce a time plot of the series.

#View(gafa_stock)
# Data adjustment for trading days 
fb_price<-gafa_stock|>filter(Symbol=="FB")|>mutate(Day = row_number())|>select(Symbol,Date,Close,Day)|>
  update_tsibble(index = Day, regular = TRUE)
#head(fb_price)

fb_price|>autoplot(Close)+
  labs(title= "Daily closing Stock Prices of Facebook")

b.Produce forecasts using the drift method and plot them.

fb_price_fit<-fb_price|>model( RW(Close ~ drift() ) )
fb_price_fc<-fb_price_fit|>forecast(h =180)
fb_price_fc|> autoplot(fb_price,level=NULL) + 
  labs(title="Drift Facebook Daily Closing Price Forecast",y="Close Price")

c.Show that the forecasts are identical to extending the line drawn between the first and last observations.

fb_price_fit<-fb_price|>model( RW(Close ~ drift() ) )
fb_price_fc<-fb_price_fit|>forecast(h =180)
fb_price_fc|> autoplot(fb_price,level=NULL) + 
  labs(title="Drift Facebook Daily Closing Price Forecast")+
 geom_segment(aes(x = Day[1], y = Close[1], xend = Day[nrow(fb_price)], yend = Close[nrow(fb_price)]),color = "green", linetype = "solid")

## Warning in geom_segment(aes(x = Day[1], y = Close[1], xend = Day[nrow(fb_price)], : All aesthetics have length 1, but the data has 1258 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

d.Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fb_price|>
  model(Naive = NAIVE(Close),
        Drift = RW(Close ~ drift()),
        Mean = MEAN(Close)
    ) |>
  forecast(h=180)|>
  autoplot(fb_price , level = NULL)+
  labs(title = "Forecast of Facebook Stock Daily Closing Price Using Differet Methods") +
  guides(colour = guide_legend(title = "Forecast"))

I think the drift benchmark shows the best forecast here as it captures the average changes of the closing prices over the time. This benchmark also shows a positive slope, which also seems aligning with some upward trend in the data.

Question-5.3:

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help. What do you conclude?

#head(aus_production)

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

#aug<-fit|>augment()
#head(aug)
#mean(aug$.innov, na.rm=TRUE)

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

The residuals plot shows that the residuals have mean close to zero and the autocorrelation plot reveals that there is significant correlation in the residuals series. For a white noise series, it is expected that 95% of the spikes to lie within the bounds represented by blue dashed lines in the ACF plot. The ACF plot above clearly shows that one or more i.e. more than 5% spikes are outside these bounds.Therefore, the residuals series here is not a white noise. As a consequence, the NAIVE method used here will not use all the available information in the data to make the forecast.

Additionally, the residuals here don’t have constant variance and are not normally distributed. Therefore,forecasts from this method will probably be quite good, but prediction intervals computed by assuming normal distribution may be inaccurate.

Question-5.4:

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

#head(global_economy)
# Extract data of interest
aus_export <- global_economy|>
  filter(Country=="Australia")
# Define and estimate a model
fit_aus_export <- aus_export |> model(NAIVE(Exports))
# Look at the residuals
fit_aus_export |> gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

mean(augment(fit_aus_export)$.innov , na.rm = TRUE)

## [1] 0.1451912

The ACF plot shows slight autocorrelation in the residual series.The mean of 0.15 suggests the forecast is slightly biased. Additionally, the histogram of the residuals show that they are normally distributed.

# Look a some forecasts
fit_aus_export |> forecast() |> autoplot(aus_export)

The NAIVE method usually performs well for financial data. Here, the forecasts seemed fairly good.

aus_bricks <- aus_production |>filter(!is.na(Bricks))
   
# Define and estimate a model
fit_bricks <- aus_bricks |> model(SNAIVE(Bricks))
# Look at the residuals
fit_bricks |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

mean(augment(fit_bricks)$.innov , na.rm = TRUE)

## [1] 4.21134

The ACF plot shows high autocorrelation in the residual series.The mean is not close to zero, which suggests the forecast is biased. Additionally, the histogram of the residuals show that they are left-skewed.

# Look a some forecasts
fit_bricks |> forecast(h=10) |> autoplot(aus_bricks)

The forecasts appear accurate with the SNAIVE method, as this method usually performs well on production data with seasonal effects.

Question-5.7:

For your retail time series (from Exercise 7 in Section 2.10):

set.seed(12345)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
#head(myseries)

a.Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries |>
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c.Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

d.Check the residuals.

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

mean(augment(fit)$.innov , na.rm = TRUE)

## [1] 4.573874

Do the residuals appear to be uncorrelated and normally distributed?

The ACF plot of the residuals shows that they are not uncorrelated, rather, it shows significant autocorrelation. However, no clear pattern is detected. Additionally, the mean of the residuals is not close to zero. It suggests that the forecasts are biased. Moreover, although the residuals appear normally distributed, they are actually slightly right-skewed.

Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Western… Clothin… SNAIV… Trai…  4.57  11.9  8.93  5.29  9.64     1     1 0.771

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… West… Clothin… Test   4.31  11.8  9.25  2.00  5.55  1.04 0.992 0.392

The model shows almost similar performance between the training and test data. This suggests that the training data used here was enough, and the accuracy measures show that the model is performing well. However, MPE and MAPE are much lower on the test data. This could mean the model is overestimating during training.

How sensitive are the accuracy measures to the amount of training data used?

The accuracy measures vary in sensitivity to the amount of training data used. Metrics like RMSE and MAE show consistent values between training and test data, indicating low sensitivity.On the other hand, MPE and MAPE show significant differences, suggesting higher sensitivity to actual data. ACF1 also reflects sensitivity, with a noticeable drop. It indicates that the model captures different patterns in training compared to test data. In summary, while some metrics remain stable, others show a greater response to the amount of training data.

Data624_HW3

Mahmud Hasan Al Raji

2024-09-22