Assignment 3 involves answering questions 5.1, 5.2, 5.3, 5.4 and 5.7 from the textbook Forecasting: principles and practice by Rob J Hyndman and George Athanasopoulos.
Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
Before responding, it’s important to define each technique.
The Naive Forecast method is an important method used as a baseline when examining multiple forecasting methods. The Naive method assumes that the next value will be identical to the last value. The eventual forecasting method used should ideally outperform the naive forecast method. It is also ideal to use for data that has no significant trend (upward or downward) or seasonality.
The Seasonal Naive Forecast method or SNAIVE is the Seasonal equivalent of the NAIVE method. If a data has a seasonal pattern then this method is ideal.
The Random Walk with Drift method like the Naive method assumes that the future value based on the previous value. The Random walk method also takes into effect the drift or average change amount over time. This method is ideal if there is a upward or downward trend.
There are any visible patterns however, we can see an upward growth trend. Based on this information, I believe the Random Walk with Drift is most appropriate.
global_economy_aus <- global_economy %>%
filter(Code == "AUS") %>%
as_tsibble(index = Year) %>%
filter(Year >= 1960 & Year <= 1964)
autoplot(global_economy_aus,Population)
rwf_drift_model <- global_economy_aus %>%
model(rw_model = RW(Population ~ drift()))
rwf_drift_forecast <- rwf_drift_model %>%
forecast(h = 5)
autoplot(rwf_drift_forecast, global_economy_aus)
Based on the strong seasonal pattern seen, the SNAIVE method is the best method to use. Due to the nulls in the data after 2003, I will filter for the data before 2003.
aus_production_bricks <- aus_production %>%
select(Quarter, Bricks) %>%
as_tsibble(index = Quarter) %>%
filter(year(Quarter) <= 2003)
autoplot(aus_production_bricks,Bricks)
snaive_model <- aus_production_bricks %>%
model(SNAIVE = SNAIVE(Bricks))
snaive_forecast <- snaive_model %>%
forecast(h = 20)
autoplot(snaive_forecast, aus_production_bricks)
Given the seasonal patterns that are visible, the SNAIVE method is appropriate.
nsw_lambs <- aus_livestock %>%
filter(State == "New South Wales", Animal == "Lambs")
autoplot(nsw_lambs, Count)
snaive_model2 <- nsw_lambs %>%
model(SNAIVE = SNAIVE(Count))
snaive_forecast2 <- snaive_model2 %>%
forecast(h=48)
autoplot(snaive_forecast2, nsw_lambs)
Because there is no strong seasonal pattern and an overall growth trend, the Random Walk with Drift method.
hh_budget_aus <- hh_budget %>%
filter(Country == "Australia")
autoplot(hh_budget_aus, Wealth)
rwf_drift_model_2 <- hh_budget_aus %>%
model(rw_model = RW(Wealth ~ drift()))
rwf_drift_forecast_2 <- rwf_drift_model_2 %>%
forecast(h=10)
autoplot(rwf_drift_forecast_2, hh_budget_aus)
There is a visible upward trend and some random fluctuations. The Random Walk with Drift method seems most appropriate.
aus_retail_food <- aus_retail %>%
filter(Industry == "Cafes, restaurants and takeaway food services", State == "Australian Capital Territory")
autoplot(aus_retail_food, Turnover)
rwf_drift_model_3 <- aus_retail_food %>%
model(rw_model = RW(Turnover ~ drift()))
rwf_drift_forecast_3 <- rwf_drift_model_3 %>%
forecast(h=60)
autoplot(rwf_drift_forecast_3, aus_retail_food)
Use the Facebook stock price (data set gafa_stock) to do the following:
Produce a time plot of the series.
Produce forecasts using the drift method and plot them.
Show that the forecasts are identical to extending the line drawn between the first and last observations.
Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
gafa_stock_fb <- gafa_stock %>%
filter(Symbol == "FB") %>%
mutate(day = row_number()) %>%
update_tsibble(index = day, regular = TRUE)
autoplot(gafa_stock_fb, Close)
rwf_drift_model_4 <- gafa_stock_fb %>%
model(rw_model = RW(Close ~ drift()))
rwf_drift_forecast_4 <- rwf_drift_model_4 %>%
forecast(h = 180)
autoplot(rwf_drift_forecast_4, gafa_stock_fb)
first_observation <- gafa_stock_fb %>%
slice(1) %>%
pull(Close)
last_observation <- gafa_stock_fb %>%
slice(n()) %>%
pull(Close)
n_periods <- nrow(gafa_stock_fb)
slope <- (last_observation - first_observation) / (n_periods - 1)
gafa_stock_fb <- gafa_stock_fb %>%
mutate(extended_line = first_observation + slope * (row_number() - 1))
autoplot(gafa_stock_fb, Close) +
geom_line(aes(y = extended_line), color = "blue", linetype = "dashed")
The RW with Drift method performs best based on the MAPE value. MAPE is the average of absolute percentage errors. The MAPE numbers is lower for the drift model at 3.490983 compared SNaive which is 35.657264. This means that the Drift method has less errors, and is therefore more accurate.
training <- gafa_stock_fb %>%
filter(year(Date) <= 2017)
validation <- gafa_stock_fb %>%
filter(year(Date) == 2018)
fit <- training %>%
model(
SNaive = SNAIVE(Close ~ lag(251)),
Drift = RW(Close~drift())
)
forecast_fb <- fit %>%
forecast(data = validation)
accuracy(forecast_fb,validation)
## # A tibble: 2 × 11
## .model Symbol .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift FB Test 6.40 6.59 6.40 3.49 3.49 NaN NaN -0.5
## 2 SNaive FB Test 65.3 65.3 65.3 35.7 35.7 NaN NaN -0.5
Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
Looking at the residual plots the residual don’t resemble white noise. This tell us that the seasonal naive model is not capturing the complexity of the data.
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)
Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
The naive model is appropriate for the global_economy set because there is no strong seasonal pattern. When looking at the residual plots we see residuals that are not white noise.
global_economy_aus <- global_economy %>%
filter(Code == "AUS")
autoplot(global_economy_aus, Population)
fit_2 <- global_economy_aus %>%
model(NAIVE(Population))
fit_2 %>%
gg_tsresiduals()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
fit_2 %>%
forecast() %>%
autoplot(global_economy_aus)
When can observe seasonal patterns and therefore the snaive model is appropriate. THe residuals have a consistent pattern which tells us the model is not capturing it properly.
aus_production_bricks <- aus_production %>%
select(Quarter, Bricks)
autoplot(aus_production_bricks, Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
fit_3 <- aus_production_bricks %>%
model(NAIVE(Bricks))
fit_3 %>%
gg_tsresiduals()
## Warning: Removed 21 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 21 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 21 rows containing non-finite outside the scale range
## (`stat_bin()`).
fit_3 %>%
forecast() %>%
autoplot(aus_production_bricks)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
For your retail time series (from Exercise 7 in Section 2.10):
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
filter(year(Month) < 2011)
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
fit_4 <- myseries_train |>
model(SNAIVE(Turnover))
fit_4 |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).
The residuals don’t appear to be uncorrelated based on the acf plot. However, we can see the residuals are normally distributed.
fc <- fit_4 |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)
fit |> accuracy()
## # A tibble: 1 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(Beer) Training -1.57 16.1 13.6 -0.426 3.16 1 1 -0.237
fc |> accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Nort… Clothin… Test 0.836 1.55 1.24 5.94 9.06 1.36 1.28 0.601
The accuracy measures such as ME, RMSE, MAE, MAPE, MASE, and ACF1 are all sensitive to the amount of training data used.