library(fpp3)
library(tidyverse)
Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock)
Household wealth (hh_budget).
Australian takeaway food turnover (aus_retail).
Since the population of Australia has an increasing trend, we use the RW(y ~ drift()) because it work best with trend data
<- global_economy %>%
aus_economy filter(Code == "AUS")
%>%
aus_economy model(Drift = RW(Population ~ drift())) %>%
forecast(h = 15) %>%
autoplot(aus_economy) +
labs(title = "Australian Population Forcast")
This is a seasonal data in quarters, therefore seasonal naive models will work well.
summary(aus_production$Bricks)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 187.0 349.0 417.0 405.5 475.0 589.0 20
%>%
aus_production filter(!is.na(Bricks)) %>%
model(SNAIVE(Bricks ~ lag("year"))) %>%
forecast(h = 15) %>%
autoplot(aus_production) +
labs(title = "Australian Bricks Production Forcast")
## Warning: Removed 20 rows containing missing values (`geom_line()`).
There is no constant trend in seasonality. The NAIVE() method will be the best to use here.
%>%
aus_livestock filter(State == "New South Wales",
== "Lambs") %>%
Animal model(NAIVE(Count)) %>%
forecast(h = 24) %>%
autoplot(aus_livestock) +
labs(title = "Lambs in New South Wales",
subtitle = "July 1976 - Dec 2018, Forecasted until Dec 2020")
The data has a slightly positive trend, therefore Drift models will be suitable. It might be more appropriate to account for the change over time.
%>%
hh_budget model(Drift = RW(Wealth ~ drift())) %>%
forecast(h = 15) %>%
autoplot(hh_budget) +
labs(title = "Household wealth Forcast")
Looks to have seasonality, SNAIVE will work well.
unique(aus_retail$Industry)
## [1] "Cafes, restaurants and catering services"
## [2] "Cafes, restaurants and takeaway food services"
## [3] "Clothing retailing"
## [4] "Clothing, footwear and personal accessory retailing"
## [5] "Department stores"
## [6] "Electrical and electronic goods retailing"
## [7] "Food retailing"
## [8] "Footwear and other personal accessory retailing"
## [9] "Furniture, floor coverings, houseware and textile goods retailing"
## [10] "Hardware, building and garden supplies retailing"
## [11] "Household goods retailing"
## [12] "Liquor retailing"
## [13] "Newspaper and book retailing"
## [14] "Other recreational goods retailing"
## [15] "Other retailing"
## [16] "Other retailing n.e.c."
## [17] "Other specialised food retailing"
## [18] "Pharmaceutical, cosmetic and toiletry goods retailing"
## [19] "Supermarket and grocery stores"
## [20] "Takeaway food services"
unique(aus_retail$State)
## [1] "Australian Capital Territory" "New South Wales"
## [3] "Northern Territory" "Queensland"
## [5] "South Australia" "Tasmania"
## [7] "Victoria" "Western Australia"
%>%
aus_retail filter(State == "South Australia",
== 'Takeaway food services') %>%
Industry model(SNAIVE(Turnover ~ lag("year"))) %>%
forecast(h = 15) %>%
autoplot(aus_retail) +
labs(title = "South Australian takeaway food turnover Forcast")
Use the Facebook stock price (data set gafa_stock) to do the following:
Produce a time plot of the series.
Produce forecasts using the drift method and plot them.
Show that the forecasts are identical to extending the line drawn between the first and last observations.
Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
<- gafa_stock %>%
fb_data filter(Symbol == "FB")
<- as_tsibble(fb_data, key = "Symbol", index = "Date", regular = TRUE) %>% fill_gaps()
fb_data2
autoplot(fb_data, Close)
%>%
fb_data2 model(Drift = RW(Close ~ drift())) %>%
forecast(h = 30) %>%
autoplot(fb_data) +
labs(title = "Facebook Close Price Forcast")
The line between the first and last observations does match the forecasts.
<- data.frame(x1 = as.Date('2014-01-02'), x2 = as.Date('2018-12-31'), y1 = 54.71, y2 = 131.09)
df
%>%
fb_data2 model(Drift = RW(Close ~ drift())) %>%
forecast(h = 90) %>%
autoplot(fb_data) +
labs(title = "Facebook Close Price Forcast") +
geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, colour = "segment"), data = df)
%>%
fb_data2 model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
Drift = RW(Close ~ drift())
%>%
) forecast(h = 90) %>%
autoplot(fb_data2) +
labs(title = "South Australian takeaway food turnover Forcast")
The best forecast for this dataset is the naive benchmark, as it has the smallest prediction interval.
Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
# Extract data of interest
<- aus_production %>%
recent_production filter(year(Quarter) >= 1992)
# Define and estimate a model
<- recent_production %>% model(SNAIVE(Beer))
fit # Look at the residuals
%>% gg_tsresiduals() fit
## Warning: Removed 4 rows containing missing values (`geom_line()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).
# Look a some forecasts
%>% forecast() %>% autoplot(recent_production) fit
What do you conclude?
The plot shows that the results are significantly different from the white noise series since the values are relatively small. The results are not white noise, as the residuals seem to be centered around zero and follow a constant variance. The ACF plot shows that lag 4 is larger than the others which can be attributed to peaks occurring every 4 quarters in Q4, and trough occurring every Q2.
Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
# Extract data of interest
<- global_economy %>%
aus_economy filter(Country == "Australia")
# Define and estimate a model
<- aus_economy %>% model(NAIVE(Exports))
fit # Look at the residuals
%>% gg_tsresiduals() fit
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing non-finite values (`stat_bin()`).
# Look a some forecasts
%>% forecast() %>% autoplot(aus_economy) fit
mean(augment(fit)$.innov , na.rm = TRUE)
## [1] 0.1451912
As all lags are close/within to the dashed line it is likely that white noise is present. The mean of the innovation residuals is very small, indicating the forecast is not bias.
<- aus_production %>%
fit filter(!is.na(Bricks)) %>%
model(SNAIVE(Bricks ~ lag("year")))
# Look at the residuals
%>% gg_tsresiduals() fit
## Warning: Removed 4 rows containing missing values (`geom_line()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).
# Look a some forecasts
%>% forecast() %>% autoplot(aus_production) fit
## Warning: Removed 20 rows containing missing values (`geom_line()`).
mean(augment(fit)$.innov , na.rm = TRUE)
## [1] 4.21134
There appears to be high autocorrelation between multiple lags, with the presence of a clear seasonal pattern. The innovation residuals indicate that the forecast is bias (large mean). The histogram also indicates that the model is not ideal for this timeseries.
For your retail time series (from Exercise 7 in Section 2.10):
Solution
set.seed(15)
<- aus_retail %>%
myseries filter(`Series ID` == sample(aus_retail$`Series ID`,1))
<- myseries %>%
myseries_train filter(year(Month) < 2011)
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
<- myseries_train %>%
fit model(SNAIVE(Turnover))
%>% gg_tsresiduals() fit
The residuals are autocorrelated and do not follow a normal distribution (right tailed).
<- fit %>%
fc forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
%>% autoplot(myseries) fc
%>% fabletools::accuracy() %>% select(MAE, RMSE, MAPE, MASE, RMSSE) fit
## # A tibble: 1 × 5
## MAE RMSE MAPE MASE RMSSE
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 13.3 18.4 8.10 1 1
%>% fabletools::accuracy(myseries) %>% select(MAE, RMSE, MAPE, MASE, RMSSE) fc
## # A tibble: 1 × 5
## MAE RMSE MAPE MASE RMSSE
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 67.5 80.9 17.1 5.07 4.40
The errors are smaller on the training data compared to the test data. The forecast model perform poorly on the test data. All comparison metrics are much worse on the test set compared to the training set. The MAPE for the test set is 2 times larger than that of the training set, indicating that it is twice as bad as fitting the data compared to the training data.
The accuracy measures are highly sensitive to the amount of training data used, which can also depend on how you split the data you used. Including more or less data in training will change the forecast, and in turn change the accuracy measurements.