Produce forecasts for the following series using whichever of
NAIVE(y), SNAIVE(y) or
RW(y ~ drift()) is more appropriate in each case:
global_economy)For this, let’s predict the Australian population for the next ten years. The population is steadily increasing without seasonal trends so we can use the drift method to forecast the population based on the most recent population size and the average change in the population over the years.
aus_pop_fc <- global_economy |>
filter(Country == 'Australia') |>
model(RW(Population~drift())) |>
forecast(h = 10)
aus_pop_fc |>
autoplot(global_economy) +
labs(title = "Australian Population 10 Year Forecast", y = "Population (in millions)")
aus_production)Brick production displays a seasonal trend, so we will use the seasonal naive method to forecast future production for the next 2 years (8 quarters).
brick_prod_fc_snaive <- aus_production |>
filter(!is.na(Bricks)) |>
model(SNAIVE(Bricks)) |>
forecast(h = "2 years")
brick_prod_fc_snaive |>
autoplot(aus_production) +
labs(title = "Australian Brick Production 2 Year Forecast", subtitle = "Seasonal Naive Approach")
Australian brick production is also affected by factors other than just seasonality, as can be seen in the graph. We will also forecast using a naive approach with a bootstrap interval.
brick_prod_fc_naive <- aus_production |>
filter(!is.na(Bricks)) |>
model(NAIVE(Bricks)) |>
forecast(h = "2 years", bootstrap=T, times=1000)
brick_prod_fc_naive |>
autoplot(aus_production) +
labs(title = "Australian Brick Production 2 Year Forecast", subtitle = "Naive Approach with Bootstrap Interval")
aus_livestock)For this, we will use a seasonal naive approach.
#seasonal naive
aus_livestock_fc_snaive <- aus_livestock |>
filter(Animal == "Lambs", State == "New South Wales") |>
model(SNAIVE(Count)) |>
forecast(h = "2 years")
aus_livestock_fc_snaive |>
autoplot(aus_livestock) +
labs(title = "Australian Lamb Production 2 Year Forecast", subtitle = "Seasonal Naive Approach") +
scale_y_continuous(labels = scales::comma)
hh_budget)The data does not display seasonal trends nor does it follow a linear increase, so we will use the naive method here.
hh_wealth_fc <- hh_budget |>
model(NAIVE(Wealth)) |>
forecast(h = "5 years")
hh_wealth_fc |>
autoplot(hh_budget) +
labs(title = "Household Wealth Forecast Next 5 Years")
aus_retail)The data seems to mostly increase, even with some seasonality exhibited, so we will use
aus_takeout <- aus_retail |>
filter(Industry == "Takeaway food services") |>
select(State, Month, Turnover)
aus_takeout_fc <- aus_takeout |>
model(RW(Turnover~drift())) |>
forecast(h = 10)
autoplot(aus_takeout, Turnover) +
autolayer(aus_takeout_fc)
Use the Facebook stock price (data set gafa_stock) to do
the following:
The market is not open on Sundays so we will re-index using row numbers as stock trading days.
Let’s plot the close price for each day.
fb_stock <- gafa_stock |>
filter(Symbol == "FB") |>
mutate(Day = row_number()) |>
update_tsibble(index=Day, regular=T)
fb_stock |>
autoplot(Close) +
labs(title = "Facebook Stock Prices Overtime", y = "Closing Price")
fb_stock_drift_fc <- fb_stock |>
model(RW(Close~drift())) |>
forecast(h = 50)
fb_stock_drift_fc |>
autoplot(fb_stock) +
labs(title = "Facebook Stock 50 Day Forecast", y = "Close")
df <- data.frame(x1 = min(fb_stock$Day), x2 = max(fb_stock$Day), y1 = fb_stock$Close[min(fb_stock$Day)], y2 = fb_stock$Close[max(fb_stock$Day)])
fb_stock_drift_fc |>
autoplot(fb_stock) +
labs(title = "Facebook Stock 50 Day Forecast", y = "Close") +
geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2), color="red", lty="dashed", data=df)
#naive
fb_stock |>
model(NAIVE(Close)) |>
forecast(h = 50) |>
autoplot(fb_stock) +
labs(title = "Facebook Stock 50 Day Forecast", subtitle = "Naive Method", y = "Close")
#mean
fb_stock |>
model(MEAN(Close)) |>
forecast(h = 50) |>
autoplot(fb_stock) +
labs(title = "Facebook Stock 50 Day Forecast", subtitle = "Mean Method", y = "Close")
The mean method is not suited for this dataset as the values of the stock price rises and falls erratically. The naive or drift method would both be options here, as they start off with the last observed value. Since the data is non-linear and the drift method here basically assumes the data will be increasing, even though we can see the more recent decreasing trend, the naive method would be best here.
Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |>
model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
# Look a some forecasts
fit |> forecast() |>
autoplot(recent_production)
What do you conclude?
The variance of the residuals are not exactly constant. Also, the distribution of the residuals shows a bimodal distribution. From the ACF plot, we can see that lag 4 is significantly larger than the other lags. All this would indicate that these results are different from the white noise series and that there is autocorrelation and non-normality of the residuals.
Repeat the previous exercise using the Australian Exports series from
global_economy and the Bricks series from
aus_production. Use whichever of NAIVE() or
SNAIVE() is more appropriate in each case.
# Extract data of interest
australian_economy <- global_economy |>
filter(Country == "Australia")
# Define and estimate a model
fit <- australian_economy |>
model(NAIVE(Exports))
# Look at the residuals
fit |> gg_tsresiduals()
# Look a some forecasts
fit |> forecast() |>
autoplot(australian_economy)
The residuals look to be normally distributed and there are no significantly outstanding lags in the ACF plot so there is no significant correlation in the residual series. The mean of the residuals is about zero, but there does seem to be slightly higher variance towards the right of the graph than the left of the graph.
# Define and estimate a model
fit <- aus_production |>
filter(!is.na(Bricks)) |>
model(SNAIVE(Bricks))
# Look at the residuals
fit |> gg_tsresiduals()
# Look a some forecasts
fit |> forecast() |>
autoplot(recent_production)
The residuals are not normally distributed, seem to have high correlation, and do not have constant variance. This model is not ideal.
For your retail time series (from Exercise 7 in Section 2.10):
a. Create a training dataset consisting of observations before 2011 using
set.seed(613)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
filter(year(Month) < 2011)
b. Check that your data have been split appropriately by producing the following plot.
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
c. Fit a seasonal naïve model using SNAIVE() applied to your
training data (myseries_train).
fit <- myseries_train |>
model(SNAIVE(Turnover))
d. Check the residuals.
fit |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values (`geom_line()`).
## Warning: Removed 12 rows containing missing values (`geom_point()`).
## Warning: Removed 12 rows containing non-finite values (`stat_bin()`).
Do the residuals appear to be uncorrelated and normally distributed?
No. From the ACF plot we can see that the residuals seem to be correlated and we can also see, from the histogram, that they do not appear to be normally distributed. Also, the mean of the residuals is not centered around zero.
e. Produce forecasts for the test data
fc <- fit |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)
f. Compare the accuracy of your forecasts against the actual values.
fit |> accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Western… Other r… SNAIV… Trai… 2.37 7.24 5.14 5.59 11.4 1 1 0.796
fc |> accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… West… Other r… Test 40.2 45.3 40.8 31.7 32.6 7.93 6.26 0.681
The errors are much smaller on the training data than the testing data, so the model performs very poorly for the testing set.
g. How sensitive are the accuracy measures to the amount of training data used?
Accuracy measures are pretty sensitive to the amount of training data being used. If there is less data in the training set, the model may likely overfit and so the accuracy of the training set will be very high but the model will perform poorly for future forecasts. The more data used in a training set, the better the accuracy will be for both the training and the testing sets, as the model will be able to capture more of the trends and patterns within the data.