Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
First a quick look at the data.
AUSpop <-global_economy |>
filter (Code=='AUS') |>
select (Population)
autoplot(AUSpop) +
scale_y_continuous(labels = label_number(suffix = "M", scale = 1e-6))+
theme_minimal()
Given the quick look, the Drift Method seems most appropriate.
AUSpop_m <- AUSpop |>
model(RW(Population ~drift()))
AUSpop_fc <- AUSpop_m |>forecast(h=5) #5 years
autoplot(AUSpop_fc) +
autolayer(AUSpop, color = 'black')+
scale_y_continuous(labels = label_number(suffix = "M", scale = 1e-6))+
theme_minimal()
First a quick look at the data.
bricks <-aus_production |>
filter(!is.na(Bricks))|>
select (Bricks)
autoplot(bricks) +
theme_minimal()
Seems to have some seasonality, so we will try SNAIVE(y).
bricks_m <- bricks |>
model(SNAIVE(Bricks))
bricks_fc <- bricks_m |>forecast(h=14) #14 quarters
autoplot(bricks_fc) +
autolayer(bricks, color = 'black')+
theme_minimal()
First a quick look at the data.
lambs <-aus_livestock |>
filter(Animal=='Lambs' & State == 'New South Wales')
autoplot(lambs) +
scale_y_continuous(labels = label_number(suffix = "K", scale = 1e-3))+
theme_minimal()
Also, seems to have some seasonality, so we will use SNAIVE(y) again.
lambs_m <- lambs |>
model(SNAIVE(Count))
lambs_fc <- lambs_m |>forecast(h=24) #24 months
autoplot(lambs_fc) +
autolayer(lambs, color = 'black')+
theme_minimal()
First a quick look at the data.
wealth <-hh_budget |>
select (Wealth)
autoplot(wealth) +
theme_minimal()
Drift looks to be a good fit here.
wealth_m <- wealth |>
model(RW(Wealth ~drift()))
wealth_fc <- wealth_m |>forecast(h=5) #5 years
autoplot(wealth_fc, level=NULL) +
autolayer(wealth)+
theme_minimal()+
theme(legend.position="none")
First a quick look at the data.
tft <-aus_retail |>
filter(Industry == 'Takeaway food services')
autoplot(tft) +
theme_minimal()+
labs(
y = 'Takeaway Food Turnover',
title ='AUS Takeaway Food Turnover by State'
)+
theme(legend.position="none")
Let’s try Naive(y) here.
tft_m <- tft |>
model(NAIVE(Turnover))
tft_fc <- tft_m |>forecast(h=24) #24 months
autoplot(tft_fc, level=NULL) +
autolayer(tft)+
theme_minimal()+
theme(legend.position="none")
Use the Facebook stock price (data set gafa_stock) to do the following:
Note: which field (Open, Close, High, Low) not specified. Assumed Close.
fb <- gafa_stock |>
filter(Symbol == "FB") |>
filter(!is.na(Close))|>
select (Close) |>
mutate(trading_day = row_number()) |>
update_tsibble(index = trading_day, regular = TRUE)
autoplot(fb) +
theme_minimal()
## Plot variable not specified, automatically selected `.vars = Close`
fb_m <- fb |>
model(RW(Close ~drift()))
fb_fc <- fb_m |>forecast(h=60) #60 days
autoplot(fb_fc) +
autolayer(fb)+
theme_minimal()+
theme(legend.position="none")
Wondering if there is a more clever way to do this, but hardcoding the value works.
autoplot(fb_fc) +
autolayer(fb)+
geom_segment(aes(x = 1, y = 54.71, xend = 1258, yend = 131.09), color='orange'
)+
theme_minimal()+
theme(legend.position="none")
## Plot variable not specified, automatically selected `.vars = Close`
There is no seasonality, so will try the other 2:
fb_m2 <- fb |>
model(
mean = MEAN(Close),
naïve = NAIVE(Close)
)
fb_fc2 <- fb_m2 |> forecast(h = 60) #60 days
autoplot(fb_fc2, level = NULL) +
autolayer(fb)+
guides(color = guide_legend(title = "Forecast"))+
theme_minimal()
## Plot variable not specified, automatically selected `.vars = Close`
Thinking about which makes the most sense:
The mean of all the data does not make sense when thinking about trading data.
Both the Naïve and Drift start with the same value - the value of the last data point, there fore they are equal for the first forecast value.
Because the last value is higher than the first, the Drift model forecasts upward. BUT the most recent 250 trading days are down. Therefore, I think the Naïve is the best as it projects flat from the current point.
Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)
What do you conclude?
Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
# Extract data of interest
aus_ex <- global_economy |>
filter(Code =='AUS')
# Define and estimate a model
fit2 <- aus_ex |> model(NAIVE(Exports))
# Look at the residuals
fit2 |> gg_tsresiduals()
# Look a some forecasts
fit2 |> forecast() |> autoplot(aus_ex)
We used NAIVE() as we do not see seasonality.
Conclusions:
# Extract data of interest
#Note that we have bricks from earlier exercise.
# Define and estimate a model
fit3 <- bricks |> model(SNAIVE(Bricks))
# Look at the residuals
fit3 |> gg_tsresiduals()
# Look a some forecasts
fit3 |> forecast() |> autoplot(bricks)
We used SNAIVE() as we do see seasonality.
Conclusions:
For your retail time series (from Exercise 7 in Section 2.10):
Create a training dataset consisting of observations before 2011 using
myseries<-aus_retail |>
filter (`Series ID`=='A3349849A')
myseries_train <- myseries |>
filter(year(Month) < 2011)
Check that your data have been split appropriately by producing the following plot.
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")+
theme_minimal()
Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).
fit4 <- myseries_train |>
model(SNAIVE(Turnover)) # Note added Turnover here
Check the residuals.
fit4 |> gg_tsresiduals()
Do the residuals appear to be uncorrelated and normally distributed?
Produce forecasts for the test data
fc <- fit4 |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries) +
theme_minimal()
Compare the accuracy of your forecasts against the actual values.
#this was the original code in the exercise
#fit4 |> accuracy)
#fc |> accuracy(myseries))
# the forecast accuracy on the test data
accuracy(fc, myseries) |>
arrange(.model) |>
select(.model, .type, MAE, RMSE, MAPE, MASE, RMSSE)
## # A tibble: 1 × 7
## .model .type MAE RMSE MAPE MASE RMSSE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(Turnover) Test 7.38 8.42 19.0 2.92 2.50
Since we are only using one model here, we do not have multiple models for which we can compare accuracy. However, I put below the full names of each and some comments just for my own notes:
A scaled error is less than one if it arises from a better forecast than the average one-step naïve forecast computed on the training data. Since both of these are greater than 1, the forecast is worse than the average one-step naïve forecast computed on the training data.
How sensitive are the accuracy measures to the amount of training data used?
The accuracy measures are sensitive to the amount of training data used. In general more data is better, as too small a sample size could lead to identifying erroneous patterns not present in the larger data. The exception to this might be if the data is particularly noisy - in that case a more focused approach, honing in on what is relevant to the question one is trying to answer might be more appropriate.