5.1, 5.2, 5.3, 5.4 and 5.7
5.1
Produce forecasts for the following series using whichever of NAIVE(y) , SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
Australian Population ( global_economy ) Bricks ( aus_production ) NSW Lambs ( aus_livestock ) Household wealth ( hh_budget ). Australian takeaway food turnover ( aus_retail ).
Use SNAIVE if data appears Seasonal Use NAIVE if no clear trend. Use drift if consistent increase.
aus_pop <- global_economy %>%
filter(Country == "Australia")
aus_pop %>%
model(RW(Population ~ drift())) %>%
forecast(h = 25) %>%
autoplot(aus_pop)bricks <- aus_production
bricks <- subset(bricks, Bricks != is.na(Bricks))
bricks %>% model(SNAIVE(Bricks)) %>%
forecast(h=25)%>%
autoplot(bricks)lambs <- subset(aus_livestock,Animal == "Lambs" & State == "New South Wales")
autoplot(lambs)## Plot variable not specified, automatically selected `.vars = Count`
lambs %>%
model(NAIVE(Count)) %>%
forecast(h=25) %>%
autoplot(lambs)house <- hh_budget
#autoplot(house,Wealth)
house %>%
model(NAIVE(Wealth)) %>%
forecast(h=25) %>%
autoplot(house)takeaway <- subset(aus_retail, Industry == "Takeaway food services")
autoplot(takeaway,Turnover)takeaway %>%
model(RW(Turnover ~ drift())) %>%
forecast(h = 25) %>%
autoplot(takeaway)+
facet_wrap(~State,scales ="free")5.2
Use the Facebook stock price (data set gafa_stock ) to do the following: a. Produce a time plot of the series. b. Produceforecastsusingthedriftmethodandplotthem. c. Showthattheforecastsareidenticaltoextendingthelinedrawnbetweenthe first and last observations. d. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
A and B
fb <- subset(gafa_stock,Symbol == "FB")
fb <- fb %>%
mutate(day = row_number()) %>%
update_tsibble(index = day, regular = TRUE)
autoplot(fb)## Plot variable not specified, automatically selected `.vars = Open`
fb %>%
model(RW(Open ~ drift())) %>%
forecast(h = 100) %>%
autoplot(fb)C
fb %>%
model(RW(Open ~ drift())) %>%
forecast(h = 100) %>%
autoplot(fb) +
geom_segment(aes(x = 0,y = fb$Open[1], xend = 1258, yend = fb$Open[1258]))## Warning: Use of `fb$Open` is discouraged. Use `Open` instead.
## Use of `fb$Open` is discouraged. Use `Open` instead.
D
In part D, I could not get SNAIVE to work. Perhaps if the open price were converted to an average for each quarter or month it would respond well, but I received errors with SNAIVE.
Naive:
fb %>%
model(NAIVE(Open)) %>%
forecast(h = 100) %>%
autoplot(fb)5.3
Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
# Extract data of interest
recent_production <- aus_production |> filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer)) # Look at the residuals
fit |> gg_tsresiduals()## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing non-finite values (stat_bin).
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)Q4 is consistently when the production of beer hits its peak for the given year. The residuals follow a near normal distribution, so the model provides a sufficient fit.
5.4
RepeatthepreviousexerciseusingtheAustralianExportsseriesfrom global_economy and the Bricks series from aus_production . Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
aus_exports <- subset(global_economy,Country == "Australia")
aus_exports %>%
model(RW(Exports ~ drift())) %>%
forecast(h = 25) %>%
autoplot(aus_exports)aus_exports %>%
model(NAIVE(Exports)) %>%
forecast(h = 25) %>%
autoplot(aus_exports)#aus_exports_nona <- subset(aus_exports,Exports != is.na(Exports))
#aus_exports_nona %>% model(SNAIVE(Exports)) %>%
# forecast(h=25)%>%
# autoplot(aus_exports_nona)
fit <- aus_exports |> model(NAIVE(Exports))
fit |> gg_tsresiduals()## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
fit |> forecast() |> autoplot(aus_exports)The Residuals plot is very near normally distributed. The Naive method is appropriate for the aus_exports data.
bricks <- aus_production
bricks <- subset(bricks, Bricks != is.na(Bricks))
bricks %>% model(SNAIVE(Bricks)) %>%
forecast(h=25)%>%
autoplot(bricks)bricks %>%
model(RW(Bricks ~ drift())) %>%
forecast(h = 25) %>%
autoplot(bricks)bricks %>%
model(NAIVE(Bricks)) %>%
forecast(h = 25) %>%
autoplot(bricks)fit <- bricks |> model(SNAIVE(Bricks))
fit |> gg_tsresiduals()## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing non-finite values (stat_bin).
fit |> forecast() |> autoplot(bricks)Bricks with Naive model:
fit <- bricks |> model(NAIVE(Bricks))
fit |> gg_tsresiduals()## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
fit |> forecast() |> autoplot(bricks)For the Bricks data, it is difficult to see whether Naive or Seasonal Naive is the right approach. The Seasonal Naive model is skewed to the left, but the standard Naive model shows a somewhat normal but left skewed residual plot as well.
The seasonal model’s ACF Plot shows a heavily autocorrelated pattern. The data is validated as non-random from the ACF plot for the Seasonal Naive approach. The standard Naive approach also shows a non-negligible correlation in the ACF plot.
Thus, either model would work well for the Bricks dataset, but SNAIVE may have the edge due to the heavy autocorrelation in the ACF plot.
5.7
Part A and B
set.seed(34)
aus_retail_series <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(aus_retail_series)## Plot variable not specified, automatically selected `.vars = Turnover`
gg_season(aus_retail_series)## Plot variable not specified, automatically selected `y = Turnover`
gg_subseries(aus_retail_series)## Plot variable not specified, automatically selected `y = Turnover`
gg_lag(aus_retail_series)## Plot variable not specified, automatically selected `y = Turnover`
aus_retail_series %>%
ACF(Turnover) %>%
autoplot()train_2011 <- subset(aus_retail_series,year(Month) < 2011)
autoplot(aus_retail_series, Turnover) + autolayer(train_2011, Turnover, colour = "red")Part C and D
fit <- train_2011 |> model(SNAIVE(Turnover))
fit |> gg_tsresiduals()## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing non-finite values (stat_bin).
The residuals are normally distributed and the ACF plot has a tail off shape at the beginning of the series.
Part E
fc <- fit |>
forecast(new_data = anti_join(aus_retail_series, train_2011))## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")
fc |> autoplot(aus_retail_series)The forecast fell short of matching the actual values in even the 95% confidence interval. It appears that the early 2000s were taken into much greater account than the growth that was observed leading up to 2000.
Part F
The accuracy of the forecast is significantly far off from the actual values.
fit |> accuracy()fc |> accuracy(aus_retail_series)Part G
The accuracy measures are very dependent on the data that was chosen as the training set. In this case, the training set may have been oversampled. The forecast was unable to project the steep increase post 2011. If the forecast were updated yearly after 2011, then the forecast would have been much more likely to have predicted the sharp increase.