Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
global_economy_hw <- global_economy %>% mutate(Country = as.character(Country))
global_economy_hw <- global_economy_hw |> filter(Country == 'Austria') |> select(Year, Country, Population) |> as_tsibble(index = Year, key = Country)
#global_economy_hw |> autoplot()
pop_mod <- global_economy_hw |> model(RW(Population ~ drift()))
pop_fc <- pop_mod |> forecast()
pop_fc |> autoplot() +
autolayer(global_economy_hw,
colour = "black"
)
## Plot variable not specified, automatically selected `.vars = Population`
#global_economy_hw |> autoplot()
brc_df <- aus_production |> select(Bricks)
brc_mod <- brc_df |> filter(!is.na(Bricks)) |> model(SNAIVE(Bricks))
brc_fc <- brc_mod |> forecast()
brc_fc |> autoplot() +
autolayer(brc_df,
colour = "black"
)
## Plot variable not specified, automatically selected `.vars = Bricks`
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
lamb_df <- aus_livestock |> filter(State == 'New South Wales', Animal == 'Lambs')
lamb_mod <- lamb_df |> model(MEAN(Count))
lamb_fc <- lamb_mod |> forecast()
lamb_fc |> autoplot() +
autolayer(lamb_df,
colour = "black"
)
## Plot variable not specified, automatically selected `.vars = Count`
house_df <- hh_budget |> select(Year, Wealth) |> as_tsibble(index = Year)
house_mod <- house_df |> model(RW(Wealth ~ drift()) )
house_fc <- house_mod |> forecast()
house_fc |> autoplot() +
autolayer(house_df,
colour = "black"
)
## Plot variable not specified, automatically selected `.vars = Wealth`
Here I grouped by Month to and created a total turnover for Australian called ‘Tot_Turnover’
retail_df <- aus_retail |> filter(Industry == 'Takeaway food services') |> as.data.frame() |> group_by(Month) |> mutate(Tot_Turnover = sum(Turnover)) %>% select(Month, Tot_Turnover) |> ungroup() |> distinct() |> as_tsibble(index = Month)
retail_mod <- retail_df |> model(RW(Tot_Turnover ~ drift()) )
retail_fc <- retail_mod |> forecast()
retail_fc |> autoplot() +
autolayer(retail_df,
colour = "black"
)
## Plot variable not specified, automatically selected `.vars = Tot_Turnover`
Use the Facebook stock price (data set gafa_stock) to do the following:
fb_df <- gafa_stock |> filter(Symbol == 'FB') %>% mutate(new_index = row_number()) %>% select(new_index, Close) |> tsibble(index = new_index)
fb_df |> autoplot()
## Plot variable not specified, automatically selected `.vars = Close`
fb_mod <- fb_df |> model(RW(Close ~ drift()))
fb_fc <- fb_mod |> forecast(h=50)
fb_fc |> autoplot() +
autolayer(fb_df,
colour = "black"
)
## Plot variable not specified, automatically selected `.vars = Close`
Note to self: I had to switch the order of the plots to make the geom_segment appear
fb_df |> autoplot() + autolayer(fb_fc,
colour = "blue") +
geom_segment(
x = min(fb_df$new_index),
y = fb_df$Close[which.min(fb_df$new_index)],
xend = max(fb_df$new_index),
yend = fb_df$Close[which.max(fb_df$new_index)], colour = "red"
)
## Plot variable not specified, automatically selected `.vars = Close`
This is an interesting question, based on the efficient market theory the last value is a best estimate of what the stock is worth so NAIVE forecast might be the best estimate.
Then again the stock is in a downward trend so maybe the historic mean which is lower is a better estimate.
Possibly this trend will reverse and the Random Walk is the best forecast? I don’t think there is a clear answer without some forecast accuracy testing.
fb_mod2 <- fb_df |> model(RW(Close ~ drift()), NAIVE(Close), MEAN(Close))
fb_fc2 <- fb_mod2 |> forecast(h=150)
fb_fc2 |> autoplot() + autolayer(fb_df,
colour = "black")
## Plot variable not specified, automatically selected `.vars = Close`
Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
What do you conclude?
The residuals appear to be uncorrelated and have a mean of zero. In the afc plot, there is only one value that crosses 0.2 further supporting the claim that the residuals are uncorrelated. The histogram of the residuals is slightly bimodal but not skewed showing that it is centered on zero.
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)
Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
Wow, the Bricks time series has some issues, the residuals are not centered on zero and are also correlated based on the acf plot. I would recommend a transformation.
# Extract data of interest
brc_df <- aus_production |> select(Bricks)
# Define and estimate a model
brc_mod <- brc_df |> filter(!is.na(Bricks)) |> model(SNAIVE(Bricks))
# Look at the residuals
brc_mod |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).
# Look a some forecasts
brc_mod |> forecast() |> autoplot(brc_df)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
The Exports data appears to have uncorrelated residuals and have residuals that are centered around zero. The acf plot shows no correlation and the histogram is approaching a normal distribution.
# Extract data of interest
global_economy_hw <- global_economy %>% mutate(Country = as.character(Country))
global_economy_hw <- global_economy_hw |> filter(Country == 'Austria') |> select(Year, Country, Exports) |> as_tsibble(index = Year, key = Country)
# Define and estimate a model
exprt_mod <- global_economy_hw |> model(NAIVE(Exports))
# Look at the residuals
exprt_mod |> gg_tsresiduals()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
# Look a some forecasts
exprt_mod |> forecast() |> autoplot(global_economy_hw)
For your retail time series (from Exercise 7 in Section 2.10):
set.seed(555)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
filter(year(Month) < 2011)
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
fit <- myseries_train |>
model(SNAIVE(Turnover))
The residuals are NOT uncorrelated and are NOT normally distributed.
fit |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).
fc <- fit |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)
The accuracy on the test data is much lower compared to the full dataset. I think this might be due in part to the fact that the residuals are correlated and not centered around zero.
fit |> accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Wester… Departm… SNAIV… Trai… 4.76 8.10 6.26 4.82 6.25 1 1 0.0518
fc |> accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… West… Departm… Test 1.73 7.97 6.24 0.746 3.57 0.996 0.983 0.107
I’m going to reduce the amount of training data (cut of the training data in 2000) and see how that changes the accuracy measures.
myseries_train2 <- myseries |>
filter(year(Month) < 2000)
fit2 <- myseries_train2 |>
model(SNAIVE(Turnover))
fc2 <- fit2 |>
forecast(new_data = anti_join(myseries, myseries_train2))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc2 |> autoplot(myseries)
The accuracy measures are sensitive to the quality of the model. By reducing the training data to the point that I missed an important trend means that that my model was way off resulting in a massive decrease in accuracy. The RSME went from 6.1 to 58.7.
fit2 |> accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Weste… Departm… SNAIV… Trai… 3.87 6.12 4.94 5.20 6.54 1 1 -0.0404
fc2 |> accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… West… Departm… Test 51.8 58.7 52.0 31.1 31.3 10.5 9.60 0.726