Australian Population (global_economy)
aus_population <- global_economy %>%
filter(Country == "Australia") %>%
select(Year, Population)
aus_population %>% model(NAIVE(Population)) %>%
forecast(h = 20)%>%
autoplot(aus_population)
Bricks (aus_production)
aus_production %>% na.omit() %>%
model(NAIVE(Bricks)) %>%
forecast(h = 20)%>%
autoplot(aus_production)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
NSW Lambs (aus_livestock)
nsw_lambs <- aus_livestock %>%
filter(Animal=='Lambs',State=='New South Wales') %>%
select(Count)
nsw_lambs %>% model(NAIVE(Count)) %>%
forecast(h = 20)%>%
autoplot(nsw_lambs)
Household wealth (hh_budget).
hh_budget %>% model(NAIVE(Wealth)) %>%
forecast(h = 15)%>%
autoplot(hh_budget)
Australian takeaway food turnover (aus_retail).
aus_food <- aus_retail %>%
filter(Industry=='Takeaway food services') %>%
select(Turnover)
aus_food %>% model(NAIVE(Turnover)) %>%
forecast(h = 15)%>%
autoplot(aus_food)
Produce a time plot of the series.
fb_stock <-gafa_stock %>%
filter(Symbol=='FB') %>%
mutate(Day = row_number()) %>%
update_tsibble(index=Day, regular=TRUE)
fb_stock %>% autoplot(Open)
Produce forecasts using the drift method and plot them.
fb_fc <- fb_stock %>% model(RW(Open ~ drift())) %>%
forecast(h = 100)
autoplot(fb_stock) + autolayer(fb_fc)
## Plot variable not specified, automatically selected `.vars = Open`
Show that the forecasts are identical to extending the line drawn between the first and last observations.
autoplot(fb_stock) + autolayer(fb_fc) +
geom_segment(x=min(fb_stock$Day), y = fb_stock$Open[which.min(fb_stock$Day)], xend=max(fb_stock$Day), yend = fb_stock$Open[which.max(fb_stock$Day)])
## Plot variable not specified, automatically selected `.vars = Open`
Try using some of the other benchmark functions to forecast the same data set.
fb_fc<- fb_stock %>% model(Mean = MEAN(Open),
Naive = NAIVE(Open),
SNaive = SNAIVE(Open),
Drift = RW(Open ~ drift())) %>% forecast(h=60)
## Warning: 1 error encountered for SNaive
## [1] Non-seasonal model specification provided, use RW() or provide a different lag specification.
autoplot(fb_stock) + autolayer(fb_fc)
## Plot variable not specified, automatically selected `.vars = Open`
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_line()`).
Which do you think is best? Why?
The drift method is best because it is the most accurate in forecasting the stock price. Since the stock price is trending upwards, the drift method is able to capture this trend since the data is non stationary since we are capturing the average rate of change which can help give a better forecast.
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)
What do you conclude?
There seems to be no white noise based on the residuals. The residual plot shows a binomial distribution which is not ideal for a white noise plot as well as the ACF plot revealing significant autocorrelation.
aus_bricks <- recent_production %>% filter(!is.na(Bricks))
aus_bricks_fc <- aus_bricks %>% model(NAIVE(Bricks))
aus_bricks_fc %>% gg_tsresiduals()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
The residuals don’t look like white noise. The histogram shows a bimodal distribution, which means the residuals probably don’t have a mean of 0 and aren’t normally distributed. Plus, the ACF graph shows that several lags go beyond the confidence limits, suggesting the residuals are correlated when they should be independent.
Forecasting
aus_bricks_fc %>% forecast(h=20) %>%
autoplot(aus_bricks)
set.seed(21)
myseries <- aus_retail|>
filter(`Series ID` == 'A3349849A')
myseries_train <- myseries |>
filter(year(Month) < 2011)
myseries |> autoplot(Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
fit <- myseries_train |>
model(SNAIVE(Turnover))
fit |> gg_tsresiduals()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).
Do the residuals appear to be uncorrelated and normally distributed?
The residuals don’t look like white noise. The histogram shows a bimodal distribution, which means the residuals probably don’t have a mean of 0 and aren’t normally distributed. Plus, the ACF graph shows that several lags go beyond the confidence limits, suggesting the residuals are correlated.
fc <- fit |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)
fit |> accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral… Cafes, … SNAIV… Trai… 0.985 3.37 2.53 5.05 16.1 1 1 0.826
fc |> accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Aust… Cafes, … Test 5.86 8.42 7.38 13.4 19.0 2.92 2.50 0.847
As more training data is added, the accuracy of the training model gets better, but the accuracy of the test model usually goes down in the same situation. This means that while the model learns well from the training data, it might not perform as well on new data.