Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

aus_population <- global_economy %>%
  filter(Country == "Australia") %>%
  select(Year, Population)

 aus_population %>% model(NAIVE(Population))  %>% 
   forecast(h = 20)%>% 
   autoplot(aus_population)

Bricks (aus_production)

aus_production %>% na.omit() %>%
  model(NAIVE(Bricks))  %>% 
   forecast(h = 20)%>% 
   autoplot(aus_production)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

NSW Lambs (aus_livestock)

nsw_lambs <- aus_livestock %>% 
  filter(Animal=='Lambs',State=='New South Wales') %>% 
  select(Count)

nsw_lambs %>% model(NAIVE(Count))  %>% 
   forecast(h = 20)%>% 
   autoplot(nsw_lambs)

Household wealth (hh_budget).

hh_budget %>% model(NAIVE(Wealth))  %>% 
   forecast(h = 15)%>% 
   autoplot(hh_budget)

Australian takeaway food turnover (aus_retail).

aus_food <- aus_retail %>% 
  filter(Industry=='Takeaway food services') %>% 
  select(Turnover)

aus_food %>% model(NAIVE(Turnover))  %>% 
   forecast(h = 15)%>% 
   autoplot(aus_food)

2. Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

fb_stock <-gafa_stock %>%
  filter(Symbol=='FB') %>% 
  mutate(Day = row_number()) %>% 
  update_tsibble(index=Day, regular=TRUE)

fb_stock %>% autoplot(Open)

Produce forecasts using the drift method and plot them.

fb_fc <- fb_stock %>% model(RW(Open ~ drift()))  %>% 
   forecast(h = 100)

autoplot(fb_stock) + autolayer(fb_fc)

## Plot variable not specified, automatically selected `.vars = Open`

Show that the forecasts are identical to extending the line drawn between the first and last observations.

autoplot(fb_stock) + autolayer(fb_fc) + 
  geom_segment(x=min(fb_stock$Day), y = fb_stock$Open[which.min(fb_stock$Day)], xend=max(fb_stock$Day), yend = fb_stock$Open[which.max(fb_stock$Day)])

## Plot variable not specified, automatically selected `.vars = Open`

Try using some of the other benchmark functions to forecast the same data set.

fb_fc<- fb_stock %>% model(Mean = MEAN(Open),
                        Naive = NAIVE(Open),
                        SNaive = SNAIVE(Open),
                        Drift = RW(Open ~ drift())) %>% forecast(h=60)

## Warning: 1 error encountered for SNaive
## [1] Non-seasonal model specification provided, use RW() or provide a different lag specification.

autoplot(fb_stock) + autolayer(fb_fc)

## Plot variable not specified, automatically selected `.vars = Open`

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_line()`).

Which do you think is best? Why?

The drift method is best because it is the most accurate in forecasting the stock price. Since the stock price is trending upwards, the drift method is able to capture this trend since the data is non stationary since we are capturing the average rate of change which can help give a better forecast.

3.Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

What do you conclude?

There seems to be no white noise based on the residuals. The residual plot shows a binomial distribution which is not ideal for a white noise plot as well as the ACF plot revealing significant autocorrelation.

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

aus_bricks <- recent_production %>% filter(!is.na(Bricks))
aus_bricks_fc <- aus_bricks %>% model(NAIVE(Bricks))

aus_bricks_fc %>% gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

The residuals don’t look like white noise. The histogram shows a bimodal distribution, which means the residuals probably don’t have a mean of 0 and aren’t normally distributed. Plus, the ACF graph shows that several lags go beyond the confidence limits, suggesting the residuals are correlated when they should be independent.

Forecasting

aus_bricks_fc %>% forecast(h=20) %>% 
  autoplot(aus_bricks)

For your retail time series (from Exercise 7 in Section 2.10):

set.seed(21)
myseries <- aus_retail|>
  filter(`Series ID` == 'A3349849A')

Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries |>
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

myseries |> autoplot(Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

Check the residuals.

fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

Do the residuals appear to be uncorrelated and normally distributed?

Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral… Cafes, … SNAIV… Trai… 0.985  3.37  2.53  5.05  16.1     1     1 0.826

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Aust… Cafes, … Test   5.86  8.42  7.38  13.4  19.0  2.92  2.50 0.847

How sensitive are the accuracy measures to the amount of training data used?

As more training data is added, the accuracy of the training model gets better, but the accuracy of the test model usually goes down in the same situation. This means that while the model learns well from the training data, it might not perform as well on new data.

Data HW 3

Mikhail Broomes

2024-09-22

2. Use the Facebook stock price (data set gafa_stock) to do the following:

3.Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.