Data 624 HW 3

9/24/2021

Gabe Abreu

Chapter 5 Exercises

5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)
global_economy%>%
    filter(Country == "Australia") %>%
    autoplot(Population)

The following series has no seasonality, only a clear upward trend. A drift model is appropriate as it will allow the trend to continue.

global_economy %>%
    filter(Country == "Australia") %>%
    model(RW(Population ~ drift())) %>%
    forecast(h="10 years") %>%
    autoplot(global_economy)

Bricks(aus_production)
aus_production %>%
    autoplot(Bricks)
## Warning: Removed 20 row(s) containing missing values (geom_path).

There is no clear trend but there is seasonality, therefore, a seasonal naive model is my choice for this series.

When running the forecast model the first time, it gave me an error due to NA values in the dataset. On the second time, I removed the NA values and the model was successfully rendered.

aus_production%>%
    filter(!is.na(Bricks)) %>%
    model(SNAIVE(Bricks ~ lag("year"))) %>%
    forecast(h = 6) %>%
    autoplot(aus_production)
## Warning: Removed 20 row(s) containing missing values (geom_path).

NSW Lambs(aus_livestock)
aus_livestock %>%
    filter(State == "New South Wales", Animal == "Lambs") %>%
    autoplot(Count)

There is no noticeable trend but there appears to be seasonlity. Let’s further investigate the seasonality of the series by creating an ACF plot.

aus_livestock %>%
    filter(State == "New South Wales", Animal == "Lambs") %>%
    ACF() %>%
    autoplot()
## Response variable not specified, automatically selected `var = Count`

The ACF plot shows definitive signs of seasonality. In this case, due to the seasonality, applying SNAIVE is optimal for forcasting.

aus_livestock %>%
    filter(State == "New South Wales", Animal == "Lambs") %>%
    model(SNAIVE(Count ~ lag("year"))) %>%
    forecast(h = 6) %>%
    autoplot(aus_livestock)

lambs_fit <- aus_livestock %>%
            filter(State == "New South Wales", Animal == "Lambs") %>%
            model(SNAIVE(Count ~ lag("year")))
lambs_forecast <- lambs_fit %>% forecast(h = 6)
Household Wealth(hh_budget)
hh_budget %>%
    autoplot(Wealth)

The series shows no signs of seasonality or cyclic periods, there is an upward trend. Because of the trending nature of the series, a RW Drift forecast will probably work best.

hh_budget %>%
    model(RW(Wealth ~ drift())) %>%
    forecast(h="5 years") %>%
    autoplot(hh_budget)

Australian takeaway food turnover (aus_retail)
aus_retail %>% filter(Industry == "Takeaway food services") %>% autoplot(Turnover)

The takeaway food services in Australia show signs of a strong upward trend and seasonality. We can decompose the series to find which component contributes the most to the series.

x11_dcmp <- aus_retail %>%
    filter(Industry == "Takeaway food services") %>%
    model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
    components()
autoplot(x11_dcmp) +
    labs(title = "Decomposition of Takeaway Food Services Time Series Data")

Given the decomposition, trend is a strong influence on the series. In this case, RW

takeawayModel <- aus_retail %>% 
    filter(Industry == "Takeaway food services") %>%
    model(RW(Turnover ~ drift())) 
takeawayForecast <- takeawayModel %>% forecast(h=12)

The model when plotted does not, so let’s select the top states for food services

takeawayForecast %>%
    filter(State %in% c("Victoria","New South Wales", "Queensland", "Western Australia" )) %>%
    autoplot(aus_retail)

5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

  1. Produce a time plot of the series
fbStock <- gafa_stock %>%
           filter(Symbol == "FB") %>%
           as_tsibble(key = "Symbol", index = "Date", regular = TRUE)
autoplot(fbStock)
## Plot variable not specified, automatically selected `.vars = Open`

  1. Produce forecasts using the drift method and plot them.
fb_stock <- gafa_stock %>%
  filter(Symbol == "FB") %>%
  mutate(day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE)

Compute a 30 day forecast

fb_stock %>%
    model(RW(Close ~ drift())) %>%
    forecast(h=30) %>%
    autoplot(fb_stock)

  1. Show that the forecasts are identical to extending the line drawn between the first and last observations.

Using the last mean value of the forecast (Day = 1288, Mean Closing Value = 132.9129), we can draw a line from the first observation and see that it is exactly a straight line between the first and last.

fb_stock %>%
    model(RW(Close ~ drift())) %>%
    forecast(h=30) -> fbforecast

fbplot <- ggplot(fb_stock, aes(x = day, y = Close)) + geom_line() 
fbplot + geom_segment(aes(x = 1, y = 54.7, xend = 1288, yend = 132.9129 ))

  1. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

I believe is that Naive will work best, since Naive works well with financial data and the time series isn’t seasonal, so Naive is the optimal function to forecast the data.

fb_stock %>%
    model(NAIVE(Close)) %>%
    forecast(h=30) %>%
    autoplot(fb_stock)

5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production %>%
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))
# Look at the residuals
fit %>% gg_tsresiduals()
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing non-finite values (stat_bin).

# Look a some forecasts
fit %>% forecast() %>% autoplot(recent_production)

What do you conclude?

The residuals do resemble white noise. The variance is more or less constant, the residual historgram shows the that the mean is close to zero, and approximately normally distributed. There is also no observable correlation. The residuals resemble what is expected of residuals from a good forecasting model.

5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Australian Exports
ausexports <- global_economy %>%
                filter(Country=="Australia") %>%
                select(c("Country", "Year", "Exports"))

# Define and estimate a model
fit <- ausexports %>% model(NAIVE(Exports))
# Look at the residuals
fit %>% gg_tsresiduals()
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).

The residuals resemble white noise, low autocorrelation, normally distributed, and a mean close to zero. There is also homoscedasticity.

# Look a some forecasts
fit %>% forecast(h = 5) %>% autoplot(ausexports)

Bricks
bricks_prod <- aus_production %>%
               select(c("Quarter", "Bricks"))

Apply SNAIVE as the time series shows seasonality.

fit2 <- bricks_prod %>% model(SNAIVE(Bricks))
fit2 %>% gg_tsresiduals()
## Warning: Removed 24 row(s) containing missing values (geom_path).
## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 24 rows containing non-finite values (stat_bin).

The residuals do not resemble white noise. There is seasonality in the acf plot, the mean is not close to zero and the histogram appears skewed to the left (not normally distributed).

fit2 <- bricks_prod %>% model(NAIVE(Bricks))
fit2 %>% gg_tsresiduals()
## Warning: Removed 21 row(s) containing missing values (geom_path).
## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing non-finite values (stat_bin).

The NAIVE model does not produce better results as there is seasonality. Neither model works in this case.

5.7

For your retail time series (from Exercise 8 in Section 2.10)

  1. Create a training dataset consisting of observations before 2011 using:
set.seed(128)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries %>%
  filter(year(Month) < 2011)
  1. Check that your data have been split appropriately by producing the following plot.
autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

  1. Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).
fit <- myseries_train %>%
  model(SNAIVE(Turnover))
  1. Check the residuals.
fit %>% gg_tsresiduals()
## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing non-finite values (stat_bin).

The residuals are skewed and there appears to be increasing variance as the levels increase. The ACF also demonstrates correlation.

  1. Produce forecasts for the test data
fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))
## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")
fc %>% autoplot(myseries)

  1. Compare the accuracy of your forecasts against the actual values.
fit %>% accuracy()
## # A tibble: 1 x 12
##   State   Industry  .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>   <chr>     <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Wester~ Other re~ SNAIV~ Trai~  8.18  14.4  10.9  6.05  8.11     1     1 0.725
fc %>% accuracy(myseries)
## # A tibble: 1 x 12
##   .model   State  Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>  <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(~ Weste~ Other r~ Test   78.7  87.4  78.9  21.6  21.7  7.27  6.06 0.591

The model performed much better on the training data set versus the testing set. The MAE and ME for the testing data set was almost 80%! The model failed to capture the upward trend of the series.

  1. How sensitive are the accuracy measures to the amount of training data used?

Increasing the length of the data did not add to the accuracy of the model in regards to the testing data set. This is surprising, typically one would conclude the more data avaiable or provided, the more accurate the model. In this case that notion isn’t accurate.

myseries_train2 <- myseries %>%
  filter(year(Month) < 2000)

fit2 <- myseries_train2 %>%
  model(SNAIVE(Turnover))
fc2 <- fit2 %>%
  forecast(new_data = anti_join(myseries, myseries_train2))
## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")
fc2 %>% autoplot(myseries)

fit2 %>% accuracy()
## # A tibble: 1 x 12
##   State   Industry  .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>   <chr>     <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Wester~ Other re~ SNAIV~ Trai~  4.89  9.35  7.30  5.76  8.00     1     1 0.625
fc2 %>% accuracy(myseries)
## # A tibble: 1 x 12
##   .model   State  Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>  <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(~ Weste~ Other r~ Test   134.  160.  134.  45.7  45.7  18.4  17.1 0.939