Exercise 5.1.

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

global_economy %>%
     filter(Country == "Australia") %>%
  autoplot(Population) + labs(title = "Population of Australia")

we can observer an increasing trend with little fluctuation, it is good to use Driff method for forecasting.

global_economy %>% 
  filter(Country == "Australia") %>%
  model(RW(Population ~ drift())) %>%
  forecast(h = 10) %>%
  autoplot(global_economy) +
  labs(title = "Forecasted in next 10 years")

Bricks (aus_production)

aus_production %>%
  autoplot(Bricks) + ggtitle("production of brick in Austalia")

## Warning: Removed 20 rows containing missing values (`geom_line()`).

Since data shows seasonal trend, it is a good choice to use the SNAIVE() method.

aus_production %>%
  filter(!is.na(Bricks)) %>%
  model(SNAIVE(Bricks)) %>%
  forecast(h = 12) %>% 
  autoplot(aus_production) + labs(title = "Forecasted in next 3 years")

## Warning: Removed 20 rows containing missing values (`geom_line()`).

NSW Lambs (aus_livestock)

aus_livestock %>%
  filter(Animal == "Lambs") %>%
  filter(State == "New South Wales") %>%
  autoplot() + ggtitle("Australian lambs livestock")

## Plot variable not specified, automatically selected `.vars = Count`

The data shows many fluctuations, white noise. We can use Seasonal Naive method to forecast.

aus_livestock %>%
  filter(Animal == "Lambs") %>%
  filter(State == "New South Wales") %>%
  model(SNAIVE(Count)) %>% 
  forecast(h = 10) %>%
  autoplot(aus_livestock) + labs(title = "Lambs stock forecasted in 5 years")

Household wealth (hh_budget).

hh_budget %>%
  autoplot(Wealth) + ggtitle("Household Wealth Australia")

The data shows each country has a increasing trend with little fluctuation, no seasonal pattern. we can Drift method to forecast.

hh_budget %>%
  model(RW(Wealth ~ drift())) %>%
  forecast( h = 5) %>%
  autoplot(hh_budget) + ggtitle("forecast in next 5 years")

Australian takeaway food turnover (aus_retail).

aus_retail %>%
  filter(Industry == "Takeaway food services") %>%
  autoplot(Turnover) + ggtitle("Australian takeaway food turnover")

The data shows increasing trend in most of the states with small fluctation, we can use NAIVE medthod in this case.

aus_retail %>%
  filter(Industry == "Takeaway food services") %>%
  model(RW(Turnover ~ drift())) %>%
  forecast(h =36) %>%                              
  autoplot(aus_retail) + ggtitle("forecasting next 3 years") +
  facet_wrap(~State)

Exercise 5.2.

a.

Produce a time plot of the series.

gafa_stock %>%
  filter(Symbol == "FB") %>%
  autoplot(Close) + ggtitle("Facebook open price")

b.

Produce forecasts using the drift method and plot them.

fb_stock <- gafa_stock %>%
  filter(Symbol == "FB") %>%
  mutate(day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE) 

fb_stock %>%
  model(RW(Close~ drift())) %>%
  forecast(h = 90) %>%
  autoplot(fb_stock) + ggtitle("forecast Drift method")

c.

Show that the forecasts are identical to extending the line drawn between the first and last observations.

fb_stock %>%
  model(RW(Open ~ drift())) %>%
  forecast(h = 90) %>%
  autoplot(fb_stock, level = NULL) +
  geom_line(data = slice(fb_stock, range(cumsum(!is.na(Close)))),
                         aes(y=Close), linetype = 'dashed')

d.

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fb_stock %>%
  model(Mean = MEAN(Open),
        Naive = NAIVE(Open),
        Drift = NAIVE(Open ~ drift())) %>%
  forecast(h = 90) %>%
  autoplot(fb_stock, level = NULL)

Overall, these 3 can be observed to be poor forecasted, however, Drift appears to be a good choice over the rest.

Excercise 5.3.

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. What do you conclude?

# Extract data of interest
recent_production <- aus_production %>%
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))
# Look at the residuals
fit %>% gg_tsresiduals()

## Warning: Removed 4 rows containing missing values (`geom_line()`).

## Warning: Removed 4 rows containing missing values (`geom_point()`).

## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).

The graph and acf plot show that the residuals seem to be as white noise. Let’s double check with Ljung-Box test.

library(stats)
Box.test(recent_production$Beer, lag = 24, type = "Ljung")

## 
##  Box-Ljung test
## 
## data:  recent_production$Beer
## X-squared = 494.29, df = 24, p-value < 2.2e-16

fit %>% forecast() %>% autoplot(recent_production)

From the Ljung-Box test we have p-value appear to be small and we can confirm the residuals appear to be as white noise. The ACF plot shows that lag 4 is larger than the others which can be attributed to peaks occurring every 4 quarters in Q4, and troughs occurring every Q2. In overral, the seasonal naive method produces forecasts pretty okay.

Exercise 5.7.

a.

Create a training dataset consisting of observations before 2011 using

set.seed(12)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

b.

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c.

Fit a seasonal naïve model using SNAIVE() applied to your training data

fit <- myseries_train %>%
  model(SNAIVE(Turnover))

d.

Check the residuals.

fit %>% gg_tsresiduals()

## Warning: Removed 12 rows containing missing values (`geom_line()`).

## Warning: Removed 12 rows containing missing values (`geom_point()`).

## Warning: Removed 12 rows containing non-finite values (`stat_bin()`).

The ACF plot shows that there are many big spike at from lag1 to lag9, which means their correlation are significant different than 0. And the residuals plot shows a right skewed. So we can say the residuals appear to be autocorrelated and not normally distributed.

e.

Produce forecasts for the test data

fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

fc %>% autoplot(myseries)

The forecasts does not appear to be correct. It does forecast a good seasonal shape but does not catch the increasing trend.

f.

Compare the accuracy of your forecasts against the actual values.

fit %>% accuracy()

fc %>% accuracy(myseries)

g.

How sensitive are the accuracy measures to the amount of training data used?

The accuracy measures are sensitive to the amount of training data used. With less training data, the model is less accurate, and the forecasts are less accurate as well. With more training data, the model is more accurate, and the forecasts are more accurate as well. The amount of training data needed to achieve accurate forecasts depends on the complexity of the time series and the forecasting model used.

Stat. 674 Homework 04

Dan Hoang - cu2107

Exercise 5.1.

Exercise 5.2.

a.

b.

c.

d.

Excercise 5.3.

Exercise 5.7.

a.

b.

c.

d.

e.

f.

g.