Week 4 Forecasting |21-Feb 27-Feb

Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book.

5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

From below we can see a clear indication that pop. is increasing at a constant rate. The RW drift method is a forecasting method that unlike averages/naive methods does not have a constant (flat) forecast, instead the drift method can increase or decrease over time, this is why its a great method when it comes to forecasting linear trends.

global_economy %>%
  filter(Country == "Australia") %>%
  autoplot(Population)

Australia <- global_economy %>%
  filter(Country=="Australia")

Australia_popu_fit <- Australia %>%
  model(RW(Population ~ drift())) %>%
  forecast(h=10)


Australia_popu_fit %>% autoplot(Australia)

Bricks (aus_production)

From below we can interpret as seasonality detection. The most appropriate would be Seasonal Naive Method. Seasonal naive methods: This method is like the naive method but predicts the last observed value of the same season of the year.

aus_production %>%
  autoplot(Bricks)
## Warning: Removed 20 row(s) containing missing values (geom_path).

australia_bricks <- aus_production %>%
  filter(!is.na(Bricks))


australia_bricks %>%
  autoplot(Bricks)

We can see from SNAIVE use below the continuation of the pattern

australia_bricks_fit <- australia_bricks %>%
  model(SNAIVE(Bricks ~ lag("year"))) %>%
  forecast(h=5)
australia_bricks_fit %>% autoplot(australia_bricks) +
labs(title="SNAIVE Forecast of the Australia Brick Prod", 
       subtitle = "Five Year Forecast", 
       xlab="The Year" )

NSW Lambs (aus_livestock)

aus_livestock %>%
  filter(Animal == "Lambs",
         State == "New South Wales"
         ) %>%
  autoplot(Count) +
  labs(title = "The NSW Lambs")

aus_livestock %>%
  filter(State == "New South Wales",
         Animal == "Lambs") %>%
  model(NAIVE(Count)) %>%
  forecast(h = 15)
## # A fable: 15 x 6 [1M]
## # Key:     Animal, State, .model [1]
##    Animal State           .model          Month              Count  .mean
##    <fct>  <fct>           <chr>           <mth>             <dist>  <dbl>
##  1 Lambs  New South Wales NAIVE(Count) 2019 Jan N(351600, 2.5e+09) 351600
##  2 Lambs  New South Wales NAIVE(Count) 2019 Feb N(351600, 5.1e+09) 351600
##  3 Lambs  New South Wales NAIVE(Count) 2019 Mar N(351600, 7.6e+09) 351600
##  4 Lambs  New South Wales NAIVE(Count) 2019 Apr   N(351600, 1e+10) 351600
##  5 Lambs  New South Wales NAIVE(Count) 2019 May N(351600, 1.3e+10) 351600
##  6 Lambs  New South Wales NAIVE(Count) 2019 Jun N(351600, 1.5e+10) 351600
##  7 Lambs  New South Wales NAIVE(Count) 2019 Jul N(351600, 1.8e+10) 351600
##  8 Lambs  New South Wales NAIVE(Count) 2019 Aug   N(351600, 2e+10) 351600
##  9 Lambs  New South Wales NAIVE(Count) 2019 Sep N(351600, 2.3e+10) 351600
## 10 Lambs  New South Wales NAIVE(Count) 2019 Oct N(351600, 2.5e+10) 351600
## 11 Lambs  New South Wales NAIVE(Count) 2019 Nov N(351600, 2.8e+10) 351600
## 12 Lambs  New South Wales NAIVE(Count) 2019 Dec   N(351600, 3e+10) 351600
## 13 Lambs  New South Wales NAIVE(Count) 2020 Jan N(351600, 3.3e+10) 351600
## 14 Lambs  New South Wales NAIVE(Count) 2020 Feb N(351600, 3.5e+10) 351600
## 15 Lambs  New South Wales NAIVE(Count) 2020 Mar N(351600, 3.8e+10) 351600

The naive model would be sufficient in this case due to possible no trend/seasonality. From the plot below there doesn’t seem to be an upward or downward trend or a seasonal trend

aus_livestock %>%
  filter(State == "New South Wales",
         Animal == "Lambs") %>%
  model(NAIVE(Count)) %>%
  forecast(h = 15) %>%
  autoplot(aus_livestock)

Household wealth (hh_budget).

hh_budget %>% 
  autoplot(Wealth) + labs(title = "Household wealth")

From the graph we can see that there seem to be some sort of cycle but no seasonal trend. There is also a noteable upward trend for the last 7-8 years so we will perform a drift model.

hh_budget %>%
  model(drift = RW(Wealth ~ drift())) %>% 
  forecast(h = 10) %>% 
  autoplot(hh_budget) + 
  labs(title = 'Wealth')

Australian takeaway food turnover (aus_retail).

From the graph below we can see we have a noteable upward trend. A seasonal trend is possibly not clear by just looking at the graph (this appears to fluctate randomly).

aus_foturnover  <-
  aus_retail %>% 
  filter(stringr::str_detect(State,"Australian") &
           stringr::str_detect(Industry,"takeaway food")) %>% 
  select(c(Month,Turnover))
aus_foturnover %>% autoplot(Turnover) 

aus_retail <- tsibbledata::aus_retail

x <- aus_retail  %>% filter(Industry == 'Takeaway food services')  %>% summarize(Turnover=sum(Turnover))
fit<-x%>% model(RW(Turnover ~ drift()))
fit %>%
  forecast(h = "10 years")   %>%
  autoplot(x)+labs(title='Turnover')

5.2 Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

unique(gafa_stock$Symbol)
## [1] "AAPL" "AMZN" "FB"   "GOOG"
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close    Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
## 1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
## 2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
## 3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
## 4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
## 5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
## 6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
facebook_stock <- gafa_stock %>%
  filter(Symbol == 'FB', year(Date) >= 2015) %>%
  mutate(day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE) %>%
  select(Date, Close)

facebook_stock %>%
  autoplot(Close) +
  labs(y = '$US', title = 'The Facebook Stock Price')

Produce forecasts using the drift method and plot them.

From the text book

facebook_stock_2015 <- facebook_stock %>%
  filter(year(Date) == 2015) %>%
  select(day, Close)

facebook_stock_fit <- facebook_stock_2015 %>%
  model(Drift = RW(Close ~ drift()))

facebook_stock_2016 <- facebook_stock %>%
  filter(yearmonth(Date) == yearmonth('2016 Jan')) %>%
  select(day, Close)

facebook_forecast <- facebook_stock_fit %>%
  forecast(new_data = facebook_stock_2016)

facebook_forecast %>%
  autoplot(facebook_stock_2015, level = NULL) +
  autolayer(facebook_stock_2016, Close, color = 'black') +
  labs(y = '$US',
       title = 'Facebook daily closing stock prices',
       subtitle = 'Jan 2015 - Jan 2016'
       ) +
  guides(color = guide_legend((title = 'Forecasts')))

Show that the forecasts are identical to extending the line drawn between the first and last observations.

facebook_stock2 <- facebook_stock %>%
  filter(year(Date) == 2015)

facebook_forecast %>% 
  autoplot(facebook_stock2, level = NULL) +
  geom_line(data = slice(facebook_stock2, range(cumsum(!is.na(Close)))),
                         aes(y=Close), linetype = 'dashed')

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

facebook_fit2 <- facebook_stock_2015 %>%
  model(
    Mean = MEAN(Close),
    Naive = NAIVE(Close),
    Drift = NAIVE(Close ~ drift())
  )
# to make the forecasts for the trading days in January 2016
facebook_jan_2016 <- facebook_stock %>%
  filter(yearmonth(Date) == yearmonth("2016 Jan"))
facebook_forecast2 <- facebook_fit2 %>%
  forecast(new_data = facebook_jan_2016)
# Plotting
facebook_forecast2 %>%
  autoplot(facebook_stock_2015, level = NULL) +
  autolayer(facebook_jan_2016, Close, colour = "blue") +
  labs(y = "$USD",
       title = "FB dly closing stock prices",
       subtitle = "(Jan 2015 - Jan 2016)") +
  guides(colour = guide_legend(title = "The Forecast"))

The naïve method of forecasting dictates that we use the previous period to forecast for the next period.

5.3 Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

Extract data of interest

recent_production <- aus_production %>% filter(year(Quarter) >= 1992) # Define and estimate a model fit <- recent_production %>% model(SNAIVE(Beer)) # Look at the residuals fit %>% gg_tsresiduals() # Look a some forecasts fit %>% forecast() %>% autoplot(recent_production)

# Extract data of interest
recent_production <- aus_production %>%
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))
# Look at the residuals
fit %>% gg_tsresiduals()
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing non-finite values (stat_bin).

head(recent_production)
## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1992 Q1   443    5777    383   1289       38332   117
## 2 1992 Q2   410    5853    404   1501       39774   151
## 3 1992 Q3   420    6416    446   1539       42246   175
## 4 1992 Q4   532    5825    420   1568       38498   129
## 5 1993 Q1   433    5724    394   1450       39460   116
## 6 1993 Q2   421    6036    462   1668       41356   149
fit %>% forecast() %>% autoplot(recent_production)

What do you conclude?

White noise is an important concept in time series forecasting. If a time series is white noise, it is a sequence of random numbers and cannot be predicted. If the series of forecast errors are not white noise, it suggests improvements could be made to the predictive model. The residuals do appear as white noise.The seasonal naive method produces forecasts for the current availabe data. We coudld could seasonal naive forecast is valid.

5.4 Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

recent_production <- global_economy %>%
  filter(Country == 'Australia')


head(recent_production)
## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country   Code   Year          GDP Growth   CPI Imports Exports Population
##   <fct>     <fct> <dbl>        <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Australia AUS    1960 18573188487.  NA     7.96    14.1    13.0   10276477
## 2 Australia AUS    1961 19648336880.   2.49  8.14    15.0    12.4   10483000
## 3 Australia AUS    1962 19888005376.   1.30  8.12    12.6    13.9   10742000
## 4 Australia AUS    1963 21501847911.   6.21  8.17    13.8    13.0   10950000
## 5 Australia AUS    1964 23758539590.   6.98  8.40    13.8    14.9   11167000
## 6 Australia AUS    1965 25931235301.   5.98  8.69    15.3    13.2   11388000
tail(recent_production)
## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country   Code   Year     GDP Growth   CPI Imports Exports Population
##   <fct>     <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Australia AUS    2012 1.54e12   3.89  105.    21.8    21.5   22742475
## 2 Australia AUS    2013 1.57e12   2.64  108.    21.3    20.0   23145901
## 3 Australia AUS    2014 1.46e12   2.56  110.    21.5    21.1   23504138
## 4 Australia AUS    2015 1.35e12   2.35  112.    21.5    20.0   23850784
## 5 Australia AUS    2016 1.21e12   2.83  113.    21.5    19.3   24210809
## 6 Australia AUS    2017 1.32e12   1.96  116.    20.6    21.3   24598933
fit <- recent_production %>% model(NAIVE(Exports))

fit %>% gg_tsresiduals()
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).

fit %>% forecast() %>% autoplot(recent_production)

mean(augment(fit)$.innov , na.rm = TRUE)
## [1] 0.1451912

BRICKS

x_bricks <- aus_production %>% 
  select(Bricks)

f_bricks <- x_bricks %>% model(SNAIVE(Bricks))

f_bricks %>% gg_tsresiduals()
## Warning: Removed 24 row(s) containing missing values (geom_path).
## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 24 rows containing non-finite values (stat_bin).

f_bricks %>% forecast() %>% autoplot(x_bricks)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 8 row(s) containing missing values (geom_path).
## Warning: Removed 20 row(s) containing missing values (geom_path).

In the case of bricks data, neither SNAIVE or NAIVE looks like a good candidate to model the residuals.I believe there is non-white noise. This suggests that the residuals do not look like white noise and the forecast model can possibly be bettered.

5.7 For your retail time series (from Exercise 8 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

set.seed(718)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))


myseries_train <- myseries %>%
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train %>%
  model(SNAIVE(Turnover))

Check the residuals.

fit %>% gg_tsresiduals()
## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing non-finite values (stat_bin).

mean(augment(fit)$.innov , na.rm = TRUE)
## [1] 0.9867868

Do the residuals appear to be uncorrelated and normally distributed?

The residuals are normally distributed with a mean of 0. However, the residuals seem to be correlated. The ACF graph shows that there is significant correlation in most lag periods. Moreover, the autocorrelation changes from positive to negative at lag 10.

Produce forecasts for the test data

fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))
## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")
fc %>% autoplot(myseries)

The residuals are close to being normally distributed. Perhaps there is room for error. acf plot shows that there is still a lot of patterns that are not accounted for in this simple model.

Compare the accuracy of your forecasts against the actual values.

fit %>% accuracy()
## # A tibble: 1 x 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral~ Electri~ SNAIV~ Trai~ 0.987  3.05  2.18  5.27  13.0     1     1 0.755
fc %>% accuracy(myseries)
## # A tibble: 1 x 12
##   .model   State Industry .type     ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr> <chr>    <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(~ Aust~ Electri~ Test  -0.508  3.65  2.99 -2.83  9.61  1.37  1.20 0.713

How sensitive are the accuracy measures to the amount of training data used? I would say accuracy measures are pretty sensitive to the amount of training data in use. By adding more data one can conclue that possibly forecast accuracy readings are affected in negative manner.