Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book.
5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
Australian Population (global_economy)
From below we can see a clear indication that pop. is increasing at a constant rate. The RW drift method is a forecasting method that unlike averages/naive methods does not have a constant (flat) forecast, instead the drift method can increase or decrease over time, this is why its a great method when it comes to forecasting linear trends.
global_economy %>%
filter(Country == "Australia") %>%
autoplot(Population)
Australia <- global_economy %>%
filter(Country=="Australia")
Australia_popu_fit <- Australia %>%
model(RW(Population ~ drift())) %>%
forecast(h=10)
Australia_popu_fit %>% autoplot(Australia)
Bricks (aus_production)
From below we can interpret as seasonality detection. The most appropriate would be Seasonal Naive Method. Seasonal naive methods: This method is like the naive method but predicts the last observed value of the same season of the year.
aus_production %>%
autoplot(Bricks)
## Warning: Removed 20 row(s) containing missing values (geom_path).
australia_bricks <- aus_production %>%
filter(!is.na(Bricks))
australia_bricks %>%
autoplot(Bricks)
We can see from SNAIVE use below the continuation of the pattern
australia_bricks_fit <- australia_bricks %>%
model(SNAIVE(Bricks ~ lag("year"))) %>%
forecast(h=5)
australia_bricks_fit %>% autoplot(australia_bricks) +
labs(title="SNAIVE Forecast of the Australia Brick Prod",
subtitle = "Five Year Forecast",
xlab="The Year" )
NSW Lambs (aus_livestock)
aus_livestock %>%
filter(Animal == "Lambs",
State == "New South Wales"
) %>%
autoplot(Count) +
labs(title = "The NSW Lambs")
aus_livestock %>%
filter(State == "New South Wales",
Animal == "Lambs") %>%
model(NAIVE(Count)) %>%
forecast(h = 15)
## # A fable: 15 x 6 [1M]
## # Key: Animal, State, .model [1]
## Animal State .model Month Count .mean
## <fct> <fct> <chr> <mth> <dist> <dbl>
## 1 Lambs New South Wales NAIVE(Count) 2019 Jan N(351600, 2.5e+09) 351600
## 2 Lambs New South Wales NAIVE(Count) 2019 Feb N(351600, 5.1e+09) 351600
## 3 Lambs New South Wales NAIVE(Count) 2019 Mar N(351600, 7.6e+09) 351600
## 4 Lambs New South Wales NAIVE(Count) 2019 Apr N(351600, 1e+10) 351600
## 5 Lambs New South Wales NAIVE(Count) 2019 May N(351600, 1.3e+10) 351600
## 6 Lambs New South Wales NAIVE(Count) 2019 Jun N(351600, 1.5e+10) 351600
## 7 Lambs New South Wales NAIVE(Count) 2019 Jul N(351600, 1.8e+10) 351600
## 8 Lambs New South Wales NAIVE(Count) 2019 Aug N(351600, 2e+10) 351600
## 9 Lambs New South Wales NAIVE(Count) 2019 Sep N(351600, 2.3e+10) 351600
## 10 Lambs New South Wales NAIVE(Count) 2019 Oct N(351600, 2.5e+10) 351600
## 11 Lambs New South Wales NAIVE(Count) 2019 Nov N(351600, 2.8e+10) 351600
## 12 Lambs New South Wales NAIVE(Count) 2019 Dec N(351600, 3e+10) 351600
## 13 Lambs New South Wales NAIVE(Count) 2020 Jan N(351600, 3.3e+10) 351600
## 14 Lambs New South Wales NAIVE(Count) 2020 Feb N(351600, 3.5e+10) 351600
## 15 Lambs New South Wales NAIVE(Count) 2020 Mar N(351600, 3.8e+10) 351600
The naive model would be sufficient in this case due to possible no trend/seasonality. From the plot below there doesn’t seem to be an upward or downward trend or a seasonal trend
aus_livestock %>%
filter(State == "New South Wales",
Animal == "Lambs") %>%
model(NAIVE(Count)) %>%
forecast(h = 15) %>%
autoplot(aus_livestock)
Household wealth (hh_budget).
hh_budget %>%
autoplot(Wealth) + labs(title = "Household wealth")
From the graph we can see that there seem to be some sort of cycle but no seasonal trend. There is also a noteable upward trend for the last 7-8 years so we will perform a drift model.
hh_budget %>%
model(drift = RW(Wealth ~ drift())) %>%
forecast(h = 10) %>%
autoplot(hh_budget) +
labs(title = 'Wealth')
Australian takeaway food turnover (aus_retail).
From the graph below we can see we have a noteable upward trend. A seasonal trend is possibly not clear by just looking at the graph (this appears to fluctate randomly).
aus_foturnover <-
aus_retail %>%
filter(stringr::str_detect(State,"Australian") &
stringr::str_detect(Industry,"takeaway food")) %>%
select(c(Month,Turnover))
aus_foturnover %>% autoplot(Turnover)
aus_retail <- tsibbledata::aus_retail
x <- aus_retail %>% filter(Industry == 'Takeaway food services') %>% summarize(Turnover=sum(Turnover))
fit<-x%>% model(RW(Turnover ~ drift()))
fit %>%
forecast(h = "10 years") %>%
autoplot(x)+labs(title='Turnover')
5.2 Use the Facebook stock price (data set gafa_stock) to do the following:
Produce a time plot of the series.
unique(gafa_stock$Symbol)
## [1] "AAPL" "AMZN" "FB" "GOOG"
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
facebook_stock <- gafa_stock %>%
filter(Symbol == 'FB', year(Date) >= 2015) %>%
mutate(day = row_number()) %>%
update_tsibble(index = day, regular = TRUE) %>%
select(Date, Close)
facebook_stock %>%
autoplot(Close) +
labs(y = '$US', title = 'The Facebook Stock Price')
Produce forecasts using the drift method and plot them.
From the text book
facebook_stock_2015 <- facebook_stock %>%
filter(year(Date) == 2015) %>%
select(day, Close)
facebook_stock_fit <- facebook_stock_2015 %>%
model(Drift = RW(Close ~ drift()))
facebook_stock_2016 <- facebook_stock %>%
filter(yearmonth(Date) == yearmonth('2016 Jan')) %>%
select(day, Close)
facebook_forecast <- facebook_stock_fit %>%
forecast(new_data = facebook_stock_2016)
facebook_forecast %>%
autoplot(facebook_stock_2015, level = NULL) +
autolayer(facebook_stock_2016, Close, color = 'black') +
labs(y = '$US',
title = 'Facebook daily closing stock prices',
subtitle = 'Jan 2015 - Jan 2016'
) +
guides(color = guide_legend((title = 'Forecasts')))
Show that the forecasts are identical to extending the line drawn between the first and last observations.
facebook_stock2 <- facebook_stock %>%
filter(year(Date) == 2015)
facebook_forecast %>%
autoplot(facebook_stock2, level = NULL) +
geom_line(data = slice(facebook_stock2, range(cumsum(!is.na(Close)))),
aes(y=Close), linetype = 'dashed')
Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
facebook_fit2 <- facebook_stock_2015 %>%
model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
Drift = NAIVE(Close ~ drift())
)
# to make the forecasts for the trading days in January 2016
facebook_jan_2016 <- facebook_stock %>%
filter(yearmonth(Date) == yearmonth("2016 Jan"))
facebook_forecast2 <- facebook_fit2 %>%
forecast(new_data = facebook_jan_2016)
# Plotting
facebook_forecast2 %>%
autoplot(facebook_stock_2015, level = NULL) +
autolayer(facebook_jan_2016, Close, colour = "blue") +
labs(y = "$USD",
title = "FB dly closing stock prices",
subtitle = "(Jan 2015 - Jan 2016)") +
guides(colour = guide_legend(title = "The Forecast"))
The naïve method of forecasting dictates that we use the previous period to forecast for the next period.
5.3 Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
recent_production <- aus_production %>% filter(year(Quarter) >= 1992) # Define and estimate a model fit <- recent_production %>% model(SNAIVE(Beer)) # Look at the residuals fit %>% gg_tsresiduals() # Look a some forecasts fit %>% forecast() %>% autoplot(recent_production)
# Extract data of interest
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))
# Look at the residuals
fit %>% gg_tsresiduals()
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing non-finite values (stat_bin).
head(recent_production)
## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1992 Q1 443 5777 383 1289 38332 117
## 2 1992 Q2 410 5853 404 1501 39774 151
## 3 1992 Q3 420 6416 446 1539 42246 175
## 4 1992 Q4 532 5825 420 1568 38498 129
## 5 1993 Q1 433 5724 394 1450 39460 116
## 6 1993 Q2 421 6036 462 1668 41356 149
fit %>% forecast() %>% autoplot(recent_production)
What do you conclude?
White noise is an important concept in time series forecasting. If a time series is white noise, it is a sequence of random numbers and cannot be predicted. If the series of forecast errors are not white noise, it suggests improvements could be made to the predictive model. The residuals do appear as white noise.The seasonal naive method produces forecasts for the current availabe data. We coudld could seasonal naive forecast is valid.
5.4 Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
recent_production <- global_economy %>%
filter(Country == 'Australia')
head(recent_production)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Australia AUS 1960 18573188487. NA 7.96 14.1 13.0 10276477
## 2 Australia AUS 1961 19648336880. 2.49 8.14 15.0 12.4 10483000
## 3 Australia AUS 1962 19888005376. 1.30 8.12 12.6 13.9 10742000
## 4 Australia AUS 1963 21501847911. 6.21 8.17 13.8 13.0 10950000
## 5 Australia AUS 1964 23758539590. 6.98 8.40 13.8 14.9 11167000
## 6 Australia AUS 1965 25931235301. 5.98 8.69 15.3 13.2 11388000
tail(recent_production)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Australia AUS 2012 1.54e12 3.89 105. 21.8 21.5 22742475
## 2 Australia AUS 2013 1.57e12 2.64 108. 21.3 20.0 23145901
## 3 Australia AUS 2014 1.46e12 2.56 110. 21.5 21.1 23504138
## 4 Australia AUS 2015 1.35e12 2.35 112. 21.5 20.0 23850784
## 5 Australia AUS 2016 1.21e12 2.83 113. 21.5 19.3 24210809
## 6 Australia AUS 2017 1.32e12 1.96 116. 20.6 21.3 24598933
fit <- recent_production %>% model(NAIVE(Exports))
fit %>% gg_tsresiduals()
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
fit %>% forecast() %>% autoplot(recent_production)
mean(augment(fit)$.innov , na.rm = TRUE)
## [1] 0.1451912
BRICKS
x_bricks <- aus_production %>%
select(Bricks)
f_bricks <- x_bricks %>% model(SNAIVE(Bricks))
f_bricks %>% gg_tsresiduals()
## Warning: Removed 24 row(s) containing missing values (geom_path).
## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 24 rows containing non-finite values (stat_bin).
f_bricks %>% forecast() %>% autoplot(x_bricks)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 8 row(s) containing missing values (geom_path).
## Warning: Removed 20 row(s) containing missing values (geom_path).
In the case of bricks data, neither SNAIVE or NAIVE looks like a good candidate to model the residuals.I believe there is non-white noise. This suggests that the residuals do not look like white noise and the forecast model can possibly be bettered.
5.7 For your retail time series (from Exercise 8 in Section 2.10):
Create a training dataset consisting of observations before 2011 using
set.seed(718)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries %>%
filter(year(Month) < 2011)
Check that your data have been split appropriately by producing the following plot.
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).
fit <- myseries_train %>%
model(SNAIVE(Turnover))
Check the residuals.
fit %>% gg_tsresiduals()
## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing non-finite values (stat_bin).
mean(augment(fit)$.innov , na.rm = TRUE)
## [1] 0.9867868
Do the residuals appear to be uncorrelated and normally distributed?
The residuals are normally distributed with a mean of 0. However, the residuals seem to be correlated. The ACF graph shows that there is significant correlation in most lag periods. Moreover, the autocorrelation changes from positive to negative at lag 10.
Produce forecasts for the test data
fc <- fit %>%
forecast(new_data = anti_join(myseries, myseries_train))
## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")
fc %>% autoplot(myseries)
The residuals are close to being normally distributed. Perhaps there is room for error. acf plot shows that there is still a lot of patterns that are not accounted for in this simple model.
Compare the accuracy of your forecasts against the actual values.
fit %>% accuracy()
## # A tibble: 1 x 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral~ Electri~ SNAIV~ Trai~ 0.987 3.05 2.18 5.27 13.0 1 1 0.755
fc %>% accuracy(myseries)
## # A tibble: 1 x 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(~ Aust~ Electri~ Test -0.508 3.65 2.99 -2.83 9.61 1.37 1.20 0.713
How sensitive are the accuracy measures to the amount of training data used? I would say accuracy measures are pretty sensitive to the amount of training data in use. By adding more data one can conclue that possibly forecast accuracy readings are affected in negative manner.