This document contains the homework problems for the Data 624 course. Link: https://otexts.com/fpp3/graphics-exercises.html
Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book.
library(fpp3)
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ tibble 3.1.8 ✔ tsibble 1.1.3
## ✔ dplyr 1.1.0 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.0 ✔ feasts 0.3.0
## ✔ lubridate 1.9.2 ✔ fable 0.3.2
## ✔ ggplot2 3.4.1 ✔ fabletools 0.3.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2
## ──
## ✔ readr 2.1.4 ✔ stringr 1.5.0
## ✔ purrr 1.0.1 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::as.difftime() masks base::as.difftime()
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks lubridate::intersect(), base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks lubridate::setdiff(), base::setdiff()
## ✖ tsibble::union() masks lubridate::union(), base::union()
5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
NAIVE():For seasonal data, a related idea is to use the corresponding season from the last year of data. naive() is simply a wrapper to rwf() for simplicity. For naïve forecasts, we simply set all forecasts to be the value of the last observation, a naïve forecast is optimal when data follow a random walk (see Section 9.1), these are also called random walk forecasts .
SNAIVE() returns forecasts and prediction intervals from an ARIMA(0,0,0)(0,1,0)m model where m is the seasonal period.
RW(y~ drift()): A variation on the naïve method is to allow the forecasts to increase or decrease over time, where the amount of change over time (called the drift) is set to be the average change seen in the historical data. This is equivalent to drawing a line between the first and last observations, and extrapolating it into the future
From below we can see a clear indication that pop. is increasing at a constant rate. The RW drift method is a forecasting method that unlike averages/naive methods does not have a constant (flat) forecast, instead the drift method can increase or decrease over time, this is why its a great method when it comes to forecasting linear trends.
global_economy %>%
filter(Country == "Australia") %>%
autoplot(Population) +
labs(title = "Population of Australia")
Australia <- global_economy %>%
filter(Country=="Australia")
auspop_fit <- Australia %>%
model(RW(Population ~ drift())) %>%
forecast(h=5)
auspop_fit %>% autoplot(Australia)
From below we can interpret as seasonality detection. The most appropriate would be Seasonal Naive Method. Seasonal naive methods: This method is like the naive method but predicts the last observed value of the same season of the year.
aus_production %>%
autoplot(Bricks)
australia_bricks <- aus_production %>%
filter(!is.na(Bricks))
australia_bricks %>%
autoplot(Bricks)
Upon application of SNAIVE, we see that there is continuation of the
trend in the variable
brick<-aus_production %>%
filter(!is.na(Bricks))
brick %>%
model(SNAIVE(Bricks ~ lag("year")))%>%
forecast(h = 10 )%>%
autoplot(brick)+
labs(title="SNAIVE Forecast ",
subtitle = "10 Year Forecast",
xlab="Year" )
### NSW Lambs (aus_livestock)
aus_livestock %>%
filter(Animal == "Lambs",
State == "New South Wales"
) %>%
autoplot(Count) +
labs(title = "The NSW Lambs")
aus_livestock %>%
filter(State == "New South Wales",
Animal == "Lambs") %>%
model(NAIVE(Count)) %>%
forecast(h = 15)
## # A fable: 15 x 6 [1M]
## # Key: Animal, State, .model [1]
## Animal State .model Month Count .mean
## <fct> <fct> <chr> <mth> <dist> <dbl>
## 1 Lambs New South Wales NAIVE(Count) 2019 Jan N(351600, 2.5e+09) 351600
## 2 Lambs New South Wales NAIVE(Count) 2019 Feb N(351600, 5.1e+09) 351600
## 3 Lambs New South Wales NAIVE(Count) 2019 Mar N(351600, 7.6e+09) 351600
## 4 Lambs New South Wales NAIVE(Count) 2019 Apr N(351600, 1e+10) 351600
## 5 Lambs New South Wales NAIVE(Count) 2019 May N(351600, 1.3e+10) 351600
## 6 Lambs New South Wales NAIVE(Count) 2019 Jun N(351600, 1.5e+10) 351600
## 7 Lambs New South Wales NAIVE(Count) 2019 Jul N(351600, 1.8e+10) 351600
## 8 Lambs New South Wales NAIVE(Count) 2019 Aug N(351600, 2e+10) 351600
## 9 Lambs New South Wales NAIVE(Count) 2019 Sep N(351600, 2.3e+10) 351600
## 10 Lambs New South Wales NAIVE(Count) 2019 Oct N(351600, 2.5e+10) 351600
## 11 Lambs New South Wales NAIVE(Count) 2019 Nov N(351600, 2.8e+10) 351600
## 12 Lambs New South Wales NAIVE(Count) 2019 Dec N(351600, 3e+10) 351600
## 13 Lambs New South Wales NAIVE(Count) 2020 Jan N(351600, 3.3e+10) 351600
## 14 Lambs New South Wales NAIVE(Count) 2020 Feb N(351600, 3.5e+10) 351600
## 15 Lambs New South Wales NAIVE(Count) 2020 Mar N(351600, 3.8e+10) 351600
Let’s start with the naive model in this case. From the plot below there doesn’t seem to be an upward or downward trend or a seasonal trend from naive’s forecast. I want to try other methods as well.
aus_livestock %>%
filter(State == "New South Wales",
Animal == "Lambs") %>%
model(NAIVE(Count)) %>%
forecast(h = 15) %>%
autoplot(aus_livestock)
train <- aus_livestock %>%
filter(State == "New South Wales",
Animal == "Lambs") %>%
filter_index("2000 Jan" ~ "2020 Jan")
# Fit the models
Lamb_fit <- train %>%
model(
Mean = MEAN(Count),
`Naïve` = NAIVE(Count),
`Seasonal naïve` = SNAIVE(Count)
)
# Generate forecasts for 14 quarters
Lamb_fc <- Lamb_fit %>% forecast(h = 20)
# Plot forecasts against actual values
Lamb_fc %>%
autoplot(train, level = NULL) +
autolayer(
filter_index(aus_livestock %>%
filter(State == "New South Wales",
Animal == "Lambs"), "2007 Q1" ~ .),
colour = "black"
) +
labs(
y = "Number of Lambs Slaughtered",
title = "Australian Lambs slaughtered from the state of New South Wales"
) +
guides(colour = guide_legend(title = "Forecast"))
Seasonal Naive is far better than other two forecast approaches.
hh_budget %>%
autoplot(Wealth) + labs(title = "Household wealth")
There is also a noteable upward trend for the last 7-8 years so we will
perform a drift model.
hh_budget %>%
model(drift = RW(Wealth ~ drift())) %>%
forecast(h = 5) %>%
autoplot(hh_budget) +
labs(title = 'Wealth')
aus_foturnover <-
aus_retail %>%
filter(stringr::str_detect(State,"Australian") &
stringr::str_detect(Industry,"takeaway food")) %>%
select(c(Month,Turnover))
aus_foturnover %>% autoplot(Turnover)
aus_tfs <- aus_retail %>%
filter(Industry == "Takeaway food services")
aus_tfs %>%
autoplot(Turnover)+
labs(y = "Retail turnover in $Million AU", title = " Australian Retail Trade Turnover")
Let’s pick one of the state/industry to identify the right technique
train= aus_retail %>%
filter(State=="Tasmania" & Industry=="Takeaway food services")
train_fit=train%>%
model(`Seasonal Naïve` = SNAIVE(Turnover),
`Naïve` = NAIVE(Turnover),
Drift= NAIVE(Turnover~drift()))
train_fit %>%
forecast(h = 10)%>%
autoplot(train)
5.2 Use the Facebook stock price (data set gafa_stock) to do the following:
Produce a time plot of the series.
unique(gafa_stock$Symbol)
## [1] "AAPL" "AMZN" "FB" "GOOG"
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
facebook_stock <- gafa_stock %>%
filter(Symbol == 'FB', year(Date) >= 2015) %>%
mutate(day = row_number()) %>%
update_tsibble(index = day, regular = TRUE) %>%
select(Date, Close)
facebook_stock %>%
autoplot(Close) +
labs(y = '$US', title = 'The Facebook Stock Price')
Produce forecasts using the drift method and plot them.
From the text book
facebook_stock_2015 <- facebook_stock %>%
filter(year(Date) == 2015) %>%
select(day, Close)
facebook_stock_fit <- facebook_stock_2015 %>%
model(Drift = RW(Close ~ drift()))
facebook_stock_2016 <- facebook_stock %>%
filter(yearmonth(Date) == yearmonth('2016 Jan')) %>%
select(day, Close)
facebook_forecast <- facebook_stock_fit %>%
forecast(new_data = facebook_stock_2016)
facebook_forecast %>%
autoplot(facebook_stock_2015, level = NULL) +
autolayer(facebook_stock_2016, Close, color = 'black') +
labs(y = '$US',
title = 'Facebook daily closing stock prices',
subtitle = 'Jan 2015 - Jan 2016'
) +
guides(color = guide_legend((title = 'Forecasts')))
Show that the forecasts are identical to extending the line drawn between the first and last observations.
facebook_stock2 <- facebook_stock %>%
filter(year(Date) == 2015)
facebook_forecast %>%
autoplot(facebook_stock2, level = NULL) +
geom_line(data = slice(facebook_stock2, range(cumsum(!is.na(Close)))),
aes(y=Close), linetype = 'dashed')
Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
facebook_fit2 <- facebook_stock_2015 %>%
model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
Drift = NAIVE(Close ~ drift())
)
# to make the forecasts for the trading days in January 2016
facebook_jan_2016 <- facebook_stock %>%
filter(yearmonth(Date) == yearmonth("2016 Jan"))
facebook_forecast2 <- facebook_fit2 %>%
forecast(new_data = facebook_jan_2016)
# Plotting
facebook_forecast2 %>%
autoplot(facebook_stock_2015, level = NULL) +
autolayer(facebook_jan_2016, Close, colour = "blue") +
labs(y = "$USD",
title = "FB dly closing stock prices",
subtitle = "(Jan 2015 - Jan 2016)") +
guides(colour = guide_legend(title = "The Forecast"))
I think that the best model for this data is the naive method as these type of data such as the price of stock follow a random walk.
5.3 Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
# Extract data of interest
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))
# Look at the residuals
fit %>% gg_tsresiduals()
head(recent_production)
## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1992 Q1 443 5777 383 1289 38332 117
## 2 1992 Q2 410 5853 404 1501 39774 151
## 3 1992 Q3 420 6416 446 1539 42246 175
## 4 1992 Q4 532 5825 420 1568 38498 129
## 5 1993 Q1 433 5724 394 1450 39460 116
## 6 1993 Q2 421 6036 462 1668 41356 149
fit %>% forecast() %>% autoplot(recent_production)
What do you conclude?
White noise is an important concept in time series forecasting. If a time series is white noise, it is a sequence of random numbers and cannot be predicted. If the series of forecast errors are not white noise, it suggests improvements could be made to the predictive model. The residuals do appear as white noise.The seasonal naive method produces forecasts for the current available data.
5.4 Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
recent_production <- global_economy %>%
filter(Country == 'Australia')
head(recent_production)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Australia AUS 1960 18573188487. NA 7.96 14.1 13.0 10276477
## 2 Australia AUS 1961 19648336880. 2.49 8.14 15.0 12.4 10483000
## 3 Australia AUS 1962 19888005376. 1.30 8.12 12.6 13.9 10742000
## 4 Australia AUS 1963 21501847911. 6.21 8.17 13.8 13.0 10950000
## 5 Australia AUS 1964 23758539590. 6.98 8.40 13.8 14.9 11167000
## 6 Australia AUS 1965 25931235301. 5.98 8.69 15.3 13.2 11388000
tail(recent_production)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Australia AUS 2012 1.54e12 3.89 105. 21.8 21.5 22742475
## 2 Australia AUS 2013 1.57e12 2.64 108. 21.3 20.0 23145901
## 3 Australia AUS 2014 1.46e12 2.56 110. 21.5 21.1 23504138
## 4 Australia AUS 2015 1.35e12 2.35 112. 21.5 20.0 23850784
## 5 Australia AUS 2016 1.21e12 2.83 113. 21.5 19.3 24210809
## 6 Australia AUS 2017 1.32e12 1.96 116. 20.6 21.3 24598933
fit <- recent_production %>% model(NAIVE(Exports))
fit %>% gg_tsresiduals()
fit %>% forecast() %>% autoplot(recent_production)
mean(augment(fit)$.innov , na.rm = TRUE)
## [1] 0.1451912
BRICKS
x_bricks <- aus_production %>%
select(Bricks)
f_bricks <- x_bricks %>% model(SNAIVE(Bricks))
f_bricks %>% gg_tsresiduals()
f_bricks %>% forecast() %>% autoplot(x_bricks)
5.7 For your retail time series (from Exercise 8 in Section 2.10):
Create a training dataset consisting of observations before 2011 using
set.seed(718)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries %>%
filter(year(Month) < 2011)
Check that your data have been split appropriately by producing the following plot.
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).
fit <- myseries_train %>%
model(SNAIVE(Turnover))
Check the residuals.
fit %>% gg_tsresiduals()
## Warning: Removed 12 rows containing missing values (`geom_line()`).
## Warning: Removed 12 rows containing missing values (`geom_point()`).
## Warning: Removed 12 rows containing non-finite values (`stat_bin()`).
mean(augment(fit)$.innov , na.rm = TRUE)
## [1] 0.9867868
Do the residuals appear to be uncorrelated and normally distributed?
It seems that the residuals on the time plot show an increasing variability from the year 2000 and on. We can also observe that there is correlation on the ACF of the residuals and the histogram is right skewed.
Produce forecasts for the test data
fc <- fit %>%
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc %>% autoplot(myseries)
The residuals are close to being normally distributed. Perhaps there is
room for error. acf plot shows that there is still a lot of patterns
that are not accounted for in this simple model.
Compare the accuracy of your forecasts against the actual values.
fit %>% accuracy()
## # A tibble: 1 × 12
## State Indus…¹ .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Australi… Electr… SNAIV… Trai… 0.987 3.05 2.18 5.27 13.0 1 1 0.755
## # … with abbreviated variable name ¹Industry
fc %>% accuracy(myseries)
## # A tibble: 1 × 12
## .model State Indus…¹ .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Aust… Electr… Test -0.508 3.65 2.99 -2.83 9.61 1.37 1.20 0.713
## # … with abbreviated variable name ¹Industry
How sensitive are the accuracy measures to the amount of training data used? I think the sensitivity of accuracy measures to the amount of training data used would depend on the model being used. As it was explained in the book, the NAIVE() method for example will “set all forecasts to be the value of the last observation”, which means that past historic data will technically have no effect. As with other methods such as the MEAN() and SNAIVE(), that base their forecasts on past data in the series, will be more sensitive to the amount of training data.