Load the following packages:
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.2.2
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ tibble 3.1.8 ✔ tsibble 1.1.3
## ✔ dplyr 1.0.10 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.2.1 ✔ feasts 0.3.0
## ✔ lubridate 1.8.0 ✔ fable 0.3.2
## ✔ ggplot2 3.3.6 ✔ fabletools 0.3.2
## Warning: package 'tsibble' was built under R version 4.2.2
## Warning: package 'tsibbledata' was built under R version 4.2.2
## Warning: package 'feasts' was built under R version 4.2.2
## Warning: package 'fabletools' was built under R version 4.2.2
## Warning: package 'fable' was built under R version 4.2.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
options(warn = - 1)
Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
Australian Population (global_economy):
p=global_economy %>%
filter(Country == "Australia")
autoplot(p,Population, col="blue") + labs(title = "Population", subtitle = "Australia")
1960-2017:
aus_pop <- global_economy %>%
filter(Country == "Australia") %>%
mutate(Population = Population/1e6) %>%
select(c(Country, Code, Year, Population))
Set training data from 1960 to 2002
train <- aus_pop %>%
filter_index("1960" ~ "2002")
Fit the models
pop_fit <- train %>%
model(
Naive = NAIVE(Population),
`Seasonal naive` = SNAIVE(Population),
`Random walk` = RW(Population ~ drift())
)
Generate forecasts for 14 years
pop_fc <- pop_fit %>% forecast(h = "14 years")
Plot
pop_fc %>%
autoplot(train, level = NULL) +
autolayer( filter_index(aus_pop, "2003" ~ "2017"), colour = "blue"
) +
labs( y = "Population (in millions)",
title = "Forecasts for annual population", subtitle = "Australia" ) +
guides(colour = guide_legend(title = "Forecast"))
Random walk with drift() appears most appropriate as the overall plot shows an increasing trend. Naive not appropriate as naive doesn’t capture trend and Seasonal Naive not appropriate as there doesn’t appear to be a seasonal nature to the dataset.
Bricks (aus_production):
1956 Q1 to 2005 Q2 (198 quarters)
aus_bricks <- aus_production %>%
select(c(Quarter, Bricks)) %>% na.omit(aus_bricks)
Set training data from 1992 to 2006
train <- aus_bricks %>%
filter_index("1956 Q1" ~ "1993 Q4")
Fit the models
brick_fit <- train %>%
model(
`Naive` = NAIVE(Bricks),
`Seasonal naive` = SNAIVE(Bricks),
`Random walk` = RW(Bricks ~ drift())
)
Generate forecasts for 14 quarters
brick_fc <- brick_fit %>% forecast(h = 46)
Plot
brick_fc %>%
autoplot(train, level = NULL) +
autolayer( filter_index(aus_bricks, "1994 Q1" ~ .), colour = "blue"
) +
labs( y = "Millions", title = "Forecasts for quarterly", subtitle = "brick production" ) + guides(colour = guide_legend(title = "Forecast"))
Analyzing the graph the Naive, Seasonal Naive or Random Walk look good.
I’ll choose Seasonal Naive, as the data set appears to have a strong
seasonal pattern. I tried Seasonal Naive with drift, but the forecasts
showed an increasing trend that didn’t follow the data, as the data
shows a more recent decreasing trend.
NSW Lambs (aus_livestock):
1972 JUL to 2018 DEC
nsw_lambs <- aus_livestock %>%
filter(State == 'New South Wales' &
Animal == 'Lambs') %>%
mutate(Count = Count/1e3) %>%
select(c(Month, Count))
Set training data from 1972 through 2006
train <- nsw_lambs %>%
filter_index("1972 JUL" ~ "2006 DEC")
Fit the models
lamb_fit <- train %>%
model(
`Naive` = NAIVE(Count),
`Seasonal naive` = SNAIVE(Count),
`Random walk` = RW(Count ~ drift())
)
Generate forecasts for 144 months
lamb_fc <- lamb_fit %>% forecast(h = 144)
Plot
lamb_fc %>%
autoplot(train, level = NULL) +
autolayer(filter_index(nsw_lambs, "2007 JAN" ~ .),colour = "blue"
) +
labs( y = "Thousands", title = "Forecasts for lambs slaughtered", subtitle = "NSW"
) + guides(colour = guide_legend(title = "Forecast"))
I selected the naive seasonal forecast for NSW lambs. The seasonal
nature of the NSW lamb data is better itself to the Seasonal Naive
forecasting method.
Household wealth (hh_budget):
Wealth as a percentage of net disposable income
hh_wealth <- hh_budget %>%
select(c(Country, Year, Wealth))
Set training data from 1995 through 2010
train <- hh_wealth %>%
filter_index("1995" ~ "2010")
Fit the models
hh_w_fit <- train %>%
model(
`Naive` = NAIVE(Wealth),
`Seasonal naive` = SNAIVE(Wealth),
`Random walk` = RW(Wealth ~ drift())
)
Generate forecasts for 6 years
hh_w_fc <- hh_w_fit %>% forecast(h = 6)
Plot
hh_w_fc %>%
autoplot(train, level = NULL) +
autolayer( filter_index(hh_wealth, "2011" ~ .), colour = "blue"
) +
labs( y = "Percentage", title = "Forecasts for wealth", subtitle = "percentage of net disposable income"
) + guides(colour = guide_legend(title = "Forecast"))
Every country shows an increasing trend in household wealth, I think
Random Walk with drift() is the best forecasting model.
Australian takeaway food turnover (aus_retail):
8 states, 1982 Apr - 2018 Dec
aus_ta_to <- aus_retail %>%
filter(Industry == "Takeaway food services") %>%
select(c(State, Month, Turnover))
Set training data from 1982 Apr through 2008 Dec
train <- aus_ta_to %>%
filter_index("1982 Apr" ~ "2008 DEC")
Fit the models
austato_fit <- train %>%
model(
`Naive` = NAIVE(Turnover),
`Seasonal naive` = SNAIVE(Turnover),
`Random walk` = RW(Turnover ~ drift())
)
Generate forecasts for 120 months
austato_fc <- austato_fit %>% forecast(h = 120)
Plot
austato_fc %>%
autoplot(train, level = NULL) +
autolayer(
filter_index(aus_ta_to, "2009 JAN" ~ .),colour = "blue"
) +
labs( y = "$Million AUD", title = "Retail turnover", subtitle = "Australian takeaway food"
) + guides(colour = guide_legend(title = "Forecast"))
By presenting each of the eight Australian states individually, a
pattern emerges where Seasonal Naive captures the seasonal nature of
retail turnover, but Random Walk with drift actually captures the
growing trend better. The random Walk with Drift is the most appropriate
forecasting method for this data set.
Use the Facebook stock price (data set gafa_stock) to do the following:
Tail shows the last day is 2018-12-31
fb_stock <- gafa_stock %>%
filter(Symbol == 'FB')
fb_stock %>% autoplot(Close, col="blue") +
labs(
y = "Price (in USD)", title = "Closing Stock Price", subtitle = "Facebook"
)
Straightforward plot using the autoplot() function.
Re-index based on trading days
fb_stock <- gafa_stock %>%
filter(Symbol == "FB") %>%
mutate(day = row_number()) %>%
update_tsibble(index = day, regular = TRUE)
Fit the models
fb_fit <- fb_stock %>%
model(
`Naive Drift` = NAIVE(Close ~ drift()),
`Random Walk` = RW(Close ~ drift())
)
Produce forecasts for next 253 days (~1 year)
fb_fc <- fb_fit %>% forecast(h = 253)
Plot
fb_fc %>%
autoplot(fb_stock, level = NULL) +
autolayer(fb_stock, Close, colour = "blue") +
labs(y = "$US", title = "Daily closing stock prices", subtitle = "Facebook"
) + guides(colour = guide_legend(title = "Forecast"))
Naive Drift and Random Walk with Drift have the same line, therefore
only one line appears on the plot above.
Plot
fb_fc %>%
autoplot(fb_stock, level = NULL) +
autolayer(fb_stock, Close, colour = "blue") +
labs(y = "$US",title = "Daily closing stock prices", subtitle = "Facebook"
) + guides(colour = guide_legend(title = "Forecast")) + geom_segment(aes(x=first(fb_stock$day), y=first(fb_stock$Close), xend=last(fb_stock$day), yend=last(fb_stock$Close)), linetype='dashed')
By drawing a straight dashed line from the first point to the endpoint
of the provided observations, the dashed line extends to the colored
lines for Naive Drift and Random Walk with Drift.
Different benchmark
fb_fit_2 <- fb_stock %>%
model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
`Seasonal naive` = SNAIVE(Close),
`Random Walk` = RW(Close)
)
Produce forecasts for next 253 days
fb_fc_2 <- fb_fit_2 %>% forecast(h = 253)
Plot
fb_fc_2 %>%
autoplot(fb_stock, level = NULL) +
autolayer(fb_stock, Close, colour = "blue") +
labs(y = "$US", title = "Daily closing stock prices", subtitle = "Facebook") + guides(colour = guide_legend(title = "Forecast"))
For the forecast use MEAN, NAIVE, SNAIVE, and RW without drift. I did not find any of the forecasts to hit the visual trend well. No seasonal pattern was detected. I think the best option is to forecast the stock price based on previous observations.
Apply a seasonal naive method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help .
# Extract data of interest
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))
# Look at the residuals
fit %>% gg_tsresiduals()
Look at some forecasts
fit %>% forecast() %>% autoplot(recent_production)
What do you conclude?
Analyzing the graphs I can see that the graph of innovation residuals shows a mean around zero along with a constant variance around the mean.
The ACF plot shows that there is a delay with a significant correlation in the residual series.
The histogram of the residuals is almost a normal distribution, it has no outliers.
In conclusion the naive seasonal generates forecasts that represent almost all the available information.
Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
Australian Exports, 1960-2017
aus_exports <- global_economy %>%
filter(Country == 'Australia')
Define and estimate a model
fit <- aus_exports %>% model(NAIVE(Exports ~ drift()))
Look at the residuals
fit %>% gg_tsresiduals()
Look at some forecasts
fit %>% forecast() %>% autoplot(aus_exports)
The innovation residuals plot is Mean near zero, near constant variance
white noise. ACF is No significant correlation. The Histogram is normal
distribution. NAIVE is Selected over SNAIVE as no seasonal pattern
appears.
Naive is the better option. The residuals do show the model accounts for most of the available data.
Bricks
1956 Q1 to 2005 Q2 (198 quarters)
aus_bricks <- aus_production %>%
select(c(Quarter, Bricks)) %>% na.omit(aus_bricks)
Define and estimate a model
fit <- aus_bricks %>% model(SNAIVE(Bricks))
Look at the residuals
fit %>% gg_tsresiduals()
Look at some forecasts
fit %>% forecast() %>% autoplot(aus_bricks)
The residual plot shows that the model does not take into account most of the available data. The histogram is not a normal distribution. The residual innovation is not white noise. The ACF is a Clear correlation present.
I selected seasonal naive because I only had two options, and the NAIVE method indicated a clear quarterly correlation in the ACF, so I selected SNAIVE.
Analyzing the forecast chart, I think the forecasts follow the visual seasonal pattern of the data. The seasonal naivety is better, but not the best model for this data set.
For your retail time series (from Exercise 8 in Section 2.10):
retail time series
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries %>%
filter(year(Month) < 2011)
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
lambda <- 0.24
fit <- myseries_train %>%
model(SNAIVE(box_cox(Turnover, lambda) ~ drift()))
fit_3 <- myseries_train %>%
model(
`SNAIVE` = SNAIVE(Turnover),
`SNAIVE with drift` = SNAIVE(Turnover ~ drift()),
`SNAIVE with drift and Box-Cox` = SNAIVE(box_cox(Turnover, lambda) ~ drift()))
I applied the Box-Cox transformation with lambda 0.24 along with drift to the seasonal naive model to the retail time series. I created a model using the basic seasonal naive method, the seasonal naive drift method, and finally the seasonal naive drift method and the Box-Cox transformation.
fit %>% gg_tsresiduals()
Do the residuals appear to be uncorrelated and normally distributed?
No, The residuals show correlation in the ACF plot. The innovation residual plot shows nearly constant variance. The histogram plot is normal form.
fc <- fit_3 %>%
forecast(new_data = anti_join(myseries, myseries_train))
fc %>% autoplot(myseries)
For the Box-Cox transform forecast, the forecast certainly follows the
path of the time series. The actual values appear with an 80% confidence
interval. We can conclude is a good forecast.
The basic seasonal naive forecast simply forecasts a seasonal pattern without taking trend into account.
The drift forecast provides the seasonal pattern along with the increasing trend.
fit_3 %>% accuracy()
## # A tibble: 3 × 12
## State Indus…¹ .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Western A… Takeaw… SNAIVE Trai… 4.12e+ 0 8.16 6.19 7.76 11.4 1 1
## 2 Western A… Takeaw… SNAIV… Trai… 4.34e-16 7.04 5.34 -3.18 10.5 0.864 0.863
## 3 Western A… Takeaw… SNAIV… Trai… -2.11e- 1 7.04 5.26 -0.498 9.51 0.851 0.863
## # … with 1 more variable: ACF1 <dbl>, and abbreviated variable name ¹Industry
According to the comparison of the models in the fitted values, the naive seasonal draft model outperforms the other two with a MAE of 9.68, MAPE of 7.65, RMSE of 13.2, and MASE of 0.840. The second best performing model in the fitted values is the seasonal naive model with drift and the Box-Cox transformation.
fc %>% accuracy(myseries)
## # A tibble: 3 × 12
## .model State Indus…¹ .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE West… Takeaw… Test 23.8 29.3 24.0 15.1 15.2 3.87 3.59 0.887
## 2 SNAIVE … West… Takeaw… Test 5.29 11.3 9.41 3.06 6.14 1.52 1.39 0.739
## 3 SNAIVE … West… Takeaw… Test -21.8 25.7 22.6 -14.2 14.9 3.66 3.15 0.775
## # … with abbreviated variable name ¹Industry
For evaluating the models on the forecast data, the seasonal naive model with drift and the Box-Cox transformation clearly outperforms the other two. The model with Box-Cox transformation shows results of a MAE of 21.8, MAPE of 7.17, RMSE of 27.6, and MASE of 1.89. The MAPE of 7.17 is the best score for this metric across all six combinations.
The most accurate model is the naive seasonal model with Box-Cox transformation.
Accuracy measures depend on the amount of training data used. The amount of training data is directly involved in calculating the model. The calculation of the model has a direct impact on the fitted values and the point forecasts.
According to the text, a test set should use about 20% of the total sample, which implies that about 80% of the training data should be used.
A model with too little training data may perform poorly on the test set.