Import Libraries
library(fpp3)## Warning: package 'fpp3' was built under R version 4.1.2
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✓ tibble 3.1.6 ✓ tsibble 1.1.3
## ✓ dplyr 1.0.8 ✓ tsibbledata 0.4.1
## ✓ tidyr 1.2.0 ✓ feasts 0.3.0
## ✓ lubridate 1.8.0 ✓ fable 0.3.2
## ✓ ggplot2 3.3.5 ✓ fabletools 0.3.2
## Warning: package 'dplyr' was built under R version 4.1.2
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'tsibble' was built under R version 4.1.2
## Warning: package 'tsibbledata' was built under R version 4.1.2
## Warning: package 'feasts' was built under R version 4.1.2
## Warning: package 'fable' was built under R version 4.1.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## x lubridate::date() masks base::date()
## x dplyr::filter() masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag() masks stats::lag()
## x tsibble::setdiff() masks base::setdiff()
## x tsibble::union() masks base::union()
library(stats)Question 1
Australian Pop
Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
australia=global_economy %>%
filter(Country == "Australia")
autoplot(australia ,Population)The population of Australia possesses a clear increasing trend so it would make the most sense to use the drift method for forecasting.
australia%>%
model(RW(Population ~ drift())) %>%
forecast(h = "10 years") %>%
autoplot(global_economy) +
labs(y = "Population", title = "Population of Australia")BRICKS
aus_production %>%
autoplot(Bricks)## Warning: Removed 20 row(s) containing missing values (geom_path).
This dataset has clear seasonal data. Every Q1 has a local maximum. When we see data with seasonality lie this it is appropriate to use the SNAIVE forecasting method.
b<-aus_production %>%
filter(!is.na(Bricks))
b %>%
model(SNAIVE(Bricks ~ lag("year")))%>%
forecast(h = 10) %>%
autoplot(b)+
labs(title="SNAIVE 10 Year Forecast",
xlab="Year" )New South Wales
aus_livestock %>%
filter(State=="New South Wales" & Animal=="Lambs") %>%
autoplot(Count)Without any clear trend or seasonality in the series, one could use any of the forecasting methods. Let’s see what they look like.
lambs=aus_livestock %>%
filter(State=="New South Wales" & Animal=="Lambs")
lambs_fit=lambs%>%
model(Mean = MEAN(Count),
`Seasonal Naïve` = SNAIVE(Count),
`Naïve` = NAIVE(Count),
Drift= NAIVE(Count~drift()))
lambs_fit %>%
forecast(h =10) %>%
autoplot(lambs)Household Wealth
Household wealth has increased steadily over the years. In order to properly capture this trend we can use a drift forecasting model.
wealth_model <- hh_budget %>%
model( naive = NAIVE(Wealth))
wealth_model %>%
forecast(h =5) %>%
autoplot(hh_budget) +
labs(title="Drift Forecast",
ylab="Net Income Percentage")Australian takeaway food turnover
We can see that the states of Australia have dratically different rates of growth, but across the board we see some positive growth. For this I will use the drift model across all states, but it is possible that certain states might fare well with other forecasting methods.
aus_retail %>%
filter(Industry=="Takeaway food services") %>%
autoplot(Turnover)aus_retail %>%
filter(Industry == "Takeaway food services") %>%
model(RW(Turnover ~ drift())) %>%
forecast(h = 36) %>%
autoplot(aus_retail) +
labs(title = "Takeaway Food Orders") +
facet_wrap(~State, scales = "free")Question 2
Produce a time plot of the series.
fb <- gafa_stock %>%
filter(Symbol == "FB") %>%
mutate(day = row_number()) %>%
update_tsibble(index = day, regular = TRUE)
fb%>%
autoplot(Open) +
labs(title= "Trading Prices of FB Stock", y = "USD")Produce forecasts using the drift method and plot them.
fb %>%
model(RW(Open ~ drift())) %>%
forecast(h = 63) %>%
autoplot(fb) +
labs(title = "Daily Open Price of Facebook", y = "USD")Show that the forecasts are identical to extending the line drawn between the first and last observations.
fb %>%
model(RW(Open ~ drift())) %>%
forecast(h = 63) %>%
autoplot(fb) +
labs(title = "Daily Open Price of Facebook", y = "USD") +
geom_segment(aes(x = 1, y = fb$Open[1], xend = length(fb$Open), yend = fb$Open[length(fb$Open)]),
colour = "red")## Warning: Use of `fb$Open` is discouraged. Use `Open` instead.
## Use of `fb$Open` is discouraged. Use `Open` instead.
## Use of `fb$Open` is discouraged. Use `Open` instead.
## Use of `fb$Open` is discouraged. Use `Open` instead.
Question 3
production <- aus_production%>%
filter(year(Quarter) >= 1992)
fit <- production %>% model(SNAIVE(Beer))
fit %>%
gg_tsresiduals()## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing non-finite values (stat_bin).
# Look a some forecasts
fit %>%
forecast() %>%
autoplot(production)Box.test(production$Beer, lag = 24, type = "Ljung")##
## Box-Ljung test
##
## data: production$Beer
## X-squared = 494.29, df = 24, p-value < 2.2e-16
The P value from the Box-Ljung test are very small, which suggests that the results from the test are not white noise. Additionally, we see that the residuals are normally distributed with a mean centered on zero.
Question 4
Australia Exports
recent_exports <- global_economy %>%
filter(Year>= 1960, Country == "Australia")
fit <- recent_exports %>% model(NAIVE(Exports ))
fit %>% gg_tsresiduals()## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
Box.test(recent_exports$Exports, lag = 24, type = "Ljung")##
## Box-Ljung test
##
## data: recent_exports$Exports
## X-squared = 398.24, df = 24, p-value < 2.2e-16
fit %>%
forecast(h = 5) %>%
autoplot(recent_exports)Because of the annual trend present in the global_economy dataset it is appropriate to use a NAIVE forecasting model. Additionally we see the residuals follow a normal distribution which signals that they are distinguishable from a white noise series.
Bricks Series
bricks_series <- aus_production %>%
filter(!is.na(Bricks))
fit <- bricks_series %>%
model(SNAIVE(Bricks))
fit %>% gg_tsresiduals() +
ggtitle("Residual Plots for Australian Production of Bricks")## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing non-finite values (stat_bin).
fit %>% forecast() %>% autoplot(aus_production) +
ggtitle("Australian Production of Bricks")## Warning: Removed 20 row(s) containing missing values (geom_path).
Because of the seasonal pattern in the bricks dataset it is best to use the SNAIVE method for forecasting. Looking at the results from the autocorrelation analysis we see that the residuals are distinguishable from a white noise series.
Question 7
set.seed(666)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries,.vars=Turnover)## Create a training dataset consisting of observations before 2011 using
pre2011 <- myseries %>%
filter(year(Month) < 2011)## Check that your data have been split appropriately by producing the following plot.
autoplot(myseries, Turnover) +
autolayer(pre2011, Turnover, colour = "red")## Fit a seasonal naïve model using SNAIVE() applied to your training data
fit <- pre2011 %>%
model(SNAIVE(Turnover ~ lag(12)))## Check the residuals.
fit %>%
gg_tsresiduals()## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing non-finite values (stat_bin).
## Produce forecasts for the test data
fc <- fit %>%
forecast(new_data = anti_join(myseries, pre2011))## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")
## Produce forecasts for the test data
fc %>%
autoplot(myseries)## Compare the accuracy of your forecasts against the actual values.
fit %>%
accuracy()## Compare the accuracy of your forecasts against the actual values.
fc %>%
accuracy(myseries)