The midterm investigates three time series datasets, simulated white noise, hh_budget and aus_retail. The questions ask about the ACF, decomposition methods, forecasting, and the use of training and test datasets to measure forecast accuracy.
Simulate a white noise time series with 250 data points. Plot the time series and the ACF of the time series. Is there are trend? Is there a seasonal pattern? Are there any meaningful statistically significant correlations? Use a seed of 1234.
set.seed(1234)
df <- data.frame(
time = seq(as.Date("2022-01-01"), by = "day", length.out = 250),
value = rnorm(250))
df <- tsibble(df)
## Using `time` as index variable.
df %>% autoplot(color = "#69b3a2") +
ggtitle("White Noise Simulate")
## Plot variable not specified, automatically selected `.vars = value`
ACF(df, y = value) %>% autoplot()
The data shows there is no trend, values fluctuate around mean = 0. From ACF plot we can observe there is no autocorrelation, most the lag spikes are under blue-dash line, close to 0 which means there is no significant correlations.
Try the X11, SEATS, and STL Decomposition methods on the Household Budget data, hh_budget to estimate the tends in Wealth for the different countries in the dataset.
Which methods work? If not, why does the method fail?
hh_budget %>%
autoplot(Wealth)
# X11 Decomposition
hh_budget %>%
model(x11 = X_13ARIMA_SEATS(Wealth ~ x11())) %>%
components() %>%
autoplot() +
labs(title ="X-11 Decomposition")
## Warning: 4 errors (1 unique) encountered for x11
## [4] X-13 run failed
##
## Errors:
## - Seasonal period must be 4 or 12 if a seasonal adjustment is
## done.
##
## Notes:
## - Correct input errors in the order they are detected since the
## first one or two may be responsible for the others (especially
## if there are errors in the SERIES or COMPOSITE spec).
## Error in `transmute()`:
## ! Problem while computing `cmp = map(.fit, components)`.
## Caused by error in `UseMethod()`:
## ! no applicable method for 'components' applied to an object of class "null_mdl"
# SEATS Decomposition
hh_budget %>%
model(x11 = X_13ARIMA_SEATS(Wealth ~ seats())) %>%
components() %>%
autoplot() +
labs(title ="SEATS Decomposition")
## Warning: 4 errors (1 unique) encountered for x11
## [4] X-13 run failed
##
## Errors:
## - Need monthly or quarterly data to perform aictest for Easter.
## No seasonal adjustment this run
## - Need monthly or quarterly data to perform aictest for Easter.
## No seasonal adjustment this run
##
## Warnings:
## - Need monthly or quarterly data to perform aictest for Easter.
## No seasonal adjustment this run
##
## Notes:
## - Correct input errors in the order they are detected since the
## first one or two may be responsible for the others (especially
## if there are errors in the SERIES or COMPOSITE spec).
## Error in `transmute()`:
## ! Problem while computing `cmp = map(.fit, components)`.
## Caused by error in `UseMethod()`:
## ! no applicable method for 'components' applied to an object of class "null_mdl"
#STL Decompostion
p1 <- hh_budget %>% filter(Country == "Australia") %>%
model(STL(Wealth ~ trend() + season(window = "periodic"), robust = TRUE)) %>% components() %>%
autoplot() + ggtitle("Australia")
p2 <- hh_budget %>% filter(Country == "Canada") %>%
model(STL(Wealth ~ trend() + season(window = "periodic"), robust = TRUE)) %>% components() %>%
autoplot() + ggtitle("Canada")
p3 <- hh_budget %>% filter(Country == "Japan") %>%
model(STL(Wealth ~ trend() + season(window = "periodic"), robust = TRUE)) %>% components() %>%
autoplot() + ggtitle("Japan")
p4 <- hh_budget %>% filter(Country == "USA") %>%
model(STL(Wealth ~ trend() + season(window = "periodic"), robust = TRUE)) %>% components() %>%
autoplot() + ggtitle("USA")
p1 +p2
p3 + p4
X11, SEATS do not work due to the observation data is by year, these two model can work only with monthly or quarterly data. STL decompositon still works in this case since in STL we can choose the trend window.
Is there are seasonal component in these times series?
The data show no seasonal component, it fluctuates with no clear pattern, only increasing trends are observed in all countries.
Try different forecasting methods to forecast 12 steps into the future the Turnover in the Liquor Industry in New South Wales, Australia using the aus_retail dataset.
Try the following: MEAN, RW, TSLM, TSLM (+ season()), NAIVE, SNAIVE
Try all of the methods and determine a best method by visual inspection of forecasts for one year.
aus_retail_sw <- aus_retail %>% filter(State == "New South Wales" & str_detect(Industry, "^L"))
aus_retail_sw %>% autoplot(Turnover)
retail_fit <- aus_retail_sw %>%
model(
Mean = MEAN(Turnover),
Naive = NAIVE(Turnover),
SNaive = SNAIVE(Turnover),
Drift = RW(Turnover ~ drift()),
TSLM = TSLM(Turnover ~ trend()),
TSLM_S = TSLM(Turnover ~ trend() + season())
)
retail_fit %>% forecast(h = 12) %>% autoplot(aus_retail_sw, level = NULL) +
labs(y = "$Million AUD",
title = "Turnover in the Liquor Industry in New South Wales, Australia") +
guides(colour = guide_legend(title = "Forecast"))
retail_fit <- aus_retail_sw %>%
model(
SNaive = SNAIVE(Turnover),
TSLM_S = TSLM(Turnover ~ trend() + season())
)
retail_fit %>% forecast(h = 12) %>% autoplot(aus_retail_sw, level = NULL) +
labs(y = "$Million AUD",
title = "TSLM_S and SNAIVE") +
guides(colour = guide_legend(title = "Forecast"))
From forecast plots, Seasonal Naive and TSLM with seasonal appear to be good forecasts in this case. However, if to choose one, Seasonal Naive is visually observed to be the best method since its trend is closest to the previous years.
Now split the data into training and testing subsets of the data. Use the data until 2017 as the training data. Using the method you have selected measure its error for forecasting the testing data, which is to 2018
train <- aus_retail_sw %>%
filter(year(Month) <= 2017)
test <- aus_retail_sw %>%
filter(year(Month) > 2017)
#check if data have been split appropriately
autoplot(aus_retail_sw, Turnover) +
autolayer(train, Turnover, colour = "red") +
ggtitle("check if data have been split appropriately")
#Seasonal NAIVE
liquor_fit <- train %>%
model(SNaive = SNAIVE(Turnover))
liquor_fit %>% gg_tsresiduals()
## Warning: Removed 12 rows containing missing values (`geom_line()`).
## Warning: Removed 12 rows containing missing values (`geom_point()`).
## Warning: Removed 12 rows containing non-finite values (`stat_bin()`).
The ACF shows there are autocorrelation between lags and residuals plot follow a normal distribution. So far, SNAIVE appears to be a good choice. Next we will compute the validation.
#forecast
fc1 <- liquor_fit %>%
forecast(h = 12)
fc1 %>%
autoplot(bind_rows(train, test),
level = NULL) +
guides(colour = guide_legend(title = "Forecast")) +
ggtitle("forecasting 2018 vs actual 2018 with SNAIVE")
#Validation
fc2 <- train %>% model(TSLM_S = TSLM(Turnover ~ trend() + season())) %>% forecast(h = 12)
fc3 <- train %>% model(Drift = RW(Turnover ~ drift())) %>% forecast(h = 12)
fc4 <- train %>% model(TSLM = TSLM(Turnover ~ trend())) %>% forecast(h = 12)
ac1 <- accuracy(fc1, aus_retail_sw)
ac2 <- accuracy(fc2, aus_retail_sw)
ac3 <- accuracy(fc3, aus_retail_sw)
ac4 <- accuracy(fc4, aus_retail_sw)
rbind(ac1,ac2,ac3,ac4)
From the validation table, Seasonal NAIVE has the best performance in all accuracy measurements (RMSE, MAE, MAPE,…) we can confirm the previous visually conclusion is correct that SNAIVE is the best forecasting method in this case.