Stat. 674 Midterm

Author

Yogesh Gupta

Instructions:

The midterm investigates three time series datasets, simulated white noise, hh_budget and aus_retail. The questions ask about the ACF, decomposition methods, forecasting, and the use of training and test datasets to measure forecast accuracy.

library(tsibble)


Attaching package: 'tsibble'

The following objects are masked from 'package:base':

    intersect, setdiff, union

library(patchwork)
library(fpp3)

── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──

✔ tibble      3.1.8     ✔ tsibbledata 0.4.1
✔ dplyr       1.1.0     ✔ feasts      0.3.0
✔ tidyr       1.3.0     ✔ fable       0.3.2
✔ lubridate   1.9.2     ✔ fabletools  0.3.2
✔ ggplot2     3.4.1

── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()     masks base::date()
✖ dplyr::filter()       masks stats::filter()
✖ tsibble::intersect()  masks base::intersect()
✖ lubridate::interval() masks tsibble::interval()
✖ dplyr::lag()          masks stats::lag()
✖ tsibble::setdiff()    masks base::setdiff()
✖ tsibble::union()      masks base::union()

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.0     ✔ readr   2.1.4
✔ purrr   1.0.1     ✔ stringr 1.5.0

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()       masks stats::filter()
✖ lubridate::interval() masks tsibble::interval()
✖ dplyr::lag()          masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

library(ggplot2)

Question 1

Simulate a white noise time series with 250 data points. Plot the time series and the ACF of the time series. Is there are trend? Is there a seasonal pattern? Are there any meaningful statistically significant correlations?

Use a seed of 1234.

Answer

The White Noise time series’ narrative lacks any discernible seasonal rhythms or trends. Moreover, no statistically significant relationships are visible in this ACF plot. In this randomly produced series, there are no discernible statistically significant associations.

Provide your code here.

set.seed(1234)
wn <- rnorm(250, mean = 0, sd = 1)
plot(wn, type = "l", xlab = "Time", ylab = "White Noise Time Series")

acf(wn)

Question 2

Try the X11, SEATS, and STL Decomposition methods on the Household Budget data, hh_budget to estimate the tends in Wealth for the different countries in the dataset.

Which methods work? If not, why does the method fail?
Is there are seasonal component in these times series?

Answer

X11, SEATS do not work as the observation is by year.they can work only with monthly or quarterly data.STL will work and we can choose the tend window.

Provide your code here.

Hint: Plot the time series, set up the model, etc.

hh_budget %>% 
  autoplot (Wealth)

#X11 Decomposition
#hh_budget %>%
#model(x11 = X_13ARIMA_SEATS(Wealth ~ ×11())) %>%
#components() %>% 
 # autoplot() +
#labs (title ="X-11 Decomposition")

# SEATS Decomposition
#hh_budget %>%
#model (x11 = X_13ARIMA_SEATS (Wealth ~ seats ())) %>%
#components() %>%
#autoplot() +
 # labs (title="SEATS Decomposition")

#STL Decompostion
p1 <- hh_budget %>% filter(Country == "Australia") %>%
    model(STL(Wealth ~ trend() + season(window = "periodic"),robust = TRUE)) %>% components() %>%
    autoplot() + ggtitle("Australia")
p2 <- hh_budget %>% filter(Country == "Canada") %>%
    model(STL(Wealth ~ trend() + season(window = "periodic"), robust = TRUE)) %>% components() %>%
    autoplot() + ggtitle("Canada")
p3 <- hh_budget %>% filter(Country == "Japan") %>%
    model(STL(Wealth ~ trend() + season(window = "periodic"), robust = TRUE)) %>% components() %>%
    autoplot() + ggtitle("Japan")
p4 <- hh_budget %>% filter(Country == "USA") %>%
    model(STL(Wealth ~ trend() + season(window = "periodic"), robust = TRUE)) %>% components() %>%
    autoplot() + ggtitle("USA")
p1+p2

p3+p4

There is no seasonal component or clear pattern. It fluctuates.In all the countries increasing trend is observed.

Question 3

Try different forecasting methods to forecast 12 steps into the future the Turnover in the Liquor Industry in New South Wales, Australia using the aus_retail dataset.

Try the following: MEAN, RW, TSLM(Turnover ~ trend(), TSLM(Turnover ~ trend() + season(), NAIVE, SNAIVE

Try all of the methods and determine a best method by visual inspection of forecasts for one year.
Now split the data into training and testing subsets of the data. Use the data until Jan 2019 as the training data. Using the method you have selected measure its error for forecasting the testing data, which is 2020 data.

Provide your code here.

aus_retail_sw <- aus_retail %>% 
  filter(State == "New South Wales" & str_detect(Industry, "^L"))
aus_retail_sw %>% 
  autoplot(Turnover)

retail_fit <- aus_retail_sw %>%
    model(
        Mean = MEAN(Turnover),
        Naive = NAIVE(Turnover),
        SNaive = SNAIVE(Turnover),
        Drift = RW(Turnover ~ drift()),
        TSLM = TSLM(Turnover ~ trend()),
        TSLM_S = TSLM(Turnover ~ trend() + season())
    )
retail_fit %>% forecast(h = 12) %>% autoplot(aus_retail_sw, level = NULL) +
    labs(y = "$Million AUD",
        title = "Turnover in the Liquor Industry in New South Walves, Australia") +
guides(colour = guide_legend(title = "Forecast"))

retail_fit <- aus_retail_sw %>%
    model(
        SNaive = SNAIVE(Turnover),
        TSLM_S = TSLM(Turnover ~ trend() + season())
    )
retail_fit %>% forecast(h = 12) %>% autoplot(aus_retail_sw, level = NULL) +
    labs(y = "$Million AUD",
        title = "TSLM_S and SNAIVE") +
    guides(colour = guide_legend(title = "Forecast"))

Seasonal Naive and TSLM with seasonal forecasts appear to be accurate in this situation based on forecast graphs. But, if forced to pick just one, seasonal Ivaive is evidently the greatest approach because of how closely its trajectory resembles earlier seasons.

Splitting the data

train <- aus_retail_sw %>%
    filter(year(Month) <= 2017)
test <- aus_retail_sw %>%
    filter(year(Month) <= 2017)
#check if data have been split appropriately
autoplot(aus_retail_sw, Turnover) +
autolayer(train, Turnover, colour = "blue") +
    ggtitle("check if data have been split appropriately")

#Seasonal Naive

liquor_fit <- train %>%
    model(SNaive = SNAIVE(Turnover))
liquor_fit %>% gg_tsresiduals()

Warning: Removed 12 rows containing missing values (`geom_line()`).

Warning: Removed 12 rows containing missing values (`geom_point()`).

Warning: Removed 12 rows containing non-finite values (`stat_bin()`).

The residuals plot in the ACH indicates a normal distribution and there is autocorrelation between the lattice parameters. SNAIVE is a good decision right now.

#forecast
fc1 <- liquor_fit %>%
    forecast(h = 12)
#fc1 %>%
#   autoplot(bind_rows(train, test),
#       level = NULL) +
#   guides(colour = guide_legend(title = "Forecast")) +
#   ggtitle("forecasting 2018 vs actual 2018 with SNAIVE")

# Validation

fc2 <- train %>% model(TSLM_S = TSLM(Turnover ~ trend() + season())) %>% forecast(h= 12)
fc3 <- train %>% model(Drift = RW(Turnover ~ drift())) %>% forecast(h = 12)
fc4 <- train %>% model(TSLM = TSLM(Turnover ~ trend())) %>% forecast(h = 12)

ac1 <- accuracy(fc1, aus_retail_sw)
ac2 <- accuracy(fc2, aus_retail_sw)
ac3 <- accuracy(fc3, aus_retail_sw)
ac4 <- accuracy(fc4, aus_retail_sw)
rbind(ac1,ac2,ac3,ac4)

# A tibble: 4 × 12
  .model State Indus…¹ .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
  <chr>  <chr> <chr>   <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
1 SNaive New … Liquor… Test     9.82  17.8  14.6   3.57  4.73  1.55  1.40 0.414 
2 TSLM_S New … Liquor… Test    22.2   34.1  22.2   6.36  6.36  2.36  2.67 0.0212
3 Drift  New … Liquor… Test  -188.   194.  188.  -63.8  63.8  20.0  15.2  0.0923
4 TSLM   New … Liquor… Test    21.6   53.4  25.7   5.29  6.76  2.73  4.18 0.101 
# … with abbreviated variable name ¹Industry

We can corroborate the prior visually drawn conclusion that SNAIVE is the best forecasting method in this instance by looking at the validation table, which shows that Seasonal NAIVE has the best performance in all accuracy measurements (RIMSE, MAL, MAPE, etc.).