1 Background

In this markdown, we are going to perform time series analysis and forecasting using dataset Weekly U.S. Product Supplied of Finished Motor Gasoline by The US Energy Information Administration. The source of the dataset is from Rob J Hyndman.

2 Load Packages

First thing first, let’s load necessary library for our analysis.

library(tidyverse)
library(forecast)
library(lubridate)
library(plotly)
library(TSstudio)
library(tseries)

3 Read data

We read the gas data which contains the data of gasoline in unit of thousand barrels from February 1991 until July 2005 with weekly interval. Therefore we use start = 1991 + 31/365.25 to indicate that our data is starting from February 1991 with weekly interval, defined by frequency = 365.25/7 or approximately 52.18 weeks.

gas <- read.csv("https://robjhyndman.com/data/gasoline.csv")
gas_ts <- ts(gas, start = 1991 + 31/365.25, frequency = 365.25/7)
ts_info(gas_ts)
##  The gas_ts series is a ts object with 1 variable and 744 observations
##  Frequency: 52.1785714285714 
##  Start time: 1991.0848733744 
##  End time: 2005.32443531828

4 Explore the data

As a first step, It’s a good practice by creating a line plot of our time series data.

ts_plot <- 
  gas_ts %>%
  autoplot() +
  theme_minimal()

ggplotly(ts_plot) %>% 
  layout(title = "Weekly US Product Supplied of Finished Motor Gasoline",
         xaxis = list(title = "Year"),
         yaxis = list(title = "Thousand Barrels"))

As we can see from the plot:

  1. The line has an upward trend, meaning on average the gasoline supply from year to year is increasing.
  2. This is an additive time series, since the peaks and troughs of seasonality are roughly the same for each year.

5 Decompose

From the previous plot, we know the time series is additive. Hence, we will decompose the gasoline data using classical decomposition which include three parts according to the formula \(Gasoline = Trend + Seasonality + Error\):

  1. Trend: the patterns of data in general, is it upwards or downwards.

  2. Seasonality: repeating pattern with a fixed period of time.

  3. Error: remaining components that are not captured by the Trend and Seasonality.

5.1 Single Seasonality

Let’s plot the decomposition of additive time series with yearly seasonality as follows:

decompose(gas_ts) %>%
  autoplot() +
  theme_minimal()

From the decompose plot above:

  1. The trend is an upward trend

  2. There are 14 oscillating lines in the seasonal plot, indicating monthly seasonality from February 1991 to July 2005.

  3. There are still hidden seasonality in the remainder plot. In order to capture these, we have to use multiple seasonality by adding the seasonal.periods parameter.

5.2 Multiple Seasonality

Here we use the seasonal.periods of 4 and 365.25/7 which indicating monthly and yearly seasonality respectively.

gas %>% 
  msts(seasonal.periods = c(4, 365.25/7)) %>% 
  mstl() %>% 
  autoplot() +
  theme_minimal()

6 Train-test Split

Before we jump into the modeling step, we split our data into gas_train and gas_test. The test data will be the last one year of our original data which is July 2003 - July 2005

gas_train <- head(gas_ts, length(gas_ts) - 2*52)
gas_test <- tail(gas_ts, 2*52)

7 Modeling

7.1 STLM

Seasonal Trend with Loess Model time series is de-seasonalised. It decompose a time series into seasonal, trend, and irregular components using loess. Then it forecast after decomposition and re-seasonalizing using the last year of the seasonal component.

gas_stlm <- 
  gas_train %>% 
  log() %>% 
  stlm(lambda = 0) %>% 
  forecast(h = 2*52) 

plot(gas_stlm)

Let’s plot the train, test, and forecasted data in one plot.

gas_ts %>%
  autoplot(series = "Train data") +
  autolayer(gas_test, series = "Test data") +
  autolayer(exp(gas_stlm$mean), series = "Forecast") +
  scale_color_manual(values = c("firebrick", "blue", "black")) +
  labs(x = "Year",
       y = "Thousand Barrels",
       title = "Forecasted Data using STLM")

7.2 TBATS

The TBATS model is a time series model for series exhibiting multiple complex seasonalities:

  • T for trigonometric regressors to model multiple-seasonalities
  • B for Box-Cox transformations
  • A for ARMA errors
  • T for trend
  • S for seasonality

Advantage of using TBATS: the seasonality is allowed to change over time, but the drawback is slower computation.

gas_tbats <-
  gas_train %>% 
  log() %>% 
  tbats(
    use.trend = TRUE,
    use.arma.errors = TRUE,
    biasadj = TRUE,
    seasonal.periods = c(4, 365.25/7)) %>% 
  forecast(h = 2*52)

plot(gas_tbats)

Let’s plot the train, test, and forecasted data in one plot.

gas_ts %>%
  autoplot(series = "Train data") +
  autolayer(gas_test, series = "Test data") +
  autolayer(exp(gas_tbats$mean), series = "Forecast") +
  scale_color_manual(values = c("firebrick", "blue", "black")) + 
  labs(x = "Year",
       y = "Thousand Barrels",
       title = "Forecasted Data using TBATS")

8 Model Evaluation

Let’s evaluate both of the model using error value, the difference between test data and forecasted data.

eval <-
  rbind(
    accuracy(as.vector(exp(gas_stlm$mean)), gas_test),
    accuracy(as.vector(exp(gas_tbats$mean)), gas_test)
    )

rownames(eval) <- c("STLM", "TBATS")
eval
##              ME     RMSE      MAE       MPE     MAPE      ACF1 Theil's U
## STLM   59.11323 224.4460 174.3122 0.6181228 1.917658 0.1107267 0.8518699
## TBATS 258.32969 348.6979 308.6590 2.8181408 3.394018 0.3805819 1.3348008

9 Conclusion

We tried to forecast weekly supply of gasoline by using STLM and TBATS with monthly and yearly seasonality. In the end, the STLM is lower in MAPE than TBATS, but both of the method are still not capture the seasonality well. The model can be further improved by tuning various parameters in the stlm() and tbats() function, or even using other forecasting method such as ARIMA or SARIMA.