1 Background

This is a learn by building project to perform data forecasting and analysis of dataset of U.S. finished motor gasoline products supplied from The U.S. Energy Information Administration (EIA). Data set consists of weekly data from February 1991 to July 2005, analyzed by using STLM and TBATS forecasting method.

The U.S. Energy Information Administration (EIA) collects, analyzes, and disseminates independent and impartial energy information to promote sound policymaking, efficient markets, and public understanding of energy and its interaction with the economy and the environment.

https://www.eia.gov/about/

2 Source of Dataset

The analysis will use dataset from https://robjhyndman.com/publications/complex-seasonality/.

3 Initialization Library

## Warning: package 'forecast' was built under R version 3.6.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Warning: package 'tidyverse' was built under R version 3.6.2
## -- Attaching packages ------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.5
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'readr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'stringr' was built under R version 3.6.2
## Warning: package 'forcats' was built under R version 3.6.2
## -- Conflicts ---------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Warning: package 'lubridate' was built under R version 3.6.2
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
## Warning: package 'scales' was built under R version 3.6.2
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor

4 Import Data

First, we will read a weekly dataset from February 1991 to July 2005.

5 Create Object Time-Series

We will store dataset in a time series object by using the ts() function.

Sometimes the time series dataset that we have may have been collected at regular intervals that were less than one year, for example, weekly, monthly or quarterly.

In this case, we will specify the number of times that data was collected per weekly basis by using the frequency parameter in the ts() function.

Due to each rows representing a data within weekly interval, we can set the frequency = 365.25/7 or 52.18 weeks on average.

6 Plot Object Time-Series

Below is a weekly autoplot of dataset from February 1991 to July 2005.

7 Decompose Object Time-Series

We can estimate the trend component and seasonal component of a seasonal time series by using the decompose() function.

This function estimates the trend, seasonal, and irregular components of a time series.

We can also use msts() function for multi seasonal time series objects, which is intended to be used for models that support multiple seasonal periods.

8 Cross Validation

We can see a clearer trend and could confirm the weekly and monthly seasonality for the data. Before we’re going to build model for our data, we’re going to split our data as data train and data test.

9 Building Model

9.1 STLM Model

We will make prediction for 4 weeks and plot them. In modelling the log transformation, we can use lambda = 0 in stlm() setting.

9.2 TBATS Models

A TBATS model differs from dynamic harmonic regression in such a way that the seasonality is allowed to change slowly over time. Meanwhile, harmonic regression terms force the seasonal patterns to repeat periodically without changing.

One drawback of TBATS models is that they can be slow to estimate, especially with long time series. One advantage of TBATS model is that the seasonality is allowed to change slowly over time.

10 Model Evaluation

We will check the accuracy for each model and define MAPE (Mean Absolute Percentage Error) for evaluation of our forecast.

##                   ME     RMSE      MAE      MPE     MAPE        ACF1 Theil's U
## stlm_model  101.3071 193.3721 134.6652 1.109806 1.476876 0.198615050  1.118724
## tbats_model 268.3773 292.3979 268.3773 2.958026 2.958026 0.004214353  1.750994

11 Summary

The time series dataset has two types of seasonality: weekly and monthly.

The STLM model reveals lower MAPE (1.476876) than the TBATS model (2.958026), meaning that the STLM model shows a better perfomance.

12 References

De Livera, A. M., Hyndman, R. J., & Snyder, R. D. (2011). Forecasting time series with complex seasonal patterns using exponential smoothing. J American Statistical Association, 106(496), 15131527. https://robjhyndman.com/publications/complex-seasonality/