This is a learn by building project to perform data forecasting and analysis of dataset of U.S. finished motor gasoline products supplied from The U.S. Energy Information Administration (EIA). Data set consists of weekly data from February 1991 to July 2005, analyzed by using STLM and TBATS forecasting method.
The U.S. Energy Information Administration (EIA) collects, analyzes, and disseminates independent and impartial energy information to promote sound policymaking, efficient markets, and public understanding of energy and its interaction with the economy and the environment.
The analysis will use dataset from https://robjhyndman.com/publications/complex-seasonality/.
## Warning: package 'forecast' was built under R version 3.6.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Warning: package 'tidyverse' was built under R version 3.6.2
## -- Attaching packages ------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.5
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'readr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'stringr' was built under R version 3.6.2
## Warning: package 'forcats' was built under R version 3.6.2
## -- Conflicts ---------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning: package 'lubridate' was built under R version 3.6.2
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
## Warning: package 'scales' was built under R version 3.6.2
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
First, we will read a weekly dataset from February 1991 to July 2005.
We will store dataset in a time series object by using the ts()
function.
Sometimes the time series dataset that we have may have been collected at regular intervals that were less than one year, for example, weekly, monthly or quarterly.
In this case, we will specify the number of times that data was collected per weekly basis by using the frequency parameter in the ts()
function.
Due to each rows representing a data within weekly interval, we can set the frequency = 365.25/7 or 52.18 weeks on average.
Below is a weekly autoplot of dataset from February 1991 to July 2005.
We can estimate the trend component and seasonal component of a seasonal time series by using the decompose()
function.
This function estimates the trend, seasonal, and irregular components of a time series.
We can also use msts()
function for multi seasonal time series objects, which is intended to be used for models that support multiple seasonal periods.
msts_cons <- gas%>%
msts(seasonal.periods = c(7, 7*4))
msts_cons%>%
head(7*52) %>%
mstl()%>%
autoplot()
We can see a clearer trend and could confirm the weekly and monthly seasonality for the data. Before we’re going to build model for our data, we’re going to split our data as data train and data test.
We will make prediction for 4 weeks and plot them. In modelling the log transformation, we can use lambda = 0 in stlm()
setting.
A TBATS model differs from dynamic harmonic regression in such a way that the seasonality is allowed to change slowly over time. Meanwhile, harmonic regression terms force the seasonal patterns to repeat periodically without changing.
One drawback of TBATS models is that they can be slow to estimate, especially with long time series. One advantage of TBATS model is that the seasonality is allowed to change slowly over time.
tbats_mod <- msts_train %>%
log() %>%
tbats(use.box.cox = FALSE,
use.trend = TRUE,
use.damped.trend = TRUE)
tbats_model <- forecast(tbats_mod, h = 7*4)
plot(tbats_model)
We will check the accuracy for each model and define MAPE (Mean Absolute Percentage Error) for evaluation of our forecast.
result<-rbind(accuracy(as.vector(stlm_model$mean) , msts_test),
accuracy(as.vector(exp(tbats_model$mean)) , msts_test))
rownames(result) <- c("stlm_model","tbats_model")
result
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## stlm_model 101.3071 193.3721 134.6652 1.109806 1.476876 0.198615050 1.118724
## tbats_model 268.3773 292.3979 268.3773 2.958026 2.958026 0.004214353 1.750994
The time series dataset has two types of seasonality: weekly and monthly.
The STLM model reveals lower MAPE (1.476876) than the TBATS model (2.958026), meaning that the STLM model shows a better perfomance.
De Livera, A. M., Hyndman, R. J., & Snyder, R. D. (2011). Forecasting time series with complex seasonal patterns using exponential smoothing. J American Statistical Association, 106(496), 15131527. https://robjhyndman.com/publications/complex-seasonality/