Time Series Analysis describes a set of research problems where our observations are collected at regular time intervals and where we can assume correlations among successive observations. The principal idea is to learn from these past observations any inherent structures or patterns within the data, with the objective of generating future values for the series. Time series may contain multiple seasonal cycles of different lengths. A fundamental goal for multiple seasonal (MS) processes is to allow for the seasonal terms that represent a seasonal cycle to be updated more than once during the period of the cycle. This article presents the perfomance of STL (Seasonal and Trend decomposition using Loess) with multiple seasonal periods and compares it with TBATS (Trigonometric Seasonal, Box-Cox Transformation, ARMA residuals, Trend and Seasonality).
In this case, we’re going to predict the value of 5-minutes total traffic using the hourly pattern in the same minute of previous hour. This is a Internet Traffic Data from 11 European Cities dataset. You can read data into R using the dmlist() function to fetches data from DataMarket.com.
#load the package
library(rdatamarket)
## Warning: package 'rdatamarket' was built under R version 3.5.3
library(lubridate)
library(dplyr)
library(forecast)
library(ggplot2)
traffic<- dmlist("http://bit.ly/1W1mCQ3")
After you have read the time series data into R, the next step is to store the data in a time series object in R, so that you can use Râs many functions for analysing time series data. To store the data in a time series object, we can use the ts() function in R. Sometimes the time series data set that you have may have been collected at regular intervals that were less than one year, for example, monthly or quarterly. In this case, you can specify the number of times that data was collected per hour by using the âfrequencyâ parameter in the ts() function. Because our each row representing a data within 5-minutes interval, you can set frequency=12*24.
traffic %<>%
select(datetime = DateTime, traffic = Value) %>%
mutate(datetime = ymd_hms(datetime))
ts_train<-traffic$traffic %>% ts(freq= 12*24)
To estimate the trend component and seasonal component of a seasonal time series, we can use the decompose() function in R. This function estimates the trend, seasonal, and irregular components of a time series. Now let’s inspect the seasonal
component and notice that the values in each day is the same.
plot(decompose(ts_train))
Decomposing a time series means separating it into it’s constituent components.
The first panel from top is the original, observed time series. Note that a series with multiplicative effects can often be transformed into one with additive effect through a log transformation.
The second panel plots the trend component. We see that the estimated trend component shows a pattern. This pattern in trend might be sourced from uncaptured extra seasonality from higher natural periodâ“in this case,so it can be considered as multi-seasonal data. To solve this complex seasonality, we need to convert the data into msts()
object which accept multiple frequency setting.
The third panel plots the seasonal component, with the figure being computed by taking the average for each time unit over all periods and then centering it around the mean.
The bottom-most panel the error component, which is determined by removing the trend and seasonal figure.
To deal with such series, we will use the msts class which handles multiple seasonality time series. This allows you to specify all of the frequencies that might be relevant. It is also flexible enough to handle non-integer frequencies.
msts_traffic<-traffic$traffic %>% msts( seasonal.periods = c(12*24,12*24*7))
msts_traffic %>% head( 12 * 24 *7 *4 ) %>% mstl() %>% autoplot()
Now we can see a clearer trend and could confirm the daily and weekly seasonality for the data. Before we’re going to modelling our data, we’re going to split our data become data train and data test.
msts_train <- traffic$traffic %>% msts( seasonal.periods = c(12*24,12*24*7))
msts_test <- msts_train %>% tail(12*24*3)
We will make prediction for 3 days and plot them. In modelling the log transformation, we can use lambda = 0 in stlm() setting.
stlm_model <- msts_train %>%
stlm(lambda = 0, biasadj = T) %>%
forecast(h = 12*24*3)
plot(stlm_model)
A TBATS model differs from dynamic harmonic regression in that the seasonality is allowed to change slowly over time in a TBATS model, while harmonic regression terms force the seasonal patterns to repeat periodically without changing. One drawback of TBATS models, however, is that they can be slow to estimate, especially with long time series. One advantage of the TBATS model is the seasonality is allowed to change slowly over time.
tbats_mod <- msts_train %>%
log() %>%
tbats(use.box.cox = FALSE)
tbats_model <- forecast(tbats_mod,h=12*24*3)
plot(tbats_model)
Let’s check the accuracy for each model and define MAPE (Mean Absolute Percentage Error) for evaluation of our forecast.
result<-rbind(accuracy(as.vector(stlm_model$mean) , msts_test),
accuracy(as.vector(exp(tbats_model$mean)) , msts_test))
rownames(result) <- c("stlm_model","tbats_model")
result
## ME RMSE MAE MPE MAPE ACF1
## stlm_model 291112161 926961514 715431672 1.063555 17.30040 0.9857483
## tbats_model -866315652 2087206595 1674816359 -51.447975 64.29527 0.9960421
## Theil's U
## stlm_model 4.532804
## tbats_model 21.357871
Both models have a different perfomance, the LSTM model presenting some advantages over the TBATS model. So we can then compare with the plot.
accuracyData <- data.frame(datetime= traffic$datetime %>% tail(12*24*3),
actual = as.vector(msts_test) ,
stlmForecast = as.vector(stlm_model$mean) ,
tbatsForecast = as.vector(exp(tbats_model$mean))
)
accuracyData %>%
ggplot() +
geom_line(aes(x = (traffic$datetime %>% tail(12*24*3)), y = (traffic$traffic %>% tail(12*24*3)), colour = "actual"))+
geom_line(aes(x = (traffic$datetime %>% tail(12*24*3)), y = stlm_model$mean, colour = "stlm"))+
geom_line(aes(x = (traffic$datetime %>% tail(12*24*3)), y = exp(tbats_model$mean), colour = "tbats "))+ labs(
title = "Forecast from STLM and TBATS model",
y = "Number of Traffic",
x = "Date",
colour = ""
)
The aim of this post was presented a multi-seasonal data. This series has two types of seasonality, daily and weekly. STLM and TBATS models are used for series with multi-seasonal data.The forecast from stlm() showing a better perfomance, because it follow the two seasonality very well.
De Livera, A. M., Hyndman, R. J., & Snyder, R. D. (2011). Forecasting time series with complex seasonal patterns using exponential smoothing. J American Statistical Association, 106(496), 1513â“1527. https://robjhyndman.com/publications/complex-seasonality/
Hyndman, R. J. (2019). Time series data library. Retrieved from https://datamarket.com/data/list/?q=provider:tsdl