Time Series Decomposition

1 Introduction

In describing time series, we identify three parts:

Trend

A trend is a long-term increase or decrease in the data. It does not have to be linear. We refer to a trend as changing direction when it changes from a general increase to a decrease.

Season

A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week.

Seasonality is always of a fixed and known frequency.

Cycle

A cycle occurs when the data shows rises and falls that are not of a fixed frequency.

These fluctuations are usually due to economic conditions, and are often related to a business cycle. The duration of these fluctuations is usually at least 2 years.

Many people confuse cyclic behaviour with seasonal behaviour, but they are really quite different.

  • If the fluctuations are not of a fixed frequency then they are cyclic.
  • If the frequency is unchanging and associated with some aspect of the calendar, then the pattern is seasonal.
  • In general, the average length of cycles is longer than the length of a seasonal pattern, and the magnitudes of cycles tend to be more variable than the magnitudes of seasonal patterns.

Many time series include trend, cycles and seasonality.

When choosing a forecasting method, we will first need to identify the time series patterns in the data, and then choose a method that is able to capture the patterns properly.

In this reader, we consider common methods for extracting:

  • trend,
  • cycles, and
  • seasonality

from a time series.

2 Additive and Multiplicative Time Series

If we assume an additive decomposition, then we can write

yt=St + Tt + Rt

where:

  • yt is the data
  • St is the seasonal component
  • Tt is the trend-cycle component, and
  • Rt is the remainder component

all at period t.

Alternatively, a multiplicative decomposition would be written as

yt=St * Tt * Rt

The additive decomposition is the most appropriate if the magnitude of the seasonal fluctuations, or the variation around the trend-cycle, does not vary with the level of the time series.

When the variation in the seasonal pattern, or the variation around the trend-cycle, appears to be proportional to the level of the time series, then a multiplicative decomposition is more appropriate.

Multiplicative decompositions are common with economic time series.

Advanced

An alternative to using a multiplicative decomposition is to first transform the data until the variation in the series appears to be stable over time, then use an additive decomposition.

When a log transformation has been used, this is equivalent to using a multiplicative decomposition because:

yt = St * Tt * Rt

is equivalent to

logyt = logSt + logTt + logRt

End Advanced

3 Estimating Trend & Cycle: Moving Averages

The first step in a classical decomposition is to use a moving average method to estimate the trend-cycle.

We read data from testdata_short_12.dta. The data set contains quarterly data. The series additive contains data generated from an additive model. The series multiplicative follow a multiplicative pattern.

We start with plotting the additive series.

library(fpp2)
library(foreign)
testdata <- read.dta("testdata_short_12.dta")
head(testdata)
as <- ts(testdata$additive, start=2010,frequency=4) 
autoplot(as) + xlab("Year") + ylab("A-Series") + ggtitle("Time Series")

For filtering out seasonal and random fluctuations from the combined trend-cycle, we can use a moving average of an appropriate order.

The data show a (linear) trend, but no clear cycle.

Since we don’t have a clear cycle, we can stick to de-seasonalizing the data.

As an analogy: nowadays the authorities publish the weekly number of new COVID-19 infections. This makes sense, since some days of the week are systematically lower than other days of the week (people don’t want to take a test in weekends; reports on some days are delayed; and so on). A weekly report filters out these daily effects.

For quarterly data, and no apparent cycle, we can use a moving average of order 4.

ma(as, 4)
##          Qtr1     Qtr2     Qtr3     Qtr4
## 2010       NA       NA 102.7699 104.2090
## 2011 106.0815 107.6997 108.5828 109.6743
## 2012 110.2265 110.6738 111.9317 112.9012
## 2013 113.7112 115.3706 116.8095 117.1785
## 2014 117.1883 117.1983 117.3639 118.2013
## 2015 119.9565 121.7170 122.5453 123.3809
## 2016 124.8600 125.9250 127.2462 128.4098
## 2017 128.9765 129.8423 130.6656 132.0307
## 2018 133.2946 134.0168 135.0112 135.6217
## 2019 135.6437 136.3058       NA       NA

Note that we don’t have MAs for the first and last two quarters, since we don’t have the data to compute them!

autoplot(as, series="Data") +
  autolayer(ma(as,4), series="4-MA") +
  xlab("Time") + ylab("Series") +
  ggtitle("Series & Moving Average") +
  scale_colour_manual(values=c("Data"="grey50","4-MA"="red"),
                      breaks=c("Data","4-MA"))

In case of business cycles, we can try MAs of larger order. In general, the larger the order, the smoother the curve. And unfortunately, the more missing MA values - which is especially problematic if we have short series to start with.

MAs of odd order (3; 5; 7; …) are automatically centered.

In our example, the 5-MA for t=3, is a simple average:

5-MA = (T1 + T2 + T3 + T4 + T5) / 5

But if we have an even number, then centering the MA is not straightforward:

4-MA = (T1 + T2 + T3 + T4) / 4, is the MA of the point in time just in between 2 and 3!

A way-out is to compute two consecutive 4-MAs, and to take the average of those for the MA of t=3.

4-MA [t=2,3] = (T1 + T2 + T3 + T4) / 4

4-MA [t=3,4] = ( T2 + T3 + T4 + T5) / 4

4-MA [t=3] = (4-MA [t=2,3] + 4-MA [t=3,4]) / 2

Then we have an MA centered at t=3. But in the process, the average now is a weighted average. T1 and T5 have only half the weight of T2 to T4.

Again, as an analogy, we can for each day compute an MA (7-MA). The daily MA on, say, Wednesday, is just the average of the preceeding Sunday, Monday and Tuesday; the Wednesday itself, and the Thursday, Friday and Saturday that follow. In this (simple) average, all days are presented, i.e., we have effectively filtered out the effect of any specific day!

If we have quarterly data, as in our example, the 4-MA for, say, the second quarter is computed as the values for quarter 1, 2 and 3, and half of quarter 4 of the preceeding year and half of this year’s quarter 4. We then have a quarterly MA centered in Q2, with the effects of specific quarters filtered out. But it’s a weighted average!

By default, the ma() function centers the data, which is good. You can detect that from the following commands. So, we don’t have to worry!

head(ma(as, 4, centre=F),8)
##          Qtr1     Qtr2     Qtr3     Qtr4
## 2010       NA 102.0100 103.5299 104.8881
## 2011 107.2748 108.1245 109.0410 110.3075
head(ma(as, 4, centre=T),8)
##          Qtr1     Qtr2     Qtr3     Qtr4
## 2010       NA       NA 102.7699 104.2090
## 2011 106.0815 107.6997 108.5828 109.6743
head(ma(as, 4),8)
##          Qtr1     Qtr2     Qtr3     Qtr4
## 2010       NA       NA 102.7699 104.2090
## 2011 106.0815 107.6997 108.5828 109.6743

Normally, we filter out seasonal and random fluctuations, in order to estimate the trend-cycle component.

4 Classical Decomposition

The classical decomposition method dates back to the 1920s.

It is a relatively simple procedure, and is the starting point for most other methods of time series decomposition.

There are two forms of classical decomposition:

  • Additive decomposition
  • Multiplicative decomposition.

These are described below for a time series with seasonal period m (e.g., m=4 for quarterly data, m=12 for monthly data, and m=7 for daily data with a weekly pattern).

In classical decomposition, we assume that the seasonal component is constant from year to year.

Additive decomposition

Step 1

If m is an even number, compute the trend-cycle component using double smoothing.

If m is an odd number, compute the trend-cycle component (TC) using simple smoothing.

Step 2

Calculate the detrended series: yt - TCt

Step 3

To estimate the seasonal component for each season, average the detrended values for that season.

For example, with monthly data, the seasonal component for March is the average of all the detrended March values in the data.

These seasonal component values are then adjusted to ensure that they add to zero.

Step 4

The remainder component is calculated by subtracting the estimated seasonal and trend-cycle components.

Multiplicative Decomposition

A classical multiplicative decomposition is similar, except that the subtractions are replaced by divisions.

Applied to our example, additive decomposition of gives:

as_dc <- decompose(as, type="additive")
autoplot(as_dc) + xlab("Time") +
  ggtitle("Additive Decomposition")

The figure shows the data, and its composition into a trend, a (fixed) seasonal patterns, and the remainder (or error).

We can also display the decomposition:

as_dc
## $x
##           Qtr1      Qtr2      Qtr3      Qtr4
## 2010  90.57998 122.41821 131.12149  63.92031
## 2011  96.65942 127.85110 140.66855  67.31885
## 2012 100.32567 132.91687 140.02065  71.54516
## 2013 106.16291 134.83517 144.58228  80.25904
## 2014 108.96011 134.99005 144.50587  80.41535
## 2015 110.12840 140.52097 153.01678  85.98853
## 2016 111.18143 146.15276 159.21785  88.30746
## 2017 119.43221 147.21069 162.69342  91.75832
## 2018 122.56803 154.99507 165.02081  95.20829
## 2019 127.07301 155.37416 164.81825 100.70761
## 
## $seasonal
##            Qtr1       Qtr2       Qtr3       Qtr4
## 2010  -9.707668  19.575125  29.777800 -39.645258
## 2011  -9.707668  19.575125  29.777800 -39.645258
## 2012  -9.707668  19.575125  29.777800 -39.645258
## 2013  -9.707668  19.575125  29.777800 -39.645258
## 2014  -9.707668  19.575125  29.777800 -39.645258
## 2015  -9.707668  19.575125  29.777800 -39.645258
## 2016  -9.707668  19.575125  29.777800 -39.645258
## 2017  -9.707668  19.575125  29.777800 -39.645258
## 2018  -9.707668  19.575125  29.777800 -39.645258
## 2019  -9.707668  19.575125  29.777800 -39.645258
## 
## $trend
##          Qtr1     Qtr2     Qtr3     Qtr4
## 2010       NA       NA 102.7699 104.2090
## 2011 106.0815 107.6997 108.5828 109.6743
## 2012 110.2265 110.6738 111.9317 112.9012
## 2013 113.7112 115.3706 116.8095 117.1785
## 2014 117.1883 117.1983 117.3639 118.2013
## 2015 119.9565 121.7170 122.5453 123.3809
## 2016 124.8600 125.9250 127.2462 128.4098
## 2017 128.9765 129.8423 130.6656 132.0307
## 2018 133.2946 134.0168 135.0112 135.6217
## 2019 135.6437 136.3058       NA       NA
## 
## $random
##            Qtr1       Qtr2       Qtr3       Qtr4
## 2010         NA         NA -1.4262370 -0.6434033
## 2011  0.2856225  0.5763175  2.3079874 -2.7101577
## 2012 -0.1931592  2.6679480 -1.6888960 -1.7107680
## 2013  2.1594028 -0.1105659 -2.0050257  2.7257878
## 2014  1.4794616 -1.7833832 -2.6358098  1.8593251
## 2015 -0.1204415 -0.7711828  0.6936857  2.2528817
## 2016 -3.9709054  0.6526224  2.1938287 -0.4570945
## 2017  0.1633758 -2.2067345  2.2499811 -0.6270831
## 2018 -1.0189362  1.4031403  0.2318384 -0.7681353
## 2019  1.1369323 -0.5068090         NA         NA
## 
## $figure
## [1]  -9.707668  19.575125  29.777800 -39.645258
## 
## $type
## [1] "additive"
## 
## attr(,"class")
## [1] "decomposed.ts"

The decomposition tells us that observations in Q1 are, on average, 9.7 lower, while observations in Q3 are 29.8 above average. We can, in reverse, use these figures to obtain seasonally adjusted data: an observation in Q3 has to be lowered by 29.8 to obtain a figure that can be compared to seasonally adjusted data in other quarters.

Let’s have a look at the other time series, labeled multiplicative.

ms <- ts(testdata$multiplicative, start=2010,frequency=4) 
autoplot(ms) + ggtitle("M-Series")+ xlab("Year") + ylab("M-Series")

ggseasonplot(ms, year.labels=TRUE, year.labels.left=TRUE) +
  ylab("M-Series") + ggtitle("M-Series Seasonplot")

ggsubseriesplot(ms) + ylab("M-Series") + ggtitle("Subseries")

The series have a less clear structure. From a first glance, it seems that apart from an upward trend, the seasonal variations are also somewhat larger, over time. This would justify a multiplicative model.

Let’s fit both models.

fita <- decompose(ms, type="additive") 
autoplot(fita) + xlab("Year") + ggtitle("Additive Decomposition")

fitm <- decompose(ms, type="multiplicative") 
autoplot(fitm) + xlab("Year") + ggtitle("Multiplicative Decomposition")

We are already familiar with the contents of fitted additive models.

Let’s display some of the seasonal components of the multiplicative model.

head(fitm$seasonal, 8)
##           Qtr1      Qtr2      Qtr3      Qtr4
## 2010 0.9233097 1.1878845 1.2876078 0.6011980
## 2011 0.9233097 1.1878845 1.2876078 0.6011980

The factors displayed, can be used to seasonally adjust the data, by division (rather than subtraction)!

It’s not quite clear which model is better.

What we can do, is look at the error terms: the differences between the actual observations, and what is predicted from the trend/cycle and seasonal patterns.

We combine the error terms in a data frame (predict), for convenience.

errorA <- fita$x - fita$trend - fita$seasonal
errorM <- fitm$x - fitm$trend * fitm$seasonal
predict <- as.data.frame(cbind(fita$x, fitm$x,errorA,errorM))

We can use descriptive statistics from the psych package.

# install.packages("psych")
library(psych)
psych::describe(predict[,3:4])

From the overview we see that:

  • The standard deviation of errorM (the error term in the multiplicative model) is smaller than the one in the additive model.
  • The maximum size of the errors, and the range of errors, is also lower in the multiplicative model.

That would give the multiplicative model an edge over the additive one.

Another check would be a test of normally distributed error terms.

shapiro.test(predict[,3])
## 
##  Shapiro-Wilk normality test
## 
## data:  predict[, 3]
## W = 0.94237, p-value = 0.06024
shapiro.test(predict[,4])
## 
##  Shapiro-Wilk normality test
## 
## data:  predict[, 4]
## W = 0.94794, p-value = 0.09009

In both cases, there is no real reason for concern. The hypothesis that the error terms are normally distributed, cannot be rejected (as the probability of the test statistics exceeds 0.05, or 5%).

Classical decomposition is often used, and a powerful tool to reveal seasonal factors - which is useful information by itself. However, when used for forecasting, there are better methods.

Some of the problems with classical decomposition are summarized below.

  • The estimate of the trend-cycle is unavailable for the first few and last few observations. For example, if m=12, there is no trend-cycle estimate for the first six or the last six observations.
  • Classical decomposition methods assume that the seasonal component repeats from year to year. For many series, this is a reasonable assumption, but for some longer series it is not.
  • Occasionally, the values of the time series in a small number of periods may be particularly unusual. The classical method is not robust to these kinds of unusual values.

5 STL

Alternative, more flexible alternatives for classical decomposition are X11, SEATS and STL. All of them address some of the shortcomings of classical decomposition pointed out above.

We will only discuss STL.

STL is a versatile and robust method for decomposing time series. STL is an acronym for Seasonal and Trend decomposition using Loess.

Loess is a method for estimating nonlinear relationships.

STL has several advantages over the classical, SEATS and X11 decomposition methods:

  • Unlike SEATS and X11, STL will handle any type of seasonality, not only monthly and quarterly data.
  • The seasonal component is allowed to change over time, and the rate of change can be controlled by the user.
  • The smoothness of the trend-cycle can also be controlled by the user.
  • It is more robust to outliers.

On the other hand, STL has some disadvantages. In particular, it only provides facilities for additive decompositions - although multiplicative specifications can be obtained by transformations of the data.

Below, we use mstl() function which determines the best parameters.

We the compute the error term, using the remainder() function, and assign it to errorSTL.

In a data frame predict2, we combine the error terms of the additive, multiplicative and STL models, for comparison.

# STL
fitstl <- mstl(ms) 
autoplot(fitstl)

errorSTL <- as.data.frame(remainder(fitstl))
predict2 <- as.data.frame(cbind(fita$x, errorA,errorM,errorSTL))

psych::describe(predict2)
# Normality check
shapiro.test(predict2[,2])
## 
##  Shapiro-Wilk normality test
## 
## data:  predict2[, 2]
## W = 0.94237, p-value = 0.06024
shapiro.test(predict2[,3])
## 
##  Shapiro-Wilk normality test
## 
## data:  predict2[, 3]
## W = 0.94794, p-value = 0.09009
shapiro.test(predict2[,4])
## 
##  Shapiro-Wilk normality test
## 
## data:  predict2[, 4]
## W = 0.95279, p-value = 0.09459

We can use the STL model for forecasts.

This is done in 2 steps.

In the first step, the seasonally adjusted data are forecasted, using one out of many methods. The naive method simply takes the last value as the forecast for the next periods.

In the second step, these forecasts are re-seasonalized.

fitstl_fc <- forecast(fitstl, method="naive") 
autoplot(fitstl_fc) + ylab("Forecast")

A short-cut for the same is to use the stlf() function. Again, we have to specify the preferred method to forecast the seasonally adjusted series (i.e., the trend/cycle component.)

fcast1 <- stlf(ms, method='naive')
fcast1
##         Point Forecast     Lo 80    Hi 80      Lo 95    Hi 95
## 2020 Q1       120.4254 102.03187 138.8190  92.294903 148.5560
## 2020 Q2       156.7887 130.77625 182.8011 117.006098 196.5713
## 2020 Q3       164.7991 132.94048 196.6577 116.075553 213.5226
## 2020 Q4        77.6910  40.90386 114.4781  21.429919 133.9521
## 2021 Q1       120.4254  79.29617 161.5547  57.523642 183.3272
## 2021 Q2       156.7887 111.73382 201.8436  87.883219 225.6942
## 2021 Q3       164.7991 116.13426 213.4639  90.372663 239.2255
## 2021 Q4        77.6910  25.66612 129.7159  -1.874184 157.2562
autoplot(fcast1)

If not method is specified, the stlf() unction uses the so-called ETS approach. Discussion of that method is beyond the scope of this introduction. Suffice to say that it produces good predictions in many settings.

fcast2 <- stlf(ms)
fcast2
##         Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 2020 Q1      129.79465 117.33699 142.2523 110.74231 148.8470
## 2020 Q2      167.11063 154.65297 179.5683 148.05829 186.1630
## 2020 Q3      176.07375 163.61609 188.5314 157.02141 195.1261
## 2020 Q4       89.91841  77.46075 102.3761  70.86606 108.9708
## 2021 Q1      133.60558 121.14792 146.0632 114.55324 152.6579
## 2021 Q2      170.92156 158.46390 183.3792 151.86921 189.9739
## 2021 Q3      179.88468 167.42702 192.3423 160.83233 198.9370
## 2021 Q4       93.72934  81.27167 106.1870  74.67698 112.7817
autoplot(fcast2)

It is recommended to verify the predictive power of the model by splitting the data into a raining and test set. The hold-out cases in the test set can then be used to see how good the forecasts are, for known observations that were not used in the model. This procedure is often referred to as back-casting.

Still, there are more advanced methods that often lead to better predictions, including exponential smoothing and ARIMA. These techniques will be introduced in the next session.