Title: Time Series Decomposition (Forecasting Methods & Applications by Makridakis et.al)


Concept

Many forecasting methods are based on the concept that when an underlying pattern exists in a data series, that pattern can be distinguished from randomness by smoothing (averaging) past values. The effect of this smoothing is to eliminate randomness so the pattern can be projected into the future and used as the forecast. In many instances the pattern can be broken down (decomposed) into subpatterns that identify each component of the time series separately. Such a breakdown can frequently aid in better understanding the behavior of the series, which facilitates improved accuracy in forecasting.

Decomposition methods usually try to identify two separate components of the basic underlying pattern that tend to characterize economic and business series. These are the trend-cycle and the seasonal factors. The seasonal factor relates to periodic fluctuations of constant length that are caused by such things as temperature, rainfall, month of the year, timing of holidays, and corporate policies. The trend-cycle represents longer-term changes in the level of the series. The trend-cycle is sometimes separated into trend and cyclical components, but the distinction is somewhat artificial and most decomposition procedures leave the trend and cycle as a single component known as the trend-cycle.

Decomposition assumes that the data are made up as follows:

          data = pattern + error = f(trend-cycle, seasonality, error).

Thus, in addition to the components of the pattern, an element of error or randomness is also assumed to be present. This error is assumed to be the difference between the combined effect of the two subpatterns of the series and the actual data. Therefore, it is often called the “irregular” or the “remainder” component.

There are several alternative approaches to decomposing a time series, all of which aim to isolate each component of the series as accurately as possible. The basic concept in such separation is empirical and consists of first removing the trend-cycle, then isolating the seasonal component. Any residual is assumed to be randomness which, while it cannot be predicted, can be identified. From a statistical point of view there are a number of theoretical weaknesses in the decomposition approach. Practitioners, however, have largely ignored these weaknesses and have used the approach with considerable success.

Decomposition methods are among the oldest approaches to time series analysis. They originated around the beginning of this century and were initiated from two different directions. First, it was recognized that to study the serial correlation within or between variable(s), any spurious correlation that might exist because of trend must be eliminated. As early as 1884, Poynting attempted to eliminate trend and some seasonal fluctuations by averaging prices over several years. Hooker (1901) followed Poynting’s example, but was more precise in his methods for eliminating trend. His work was followed by Spencer (1904) and Anderson and Nochmals (1914), who generalized the procedure of trend elimination to include higher-order polynomials.

A second direction for work in this area originated with economists who worried about the impact of depressions and sought ways to predict them. They felt that the elements of economic activity should be separated so that changes in the business cycle could be isolated from seasonal and other changes. France appointed a committee that in 1911 presented a report analyzing the causes of the 1907 economic crisis. This group introduced the idea of leading and coincidental indicators and attempted to separate the trend from the cycle so that the movement of the latter could be followed.

In the United States this idea was expanded and the concept of constructing barometers of business activity was developed. Furthermore, an attempt to separate the seasonal fluctuation from the rest of the components was made as early as 1915 (Copeland). The process of decomposition, as it is known today, was introduced by Macauley (1930) who, in the 1920s, introduced the ratio-to-moving averages method that forms the basis of Census II. (For a summary article, see Burman, 1979.)

An impetus in the development of decomposition came with the introduction and widespread use of computers. Shiskin (1957) developed a computer program that could perform the needed computations easily and quickly. This gave rise to Census II, which has become the most widely used of the decomposition methods. Since that time, decomposition approaches have been used widely by both economists and business analysts.

More recently, the advantages of decomposition approaches have been recognized and efforts have been made to upgrade these approaches. These efforts have been in the direction of introducing statistical rigor into the approach without losing its intuitive appeal. (See Dagum, 1982; Cleveland, 1983.)

We introduce the ideas behind decomposition and seasonal adjustment in Section 3/2. A key step in all decomposition methods involves smoothing the original data. In Section 3/3 we describe moving average smoothers and their variations that are used in most decomposition methodology. An alternative smoother which is becoming increasingly popular is a local linear regression smoother; it is introduced in Section 3/4. The classical decomposition method, dating back to the 1920s, was once the most popular technique and still forms the basis for most other methods. Classical decomposition is discussed in Section 3/5. Today, the most popular method of decomposition is Census II, which lies behind a great many basic economic series used in the private and public sectors. We will study the latest variant of Census II (X-12-ARIMA) in Section 3/6. Then, in Section 3/7 we look at a relatively new decomposition method, STL, which is based on local linear regressions. Finally, in Section 3/8, we briefly review the role of time series decomposition in forecasting.


3/1 Principles of decomposition

3/1/1 Decomposition models

The general mathematical representation of the decomposition approach is:

            Yt = f(St,Tt,Et) -------- (1) 

where Yt is the time series value (actual data) at period t, St is the seasonal component (or index) at period t, Tt is the trend-cycle component at period t, and Et is the irregular (or remainder) component at period t.

The exact functional form of (eq 1) depends on the decomposition method actually used. A common approach is to assume equation (1) has the additive form

            Yt = St +Tt +Et.

That is, the seasonal, trend-cycle and irregular components are simply added together to give the observed series. Alternatively, the multiplicative decomposition has the formula

            Yt = St ×Tt ×Et. 
            

That is, the seasonal, trend-cycle and irregular components are multiplied together to give the observed series.

An additive model is appropriate if the magnitude of the seasonal fluctuations does not vary with the level of the series. But if the seasonal fluctuations increase and decrease proportionally with increases and decreases in the level of the series, then a multiplicative model is appropriate. Multiplicative decomposition is more prevalent with economic series because most seasonal economic series do have seasonal variation which increases with the level of the series.

Rather than choosing either an additive or multiplicative decomposition, we could use a transformation. Very often the transformed series can be modeled additively, when the original data are not additive.

Logarithms, in particular, turn a multiplicative relationship into an additive relationship, since if,

         Yt = St * Tt * Et
      logYt = logSt +logTt +logEt. 

So we can fit a multiplicative relationship by fitting an additive relationship to the logarithms of the data. Other transformations allow a decomposition which is somewhere between the additive and multiplicative forms.

A further decomposition method is pseudo-additive decomposition which takes the form

         Yt = Tt(St +Et −1).

This type of decomposition is useful in series where there is one month (or quarter) that is much higher or lower than all the other months (or quarters). For example, many European series take large dips in August when companies shut down for vacations. Baxter (1994) describes it and its applications in detail.


Example

Install required Packages

  • install.packages(“astsa”)
  • install.packages(“printr”)
  • library(astsa, quietly=TRUE, warn.conflicts=FALSE)
  • library(ggplot2)
  • library(knitr)
  • library(printr)
  • library(plyr)
  • library(dplyr)
  • library(lubridate)
  • library(gridExtra)
  • library(reshape2)
  • library(TTR)

Let us use births per minute data of New York from 1946 to 1960.

Taken from the below link

New York Births per minute - 1946 to 1960

births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
births_ts <- ts(births, frequency = 12, start = c(1946, 1))
births_ts
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1946 26.663 23.598 26.931 24.740 25.806 24.364 24.477 23.901 23.175 23.227
## 1947 21.439 21.089 23.709 21.669 21.752 20.761 23.479 23.824 23.105 23.110
## 1948 21.937 20.035 23.590 21.672 22.222 22.123 23.950 23.504 22.238 23.142
## 1949 21.548 20.000 22.424 20.615 21.761 22.874 24.104 23.748 23.262 22.907
## 1950 22.604 20.894 24.677 23.673 25.320 23.583 24.671 24.454 24.122 24.252
## 1951 23.287 23.049 25.076 24.037 24.430 24.667 26.451 25.618 25.014 25.110
## 1952 23.798 22.270 24.775 22.646 23.988 24.737 26.276 25.816 25.210 25.199
## 1953 24.364 22.644 25.565 24.062 25.431 24.635 27.009 26.606 26.268 26.462
## 1954 24.657 23.304 26.982 26.199 27.210 26.122 26.706 26.878 26.152 26.379
## 1955 24.990 24.239 26.721 23.475 24.767 26.219 28.361 28.599 27.914 27.784
## 1956 26.217 24.218 27.914 26.975 28.527 27.139 28.982 28.169 28.056 29.136
## 1957 26.589 24.848 27.543 26.896 28.878 27.390 28.065 28.141 29.048 28.484
## 1958 27.132 24.924 28.963 26.589 27.931 28.009 29.229 28.759 28.405 27.945
## 1959 26.076 25.286 27.660 25.951 26.398 25.565 28.865 30.000 29.261 29.012
##         Nov    Dec
## 1946 21.672 21.870
## 1947 21.759 22.073
## 1948 21.059 21.573
## 1949 21.519 22.025
## 1950 22.084 22.991
## 1951 22.964 23.981
## 1952 23.162 24.707
## 1953 25.246 25.180
## 1954 24.712 25.688
## 1955 25.693 26.881
## 1956 26.291 26.987
## 1957 26.634 27.735
## 1958 25.912 26.619
## 1959 26.992 27.897

Decomposing Trend-Cycle & Seasonality from the births time series data
births_ts_decomposed_additive <- decompose(births_ts)

plot(births_ts_decomposed_additive)

births_ts_decomposed_multiplicative <- decompose(births_ts, type = "multiplicative")

plot(births_ts_decomposed_multiplicative)