Stat 142 (Time Series Analysis)

Time Series Decomposition

Author

Norberto E. Milla, Jr.

Published

January 24, 2024

Code

library(fpp2)
library(forecast)
library(tidyverse)
library(timetk)
library(seasonal)

1 Review of time series components

A time series consists of two general components: systematic and non-systematic
The systematic part is typically divided into three components: trend, seasonality, and cyclicity
The nonsystematic part is called noise
Trend is the change in the series from one period to the next
Seasonality describes a short-term cyclical behavior that can be observed several times within the given series
Lastly, noise is the random variation that results from measurement error or other causes that are not accounted for.
It is always present in a time series to some degree, although we cannot observe it directly
The different components are commonly considered to be either additive or multiplicative
A time series with additive components can be written as: y_t = T_t + S_t + R_t
A time series with multiplicative components can be written as: y_t = T_t \times S_t \times R_t
Forecasting methods attempt to isolate the systematic part and quantify the noise level
The systematic part is used for generating point forecasts and the level of noise helps assess the uncertainty associated with the point forecasts
Trend patterns are commonly approximated by linear, exponential and other mathematical functions
For seasonal patterns, two common approximations are additive seasonality (where values in different seasons vary by a constant amount) and multiplicative seasonality (where values in different seasons vary by a percentage)

Code

knitr::include_graphics("pic10.png")

The classical method of time series decomposition originated in the 1920s and was widely used until the 1950s.
It still forms the basis of many time series decomposition methods, so it is important to understand how it works.
The first step in a classical decomposition is to use a moving average method to estimate the trend-cycle, so let us revisit and discuss more about moving averages.

2 More on moving averages

A moving average of order m can be written as

\hat{T}_t = \frac{1}{m} \sum_{j=-k}^k y_{t+j}

where m = 2k+1. That is, the estimate of the trend-cycle at time t is obtained by averaging values of the time series within
k periods of t.

Note that observations that are nearby in time are also likely to be close in value. Therefore, the average eliminates some of the randomness in the data, leaving a smooth trend-cycle component.
We call this an m-MA, meaning a moving average of order m.
The order of the moving average determines the smoothness of the trend-cycle estimate.
In general, a larger order means a smoother curve
There are no trend-cycle estimates near the tails of the series
- For an 3-MA, no value for the first and last time points
- For an 5-MA, no values for the first 2 and last 2 time points
The larger the order of MA, more points lost at ends
Moving averages can be easily implemented using the ma() function in the stats package

Code

co2 <- read.csv("CO2 1980-1987.csv")
co2_ts<-ts(co2$CO2,
           start=c(1980,1),
           end=c(1987,9),
           frequency=12)

autoplot(co2_ts) +
  autolayer(ma(co2_ts, order = 3)) +
  labs(x = "Year",
       y ="CO2 concentration",
       title = "3-MA") +
  theme(legend.position = "none")
autoplot(co2_ts) +
  autolayer(ma(co2_ts, order = 5)) +
  labs(x = "Year",
       y ="CO2 concentration",
       title = "5-MA") +
  theme(legend.position = "none")
autoplot(co2_ts) +
  autolayer(ma(co2_ts, order = 7)) +
  labs(x = "Year",
       y ="CO2 concentration",
       title = "7-MA") +
  theme(legend.position = "none")
autoplot(co2_ts) +
  autolayer(ma(co2_ts, order = 9)) +
  labs(x = "Year",
       y ="CO2 concentration",
       title = "9-MA") +
  theme(legend.position = "none")

Simple moving averages such as these are usually of an odd order (e.g., 3, 5, 7, etc.). This is so they are symmetric: in a moving average of order m = 2k + 1, the middle observation, and k observations on either side, are averaged. But if m is even, it would no longer be symmetric.
It is possible to apply a moving average to a moving average. One reason for doing this is to make an even-order moving average symmetric
For example, we might take a moving average of order 4, and then apply another moving average of order 2 to the results.

Code

carameat <- read.csv("Carameat.csv")
carameat %>% 
  knitr::kable(col.names=c("Year", "Quarter", "Xt", "4-MA", "2x4-MA")) %>% 
  kableExtra::kable_styling()

Year	Quarter	Xt	4-MA	2x4-MA
2000	Quarter 1	1493	NA	NA
2000	Quarter 2	2161	1909.50	NA
2000	Quarter 3	1841	1911.25	1910.38
2000	Quarter 4	2143	1921.00	1916.13
2001	Quarter 1	1500	1939.00	1930.00
2001	Quarter 2	2200	1890.75	1914.88
2001	Quarter 3	1913	1888.75	1889.75
2001	Quarter 4	1950	1918.25	1903.50
2002	Quarter 1	1492	1894.25	1906.25
2002	Quarter 2	2318	1943.75	1919.00
2002	Quarter 3	1817	1937.25	1940.50
2002	Quarter 4	2148	1927.00	1932.13
2003	Quarter 1	1466	1858.75	1892.88
2003	Quarter 2	2277	1907.00	1882.88
2003	Quarter 3	1544	2054.00	1980.50
2003	Quarter 4	2341	1942.50	1998.25

When a 2-MA follows a moving average of an even order (such as 4), it is called a “centered moving average of order 4”. This is because the results are now symmetric.
To see that this is the case, we can write the 2×4 -MA as follows: \begin{align} \hat{T}_t &= \frac{1}{2}\left[\frac{1}{4} (y_{t-2} + y_{t-1} + y_t +y_{t+1}) + \frac{1}{4} (y_{t-1} + y_t +y_{t+1}+ y_{t+2}) \right] \notag \\ &= \frac{1}{8}x_{t-2}+\frac{1}{4}x_{t-1}+\frac{1}{4}x_t+\frac{1}{4}x_{t+1}+\frac{1}{8}x_{t+2} \notag \end{align}
It is now a weighted average of observations that is symmetric. By default, the ma() function in R will return a centered moving average for even orders (unless center=FALSE is specified).
Other combinations of moving averages are also possible. For example, a 3×3-MA is often used, and consists of a moving average of order 3 followed by another moving average of order 3.
In general, an even order MA should be followed by an even order MA to make it symmetric.
Similarly, an odd order MA should be followed by an odd order MA.
To take away seasonality from a series so we can better see trend, we would use a moving average with order = seasonal span.
If the seasonal period is odd and of order m, we use a m-MA to estimate the trend-cycle. For example, we will use a 7-MA to estimate the trend-cycle of daily data with a weekly seasonality
If the seasonal period is even and of order m, we use a 2×m-MA to estimate the trend-cycle.
For example, a 2 \times 12-MA can be used to estimate the trend-cycle of monthly data \frac{1}{24}x_{t-6}+\frac{1}{12}x_{t-5}+\frac{1}{12}x_{t-4}+\cdots+\frac{1}{12}x_{t+4}+\frac{1}{12}x_{t+5}+\frac{1}{24}x_{t+6}

3 Classical decomposition

It is a relatively simple procedure, and forms the starting point for most other methods of time series decomposition
Two forms of classical decomposition: an additive decomposition and a multiplicative decomposition
In classical decomposition, we assume that the seasonal component is constant from year to year
For multiplicative seasonality, the m values that form the seasonal component are sometimes called the seasonal indices

3.1 Additive decomposition

If m is an even number, compute the trend-cycle component \hat{T}_t using a 2\times m-MA. If m is an odd number, compute the trend-cycle component \hat{T}_t using an m-MA.
Calculate the detrended series: y_t - \hat{T}_t
To estimate the seasonal component for each season, simply average the detrended values for that season. For example, with monthly data, the seasonal component for March is the average of all the detrended March values in the data. These seasonal component values are then adjusted to ensure that they add to zero. The seasonal component is obtained by stringing together these monthly values, and then replicating the sequence for each year of data. This gives \hat{S}_t
The remainder component is calculated by subtracting the estimated seasonal and trend-cycle components: \hat{R}_t = y_t - \hat{T}_t - \hat{S}_t

3.2 Multiplicative decomposition

If m is an even number, compute the trend-cycle component \hat{T}_t using a 2\times m-MA. If m is an odd number, compute the trend-cycle component \hat{T}_t using an m-MA.
Calculate the detrended series: \frac{y_t}{\hat{T}_t}
To estimate the seasonal component for each season, simply average the detrended values for that season. For example, with monthly data, the seasonal component for March is the average of all the detrended March values in the data. These seasonal component values are then adjusted to ensure that they add to m. The seasonal component is obtained by stringing together these monthly values, and then replicating the sequence for each year of data. This gives \hat{S}_t.
The remainder component is calculated by subtracting the estimated seasonal and trend-cycle components: \hat{R}_t = \frac{y_t}{\hat{T}_t \times \hat{S}_t}

The basic command in R for (classical) decomposition is decompose()
For an additive model, the code is: decompose(name of series, type = “additive”)
Code for a multiplicative decomposition: decompose(name of series, type =“multiplicative”)
Let us consider the dataset of number of births per month in New York city, from January 1946 to December 1959 to illustrate decompsition of an additive model

Code

births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
births.ts <- ts(births, 
                frequency = 12, 
                start = c(1946, 1))
plot(births.ts,
        ylab = "No. of births")

We can see some seasonal variation in the number of births per month; there is a peak every summer, and a trough every winter
Seasonal fluctuations are roughly constant in size over time and do not seem to depend on the level of the time series
Hence, an additive model can be applied

Code

births.comp <- decompose(births.ts, type="additive")
autoplot(births.comp) + xlab("Year") +
  ggtitle("Additive decomposition")

We can extract the component values from the output of the decompose() function

trend <- births.comp$trend
seasonality <- births.comp$seasonal
random <- births.comp$random
seasonal.indices <- births.comp$figure
plot(trend)

plot(seasonality)

plot(random)

print(seasonal.indices)

 [1] -0.6771947 -2.0829607  0.8625232 -0.8016787  0.2516514 -0.1532556
 [7]  1.4560457  1.1645938  0.6916162  0.7752444 -1.1097652 -0.3768197

head(seasonality, n=24) #seasonal component is obtained by stringing together these monthly values, and then replicating the sequence for each year of data

            Jan        Feb        Mar        Apr        May        Jun
1946 -0.6771947 -2.0829607  0.8625232 -0.8016787  0.2516514 -0.1532556
1947 -0.6771947 -2.0829607  0.8625232 -0.8016787  0.2516514 -0.1532556
            Jul        Aug        Sep        Oct        Nov        Dec
1946  1.4560457  1.1645938  0.6916162  0.7752444 -1.1097652 -0.3768197
1947  1.4560457  1.1645938  0.6916162  0.7752444 -1.1097652 -0.3768197

round(sum(seasonal.indices),4)#equals zero for additive decomposition

[1] 0

deseasonalized <- births - seasonality
head(cbind(births, seasonality, deseasonalized),n=10)

         births seasonality deseasonalized
Jan 1946 26.663  -0.6771947       27.34019
Feb 1946 23.598  -2.0829607       25.68096
Mar 1946 26.931   0.8625232       26.06848
Apr 1946 24.740  -0.8016787       25.54168
May 1946 25.806   0.2516514       25.55435
Jun 1946 24.364  -0.1532556       24.51726
Jul 1946 24.477   1.4560457       23.02095
Aug 1946 23.901   1.1645938       22.73641
Sep 1946 23.175   0.6916162       22.48338
Oct 1946 23.227   0.7752444       22.45176

To demonstrate classical decomposition of a multiplicative model, let us use the AirPassengers data which contains monthly totals of a US airline passengers, from 1949 to 1960.

Code

data(AirPassengers)
AP <- AirPassengers

autoplot(AP) + 
    labs(x = "Year",
         y = "No. of passengers (1000s)" )

The passenger numbers increase over time with each year which may be indicative of an increasing linear trend, perhaps due to increasing demand for flight travel and commercialization of airlines in that time period
Let’s use the boxplot function to see any seasonal effects.

Code

boxplot(AP ~ cycle(AP),
        xlab = "Months", 
        ylab = "No. of passengers (1000s)",
        main = "Boxplot of the monthly air passengers from 1949 to 1961")

There are more passengers travelling in June to September with higher means and higher variances than the other months, indicating seasonality with an apparent cycle of 12 months
We will apply multiplicative decomposition to this data

Code

AP.comp <- decompose(AP, type = "multiplicative")
sum(AP.comp$figure)#equals 12 for monthly data

[1] 12

Code

autoplot(AP.comp) +
  labs(x = "Year",
       title = "Multipliative decomposition")

3.3 Comments on classical decomposition

While classical decomposition is still widely used, it is not recommended, as there are now several much better methods. Some of the problems with classical decomposition are summarized below.

The estimate of the trend-cycle is unavailable for the first few and last few observations. For example, if m=12, there is no trend-cycle estimate for the first six or the last six observations. Consequently, there is also no estimate of the remainder component for the same time periods.
See code below for an example.

Code

head(trend, n=10)

          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
1946       NA       NA       NA       NA       NA       NA 23.98433 23.66213
          Sep      Oct
1946 23.42333 23.16112

Code

tail(trend, n=10)

          Mar      Apr      May      Jun      Jul      Aug      Sep      Oct
1959 27.09250 27.17263 27.26208 27.36033       NA       NA       NA       NA
          Nov      Dec
1959       NA       NA

The trend-cycle estimate tends to over-smooth rapid rises and falls in the data
Classical decomposition methods assume that the seasonal component repeats from year to year. For many series, this is a reasonable assumption, but for some longer series it is not. For example, electricity demand patterns have changed over time as air conditioning has become more widespread. The classical decomposition methods are unable to capture these seasonal changes over time.
Occasionally, the values of the time series in a small number of periods may be particularly unusual. For example, the monthly air passenger traffic may be affected by an industrial dispute, making the traffic during the dispute different from usual. The classical method is not robust to these kinds of unusual values.

4 X11 decomposition

Another popular method for decomposing quarterly and monthly data is the X11 method which originated in the US Census Bureau and Statistics Canada
Includes many extra steps and features in order to overcome the drawbacks of classical decomposition
- trend-cycle estimates are available for all time periods
- seasonal component is allowed to vary slowly over time
- highly robust to outliers and level shifts in the time series
It has sophisticated methods for handling trading day variation, holiday effects and the effects of known predictors
Handles both additive and multiplicative decomposition
The X11 method is available using the seas() function from the seasonal package

Code

births.comp.x11 <- seas(births.ts, x11 = "")
autoplot(births.comp.x11) + 
  labs(x = "Year",
       title = "Example of X11 decomposition")

Given the output from the seas() function, we can use
- seasonal() to extract the seasonal component
- trendcycle() to extract the trend-cycle component
- remainder() to extract the remainder component
- seasadj() to extraxt the seasonally adjusted time series
The following code chunk will plot the original series together with the trend component and the seasonally adjusted series

Code

autoplot(births.ts, series="Original series") +
  autolayer(trendcycle(births.comp.x11), series="Trend") + 
  autolayer(seasadj(births.comp.x11), series="Seasonally adjusted series") + 
  labs(x = "Year",
       y = "Number of births per month",
       title = "X11 decomposition of the number of births per month")

Code

AP.comp.x11 <- seas(AP, x11 = "")
autoplot(AP, series="Original series") +
  autolayer(trendcycle(AP.comp.x11), series="Trend") + 
  autolayer(seasadj(AP.comp.x11), series="Seasonally adjusted series") + 
  labs(x = "Year",
       y = "Number of paseengers",
       title = "X11 decomposition of the number of passengers per month")

5 SEATS decomposition

“SEATS” stands for “Seasonal Extraction in ARIMA Time Series”
Works only with quarterly and monthly data
Uses the same functions to extract the components and the seasonally adjusted series

Code

AP.comp.seats <- seas(AP)
autoplot(AP.comp.seats) + 
  labs(x = "Year",
       title = "Example of SEATS decomposition")

6 STL decomposition

STL (“Seasonal and Trend decomposition using Loess”) is a versatile and robust method for decomposing time series
Loess is a method for estimating nonlinear relationships
STL has several advantages over the classical, SEATS and X11 decomposition methods
- Unlike SEATS and X11, STL will handle any type of seasonality, not only monthly and quarterly data
- The seasonal component is allowed to change over time, and the rate of change can be controlled by the user
- The smoothness of the trend-cycle can also be controlled by the user
- It can be robust to outliers (occasional unusual observations will not affect the estimates of the trend-cycle and seasonal components but can affect the remainder component)
It can not handle trading day or calendar variation automatically, and it only provides facilities for additive decomposition
It is possible to obtain a multiplicative decomposition by first taking logarithms of the data, then back-transforming the components
Decompositions between additive and multiplicative can be obtained using a Box-Cox transformation of the data with 0 < \lambda <1
- A value of \lambda=0 corresponds to the multiplicative decomposition
- A value of \lambda=1 corresponds to the additive decomposition
The two main parameters to be chosen when using STL are the trend-cycle window (t.window) and the seasonal window (s.window)
- t.window is the number of consecutive observations to be used when estimating the trend-cycle
- s.window is the number of consecutive years to be used in estimating each value in the seasonal component
- These control how rapidly the trend-cycle and seasonal components can change
- Smaller values allow for more rapid changes
- Both must be odd numbers
- The user must specify s.window as there is no default
- Specifying t.window is optional, and a default value will be used if it is omitted
Use the same functions as with X11 and SEATS methods to extract the components

Code

births.comp.stl <- stl(births.ts, s.window="periodic")
autoplot(births.comp.stl) +
  labs(x = "Year",
       title = "Example of STL decomposition")