Stat 142 (Time Series Analysis)

Time Series Decomposition

Author

Norberto E. Milla, Jr.

Published

January 24, 2024

Code
library(fpp2)
library(forecast)
library(tidyverse)
library(timetk)
library(seasonal)

1 Review of time series components

  • A time series consists of two general components: systematic and non-systematic

  • The systematic part is typically divided into three components: trend, seasonality, and cyclicity

  • The nonsystematic part is called noise

  • Trend is the change in the series from one period to the next

  • Seasonality describes a short-term cyclical behavior that can be observed several times within the given series

  • Lastly, noise is the random variation that results from measurement error or other causes that are not accounted for.

  • It is always present in a time series to some degree, although we cannot observe it directly

  • The different components are commonly considered to be either additive or multiplicative

  • A time series with additive components can be written as: y_t = T_t + S_t + R_t

  • A time series with multiplicative components can be written as: y_t = T_t \times S_t \times R_t

  • Forecasting methods attempt to isolate the systematic part and quantify the noise level

  • The systematic part is used for generating point forecasts and the level of noise helps assess the uncertainty associated with the point forecasts

  • Trend patterns are commonly approximated by linear, exponential and other mathematical functions

  • For seasonal patterns, two common approximations are additive seasonality (where values in different seasons vary by a constant amount) and multiplicative seasonality (where values in different seasons vary by a percentage)

Code
knitr::include_graphics("pic10.png")

  • The classical method of time series decomposition originated in the 1920s and was widely used until the 1950s.

  • It still forms the basis of many time series decomposition methods, so it is important to understand how it works.

  • The first step in a classical decomposition is to use a moving average method to estimate the trend-cycle, so let us revisit and discuss more about moving averages.

2 More on moving averages

  • A moving average of order m can be written as

\hat{T}_t = \frac{1}{m} \sum_{j=-k}^k y_{t+j}

where m = 2k+1. That is, the estimate of the trend-cycle at time t is obtained by averaging values of the time series within
k periods of t.

  • Note that observations that are nearby in time are also likely to be close in value. Therefore, the average eliminates some of the randomness in the data, leaving a smooth trend-cycle component.

  • We call this an m-MA, meaning a moving average of order m.

  • The order of the moving average determines the smoothness of the trend-cycle estimate.

  • In general, a larger order means a smoother curve

  • There are no trend-cycle estimates near the tails of the series

    • For an 3-MA, no value for the first and last time points

    • For an 5-MA, no values for the first 2 and last 2 time points

  • The larger the order of MA, more points lost at ends

  • Moving averages can be easily implemented using the ma() function in the stats package

Code
co2 <- read.csv("CO2 1980-1987.csv")
co2_ts<-ts(co2$CO2,
           start=c(1980,1),
           end=c(1987,9),
           frequency=12)

autoplot(co2_ts) +
  autolayer(ma(co2_ts, order = 3)) +
  labs(x = "Year",
       y ="CO2 concentration",
       title = "3-MA") +
  theme(legend.position = "none")
autoplot(co2_ts) +
  autolayer(ma(co2_ts, order = 5)) +
  labs(x = "Year",
       y ="CO2 concentration",
       title = "5-MA") +
  theme(legend.position = "none")
autoplot(co2_ts) +
  autolayer(ma(co2_ts, order = 7)) +
  labs(x = "Year",
       y ="CO2 concentration",
       title = "7-MA") +
  theme(legend.position = "none")
autoplot(co2_ts) +
  autolayer(ma(co2_ts, order = 9)) +
  labs(x = "Year",
       y ="CO2 concentration",
       title = "9-MA") +
  theme(legend.position = "none")

  • Simple moving averages such as these are usually of an odd order (e.g., 3, 5, 7, etc.). This is so they are symmetric: in a moving average of order m = 2k + 1, the middle observation, and k observations on either side, are averaged. But if m is even, it would no longer be symmetric.

  • It is possible to apply a moving average to a moving average. One reason for doing this is to make an even-order moving average symmetric

  • For example, we might take a moving average of order 4, and then apply another moving average of order 2 to the results.

Code
carameat <- read.csv("Carameat.csv")
carameat %>% 
  knitr::kable(col.names=c("Year", "Quarter", "Xt", "4-MA", "2x4-MA")) %>% 
  kableExtra::kable_styling()
Year Quarter Xt 4-MA 2x4-MA
2000 Quarter 1 1493 NA NA
2000 Quarter 2 2161 1909.50 NA
2000 Quarter 3 1841 1911.25 1910.38
2000 Quarter 4 2143 1921.00 1916.13
2001 Quarter 1 1500 1939.00 1930.00
2001 Quarter 2 2200 1890.75 1914.88
2001 Quarter 3 1913 1888.75 1889.75
2001 Quarter 4 1950 1918.25 1903.50
2002 Quarter 1 1492 1894.25 1906.25
2002 Quarter 2 2318 1943.75 1919.00
2002 Quarter 3 1817 1937.25 1940.50
2002 Quarter 4 2148 1927.00 1932.13
2003 Quarter 1 1466 1858.75 1892.88
2003 Quarter 2 2277 1907.00 1882.88
2003 Quarter 3 1544 2054.00 1980.50
2003 Quarter 4 2341 1942.50 1998.25
  • When a 2-MA follows a moving average of an even order (such as 4), it is called a “centered moving average of order 4”. This is because the results are now symmetric.

  • To see that this is the case, we can write the 2×4 -MA as follows: \begin{align} \hat{T}_t &= \frac{1}{2}\left[\frac{1}{4} (y_{t-2} + y_{t-1} + y_t +y_{t+1}) + \frac{1}{4} (y_{t-1} + y_t +y_{t+1}+ y_{t+2}) \right] \notag \\ &= \frac{1}{8}x_{t-2}+\frac{1}{4}x_{t-1}+\frac{1}{4}x_t+\frac{1}{4}x_{t+1}+\frac{1}{8}x_{t+2} \notag \end{align}

  • It is now a weighted average of observations that is symmetric. By default, the ma() function in R will return a centered moving average for even orders (unless center=FALSE is specified).

  • Other combinations of moving averages are also possible. For example, a 3×3-MA is often used, and consists of a moving average of order 3 followed by another moving average of order 3.

  • In general, an even order MA should be followed by an even order MA to make it symmetric.

  • Similarly, an odd order MA should be followed by an odd order MA.

  • To take away seasonality from a series so we can better see trend, we would use a moving average with order = seasonal span.

  • If the seasonal period is odd and of order m, we use a m-MA to estimate the trend-cycle. For example, we will use a 7-MA to estimate the trend-cycle of daily data with a weekly seasonality

  • If the seasonal period is even and of order m, we use a 2×m-MA to estimate the trend-cycle.

  • For example, a 2 \times 12-MA can be used to estimate the trend-cycle of monthly data \frac{1}{24}x_{t-6}+\frac{1}{12}x_{t-5}+\frac{1}{12}x_{t-4}+\cdots+\frac{1}{12}x_{t+4}+\frac{1}{12}x_{t+5}+\frac{1}{24}x_{t+6}

3 Classical decomposition

  • It is a relatively simple procedure, and forms the starting point for most other methods of time series decomposition

  • Two forms of classical decomposition: an additive decomposition and a multiplicative decomposition

  • In classical decomposition, we assume that the seasonal component is constant from year to year

  • For multiplicative seasonality, the m values that form the seasonal component are sometimes called the seasonal indices

3.1 Additive decomposition

  1. If m is an even number, compute the trend-cycle component \hat{T}_t using a 2\times m-MA. If m is an odd number, compute the trend-cycle component \hat{T}_t using an m-MA.

  2. Calculate the detrended series: y_t - \hat{T}_t

  3. To estimate the seasonal component for each season, simply average the detrended values for that season. For example, with monthly data, the seasonal component for March is the average of all the detrended March values in the data. These seasonal component values are then adjusted to ensure that they add to zero. The seasonal component is obtained by stringing together these monthly values, and then replicating the sequence for each year of data. This gives \hat{S}_t

  4. The remainder component is calculated by subtracting the estimated seasonal and trend-cycle components: \hat{R}_t = y_t - \hat{T}_t - \hat{S}_t

3.2 Multiplicative decomposition

  1. If m is an even number, compute the trend-cycle component \hat{T}_t using a 2\times m-MA. If m is an odd number, compute the trend-cycle component \hat{T}_t using an m-MA.

  2. Calculate the detrended series: \frac{y_t}{\hat{T}_t}

  3. To estimate the seasonal component for each season, simply average the detrended values for that season. For example, with monthly data, the seasonal component for March is the average of all the detrended March values in the data. These seasonal component values are then adjusted to ensure that they add to m. The seasonal component is obtained by stringing together these monthly values, and then replicating the sequence for each year of data. This gives \hat{S}_t.

  4. The remainder component is calculated by subtracting the estimated seasonal and trend-cycle components: \hat{R}_t = \frac{y_t}{\hat{T}_t \times \hat{S}_t}

  • The basic command in R for (classical) decomposition is decompose()

  • For an additive model, the code is: decompose(name of series, type = “additive”)

  • Code for a multiplicative decomposition: decompose(name of series, type =“multiplicative”)

  • Let us consider the dataset of number of births per month in New York city, from January 1946 to December 1959 to illustrate decompsition of an additive model

Code
births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
births.ts <- ts(births, 
                frequency = 12, 
                start = c(1946, 1))
plot(births.ts,
        ylab = "No. of births")

  • We can see some seasonal variation in the number of births per month; there is a peak every summer, and a trough every winter

  • Seasonal fluctuations are roughly constant in size over time and do not seem to depend on the level of the time series

  • Hence, an additive model can be applied

Code
births.comp <- decompose(births.ts, type="additive")
autoplot(births.comp) + xlab("Year") +
  ggtitle("Additive decomposition")

  • We can extract the component values from the output of the decompose() function
trend <- births.comp$trend
seasonality <- births.comp$seasonal
random <- births.comp$random
seasonal.indices <- births.comp$figure
plot(trend)

plot(seasonality)

plot(random)

print(seasonal.indices)
 [1] -0.6771947 -2.0829607  0.8625232 -0.8016787  0.2516514 -0.1532556
 [7]  1.4560457  1.1645938  0.6916162  0.7752444 -1.1097652 -0.3768197
head(seasonality, n=24) #seasonal component is obtained by stringing together these monthly values, and then replicating the sequence for each year of data
            Jan        Feb        Mar        Apr        May        Jun
1946 -0.6771947 -2.0829607  0.8625232 -0.8016787  0.2516514 -0.1532556
1947 -0.6771947 -2.0829607  0.8625232 -0.8016787  0.2516514 -0.1532556
            Jul        Aug        Sep        Oct        Nov        Dec
1946  1.4560457  1.1645938  0.6916162  0.7752444 -1.1097652 -0.3768197
1947  1.4560457  1.1645938  0.6916162  0.7752444 -1.1097652 -0.3768197
round(sum(seasonal.indices),4)#equals zero for additive decomposition
[1] 0
deseasonalized <- births - seasonality
head(cbind(births, seasonality, deseasonalized),n=10)
         births seasonality deseasonalized
Jan 1946 26.663  -0.6771947       27.34019
Feb 1946 23.598  -2.0829607       25.68096
Mar 1946 26.931   0.8625232       26.06848
Apr 1946 24.740  -0.8016787       25.54168
May 1946 25.806   0.2516514       25.55435
Jun 1946 24.364  -0.1532556       24.51726
Jul 1946 24.477   1.4560457       23.02095
Aug 1946 23.901   1.1645938       22.73641
Sep 1946 23.175   0.6916162       22.48338
Oct 1946 23.227   0.7752444       22.45176
  • To demonstrate classical decomposition of a multiplicative model, let us use the AirPassengers data which contains monthly totals of a US airline passengers, from 1949 to 1960.
Code
data(AirPassengers)
AP <- AirPassengers

autoplot(AP) + 
    labs(x = "Year",
         y = "No. of passengers (1000s)" )

  • The passenger numbers increase over time with each year which may be indicative of an increasing linear trend, perhaps due to increasing demand for flight travel and commercialization of airlines in that time period

  • Let’s use the boxplot function to see any seasonal effects.

Code
boxplot(AP ~ cycle(AP),
        xlab = "Months", 
        ylab = "No. of passengers (1000s)",
        main = "Boxplot of the monthly air passengers from 1949 to 1961")

  • There are more passengers travelling in June to September with higher means and higher variances than the other months, indicating seasonality with an apparent cycle of 12 months

  • We will apply multiplicative decomposition to this data

Code
AP.comp <- decompose(AP, type = "multiplicative")
sum(AP.comp$figure)#equals 12 for monthly data
[1] 12
Code
autoplot(AP.comp) +
  labs(x = "Year",
       title = "Multipliative decomposition")

3.3 Comments on classical decomposition

While classical decomposition is still widely used, it is not recommended, as there are now several much better methods. Some of the problems with classical decomposition are summarized below.

  • The estimate of the trend-cycle is unavailable for the first few and last few observations. For example, if m=12, there is no trend-cycle estimate for the first six or the last six observations. Consequently, there is also no estimate of the remainder component for the same time periods.

  • See code below for an example.

Code
head(trend, n=10)
          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
1946       NA       NA       NA       NA       NA       NA 23.98433 23.66213
          Sep      Oct
1946 23.42333 23.16112
Code
tail(trend, n=10)
          Mar      Apr      May      Jun      Jul      Aug      Sep      Oct
1959 27.09250 27.17263 27.26208 27.36033       NA       NA       NA       NA
          Nov      Dec
1959       NA       NA
  • The trend-cycle estimate tends to over-smooth rapid rises and falls in the data

  • Classical decomposition methods assume that the seasonal component repeats from year to year. For many series, this is a reasonable assumption, but for some longer series it is not. For example, electricity demand patterns have changed over time as air conditioning has become more widespread. The classical decomposition methods are unable to capture these seasonal changes over time.

  • Occasionally, the values of the time series in a small number of periods may be particularly unusual. For example, the monthly air passenger traffic may be affected by an industrial dispute, making the traffic during the dispute different from usual. The classical method is not robust to these kinds of unusual values.

4 X11 decomposition

  • Another popular method for decomposing quarterly and monthly data is the X11 method which originated in the US Census Bureau and Statistics Canada

  • Includes many extra steps and features in order to overcome the drawbacks of classical decomposition

    • trend-cycle estimates are available for all time periods

    • seasonal component is allowed to vary slowly over time

    • highly robust to outliers and level shifts in the time series

  • It has sophisticated methods for handling trading day variation, holiday effects and the effects of known predictors

  • Handles both additive and multiplicative decomposition

  • The X11 method is available using the seas() function from the seasonal package

Code
births.comp.x11 <- seas(births.ts, x11 = "")
autoplot(births.comp.x11) + 
  labs(x = "Year",
       title = "Example of X11 decomposition")

  • Given the output from the seas() function, we can use
    • seasonal() to extract the seasonal component
    • trendcycle() to extract the trend-cycle component
    • remainder() to extract the remainder component
    • seasadj() to extraxt the seasonally adjusted time series
  • The following code chunk will plot the original series together with the trend component and the seasonally adjusted series
Code
autoplot(births.ts, series="Original series") +
  autolayer(trendcycle(births.comp.x11), series="Trend") + 
  autolayer(seasadj(births.comp.x11), series="Seasonally adjusted series") + 
  labs(x = "Year",
       y = "Number of births per month",
       title = "X11 decomposition of the number of births per month")

Code
AP.comp.x11 <- seas(AP, x11 = "")
autoplot(AP, series="Original series") +
  autolayer(trendcycle(AP.comp.x11), series="Trend") + 
  autolayer(seasadj(AP.comp.x11), series="Seasonally adjusted series") + 
  labs(x = "Year",
       y = "Number of paseengers",
       title = "X11 decomposition of the number of passengers per month")

5 SEATS decomposition

  • “SEATS” stands for “Seasonal Extraction in ARIMA Time Series”

  • Works only with quarterly and monthly data

  • Uses the same functions to extract the components and the seasonally adjusted series

Code
AP.comp.seats <- seas(AP)
autoplot(AP.comp.seats) + 
  labs(x = "Year",
       title = "Example of SEATS decomposition")

6 STL decomposition

  • STL (“Seasonal and Trend decomposition using Loess”) is a versatile and robust method for decomposing time series

  • Loess is a method for estimating nonlinear relationships

  • STL has several advantages over the classical, SEATS and X11 decomposition methods

    • Unlike SEATS and X11, STL will handle any type of seasonality, not only monthly and quarterly data

    • The seasonal component is allowed to change over time, and the rate of change can be controlled by the user

    • The smoothness of the trend-cycle can also be controlled by the user

    • It can be robust to outliers (occasional unusual observations will not affect the estimates of the trend-cycle and seasonal components but can affect the remainder component)

  • It can not handle trading day or calendar variation automatically, and it only provides facilities for additive decomposition

  • It is possible to obtain a multiplicative decomposition by first taking logarithms of the data, then back-transforming the components

  • Decompositions between additive and multiplicative can be obtained using a Box-Cox transformation of the data with 0 < \lambda <1

    • A value of \lambda=0 corresponds to the multiplicative decomposition
    • A value of \lambda=1 corresponds to the additive decomposition
  • The two main parameters to be chosen when using STL are the trend-cycle window (t.window) and the seasonal window (s.window)

    • t.window is the number of consecutive observations to be used when estimating the trend-cycle

    • s.window is the number of consecutive years to be used in estimating each value in the seasonal component

    • These control how rapidly the trend-cycle and seasonal components can change

    • Smaller values allow for more rapid changes

    • Both must be odd numbers

    • The user must specify s.window as there is no default

    • Specifying t.window is optional, and a default value will be used if it is omitted

  • Use the same functions as with X11 and SEATS methods to extract the components

Code
births.comp.stl <- stl(births.ts, s.window="periodic")
autoplot(births.comp.stl) +
  labs(x = "Year",
       title = "Example of STL decomposition")