Chapter 3 TS Decomposition

The idea of breaking down the varying patterns and component such as trends, seasonality, or cycles.

We break it in trend cycle, seasonal component, and remainder component.

So how do we determine the components and observe them?

3.1 Transformation and Adjustments

Calendar adjustments (days in a month etc)

Population adjustments (to factor out population changes and see true increases).

global_economy %>%
  filter(Country == "Australia") %>%
  autoplot(GDP/Population) +
  labs(title= "GDP per capita", y = "$US")

Inflation adjustments such that prices are compared to a base year. So that even though prices might have increased, a company’s sales revenue might of decreased.

## Warning: Removed 1 row(s) containing missing values (geom_path).

Mathematical transformations

Log transformation is a useful transformation for percent change.

Box-Cox transformations which is influenced by \(\lambda\). If it is 0, then it is a log, if not, then it is a \((sign(y_t)|y_t|^{\lambda}-1)/\lambda\). The \(\lambda\) makes the size of the seasonal variation across the whole series.

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

3.2 Transformation and Adjustments

Additive Decomposition:

\[y_t=S_t+T_t+R_t\]

Which is the seasonal, trend cycle, and the remainder component.

Multiplicative Decomposition:

\[y_t=S_t*T_t*R_t\]

You use additive if the magnitude of seasonal fluctuations or variation around the trend cycle does not vary with the level of the time series. Multiplicative is common with economic TS.

We can also do a first transformation to make the data stable over time such as with a log transformation.

\[y_t=S_t*T_t*R_t=logy_t=logS_t+logT_t+logR_t\]

We will see a employment level of US Retail and then do a STL decomposition method:

us_retail_employment <- us_employment %>%
  filter(year(Month) >= 1990, Title == "Retail Trade") %>%
  select(-Series_ID)
autoplot(us_retail_employment, Employed) +
  labs(y = "Persons (thousands)",
       title = "Total employment in US retail")

dcmp <- us_retail_employment %>%
  model(stl = STL(Employed))
components(dcmp)
## # A dable: 357 x 7 [1M]
## # Key:     .model [1]
## # :        Employed = trend + season_year + remainder
##    .model    Month Employed  trend season_year remainder season_adjust
##    <chr>     <mth>    <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
##  1 stl    1990 Jan   13256. 13288.      -33.0      0.836        13289.
##  2 stl    1990 Feb   12966. 13269.     -258.     -44.6          13224.
##  3 stl    1990 Mar   12938. 13250.     -290.     -22.1          13228.
##  4 stl    1990 Apr   13012. 13231.     -220.       1.05         13232.
##  5 stl    1990 May   13108. 13211.     -114.      11.3          13223.
##  6 stl    1990 Jun   13183. 13192.      -24.3     15.5          13207.
##  7 stl    1990 Jul   13170. 13172.      -23.2     21.6          13193.
##  8 stl    1990 Aug   13160. 13151.       -9.52    17.8          13169.
##  9 stl    1990 Sep   13113. 13131.      -39.5     22.0          13153.
## 10 stl    1990 Oct   13185. 13110.       61.6     13.2          13124.
## # ... with 347 more rows
components(dcmp) %>%
  as_tsibble() %>%
  autoplot(Employed, colour="gray") +
  geom_line(aes(y=trend), colour = "#D55E00") +
  labs(
    y = "Persons (thousands)",
    title = "Total employment in US retail"
  )

components(dcmp) %>% autoplot()

We see all three components. Seasonality

Here is the example with seasonally adjusted data. This would account for things like seasonal workers for the holidays or school employees.

components(dcmp) %>%
  as_tsibble() %>%
  autoplot(Employed, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Persons (thousands)",
       title = "Total employment in US retail")

3.3 Moving Averages

Method used between 1920s through 1950s. The purpose is to estimate the trend cycle.

\[\hat{T}_t=1/m *\sum_{j=-k}^{k} y_{t+j,}\]

Where \(m=2k+1\) which represents the estimate of the trend-cycle at time t is via averaging values of the time series within k periods of t. So observations closer together should be closer in value. So randomness is elimated and we gain a smooth trend-cycle component called \(m-MA\) or the moving average of order m.

global_economy %>%
  filter(Country == "Australia") %>%
  autoplot(Exports) +
  labs(y = "% of GDP", title = "Total Australian exports")

aus_exports <- global_economy %>%
  filter(Country == "Australia") %>%
  mutate(
    `5-MA` = slider::slide_dbl(Exports, mean,
                .before = 2, .after = 2, .complete = TRUE)
  )

aus_exports %>%
  autoplot(Exports) +
  geom_line(aes(y = `5-MA`), colour = "#D55E00") +
  labs(y = "% of GDP",
       title = "Total Australian exports") +
  guides(colour = guide_legend(title = "series"))
## Warning: Removed 4 row(s) containing missing values (geom_path).

So we capture the main moving without the randomness and white noise.

Moving Averages of Moving Averages? Yes, to make an even order moving average symmetric!

beer <- aus_production %>%
  filter(year(Quarter) >= 1992) %>%
  select(Quarter, Beer)
beer_ma <- beer %>%
  mutate(
    `4-MA` = slider::slide_dbl(Beer, mean,
                .before = 1, .after = 2, .complete = TRUE),
    `2x4-MA` = slider::slide_dbl(`4-MA`, mean,
                .before = 1, .after = 0, .complete = TRUE)
  )

So in terms of application, centered MAs are used to estimate trend-cycle from seasonal data. So can determine the trend cycle of monthly data using yearly seasonality.

us_retail_employment_ma <- us_retail_employment %>%
  mutate(
    `12-MA` = slider::slide_dbl(Employed, mean,
                .before = 5, .after = 6, .complete = TRUE),
    `2x12-MA` = slider::slide_dbl(`12-MA`, mean,
                .before = 1, .after = 0, .complete = TRUE)
  )
us_retail_employment_ma %>%
  autoplot(Employed, colour = "gray") +
  geom_line(aes(y = `2x12-MA`), colour = "#D55E00") +
  labs(y = "Persons (thousands)",
       title = "Total employment in US retail")
## Warning: Removed 12 row(s) containing missing values (geom_path).

Finally, there is the weight moving averages like 2 4-MA is equal to a weight 5-MA. The benefit of weighted MA is a smoother estimate of a trend cycle since observations are not dropped from the averages.

3.4 Classical Decomposition

Note there is additive and multiplicative decomposition. Assumption that seasonal component is constant from year to year (i.e) that December will have higher levels of consumer spending vs January. For multplicative seasonality, m values forms the seasonal component known as “seasonal indices”.

Additive decomposition.

  1. If m (the period such as 4, 12, etc) is even, compute the trend cycle component \(\hat{T}_t\) using 2 x m-MA. If odd number, use m-MA.

  2. Calculate detrended series: \(y_t-\hat{T}_t\)

  3. Estimate seasonal aspect for each season, by averaging the detrended values for that season.

  4. The remainder is determined by subtracting the estimated seasonal and trend cycle components: \(\hat{R}_t=y_t-\hat{T}_t-\hat{S}_t\).

us_retail_employment %>%
  model(
    classical_decomposition(Employed, type = "additive")
  ) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical additive decomposition of total
                  US retail employment")
## Warning: Removed 6 row(s) containing missing values (geom_path).

Multiplicative decomposition: Which the same as additive, except the substractions are replaced with divisions.

  1. If m (the period such as 4, 12, etc) is even, compute the trend cycle component \(\hat{T}_t\) using 2 x m-MA. If odd number, use m-MA.

  2. Calculate detrended series: \(y_t/\hat{T}_t\)

  3. Estimate seasonal aspect for each season, by averaging the detrended values for that season.

  4. The remainder is determined by subtracting the estimated seasonal and trend cycle component: \(\hat{R}_t=y_t/(\hat{T}_t\hat{S}_t)\).

Classical decomposition is not recommended as there are better alternatives. Also issues regarding estimates for trend cycle being missing for first and last observations, over smoothing, assumptions regarding constant seasonality, and issues regarding robustness for oddity values in certain years.

3.5 Methods used by offcial statistics agencies

X-11 Methods or SEATS methods established by the Census Bureau. Works only for quarterly and monthly data.

x-11 Method

Based on classical decomposition with modifications. X-11 handles trading day variation, holiday effects, effects of known predictors, and robust to outliers

x11_dcmp <- us_retail_employment %>%
  model(x11 = X_13ARIMA_SEATS(Employed ~ x11())) %>%
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of total US retail employment using X-11.")

x-11 captures the sudden fall in 2009 better than the prior methods.

x11_dcmp %>%
  ggplot(aes(x = Month)) +
  geom_line(aes(y = Employed, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Persons (thousands)",
       title = "Total employment in US retail") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

x11_dcmp %>%
  gg_subseries(seasonal)

SEATS (Seasonal Extraction in Arima Time Series) Method:

Created by Bank of Spain.

seats_dcmp <- us_retail_employment %>%
  model(seats = X_13ARIMA_SEATS(Employed ~ seats())) %>%
  components()
autoplot(seats_dcmp) +
  labs(title =
    "Decomposition of total US retail employment using SEATS")

3.6 STL Decomposition

STL method for decomposing TS. Season and Trend decomposition using Loess. STL is better since it handles any type of seasonality, the seasonal component is allowed to change over time, the smoothness of trend=cycle can be controlled by the user, and can be robustness to outliers.

STL disadvantages including issues handling trading or calendar day variation. Also, the multiplicative decomposition can be obtained by first logging the data.

us_retail_employment %>%
  model(
    STL(Employed ~ trend(window = 7) +
                   season(window = "periodic"),
    robust = TRUE)) %>%
  components() %>%
  autoplot()

The main parameters to choose are trend cycle window and the seasonal window. These determine how rapidly the trend cycle and seasonal components can change. The smaller the value, the faster the change. The values should also be odd numbers. If the value is infinity, then that means it is periodic or identical across all years.

3.7 Excerises

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00