- Time series data can exhibit a variety of patterns.
- trend
- seasonality
- cycles
- remainder component
- It is often helpful to split a time series into several components, each representing an underlying pattern category.
2022-09-08
When decomposing a time series, it is helpful to first transform or adjust the series.
Series transformation helps the decomposition (and later analysis) as simple as possible.
So we will begin by discussing transformations and adjustments.
Some of the variation seen in seasonal data may be due to simple calendar effects.
In such cases, it is usually much easier to remove the variation before doing any further analysis.
It is also known as Trading Day Adjustments.
Monthly sales in a retail store - will be variation between the months simply because of the different numbers of trading days in each month + the seasonal variation across the year.
So to perform the calendar adjustment, we can compute the sales per trading day in each month.
Then we effectively remove the calendar variation.
Sales
## # A tsibble: 12 x 3 [1M] ## Month tranding_days sales ## <mth> <dbl> <dbl> ## 1 2007 Jan 23 4 ## 2 2007 Feb 20 4 ## 3 2007 Mar 22 5 ## 4 2007 Apr 21 6 ## 5 2007 May 23 5 ## 6 2007 Jun 21 6 ## 7 2007 Jul 22 6 ## 8 2007 Aug 23 4 ## 9 2007 Sep 20 3 ## 10 2007 Oct 23 4 ## 11 2007 Nov 22 5 ## 12 2007 Dec 21 7
Sales<-Sales%>% mutate(adj_sales = sales/trading_days) Sales
## # A tsibble: 12 x 4 [1M] ## Month trading_days sales adj_sales ## <mth> <dbl> <dbl> <dbl> ## 1 2007 Jan 23 4 0.174 ## 2 2007 Feb 20 4 0.2 ## 3 2007 Mar 22 5 0.227 ## 4 2007 Apr 21 6 0.286 ## 5 2007 May 23 5 0.217 ## 6 2007 Jun 21 6 0.286 ## 7 2007 Jul 22 6 0.273 ## 8 2007 Aug 23 4 0.174 ## 9 2007 Sep 20 3 0.15 ## 10 2007 Oct 23 4 0.174 ## 11 2007 Nov 22 5 0.227 ## 12 2007 Dec 21 7 0.333
global_economy %>% filter(Country == "Brazil") %>% autoplot(GDP)
global_economy %>% filter(Country == "Brazil") %>% autoplot(GDP / Population)
\[x_{t} = \frac{y_{t}}{z_{t}*{z_{2010}}}\] - Examples of price indexes are CPI, GDP deflator, etc
print_retail <- aus_retail %>% filter(Industry == "Newspaper and book retailing") %>% group_by(Industry) %>% index_by(Year = year(Month)) %>% summarise(Turnover = sum(Turnover)) aus_economy <- global_economy %>% filter(Code == "AUS") xt<-print_retail %>% left_join(aus_economy, by = "Year") %>% mutate(Adjusted_turnover = Turnover / CPI * 100)%>% pivot_longer(c(Turnover, Adjusted_turnover), values_to = "Turnover")%>% ggplot(aes(x = Year, y = Turnover)) + geom_line() + facet_grid(name ~ ., scales = "free_y") + labs(title = "Turnover: Australian print media industry", y = "$AU") #Left_join - it join both dataset given preference to the left dataset # pivot_longer - Turn the dataset into tidy
xt
If the data shows variation that increases or decreases with the level of the series, then a transformation can be useful !
It should be kept in mind that data transformation simply changes the relative magnitude of the data and does not change the essential characteristics of the data patterns.
Growth rate - It is a transformation in the data when someone is interested in forecasting the rate of growth rather than the absolute level of growth in such variables as production, interest rate, unemployment, exports, capital formation, etc.
This transformation is quite useful when an analyst is developing forecasts with multiple economic variables measured in different units
There are other transformations - Logs and Power Transformations
Denote original observations as \(y_1,\dots,y_T\) and transformed observations as \(w_1, \dots, w_T\).
Logarithms are useful because they are interpretable: changes in a log value are relative (or percentage) changes on the original scale (\(w_t = \log(y_t)\))
Additionally, square roots and cube roots can be used. These are called power transformations because they can be written in the form \(w_t = \sqrt[p]{y_t}\) (although they are not so interpretable).
food <- aus_retail %>% filter(Industry == "Food retailing") %>% summarise(Turnover = sum(Turnover))
## Mathematical transformations
food %>% autoplot(sqrt(Turnover)) + labs(y = "Square root turnover")
food %>% autoplot(Turnover^(1/3)) + labs(y = "Cube root turnover")
food %>% autoplot(log(Turnover)) + labs(y = "Log turnover")
food %>% autoplot(-1/Turnover) + labs(y = "Inverse turnover")
Each of these transformations is close to a member of the family of Box-Cox transformations: \[w_t = \left\{\begin{array}{ll} \log(y_t), & \quad \lambda = 0; \\ (sign(y_t)|y_t|^\lambda-1)/\lambda , & \quad \lambda \ne 0. \end{array}\right.\]
A good value of \(\lambda\) is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler.
The guerrero feature (Guerrero, 1993) can be used to choose a value of lambda for you.
Check the Book for an interactive transformation (3.1)
food %>% features(Turnover, features = guerrero)
## # A tibble: 1 × 1 ## lambda_guerrero ## <dbl> ## 1 0.0524
food %>% autoplot(box_cox(Turnover, 0.0524)) + labs(y = "Box-Cox transformed turnover")
log1p() can also be useful for data with zeros. \(log(1+x)\)fable.)Recall
Trend - pattern exists when there is a long-term increase or decrease in the data.
Cyclic - pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years).
Seasonal - pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week).
\[y_t = f(S_t, T_t, R_t)\]
Additive decomposition: \(y_t = S_t + T_t + R_t.\)
Multiplicative decomposition: \(y_t = S_t \times T_t \times R_t.\)
\[ y_t = S_t \times T_t \times R_t \quad\Rightarrow\quad \log y_t = \log S_t + \log T_t + \log R_t. \]
us_retail_employment <- us_employment %>% filter(year(Month) >= 1990, Title == "Retail Trade") %>% select(-Series_ID) us_retail_employment
## # A tsibble: 357 x 3 [1M] ## Month Title Employed ## <mth> <chr> <dbl> ## 1 1990 Jan Retail Trade 13256. ## 2 1990 Feb Retail Trade 12966. ## 3 1990 Mar Retail Trade 12938. ## 4 1990 Apr Retail Trade 13012. ## 5 1990 May Retail Trade 13108. ## 6 1990 Jun Retail Trade 13183. ## 7 1990 Jul Retail Trade 13170. ## 8 1990 Aug Retail Trade 13160. ## 9 1990 Sep Retail Trade 13113. ## 10 1990 Oct Retail Trade 13185. ## # … with 347 more rows
us_retail_employment %>% autoplot(Employed) + labs(y="Persons (thousands)", title="Total employment in US retail")
us_retail_employment %>% model(stl = STL(Employed))
## # A mable: 1 x 1 ## stl ## <model> ## 1 <STL>
dcmp <- us_retail_employment %>% model(stl = STL(Employed)) components(dcmp)
## # A dable: 357 x 7 [1M] ## # Key: .model [1] ## # : Employed = trend + season_year + remainder ## .model Month Employed trend season_year remainder season_adjust ## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 stl 1990 Jan 13256. 13288. -33.0 0.836 13289. ## 2 stl 1990 Feb 12966. 13269. -258. -44.6 13224. ## 3 stl 1990 Mar 12938. 13250. -290. -22.1 13228. ## 4 stl 1990 Apr 13012. 13231. -220. 1.05 13232. ## 5 stl 1990 May 13108. 13211. -114. 11.3 13223. ## 6 stl 1990 Jun 13183. 13192. -24.3 15.5 13207. ## 7 stl 1990 Jul 13170. 13172. -23.2 21.6 13193. ## 8 stl 1990 Aug 13160. 13151. -9.52 17.8 13169. ## 9 stl 1990 Sep 13113. 13131. -39.5 22.0 13153. ## 10 stl 1990 Oct 13185. 13110. 61.6 13.2 13124. ## # … with 347 more rows
us_retail_employment %>% autoplot(Employed, color='gray') + autolayer(components(dcmp), trend, color='#D55E00') + labs(y="Persons (thousands)", title="Total employment in US retail")
components(dcmp) %>% autoplot()
If the variation due to seasonality is not of primary interest, the seasonally adjusted series can be useful.
For example, monthly unemployment data are usually seasonally adjusted in order to highlight variation due to the underlying state of the economy rather than the seasonal variation.
An increase in unemployment due to school leavers seeking work is seasonal variation, while an increase in unemployment due to an economic recession is non-seasonal.
Most economic analysts who study unemployment data are more interested in the non-seasonal variation. Consequently, employment data (and many other economic series) are usually seasonally adjusted.
us_retail_employment %>% autoplot(Employed, color='gray') + autolayer(components(dcmp), season_adjust, color='#0072B2') + labs(y="Persons (thousands)", title="Total employment in US retail")
The classical method of time series decomposition originated in the 1920s and was widely used until the 1950s.
It still forms the basis of many time series decomposition methods, so it is important to understand how it works.
The first step in a classical decomposition is to use a moving average method to estimate the trend-cycle.
A moving average of order \(m\) can be written as:
\[ \hat{T}_t = \frac{1}{T}\sum_{j=-k}^ky_{t+j} \]
Where \(m=2k+1\). That is, the estimate of the trend-cycle at time \(t\) is obtained by averaging values of the time series within \(k\) periods of \(t\).
The average eliminates some of the randomness in the data, leaving a smooth trend-cycle component. We call this an
\(m-MA\), meaning a moving average of order \(m\).
global_economy %>% filter(Country == "Australia") %>% autoplot(Exports) + labs(y = "% of GDP", title = "Total Australian exports")
slide_dbl() from the slider package which applies a function to sliding/moving time windowsmean() function.aus_exports <- global_economy %>%
filter(Country == "Australia") %>%
select(Exports)%>%
mutate(`5-MA` = slider::slide_dbl(Exports, mean,
.before = 2, .after = 2, .complete = TRUE))
aus_exports
## # A tsibble: 58 x 3 [1Y] ## Exports Year `5-MA` ## <dbl> <dbl> <dbl> ## 1 13.0 1960 NA ## 2 12.4 1961 NA ## 3 13.9 1962 13.5 ## 4 13.0 1963 13.5 ## 5 14.9 1964 13.6 ## 6 13.2 1965 13.4 ## 7 12.9 1966 13.3 ## 8 12.9 1967 12.7 ## 9 12.3 1968 12.6 ## 10 12.0 1969 12.6 ## # … with 48 more rows
5-MA provides an estimate of the trend-cycle using a moving average of 5aus_exports %>%
autoplot(Exports) +
geom_line(aes(y = `5-MA`), colour = "#D55E00") +
labs(y = "% of GDP",
title = "Total Australian exports") +
guides(colour = guide_legend(title = "series"))
Notice that the trend-cycle (in orange) is smoother than the original data and captures the main movement of the time series without all of the minor fluctuations.
The order of the moving average determines the smoothness of the trend-cycle estimate.
In general, a larger order means a smoother curve.
aus_exports %>%
autoplot(Exports) +
geom_line(aes(y = `7-MA`), colour = "#0072B2") +
labs(y = "% of GDP",
title = "Total Australian exports - Moving Average 7") +
guides(colour = guide_legend(title = "series"))
aus_exports %>%
autoplot(Exports) +
geom_line(aes(y = `9-MA`), colour = "#009E73") +
labs(y = "% of GDP",
title = "Total Australian exports - Moving Average 9") +
guides(colour = guide_legend(title = "series"))
Simple moving averages such as these are usually of an odd order (e.g., 3, 5, 7, etc.).
This is so they are symmetric: in a moving average of order \(m=2k+1\), the middle observation, and
\(k\) observations on either side, are averaged.
But if \(m\) was even, it would no longer be symmetric.
It is possible to apply a moving average to a moving average.
One reason for doing this is to make an even-order moving average symmetric.
For example, we might take a moving average of order 4, and then apply another moving average of order 2.
beer <- aus_production %>%
filter(year(Quarter) >= 1992) %>%
select(Quarter, Beer)
beer_ma <- beer %>%
mutate(`4-MA` = slider::slide_dbl(Beer, mean,.before = 1, .after = 2, .complete = TRUE),
`2x4-MA` = slider::slide_dbl(`4-MA`, mean,.before = 1, .after = 0, .complete = TRUE))
beer_ma
## # A tsibble: 74 x 4 [1Q] ## Quarter Beer `4-MA` `2x4-MA` ## <qtr> <dbl> <dbl> <dbl> ## 1 1992 Q1 443 NA NA ## 2 1992 Q2 410 451. NA ## 3 1992 Q3 420 449. 450 ## 4 1992 Q4 532 452. 450. ## 5 1993 Q1 433 449 450. ## 6 1993 Q2 421 444 446. ## 7 1993 Q3 410 448 446 ## 8 1993 Q4 512 438 443 ## 9 1994 Q1 449 441. 440. ## 10 1994 Q2 381 446 444. ## # … with 64 more rows
2×4-MA in the last column means a 4-MA followed by a 2-MA.2×4-MA becomes a weighted average of observations that is symmetric\[ \hat{T}_t = \frac{1}{2}[\frac{1}{4}(y_{t-2}+y_{t-1}+y_{t}+y_{t+1})+\frac{1}{4}(y_{t-1}+y_{t}+y_{t+1}+y_{t+2})] \] \[ \hat{T}_t = \frac{1}{8}y_{t-2}+\frac{1}{4}y_{t-1}+\frac{1}{4}y_{t}+\frac{1}{4}y_{t+1}+\frac{1}{8}y_{t+2} \] - Other combinations of moving averages are also possible.
In general, a 2×m-MA is equivalent to a weighted moving average of order \(m+1\) where all observations take the weight \(1/m\), except for the first and last terms which takes weights \(1/(2m)\).
So, if the seasonal period is even and of order \(m\), we use a 2×m-MAto estimate the trend-cycle.
If the seasonal period is odd and of of order \(m\), we use a m-MAto estimate the trend-cycle.
For example, a 2×12-MAcan be used to estimate the trend-cycle of monthly data with annual seasonality
7-MA can be used to estimate the trend-cycle of daily data with a weekly seasonality.
Let’s revisit the US unemployment data without using the stl() methodology.
It is a monthly data with annual seasonality - So what could be order of the moving average to account for seasonality?
2x12-MAus_retail_employment_ma <- us_retail_employment %>%
mutate(
`12-MA` = slider::slide_dbl(Employed, mean,
.before = 5, .after = 6, .complete = TRUE),
`2x12-MA` = slider::slide_dbl(`12-MA`, mean,
.before = 1, .after = 0, .complete = TRUE)
)
us_retail_employment_ma %>% autoplot(Employed, colour = "gray") +
geom_line(aes(y = `2x12-MA`), colour = "#D55E00") + labs(y = "Persons (thousands)",
title = "Total employment in US retail")
The smooth line shows no seasonality; it is almost the same as the trend-cycle shown in the stl() function
Any other choice for the order of the moving average (except for 24, 36, etc.) would have resulted in a smooth line that showed some seasonal fluctuations.
A major advantage of weighted moving averages is that they yield a smoother estimate of the trend-cycle.
Instead of observations entering and leaving the calculation at full weight, their weights slowly increase and then slowly decrease, resulting in a smoother curve.
Classical method originated in 1920s, and it forms the starting point for most other methods of time series decomposition
There are two classical decomposition: 1 - Additive 2 - Multiplicative
Let’s assume a time series with seasonal period \(m\) (e.g, m=4 for quartely data, m=12 for monthly data, etc)
We assume that the seasonal component is constant from year to year
us_retail_employment %>%
model(classical_decomposition(Employed, type = "additive")) %>%
components() %>% autoplot() +
labs(title = "Classical additive decomposition of total
US retail employment")
Advantages
Disadvantages
Advantages
Disadvantages
us_retail_employment %>%
model(STL(Employed ~ season(window=9), robust=TRUE)) %>%
components() %>% autoplot() +
labs(title = "STL decomposition: US retail employment")
us_retail_employment %>%
model(STL(Employed ~ season(window=7), robust=TRUE)) %>%
components() %>% autoplot() +
labs(title = "STL decomposition: US retail employment")
us_retail_employment %>%
model(STL(Employed ~ season(window=5), robust=TRUE)) %>%
components() %>% autoplot() +
labs(title = "STL decomposition: US retail employment")
us_retail_employment %>%
model(STL(Employed ~ trend(window=5) + season(window="periodic"), robust = TRUE)) %>%
components() %>% autoplot() +
labs(title = "STL decomposition: US retail employment")
trend(window = ?) and the seasonal window season(window = ?).trend window is the number of consecutive observations to be used when estimating the trend-cycleseason window is the number of consecutive years to be used in estimating each value in the seasonal componentseason(window='periodic') - is equivalent to an infinite window - identical across yearsBy default, the STL() function provides a convenient automated STL decomposition using a seasonal window of season(window=13), and the trend window chosen automatically from the seasonal period.
The default setting for monthly data is trend(window=21).
But, as with any automated procedure, the default settings will need adjusting for some time series.