Time Series - Decompositions - Wk 3

Matt

2/5/2021

Setup

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(tidyverse)
library(purrr)
library(fpp3)
library(gganimate)
library(rmdformats)

Global Economy Dataset

global_economy # A tsibble embedded in R

## # A tsibble: 15,150 x 9 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # … with 15,140 more rows

Filter Australia

global_economy %>%
  filter(Country == "Australia") %>%
  autoplot(GDP)

global_economy %>%
  filter(Country == "Australia") %>%
  mutate(mean(GDP))

## # A tsibble: 58 x 10 [1Y]
## # Key:       Country [1]
##    Country Code   Year     GDP Growth   CPI Imports Exports Population
##    <fct>   <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Austra… AUS    1960 1.86e10  NA     7.96    14.1    13.0   10276477
##  2 Austra… AUS    1961 1.96e10   2.49  8.14    15.0    12.4   10483000
##  3 Austra… AUS    1962 1.99e10   1.30  8.12    12.6    13.9   10742000
##  4 Austra… AUS    1963 2.15e10   6.21  8.17    13.8    13.0   10950000
##  5 Austra… AUS    1964 2.38e10   6.98  8.40    13.8    14.9   11167000
##  6 Austra… AUS    1965 2.59e10   5.98  8.69    15.3    13.2   11388000
##  7 Austra… AUS    1966 2.73e10   2.38  8.98    15.1    12.9   11651000
##  8 Austra… AUS    1967 3.04e10   6.30  9.29    13.9    12.9   11799000
##  9 Austra… AUS    1968 3.27e10   5.10  9.52    14.5    12.3   12009000
## 10 Austra… AUS    1969 3.66e10   7.04  9.83    13.3    12.0   12263000
## # … with 48 more rows, and 1 more variable: `mean(GDP)` <dbl>

# Finding the average
mean(global_economy$GDP[global_economy$Country == "Australia"])

## [1] 417399409147

We can see how the average GDP is around 4.17m which is right before it line boosts up
- How might we want to think about analyzing this data?
  - we may want to think of some form of relevant adjustment.. we know there are more people there over time too, so maybe we think of Per capita

First Adjustment: Per Capita Adjustments

# Normalizing by Per-capitizing
global_economy %>%
  filter(Country == "Australia") %>%
  autoplot(GDP/Population)

Highest GDP per Capita

global_economy %>%
  filter(Year == "2016" | Year == "2017") %>%
  mutate(PerCap = GDP/Population) %>% 
  arrange(desc(PerCap))

## # A tsibble: 524 x 10 [1Y]
## # Key:       Country [262]
##    Country   Code   Year      GDP Growth   CPI Imports Exports Population PerCap
##    <fct>     <fct> <dbl>    <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>  <dbl>
##  1 Monaco    MCO    2016  6.47e 9  3.21   NA      NA      NA        38499 1.68e5
##  2 Liechten… LIE    2016  6.21e 9 NA      NA      NA      NA        37666 1.65e5
##  3 Luxembou… LUX    2017  6.24e10  2.30  111.    194.    230.      599449 1.04e5
##  4 Luxembou… LUX    2016  5.86e10  3.08  110.    186.    221.      582014 1.01e5
##  5 Macao SA… MAC    2017  5.04e10  9.10  136.     32.0    79.4     622567 8.09e4
##  6 Switzerl… CHE    2017  6.79e11  1.09   98.3    53.9    65.0    8466017 8.02e4
##  7 Switzerl… CHE    2016  6.69e11  1.38   97.7    54.6    65.8    8373338 7.99e4
##  8 Isle of … IMN    2016  6.59e 9  7.4    NA      NA      NA        83737 7.87e4
##  9 Norway    NOR    2017  3.99e11  1.92  115.     33.1    35.5    5282223 7.55e4
## 10 Macao SA… MAC    2016  4.53e10 -0.863 134.     34.5    76.1     612167 7.40e4
## # … with 514 more rows

Second Adjustment: Inflation Adjustment

often FRED or world data supplies inflation adjuster

print_retail <- aus_retail %>%
  filter(Industry == "Newspaper and book retailing") %>% # Looking at Print media industry
  group_by(Industry) %>%
  index_by(Year = year(Month)) %>% # Index by year
  summarise(Turnover = sum(Turnover)) # Turnover = Same as revenue

aus_economy <- filter(global_economy, Code == "AUS")


# Create inflation adjusted data & Plot it

print_retail %>% 
  left_join(aus_economy, by = "Year") %>%
  mutate(Adj_turnover = Turnover / CPI) %>% # adjusted turnover = turnover / consumer price index
  pivot_longer(c(Turnover, Adj_turnover),
               names_to = "Type", values_to = "Turnover") %>%
  ggplot(aes(Year, Turnover)) + geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  xlab("Years") +ylab(NULL) +
  ggtitle("Turnover: Aus print media industry")

Mathematical Transformations

library(tidyquant)

SnP500 <-  c("^GSPC") %>% tq_get(., from = "2019-01-01")

SnP500 %>% as_tsibble(., index = date) %>% autoplot()

# Lets take it back further

SnP500 <-  c("^GSPC") %>% tq_get(., from = "1990-01-01")

SnP500 %>% as_tsibble(., index = date) %>% autoplot()

Mathematical Tranformations

food <- aus_retail %>%
  filter(Industry == "Food retailing") %>%
  summarize(Turnover = sum(Turnover))

food %>% autoplot()

Looking at this data on a monthly basis, back to 1980
- Clearly an increase over time (level of volatility changing)
- Seasonal pattern over time
  - Australian’s start to spend more money over time in the retail sector
What we want to do:
- Smooth this out

Square Root

The least powerful transformation
Still a lot of volatility (not as extreme)
Can’t do negative numbers

food %>%
  autoplot(sqrt(Turnover)) +
  labs(y = "Square root turnover")

Cube Root

2nd most powerful (right in the middle)
Same thing, more effective but still volatility
Can’t do negative numbers

food %>%
  autoplot(Turnover^(1/3)) +
  labs(y = "Cube Root Turnover")

Log Transformation

Most powerful of the 3
Now the trend is constant over time
Doesn’t take 0’s well, if we have this issue… use the logp1() function

food %>%
  autoplot(log(Turnover)) +
  labs(y = "Log Turnover")

Box-Cox Transformation

How to solve for Lambda?
- Guerrero!!

food %>%
  features(Turnover, features = guerrero)

## # A tibble: 1 x 1
##   lambda_guerrero
##             <dbl>
## 1          0.0524

food %>%
  autoplot(box_cox(Turnover, 0.05241123)) +
  labs(y = "Box-Cox Transformed Turnover")

The optimal lambda = 0.052

Homework for practice

For the following TS, find the appropriate transformation in order to stabalize the variance

US GDP from global_economy
Slaughter of Vitorian bulls, bullocks, and steers from aus_livestock
Electricity demand from vic_elec
gas production from aus_production

Why is a Box-Cox transformation unhelpful for canadian_gas data?

DeSeasoning

# We can grab this first chunk of code from FRED
us_retail_employment <- us_employment %>% 
  filter(year(Month) >= 1990, Title == "Retail Trade") %>%
  select(-Series_ID)

dcmp <- us_retail_employment %>%
  model(STL(Employed))

# Original US Retail Employement data
autoplot(us_retail_employment, Employed)

# Seasonally adjusted Data
## We want to pick this data apart, have the trend and the cycle distinct from the season
autoplot(us_retail_employment, Employed, color = "gray") +
  autolayer(components(dcmp), season_adjust, color = "blue") + # Season_adjust comes from the decomposition
  labs(y = "Persons (thousands", title = "Total employment in US retail")

Moving Averages

aus_exports <- global_economy %>%
  filter(Country == "Australia") %>% 
  select(Exports, Year)

head(aus_exports)

## # A tsibble: 6 x 2 [1Y]
##   Exports  Year
##     <dbl> <dbl>
## 1    13.0  1960
## 2    12.4  1961
## 3    13.9  1962
## 4    13.0  1963
## 5    14.9  1964
## 6    13.2  1965

autoplot(aus_exports)

aus_exports <- global_economy %>%
  filter(Country == "Australia") %>%
  mutate(
    '5-MA' = slider::slide_dbl(Exports, # Pull from Exports 
                               mean, # Calculate the mean for the Moving average
                               .before = 2, # Take the 2 observations before
                               .after = 2, # And the 2 observations after
                               .complete = TRUE) # for moving AVG
  )

head(aus_exports)

## # A tsibble: 6 x 10 [1Y]
## # Key:       Country [1]
##   Country   Code   Year       GDP Growth   CPI Imports Exports Population `5-MA`
##   <fct>     <fct> <dbl>     <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>  <dbl>
## 1 Australia AUS    1960   1.86e10  NA     7.96    14.1    13.0   10276477   NA  
## 2 Australia AUS    1961   1.96e10   2.49  8.14    15.0    12.4   10483000   NA  
## 3 Australia AUS    1962   1.99e10   1.30  8.12    12.6    13.9   10742000   13.5
## 4 Australia AUS    1963   2.15e10   6.21  8.17    13.8    13.0   10950000   13.5
## 5 Australia AUS    1964   2.38e10   6.98  8.40    13.8    14.9   11167000   13.6
## 6 Australia AUS    1965   2.59e10   5.98  8.69    15.3    13.2   11388000   13.4

# Note: the information is slightly deceptive because it is taking numbers from after the observed date


aus_exports <- global_economy %>%
  filter(Country == "Australia") %>% 
  select(Exports, Year) %>%  
  mutate(
    '5-MA' = slider::slide_dbl(Exports,  
                               mean, 
                               .before = 2, 
                               .after = 2, 
                               .complete = TRUE))

head(aus_exports) # Notice there are 2 missing observations in the very beginning

## # A tsibble: 6 x 3 [1Y]
##   Exports  Year `5-MA`
##     <dbl> <dbl>  <dbl>
## 1    13.0  1960   NA  
## 2    12.4  1961   NA  
## 3    13.9  1962   13.5
## 4    13.0  1963   13.5
## 5    14.9  1964   13.6
## 6    13.2  1965   13.4

tail(aus_exports) # And 2 missing at the end

## # A tsibble: 6 x 3 [1Y]
##   Exports  Year `5-MA`
##     <dbl> <dbl>  <dbl>
## 1    21.5  2012   20.8
## 2    20.0  2013   20.8
## 3    21.1  2014   20.4
## 4    20.0  2015   20.3
## 5    19.3  2016   NA  
## 6    21.3  2017   NA

Always move in odd windows because it doesn’t alwasy split evenly in your window
- Example… 3 before & 0 after is a total of 4 observations…
  - Or 1 before and 2 after is 4 observations (it is uneven)

# Now for 5 year moving average
aus_exports <- global_economy %>%
  filter(Country == "Australia") %>% 
  select(Exports, Year) %>%  
  mutate(
    `5-MA` = slider::slide_dbl(Exports,  
                               mean, 
                               .before = 4, 
                               .after = 0, 
                               .complete = TRUE))


head(aus_exports) # Missing the 4 initial observations

## # A tsibble: 6 x 3 [1Y]
##   Exports  Year `5-MA`
##     <dbl> <dbl>  <dbl>
## 1    13.0  1960   NA  
## 2    12.4  1961   NA  
## 3    13.9  1962   NA  
## 4    13.0  1963   NA  
## 5    14.9  1964   13.5
## 6    13.2  1965   13.5

tail(aus_exports) # Not missing any end observations

## # A tsibble: 6 x 3 [1Y]
##   Exports  Year `5-MA`
##     <dbl> <dbl>  <dbl>
## 1    21.5  2012   21.2
## 2    20.0  2013   21.2
## 3    21.1  2014   20.8
## 4    20.0  2015   20.8
## 5    19.3  2016   20.4
## 6    21.3  2017   20.3

autoplot(aus_exports, Exports) +
  autolayer(aus_exports, `5-MA`, color = "red") + # Note that 5-MA uses tick marks
  labs(y = "Exports (% of GDP)", title = "Total Australian exports") +
  guides(colour = guide_legend(title = "series"))

Taking the Moving AVG of our Moving AVG

aus_exports <- global_economy %>%
  filter(Country == "Australia") %>% 
  select(Exports, Year) %>%  
  mutate(
    '5-MA' = slider::slide_dbl(Exports,  
                               mean, 
                               .before = 2, 
                               .after = 2, 
                               .complete = TRUE))

# Take running average of our 5 Year window
aus_exports2 <- aus_exports %>% 
  mutate(`2x5-MA` = slider::slide_dbl(`5-MA`, mean, .before = 1, .after = 0, .complete = TRUE)
  )


head(aus_exports2,10)

## # A tsibble: 10 x 4 [1Y]
##    Exports  Year `5-MA` `2x5-MA`
##      <dbl> <dbl>  <dbl>    <dbl>
##  1    13.0  1960   NA       NA  
##  2    12.4  1961   NA       NA  
##  3    13.9  1962   13.5     NA  
##  4    13.0  1963   13.5     13.5
##  5    14.9  1964   13.6     13.6
##  6    13.2  1965   13.4     13.5
##  7    12.9  1966   13.3     13.3
##  8    12.9  1967   12.7     13.0
##  9    12.3  1968   12.6     12.6
## 10    12.0  1969   12.6     12.6

# Plot
autoplot(aus_exports2, Exports) +
  autolayer(aus_exports2, `5-MA`, color = "red") +
  autolayer(aus_exports2, `2x5-MA`, color = "blue") +
  labs(y = "Exports (% of GDP)", title = "Total Australian exports") +
  guides(colour = guide_legend(title = "series"))

Red (5-MA) is slightly jaggedy
Blue (2x5-MA) smooths the average a little bit

This is called a double moving average

us_retail_employment <- us_employment %>%
  filter(year(Month) >= 1990, Title == "Retail Trade") %>%
  select(-Series_ID)

us_retail_employment %>% head(20)

## # A tsibble: 20 x 3 [1M]
##       Month Title        Employed
##       <mth> <chr>           <dbl>
##  1 1990 Jan Retail Trade   13256.
##  2 1990 Feb Retail Trade   12966.
##  3 1990 Mar Retail Trade   12938.
##  4 1990 Apr Retail Trade   13012.
##  5 1990 May Retail Trade   13108.
##  6 1990 Jun Retail Trade   13183.
##  7 1990 Jul Retail Trade   13170.
##  8 1990 Aug Retail Trade   13160.
##  9 1990 Sep Retail Trade   13113.
## 10 1990 Oct Retail Trade   13185.
## 11 1990 Nov Retail Trade   13462.
## 12 1990 Dec Retail Trade   13673.
## 13 1991 Jan Retail Trade   13068.
## 14 1991 Feb Retail Trade   12744.
## 15 1991 Mar Retail Trade   12684.
## 16 1991 Apr Retail Trade   12687 
## 17 1991 May Retail Trade   12781.
## 18 1991 Jun Retail Trade   12859 
## 19 1991 Jul Retail Trade   12849.
## 20 1991 Aug Retail Trade   12871.

us_retail_employment %>%
  autoplot(Employed) +
  xlab("Year") + ylab("Persons (thousands)") +
  ggtitle("Total employment in US retail")

The volatility is relatively constant, meaning we can continue with an Additive Decomposition

Decomposition

Additive Decomposition

USREADC <- us_retail_employment %>% # Data we are interested in
  model(classical_decomposition(Employed, # Sub-data we are interested in
                                type = "additive")) %>%
  components() # Show me the components "so I can see what it looks like"


# Monthly, Quarterly, Annual data works in this situation, anything less than monthly won't 
## Too many missing values/calender adjustments, etc.

USREADC

## # A dable:                 357 x 7 [1M]
## # Key:                     .model [1]
## # Classical Decomposition: Employed = trend + seasonal + random
##    .model                    Month Employed  trend seasonal random season_adjust
##    <chr>                     <mth>    <dbl>  <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposit… 1990 Jan   13256.    NA    -75.5   NA           13331.
##  2 "classical_decomposit… 1990 Feb   12966.    NA   -273.    NA           13239.
##  3 "classical_decomposit… 1990 Mar   12938.    NA   -253.    NA           13191.
##  4 "classical_decomposit… 1990 Apr   13012.    NA   -190.    NA           13203.
##  5 "classical_decomposit… 1990 May   13108.    NA    -88.9   NA           13197.
##  6 "classical_decomposit… 1990 Jun   13183.    NA    -10.4   NA           13193.
##  7 "classical_decomposit… 1990 Jul   13170. 13178.   -13.3    5.65        13183.
##  8 "classical_decomposit… 1990 Aug   13160. 13161.    -9.99   8.80        13169.
##  9 "classical_decomposit… 1990 Sep   13113. 13141.   -87.4   59.9         13201.
## 10 "classical_decomposit… 1990 Oct   13185. 13117.    34.6   33.8         13151.
## # … with 347 more rows

These Decompositions use some type of slider window calculations to figure out..
1. What does a seasonal component look like
- From the moving window, it calculates & averages the figure for each Jan,Feb,etc. and caluclates, how far we are from the average we do this to get the seasonal number
1. What does a trend-cycle component look like
- The decomposition is going to calculate what the trend is
- It calculates it on a given window
  - By default for monthly data is size 12 (for 12 months)
  - For daily data with a weekly pattern it would be 7
  - Quartly data = size 4
- Using these windows, lets calculate what a trend looks like
- As the windows get smaller, the data gets more jagged, as window gets bigger, it becomes more smooth
1. Then net of those two things, what is left over

USREADC %>% head(10)

## # A dable:                 10 x 7 [1M]
## # Key:                     .model [1]
## # Classical Decomposition: Employed = trend + seasonal + random
##    .model                    Month Employed  trend seasonal random season_adjust
##    <chr>                     <mth>    <dbl>  <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposit… 1990 Jan   13256.    NA    -75.5   NA           13331.
##  2 "classical_decomposit… 1990 Feb   12966.    NA   -273.    NA           13239.
##  3 "classical_decomposit… 1990 Mar   12938.    NA   -253.    NA           13191.
##  4 "classical_decomposit… 1990 Apr   13012.    NA   -190.    NA           13203.
##  5 "classical_decomposit… 1990 May   13108.    NA    -88.9   NA           13197.
##  6 "classical_decomposit… 1990 Jun   13183.    NA    -10.4   NA           13193.
##  7 "classical_decomposit… 1990 Jul   13170. 13178.   -13.3    5.65        13183.
##  8 "classical_decomposit… 1990 Aug   13160. 13161.    -9.99   8.80        13169.
##  9 "classical_decomposit… 1990 Sep   13113. 13141.   -87.4   59.9         13201.
## 10 "classical_decomposit… 1990 Oct   13185. 13117.    34.6   33.8         13151.

USREADC %>% tail(10)

## # A dable:                 10 x 7 [1M]
## # Key:                     .model [1]
## # Classical Decomposition: Employed = trend + seasonal + random
##    .model                    Month Employed  trend seasonal random season_adjust
##    <chr>                     <mth>    <dbl>  <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposit… 2018 Dec   16309. 15816.   573.    -79.7        15736.
##  2 "classical_decomposit… 2019 Jan   15754. 15810.   -75.5    18.8        15829.
##  3 "classical_decomposit… 2019 Feb   15567. 15804.  -273.     36.7        15840.
##  4 "classical_decomposit… 2019 Mar   15577. 15797.  -253.     32.5        15830.
##  5 "classical_decomposit… 2019 Apr   15625.    NA   -190.     NA          15815.
##  6 "classical_decomposit… 2019 May   15692.    NA    -88.9    NA          15781.
##  7 "classical_decomposit… 2019 Jun   15776.    NA    -10.4    NA          15786.
##  8 "classical_decomposit… 2019 Jul   15786.    NA    -13.3    NA          15799.
##  9 "classical_decomposit… 2019 Aug   15750.    NA     -9.99   NA          15759.
## 10 "classical_decomposit… 2019 Sep   15611.    NA    -87.4    NA          15699.

We notice the trend is missing for the first 6 values & last 6 values
- So now we have a 13 month window
  - 6 before + given data point + 6 after = 13
- Takes the moving average of this window and calculates the trend

What does the Additive Decomp look like

autoplot(USREADC) +
  labs(title = "Classical additive decomposition of total US retail employment")

Four Panels:
1. Original data
2. Trend-Cycle
3. Seasonal
- Drops off at beginning of the year
- Raises up
- Drops off
- Jumps up
- Drops again
1. Random
- Should get rid of seasonality and trend, but if there is something worth looking at, dive into it
- Pay attention to volatility

Multiplicative Decomposition

USREMDC <- us_retail_employment %>% 
  model(classical_decomposition(Employed,
                                type = "multiplicative")) %>%
  components()




USREMDC

## # A dable:                 357 x 7 [1M]
## # Key:                     .model [1]
## # Classical Decomposition: Employed = trend * seasonal * random
##    .model                    Month Employed  trend seasonal random season_adjust
##    <chr>                     <mth>    <dbl>  <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposit… 1990 Jan   13256.    NA     0.995  NA           13323.
##  2 "classical_decomposit… 1990 Feb   12966.    NA     0.981  NA           13211.
##  3 "classical_decomposit… 1990 Mar   12938.    NA     0.983  NA           13166.
##  4 "classical_decomposit… 1990 Apr   13012.    NA     0.987  NA           13184.
##  5 "classical_decomposit… 1990 May   13108.    NA     0.994  NA           13189.
##  6 "classical_decomposit… 1990 Jun   13183.    NA     0.999  NA           13193.
##  7 "classical_decomposit… 1990 Jul   13170. 13178.    0.999   1.00        13183.
##  8 "classical_decomposit… 1990 Aug   13160. 13161.    0.999   1.00        13168.
##  9 "classical_decomposit… 1990 Sep   13113. 13141.    0.994   1.00        13190.
## 10 "classical_decomposit… 1990 Oct   13185. 13117.    1.00    1.00        13153.
## # … with 347 more rows

Now the model is taking the multiplicative
1. First step in the model is to take the log()
2. Estimate the classical additive decomp on the logarithm
Seasonal estimates are going to be slightly different
- Much higher in December (higher than 1) lower than 1 in other months

USREMDC %>% head(10)

## # A dable:                 10 x 7 [1M]
## # Key:                     .model [1]
## # Classical Decomposition: Employed = trend * seasonal * random
##    .model                    Month Employed  trend seasonal random season_adjust
##    <chr>                     <mth>    <dbl>  <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposit… 1990 Jan   13256.    NA     0.995  NA           13323.
##  2 "classical_decomposit… 1990 Feb   12966.    NA     0.981  NA           13211.
##  3 "classical_decomposit… 1990 Mar   12938.    NA     0.983  NA           13166.
##  4 "classical_decomposit… 1990 Apr   13012.    NA     0.987  NA           13184.
##  5 "classical_decomposit… 1990 May   13108.    NA     0.994  NA           13189.
##  6 "classical_decomposit… 1990 Jun   13183.    NA     0.999  NA           13193.
##  7 "classical_decomposit… 1990 Jul   13170. 13178.    0.999   1.00        13183.
##  8 "classical_decomposit… 1990 Aug   13160. 13161.    0.999   1.00        13168.
##  9 "classical_decomposit… 1990 Sep   13113. 13141.    0.994   1.00        13190.
## 10 "classical_decomposit… 1990 Oct   13185. 13117.    1.00    1.00        13153.

USREMDC %>% tail(10)

## # A dable:                 10 x 7 [1M]
## # Key:                     .model [1]
## # Classical Decomposition: Employed = trend * seasonal * random
##    .model                    Month Employed  trend seasonal random season_adjust
##    <chr>                     <mth>    <dbl>  <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposit… 2018 Dec   16309. 15816.    1.04   0.992        15696.
##  2 "classical_decomposit… 2019 Jan   15754. 15810.    0.995  1.00         15834.
##  3 "classical_decomposit… 2019 Feb   15567. 15804.    0.981  1.00         15861.
##  4 "classical_decomposit… 2019 Mar   15577. 15797.    0.983  1.00         15851.
##  5 "classical_decomposit… 2019 Apr   15625.    NA     0.987 NA            15831.
##  6 "classical_decomposit… 2019 May   15692.    NA     0.994 NA            15788.
##  7 "classical_decomposit… 2019 Jun   15776.    NA     0.999 NA            15787.
##  8 "classical_decomposit… 2019 Jul   15786.    NA     0.999 NA            15801.
##  9 "classical_decomposit… 2019 Aug   15750.    NA     0.999 NA            15760.
## 10 "classical_decomposit… 2019 Sep   15611.    NA     0.994 NA            15703.

What does the Multiplicative Decomp look like

autoplot(USREMDC) +
  labs(title = "Classical Multiplicative decomposition of total US retail employment")

STL - Seasonal & Trend Decomposition using Low S

dcmp <- us_retail_employment %>%
  model(stl = STL(Employed))


components(dcmp)

## # A dable:           357 x 7 [1M]
## # Key:               .model [1]
## # STL Decomposition: Employed = trend + season_year + remainder
##    .model    Month Employed  trend season_year remainder season_adjust
##    <chr>     <mth>    <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
##  1 stl    1990 Jan   13256. 13291.       -38.1    3.08          13294.
##  2 stl    1990 Feb   12966. 13272.      -261.   -44.2           13227.
##  3 stl    1990 Mar   12938. 13252.      -291.   -23.0           13229.
##  4 stl    1990 Apr   13012. 13233.      -221.     0.0892        13233.
##  5 stl    1990 May   13108. 13213.      -115.     9.98          13223.
##  6 stl    1990 Jun   13183. 13193.       -25.6   15.7           13208.
##  7 stl    1990 Jul   13170. 13173.       -24.4   22.0           13194.
##  8 stl    1990 Aug   13160. 13152.       -11.8   19.5           13171.
##  9 stl    1990 Sep   13113. 13131.       -43.4   25.7           13157.
## 10 stl    1990 Oct   13185. 13110.        62.5   12.3           13123.
## # … with 347 more rows

Decompose into Trend and Season
- These seasons are going to be averaged over moving windows
Seasonally adjusted part
- Trend plus Remainder

What does this STL Decomposition look like?

us_retail_employment %>%
  autoplot(Employed, color='gray') +
  autolayer(components(dcmp), trend, color='red') +
  xlab("Year") + ylab("Persons (thousands)") +
  ggtitle("Total employment in US retail")

# Looking at the Seasonal Component
us_retail_employment %>%
  autoplot(Employed, color='gray') +
  autolayer(components(dcmp), season_year, color='red')

Red is general season
Grey is OG data

Playing with the Window Feature

dcmp25 <- us_retail_employment %>%
  model(stl = STL(Employed ~ trend(window = 25))) # Updating the Trend to operate over 25 windows

components(dcmp25)

## # A dable:           357 x 7 [1M]
## # Key:               .model [1]
## # STL Decomposition: Employed = trend + season_year + remainder
##    .model    Month Employed  trend season_year remainder season_adjust
##    <chr>     <mth>    <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
##  1 stl    1990 Jan   13256. 13300.       -38.3     -5.63        13294.
##  2 stl    1990 Feb   12966. 13278.      -261.     -51.7         13227.
##  3 stl    1990 Mar   12938. 13257.      -290.     -29.1         13228.
##  4 stl    1990 Apr   13012. 13236.      -219.      -4.85        13231.
##  5 stl    1990 May   13108. 13214.      -113.       7.39        13221.
##  6 stl    1990 Jun   13183. 13192.       -24.9     15.5         13208.
##  7 stl    1990 Jul   13170. 13170.       -24.3     24.1         13194.
##  8 stl    1990 Aug   13160. 13148.       -12.4     23.8         13172.
##  9 stl    1990 Sep   13113. 13126.       -44.7     32.1         13158.
## 10 stl    1990 Oct   13185. 13104.        60.5     20.8         13125.
## # … with 347 more rows

Adjusting Windows & Comparing

dcmp <- us_retail_employment %>%
  model(stl = STL(Employed))

us_retail_employment %>%
  autoplot(Employed, color='gray') +
  autolayer(components(dcmp), trend, color='red')

# 25 Window Decomp visual
us_retail_employment %>%
  autoplot(Employed, color='gray') +
  autolayer(components(dcmp25), trend, color='red')

# Can do this for Season too
#dcmpS <- us_retail_employment %>%
#  model(stl = STL(Employed ~ Season(window = 25)))

#us_retail_employment %>%
#  autoplot(Employed, color='gray') +
#  autolayer(components(dcmpS), season_year, color='red')

Graphic trend line a whole lot smoother because we took the average over 25 windows

STL Decomposition

components(dcmp) %>% autoplot() + xlab("Year")

Appears to be higher levels of seasonality from 1990 - 2005 then it levels out a little bit
Local regression of the moving average windows throught the data

Visualizations w/ Core Decomp Components

components(dcmp) %>% gg_subseries(season_year)

This breaks down the season_year components into individual months & Shows us that thye look like as a time series
In september/october over time the size of employement effectas appears to be dropping

Seasonally Adjusted Data

us_retail_employment %>%
  autoplot(Employed, color='gray') +
  autolayer(components(dcmp), season_adjust, color='blue') +
  xlab("Year") + ylab("Persons (thousands)") +
  ggtitle("Total employment in US retail")

Broader trend in Unemployment
- Taking seasonal data out of this
- Not a plot on the trend, but the seasonally adjusted data

X-11 Decomposition

X11_dcmp <- us_retail_employment %>%
    model(seats = feasts:::X11(Employed, type = "additive")) %>%
    components()

X11_dcmp

## # A dable:           357 x 7 [1M]
## # Key:               .model [1]
## # X11 Decomposition: Employed = trend + seasonal + irregular
##    .model    Month Employed  trend seasonal irregular season_adjust
##    <chr>     <mth>    <dbl>  <dbl>    <dbl>     <dbl>         <dbl>
##  1 seats  1990 Jan   13256. 13260.   -20.5     16.0          13276.
##  2 seats  1990 Feb   12966. 13248.  -253.     -29.1          13219.
##  3 seats  1990 Mar   12938. 13237.  -291.      -7.47         13229.
##  4 seats  1990 Apr   13012. 13227.  -217.       2.31         13229.
##  5 seats  1990 May   13108. 13217.  -111.       2.40         13219.
##  6 seats  1990 Jun   13183. 13204.   -21.0     -0.192        13204.
##  7 seats  1990 Jul   13170. 13186.   -21.1      5.09         13191.
##  8 seats  1990 Aug   13160. 13167.    -2.20    -5.18         13162.
##  9 seats  1990 Sep   13113. 13150.   -33.0     -3.86         13146.
## 10 seats  1990 Oct   13185. 13136.    52.4     -2.87         13133.
## # … with 347 more rows

Irregular = Remainder
Looking at the seasonal trend

X11 Decomposition

autoplot(X11_dcmp)

SEATS Decomposition

seats_dcmp <- us_retail_employment %>%
    model(seats = feasts:::SEATS(Employed)) %>%
    components()

seats_dcmp

## # A dable:                       357 x 7 [1M]
## # Key:                           .model [1]
## # X-13ARIMA-SEATS Decomposition: Employed = trend * seasonal * irregular
##    .model    Month Employed  trend seasonal irregular season_adjust
##    <chr>     <mth>    <dbl>  <dbl>    <dbl>     <dbl>         <dbl>
##  1 seats  1990 Jan   13256. 13265.    0.999     1.00         13269.
##  2 seats  1990 Feb   12966. 13244.    0.980     0.999        13235.
##  3 seats  1990 Mar   12938. 13236.    0.977     1.00         13238.
##  4 seats  1990 Apr   13012. 13232.    0.983     1.00         13234.
##  5 seats  1990 May   13108. 13221.    0.991     1.00         13222.
##  6 seats  1990 Jun   13183. 13205.    0.998     1.00         13204.
##  7 seats  1990 Jul   13170. 13186.    0.999     1.00         13189.
##  8 seats  1990 Aug   13160. 13165.    1.00      1.00         13162.
##  9 seats  1990 Sep   13113. 13145.    0.998     1.00         13145.
## 10 seats  1990 Oct   13185. 13129.    1.00      1.00         13126.
## # … with 347 more rows

autoplot(seats_dcmp) +
  labs(title = "SEATS decomposition of total US retail employment")

Based as a multiplicative model

STL Decomposition

us_retail_employment %>%
  model(STL(Employed ~ season(window=9), robust=TRUE)) %>%
  components() %>% autoplot() +
    ggtitle("STL decomposition: US retail employment")

Different ways of manipulating model

us_retail_employment %>%
  model(STL(Employed ~ season(window=5))) %>%
  components()

## # A dable:           357 x 7 [1M]
## # Key:               .model [1]
## # STL Decomposition: Employed = trend + season_year + remainder
##    .model              Month Employed  trend season_year remainder season_adjust
##    <chr>               <mth>    <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
##  1 STL(Employed ~ … 1990 Jan   13256. 13294.       -2.16    -36.2         13258.
##  2 STL(Employed ~ … 1990 Feb   12966. 13273.     -260.      -47.3         13226.
##  3 STL(Employed ~ … 1990 Mar   12938. 13252.     -289.      -25.1         13227.
##  4 STL(Employed ~ … 1990 Apr   13012. 13231.     -221.        2.25        13233.
##  5 STL(Employed ~ … 1990 May   13108. 13209.     -111.        9.96        13219.
##  6 STL(Employed ~ … 1990 Jun   13183. 13188.      -18.8      14.1         13202.
##  7 STL(Employed ~ … 1990 Jul   13170. 13166.      -17.9      22.1         13188.
##  8 STL(Employed ~ … 1990 Aug   13160. 13144.       -2.53     18.1         13162.
##  9 STL(Employed ~ … 1990 Sep   13113. 13122.      -34.0      25.3         13147.
## 10 STL(Employed ~ … 1990 Oct   13185. 13100.       54.3      30.7         13131.
## # … with 347 more rows

us_retail_employment %>%
  model(STL(Employed ~ trend(window=15) +
                       season(window="periodic"),
            robust = TRUE)
  ) %>% components()

## # A dable:           357 x 7 [1M]
## # Key:               .model [1]
## # STL Decomposition: Employed = trend + season_year + remainder
##    .model              Month Employed  trend season_year remainder season_adjust
##    <chr>               <mth>    <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
##  1 "STL(Employed ~… 1990 Jan   13256. 13247.      -80.8      89.9         13337.
##  2 "STL(Employed ~… 1990 Feb   12966. 13235.     -273.        4.72        13240.
##  3 "STL(Employed ~… 1990 Mar   12938. 13223.     -258.      -26.5         13197.
##  4 "STL(Employed ~… 1990 Apr   13012. 13211.     -186.      -12.6         13198.
##  5 "STL(Employed ~… 1990 May   13108. 13198.      -88.4      -1.74        13197.
##  6 "STL(Employed ~… 1990 Jun   13183. 13186.       -8.47      5.67        13191.
##  7 "STL(Employed ~… 1990 Jul   13170. 13173.      -10.9       8.17        13181.
##  8 "STL(Employed ~… 1990 Aug   13160. 13157.      -11.5      13.5         13171.
##  9 "STL(Employed ~… 1990 Sep   13113. 13142.      -88.0      59.2         13201.
## 10 "STL(Employed ~… 1990 Oct   13185. 13116.       39.0      29.8         13146.
## # … with 347 more rows

trend(window = ?) controls the wiggliness of trend component

As window becomes shorter, you’re averaging over smaller things so it is more wiggly
As window becomes bigger, you’re averageing over more time, so it is less wiggly (variant)

season(window = ?) controls variation on seasonal component

Same thing as above but for seasonal, how many seasons are we averaging over? how many Januarys, febuarys etc.

season(window = ‘periodic’) is equivalent to an infinate window

Means data is purely periodic
Default to 13 periods if monthly data

us_retail_employment %>%
  model(STL(Employed)) %>%
  components() %>%
  autoplot()