Setup
Global Economy Dataset
## # A tsibble: 15,150 x 9 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
## 7 Afghanistan AFG 1966 1399999967. NA NA 18.6 8.57 10152331
## 8 Afghanistan AFG 1967 1673333418. NA NA 14.2 6.77 10372630
## 9 Afghanistan AFG 1968 1373333367. NA NA 15.2 8.90 10604346
## 10 Afghanistan AFG 1969 1408888922. NA NA 15.0 10.1 10854428
## # … with 15,140 more rows
Filter Australia
## # A tsibble: 58 x 10 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austra… AUS 1960 1.86e10 NA 7.96 14.1 13.0 10276477
## 2 Austra… AUS 1961 1.96e10 2.49 8.14 15.0 12.4 10483000
## 3 Austra… AUS 1962 1.99e10 1.30 8.12 12.6 13.9 10742000
## 4 Austra… AUS 1963 2.15e10 6.21 8.17 13.8 13.0 10950000
## 5 Austra… AUS 1964 2.38e10 6.98 8.40 13.8 14.9 11167000
## 6 Austra… AUS 1965 2.59e10 5.98 8.69 15.3 13.2 11388000
## 7 Austra… AUS 1966 2.73e10 2.38 8.98 15.1 12.9 11651000
## 8 Austra… AUS 1967 3.04e10 6.30 9.29 13.9 12.9 11799000
## 9 Austra… AUS 1968 3.27e10 5.10 9.52 14.5 12.3 12009000
## 10 Austra… AUS 1969 3.66e10 7.04 9.83 13.3 12.0 12263000
## # … with 48 more rows, and 1 more variable: `mean(GDP)` <dbl>
## [1] 417399409147
- We can see how the average GDP is around 4.17m which is right before it line boosts up
- How might we want to think about analyzing this data?
- we may want to think of some form of relevant adjustment.. we know there are more people there over time too, so maybe we think of Per capita
- How might we want to think about analyzing this data?
First Adjustment: Per Capita Adjustments
# Normalizing by Per-capitizing
global_economy %>%
filter(Country == "Australia") %>%
autoplot(GDP/Population)
Highest GDP per Capita
global_economy %>%
filter(Year == "2016" | Year == "2017") %>%
mutate(PerCap = GDP/Population) %>%
arrange(desc(PerCap))
## # A tsibble: 524 x 10 [1Y]
## # Key: Country [262]
## Country Code Year GDP Growth CPI Imports Exports Population PerCap
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2016 6.47e 9 3.21 NA NA NA 38499 1.68e5
## 2 Liechten… LIE 2016 6.21e 9 NA NA NA NA 37666 1.65e5
## 3 Luxembou… LUX 2017 6.24e10 2.30 111. 194. 230. 599449 1.04e5
## 4 Luxembou… LUX 2016 5.86e10 3.08 110. 186. 221. 582014 1.01e5
## 5 Macao SA… MAC 2017 5.04e10 9.10 136. 32.0 79.4 622567 8.09e4
## 6 Switzerl… CHE 2017 6.79e11 1.09 98.3 53.9 65.0 8466017 8.02e4
## 7 Switzerl… CHE 2016 6.69e11 1.38 97.7 54.6 65.8 8373338 7.99e4
## 8 Isle of … IMN 2016 6.59e 9 7.4 NA NA NA 83737 7.87e4
## 9 Norway NOR 2017 3.99e11 1.92 115. 33.1 35.5 5282223 7.55e4
## 10 Macao SA… MAC 2016 4.53e10 -0.863 134. 34.5 76.1 612167 7.40e4
## # … with 514 more rows
Second Adjustment: Inflation Adjustment
- often FRED or world data supplies inflation adjuster
print_retail <- aus_retail %>%
filter(Industry == "Newspaper and book retailing") %>% # Looking at Print media industry
group_by(Industry) %>%
index_by(Year = year(Month)) %>% # Index by year
summarise(Turnover = sum(Turnover)) # Turnover = Same as revenue
aus_economy <- filter(global_economy, Code == "AUS")
# Create inflation adjusted data & Plot it
print_retail %>%
left_join(aus_economy, by = "Year") %>%
mutate(Adj_turnover = Turnover / CPI) %>% # adjusted turnover = turnover / consumer price index
pivot_longer(c(Turnover, Adj_turnover),
names_to = "Type", values_to = "Turnover") %>%
ggplot(aes(Year, Turnover)) + geom_line() +
facet_grid(vars(Type), scales = "free_y") +
xlab("Years") +ylab(NULL) +
ggtitle("Turnover: Aus print media industry")
Mathematical Transformations
library(tidyquant)
SnP500 <- c("^GSPC") %>% tq_get(., from = "2019-01-01")
SnP500 %>% as_tsibble(., index = date) %>% autoplot()
# Lets take it back further
SnP500 <- c("^GSPC") %>% tq_get(., from = "1990-01-01")
SnP500 %>% as_tsibble(., index = date) %>% autoplot()
Mathematical Tranformations
food <- aus_retail %>%
filter(Industry == "Food retailing") %>%
summarize(Turnover = sum(Turnover))
food %>% autoplot()
- Looking at this data on a monthly basis, back to 1980
- Clearly an increase over time (level of volatility changing)
- Seasonal pattern over time
- Australian’s start to spend more money over time in the retail sector
- What we want to do:
- Smooth this out
Square Root
- The least powerful transformation
- Still a lot of volatility (not as extreme)
- Can’t do negative numbers
Cube Root
- 2nd most powerful (right in the middle)
- Same thing, more effective but still volatility
- Can’t do negative numbers
Log Transformation
- Most powerful of the 3
- Now the trend is constant over time
- Doesn’t take 0’s well, if we have this issue… use the logp1() function
Box-Cox Transformation
- How to solve for Lambda?
- Guerrero!!
## # A tibble: 1 x 1
## lambda_guerrero
## <dbl>
## 1 0.0524
- The optimal lambda = 0.052
Homework for practice
- For the following TS, find the appropriate transformation in order to stabalize the variance
- US GDP from global_economy
- Slaughter of Vitorian bulls, bullocks, and steers from aus_livestock
- Electricity demand from vic_elec
- gas production from aus_production
- Why is a Box-Cox transformation unhelpful for canadian_gas data?
DeSeasoning
# We can grab this first chunk of code from FRED
us_retail_employment <- us_employment %>%
filter(year(Month) >= 1990, Title == "Retail Trade") %>%
select(-Series_ID)
dcmp <- us_retail_employment %>%
model(STL(Employed))
# Original US Retail Employement data
autoplot(us_retail_employment, Employed)
# Seasonally adjusted Data
## We want to pick this data apart, have the trend and the cycle distinct from the season
autoplot(us_retail_employment, Employed, color = "gray") +
autolayer(components(dcmp), season_adjust, color = "blue") + # Season_adjust comes from the decomposition
labs(y = "Persons (thousands", title = "Total employment in US retail")
Moving Averages
aus_exports <- global_economy %>%
filter(Country == "Australia") %>%
select(Exports, Year)
head(aus_exports)
## # A tsibble: 6 x 2 [1Y]
## Exports Year
## <dbl> <dbl>
## 1 13.0 1960
## 2 12.4 1961
## 3 13.9 1962
## 4 13.0 1963
## 5 14.9 1964
## 6 13.2 1965
aus_exports <- global_economy %>%
filter(Country == "Australia") %>%
mutate(
'5-MA' = slider::slide_dbl(Exports, # Pull from Exports
mean, # Calculate the mean for the Moving average
.before = 2, # Take the 2 observations before
.after = 2, # And the 2 observations after
.complete = TRUE) # for moving AVG
)
head(aus_exports)
## # A tsibble: 6 x 10 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population `5-MA`
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Australia AUS 1960 1.86e10 NA 7.96 14.1 13.0 10276477 NA
## 2 Australia AUS 1961 1.96e10 2.49 8.14 15.0 12.4 10483000 NA
## 3 Australia AUS 1962 1.99e10 1.30 8.12 12.6 13.9 10742000 13.5
## 4 Australia AUS 1963 2.15e10 6.21 8.17 13.8 13.0 10950000 13.5
## 5 Australia AUS 1964 2.38e10 6.98 8.40 13.8 14.9 11167000 13.6
## 6 Australia AUS 1965 2.59e10 5.98 8.69 15.3 13.2 11388000 13.4
# Note: the information is slightly deceptive because it is taking numbers from after the observed date
aus_exports <- global_economy %>%
filter(Country == "Australia") %>%
select(Exports, Year) %>%
mutate(
'5-MA' = slider::slide_dbl(Exports,
mean,
.before = 2,
.after = 2,
.complete = TRUE))
head(aus_exports) # Notice there are 2 missing observations in the very beginning
## # A tsibble: 6 x 3 [1Y]
## Exports Year `5-MA`
## <dbl> <dbl> <dbl>
## 1 13.0 1960 NA
## 2 12.4 1961 NA
## 3 13.9 1962 13.5
## 4 13.0 1963 13.5
## 5 14.9 1964 13.6
## 6 13.2 1965 13.4
## # A tsibble: 6 x 3 [1Y]
## Exports Year `5-MA`
## <dbl> <dbl> <dbl>
## 1 21.5 2012 20.8
## 2 20.0 2013 20.8
## 3 21.1 2014 20.4
## 4 20.0 2015 20.3
## 5 19.3 2016 NA
## 6 21.3 2017 NA
- Always move in odd windows because it doesn’t alwasy split evenly in your window
- Example… 3 before & 0 after is a total of 4 observations…
- Or 1 before and 2 after is 4 observations (it is uneven)
- Example… 3 before & 0 after is a total of 4 observations…
# Now for 5 year moving average
aus_exports <- global_economy %>%
filter(Country == "Australia") %>%
select(Exports, Year) %>%
mutate(
`5-MA` = slider::slide_dbl(Exports,
mean,
.before = 4,
.after = 0,
.complete = TRUE))
head(aus_exports) # Missing the 4 initial observations
## # A tsibble: 6 x 3 [1Y]
## Exports Year `5-MA`
## <dbl> <dbl> <dbl>
## 1 13.0 1960 NA
## 2 12.4 1961 NA
## 3 13.9 1962 NA
## 4 13.0 1963 NA
## 5 14.9 1964 13.5
## 6 13.2 1965 13.5
## # A tsibble: 6 x 3 [1Y]
## Exports Year `5-MA`
## <dbl> <dbl> <dbl>
## 1 21.5 2012 21.2
## 2 20.0 2013 21.2
## 3 21.1 2014 20.8
## 4 20.0 2015 20.8
## 5 19.3 2016 20.4
## 6 21.3 2017 20.3
autoplot(aus_exports, Exports) +
autolayer(aus_exports, `5-MA`, color = "red") + # Note that 5-MA uses tick marks
labs(y = "Exports (% of GDP)", title = "Total Australian exports") +
guides(colour = guide_legend(title = "series"))
Taking the Moving AVG of our Moving AVG
aus_exports <- global_economy %>%
filter(Country == "Australia") %>%
select(Exports, Year) %>%
mutate(
'5-MA' = slider::slide_dbl(Exports,
mean,
.before = 2,
.after = 2,
.complete = TRUE))
# Take running average of our 5 Year window
aus_exports2 <- aus_exports %>%
mutate(`2x5-MA` = slider::slide_dbl(`5-MA`, mean, .before = 1, .after = 0, .complete = TRUE)
)
head(aus_exports2,10)
## # A tsibble: 10 x 4 [1Y]
## Exports Year `5-MA` `2x5-MA`
## <dbl> <dbl> <dbl> <dbl>
## 1 13.0 1960 NA NA
## 2 12.4 1961 NA NA
## 3 13.9 1962 13.5 NA
## 4 13.0 1963 13.5 13.5
## 5 14.9 1964 13.6 13.6
## 6 13.2 1965 13.4 13.5
## 7 12.9 1966 13.3 13.3
## 8 12.9 1967 12.7 13.0
## 9 12.3 1968 12.6 12.6
## 10 12.0 1969 12.6 12.6
# Plot
autoplot(aus_exports2, Exports) +
autolayer(aus_exports2, `5-MA`, color = "red") +
autolayer(aus_exports2, `2x5-MA`, color = "blue") +
labs(y = "Exports (% of GDP)", title = "Total Australian exports") +
guides(colour = guide_legend(title = "series"))
Red (5-MA) is slightly jaggedy
- Blue (2x5-MA) smooths the average a little bit
- This is called a double moving average
us_retail_employment <- us_employment %>%
filter(year(Month) >= 1990, Title == "Retail Trade") %>%
select(-Series_ID)
us_retail_employment %>% head(20)
## # A tsibble: 20 x 3 [1M]
## Month Title Employed
## <mth> <chr> <dbl>
## 1 1990 Jan Retail Trade 13256.
## 2 1990 Feb Retail Trade 12966.
## 3 1990 Mar Retail Trade 12938.
## 4 1990 Apr Retail Trade 13012.
## 5 1990 May Retail Trade 13108.
## 6 1990 Jun Retail Trade 13183.
## 7 1990 Jul Retail Trade 13170.
## 8 1990 Aug Retail Trade 13160.
## 9 1990 Sep Retail Trade 13113.
## 10 1990 Oct Retail Trade 13185.
## 11 1990 Nov Retail Trade 13462.
## 12 1990 Dec Retail Trade 13673.
## 13 1991 Jan Retail Trade 13068.
## 14 1991 Feb Retail Trade 12744.
## 15 1991 Mar Retail Trade 12684.
## 16 1991 Apr Retail Trade 12687
## 17 1991 May Retail Trade 12781.
## 18 1991 Jun Retail Trade 12859
## 19 1991 Jul Retail Trade 12849.
## 20 1991 Aug Retail Trade 12871.
us_retail_employment %>%
autoplot(Employed) +
xlab("Year") + ylab("Persons (thousands)") +
ggtitle("Total employment in US retail")
- The volatility is relatively constant, meaning we can continue with an Additive Decomposition
Decomposition
Additive Decomposition
USREADC <- us_retail_employment %>% # Data we are interested in
model(classical_decomposition(Employed, # Sub-data we are interested in
type = "additive")) %>%
components() # Show me the components "so I can see what it looks like"
# Monthly, Quarterly, Annual data works in this situation, anything less than monthly won't
## Too many missing values/calender adjustments, etc.
USREADC
## # A dable: 357 x 7 [1M]
## # Key: .model [1]
## # Classical Decomposition: Employed = trend + seasonal + random
## .model Month Employed trend seasonal random season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposit… 1990 Jan 13256. NA -75.5 NA 13331.
## 2 "classical_decomposit… 1990 Feb 12966. NA -273. NA 13239.
## 3 "classical_decomposit… 1990 Mar 12938. NA -253. NA 13191.
## 4 "classical_decomposit… 1990 Apr 13012. NA -190. NA 13203.
## 5 "classical_decomposit… 1990 May 13108. NA -88.9 NA 13197.
## 6 "classical_decomposit… 1990 Jun 13183. NA -10.4 NA 13193.
## 7 "classical_decomposit… 1990 Jul 13170. 13178. -13.3 5.65 13183.
## 8 "classical_decomposit… 1990 Aug 13160. 13161. -9.99 8.80 13169.
## 9 "classical_decomposit… 1990 Sep 13113. 13141. -87.4 59.9 13201.
## 10 "classical_decomposit… 1990 Oct 13185. 13117. 34.6 33.8 13151.
## # … with 347 more rows
- These Decompositions use some type of slider window calculations to figure out..
- What does a seasonal component look like
- From the moving window, it calculates & averages the figure for each Jan,Feb,etc. and caluclates, how far we are from the average we do this to get the seasonal number
- What does a trend-cycle component look like
- The decomposition is going to calculate what the trend is
- It calculates it on a given window
- By default for monthly data is size 12 (for 12 months)
- For daily data with a weekly pattern it would be 7
- Quartly data = size 4
- Using these windows, lets calculate what a trend looks like
- As the windows get smaller, the data gets more jagged, as window gets bigger, it becomes more smooth
- Then net of those two things, what is left over
## # A dable: 10 x 7 [1M]
## # Key: .model [1]
## # Classical Decomposition: Employed = trend + seasonal + random
## .model Month Employed trend seasonal random season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposit… 1990 Jan 13256. NA -75.5 NA 13331.
## 2 "classical_decomposit… 1990 Feb 12966. NA -273. NA 13239.
## 3 "classical_decomposit… 1990 Mar 12938. NA -253. NA 13191.
## 4 "classical_decomposit… 1990 Apr 13012. NA -190. NA 13203.
## 5 "classical_decomposit… 1990 May 13108. NA -88.9 NA 13197.
## 6 "classical_decomposit… 1990 Jun 13183. NA -10.4 NA 13193.
## 7 "classical_decomposit… 1990 Jul 13170. 13178. -13.3 5.65 13183.
## 8 "classical_decomposit… 1990 Aug 13160. 13161. -9.99 8.80 13169.
## 9 "classical_decomposit… 1990 Sep 13113. 13141. -87.4 59.9 13201.
## 10 "classical_decomposit… 1990 Oct 13185. 13117. 34.6 33.8 13151.
## # A dable: 10 x 7 [1M]
## # Key: .model [1]
## # Classical Decomposition: Employed = trend + seasonal + random
## .model Month Employed trend seasonal random season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposit… 2018 Dec 16309. 15816. 573. -79.7 15736.
## 2 "classical_decomposit… 2019 Jan 15754. 15810. -75.5 18.8 15829.
## 3 "classical_decomposit… 2019 Feb 15567. 15804. -273. 36.7 15840.
## 4 "classical_decomposit… 2019 Mar 15577. 15797. -253. 32.5 15830.
## 5 "classical_decomposit… 2019 Apr 15625. NA -190. NA 15815.
## 6 "classical_decomposit… 2019 May 15692. NA -88.9 NA 15781.
## 7 "classical_decomposit… 2019 Jun 15776. NA -10.4 NA 15786.
## 8 "classical_decomposit… 2019 Jul 15786. NA -13.3 NA 15799.
## 9 "classical_decomposit… 2019 Aug 15750. NA -9.99 NA 15759.
## 10 "classical_decomposit… 2019 Sep 15611. NA -87.4 NA 15699.
- We notice the trend is missing for the first 6 values & last 6 values
- So now we have a 13 month window
- 6 before + given data point + 6 after = 13
- Takes the moving average of this window and calculates the trend
- So now we have a 13 month window
What does the Additive Decomp look like
- Four Panels:
- Original data
- Trend-Cycle
- Seasonal
- Drops off at beginning of the year
- Raises up
- Drops off
- Jumps up
- Drops again
- Random
- Should get rid of seasonality and trend, but if there is something worth looking at, dive into it
- Pay attention to volatility
Multiplicative Decomposition
USREMDC <- us_retail_employment %>%
model(classical_decomposition(Employed,
type = "multiplicative")) %>%
components()
USREMDC
## # A dable: 357 x 7 [1M]
## # Key: .model [1]
## # Classical Decomposition: Employed = trend * seasonal * random
## .model Month Employed trend seasonal random season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposit… 1990 Jan 13256. NA 0.995 NA 13323.
## 2 "classical_decomposit… 1990 Feb 12966. NA 0.981 NA 13211.
## 3 "classical_decomposit… 1990 Mar 12938. NA 0.983 NA 13166.
## 4 "classical_decomposit… 1990 Apr 13012. NA 0.987 NA 13184.
## 5 "classical_decomposit… 1990 May 13108. NA 0.994 NA 13189.
## 6 "classical_decomposit… 1990 Jun 13183. NA 0.999 NA 13193.
## 7 "classical_decomposit… 1990 Jul 13170. 13178. 0.999 1.00 13183.
## 8 "classical_decomposit… 1990 Aug 13160. 13161. 0.999 1.00 13168.
## 9 "classical_decomposit… 1990 Sep 13113. 13141. 0.994 1.00 13190.
## 10 "classical_decomposit… 1990 Oct 13185. 13117. 1.00 1.00 13153.
## # … with 347 more rows
- Now the model is taking the multiplicative
- First step in the model is to take the log()
- Estimate the classical additive decomp on the logarithm
- Seasonal estimates are going to be slightly different
- Much higher in December (higher than 1) lower than 1 in other months
## # A dable: 10 x 7 [1M]
## # Key: .model [1]
## # Classical Decomposition: Employed = trend * seasonal * random
## .model Month Employed trend seasonal random season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposit… 1990 Jan 13256. NA 0.995 NA 13323.
## 2 "classical_decomposit… 1990 Feb 12966. NA 0.981 NA 13211.
## 3 "classical_decomposit… 1990 Mar 12938. NA 0.983 NA 13166.
## 4 "classical_decomposit… 1990 Apr 13012. NA 0.987 NA 13184.
## 5 "classical_decomposit… 1990 May 13108. NA 0.994 NA 13189.
## 6 "classical_decomposit… 1990 Jun 13183. NA 0.999 NA 13193.
## 7 "classical_decomposit… 1990 Jul 13170. 13178. 0.999 1.00 13183.
## 8 "classical_decomposit… 1990 Aug 13160. 13161. 0.999 1.00 13168.
## 9 "classical_decomposit… 1990 Sep 13113. 13141. 0.994 1.00 13190.
## 10 "classical_decomposit… 1990 Oct 13185. 13117. 1.00 1.00 13153.
## # A dable: 10 x 7 [1M]
## # Key: .model [1]
## # Classical Decomposition: Employed = trend * seasonal * random
## .model Month Employed trend seasonal random season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposit… 2018 Dec 16309. 15816. 1.04 0.992 15696.
## 2 "classical_decomposit… 2019 Jan 15754. 15810. 0.995 1.00 15834.
## 3 "classical_decomposit… 2019 Feb 15567. 15804. 0.981 1.00 15861.
## 4 "classical_decomposit… 2019 Mar 15577. 15797. 0.983 1.00 15851.
## 5 "classical_decomposit… 2019 Apr 15625. NA 0.987 NA 15831.
## 6 "classical_decomposit… 2019 May 15692. NA 0.994 NA 15788.
## 7 "classical_decomposit… 2019 Jun 15776. NA 0.999 NA 15787.
## 8 "classical_decomposit… 2019 Jul 15786. NA 0.999 NA 15801.
## 9 "classical_decomposit… 2019 Aug 15750. NA 0.999 NA 15760.
## 10 "classical_decomposit… 2019 Sep 15611. NA 0.994 NA 15703.
What does the Multiplicative Decomp look like
autoplot(USREMDC) +
labs(title = "Classical Multiplicative decomposition of total US retail employment")
STL - Seasonal & Trend Decomposition using Low S
## # A dable: 357 x 7 [1M]
## # Key: .model [1]
## # STL Decomposition: Employed = trend + season_year + remainder
## .model Month Employed trend season_year remainder season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 stl 1990 Jan 13256. 13291. -38.1 3.08 13294.
## 2 stl 1990 Feb 12966. 13272. -261. -44.2 13227.
## 3 stl 1990 Mar 12938. 13252. -291. -23.0 13229.
## 4 stl 1990 Apr 13012. 13233. -221. 0.0892 13233.
## 5 stl 1990 May 13108. 13213. -115. 9.98 13223.
## 6 stl 1990 Jun 13183. 13193. -25.6 15.7 13208.
## 7 stl 1990 Jul 13170. 13173. -24.4 22.0 13194.
## 8 stl 1990 Aug 13160. 13152. -11.8 19.5 13171.
## 9 stl 1990 Sep 13113. 13131. -43.4 25.7 13157.
## 10 stl 1990 Oct 13185. 13110. 62.5 12.3 13123.
## # … with 347 more rows
- Decompose into Trend and Season
- These seasons are going to be averaged over moving windows
- Seasonally adjusted part
- Trend plus Remainder
What does this STL Decomposition look like?
us_retail_employment %>%
autoplot(Employed, color='gray') +
autolayer(components(dcmp), trend, color='red') +
xlab("Year") + ylab("Persons (thousands)") +
ggtitle("Total employment in US retail")
# Looking at the Seasonal Component
us_retail_employment %>%
autoplot(Employed, color='gray') +
autolayer(components(dcmp), season_year, color='red')
- Red is general season
- Grey is OG data
Playing with the Window Feature
dcmp25 <- us_retail_employment %>%
model(stl = STL(Employed ~ trend(window = 25))) # Updating the Trend to operate over 25 windows
components(dcmp25)
## # A dable: 357 x 7 [1M]
## # Key: .model [1]
## # STL Decomposition: Employed = trend + season_year + remainder
## .model Month Employed trend season_year remainder season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 stl 1990 Jan 13256. 13300. -38.3 -5.63 13294.
## 2 stl 1990 Feb 12966. 13278. -261. -51.7 13227.
## 3 stl 1990 Mar 12938. 13257. -290. -29.1 13228.
## 4 stl 1990 Apr 13012. 13236. -219. -4.85 13231.
## 5 stl 1990 May 13108. 13214. -113. 7.39 13221.
## 6 stl 1990 Jun 13183. 13192. -24.9 15.5 13208.
## 7 stl 1990 Jul 13170. 13170. -24.3 24.1 13194.
## 8 stl 1990 Aug 13160. 13148. -12.4 23.8 13172.
## 9 stl 1990 Sep 13113. 13126. -44.7 32.1 13158.
## 10 stl 1990 Oct 13185. 13104. 60.5 20.8 13125.
## # … with 347 more rows
Adjusting Windows & Comparing
dcmp <- us_retail_employment %>%
model(stl = STL(Employed))
us_retail_employment %>%
autoplot(Employed, color='gray') +
autolayer(components(dcmp), trend, color='red')
# 25 Window Decomp visual
us_retail_employment %>%
autoplot(Employed, color='gray') +
autolayer(components(dcmp25), trend, color='red')
# Can do this for Season too
#dcmpS <- us_retail_employment %>%
# model(stl = STL(Employed ~ Season(window = 25)))
#us_retail_employment %>%
# autoplot(Employed, color='gray') +
# autolayer(components(dcmpS), season_year, color='red')
- Graphic trend line a whole lot smoother because we took the average over 25 windows
STL Decomposition
Appears to be higher levels of seasonality from 1990 - 2005 then it levels out a little bit
Local regression of the moving average windows throught the data
Visualizations w/ Core Decomp Components
- This breaks down the season_year components into individual months & Shows us that thye look like as a time series
- In september/october over time the size of employement effectas appears to be dropping
Seasonally Adjusted Data
us_retail_employment %>%
autoplot(Employed, color='gray') +
autolayer(components(dcmp), season_adjust, color='blue') +
xlab("Year") + ylab("Persons (thousands)") +
ggtitle("Total employment in US retail")
- Broader trend in Unemployment
- Taking seasonal data out of this
- Not a plot on the trend, but the seasonally adjusted data
X-11 Decomposition
X11_dcmp <- us_retail_employment %>%
model(seats = feasts:::X11(Employed, type = "additive")) %>%
components()
X11_dcmp
## # A dable: 357 x 7 [1M]
## # Key: .model [1]
## # X11 Decomposition: Employed = trend + seasonal + irregular
## .model Month Employed trend seasonal irregular season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 seats 1990 Jan 13256. 13260. -20.5 16.0 13276.
## 2 seats 1990 Feb 12966. 13248. -253. -29.1 13219.
## 3 seats 1990 Mar 12938. 13237. -291. -7.47 13229.
## 4 seats 1990 Apr 13012. 13227. -217. 2.31 13229.
## 5 seats 1990 May 13108. 13217. -111. 2.40 13219.
## 6 seats 1990 Jun 13183. 13204. -21.0 -0.192 13204.
## 7 seats 1990 Jul 13170. 13186. -21.1 5.09 13191.
## 8 seats 1990 Aug 13160. 13167. -2.20 -5.18 13162.
## 9 seats 1990 Sep 13113. 13150. -33.0 -3.86 13146.
## 10 seats 1990 Oct 13185. 13136. 52.4 -2.87 13133.
## # … with 347 more rows
- Irregular = Remainder
- Looking at the seasonal trend
SEATS Decomposition
seats_dcmp <- us_retail_employment %>%
model(seats = feasts:::SEATS(Employed)) %>%
components()
seats_dcmp
## # A dable: 357 x 7 [1M]
## # Key: .model [1]
## # X-13ARIMA-SEATS Decomposition: Employed = trend * seasonal * irregular
## .model Month Employed trend seasonal irregular season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 seats 1990 Jan 13256. 13265. 0.999 1.00 13269.
## 2 seats 1990 Feb 12966. 13244. 0.980 0.999 13235.
## 3 seats 1990 Mar 12938. 13236. 0.977 1.00 13238.
## 4 seats 1990 Apr 13012. 13232. 0.983 1.00 13234.
## 5 seats 1990 May 13108. 13221. 0.991 1.00 13222.
## 6 seats 1990 Jun 13183. 13205. 0.998 1.00 13204.
## 7 seats 1990 Jul 13170. 13186. 0.999 1.00 13189.
## 8 seats 1990 Aug 13160. 13165. 1.00 1.00 13162.
## 9 seats 1990 Sep 13113. 13145. 0.998 1.00 13145.
## 10 seats 1990 Oct 13185. 13129. 1.00 1.00 13126.
## # … with 347 more rows
- Based as a multiplicative model
STL Decomposition
us_retail_employment %>%
model(STL(Employed ~ season(window=9), robust=TRUE)) %>%
components() %>% autoplot() +
ggtitle("STL decomposition: US retail employment")
Different ways of manipulating model
## # A dable: 357 x 7 [1M]
## # Key: .model [1]
## # STL Decomposition: Employed = trend + season_year + remainder
## .model Month Employed trend season_year remainder season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 STL(Employed ~ … 1990 Jan 13256. 13294. -2.16 -36.2 13258.
## 2 STL(Employed ~ … 1990 Feb 12966. 13273. -260. -47.3 13226.
## 3 STL(Employed ~ … 1990 Mar 12938. 13252. -289. -25.1 13227.
## 4 STL(Employed ~ … 1990 Apr 13012. 13231. -221. 2.25 13233.
## 5 STL(Employed ~ … 1990 May 13108. 13209. -111. 9.96 13219.
## 6 STL(Employed ~ … 1990 Jun 13183. 13188. -18.8 14.1 13202.
## 7 STL(Employed ~ … 1990 Jul 13170. 13166. -17.9 22.1 13188.
## 8 STL(Employed ~ … 1990 Aug 13160. 13144. -2.53 18.1 13162.
## 9 STL(Employed ~ … 1990 Sep 13113. 13122. -34.0 25.3 13147.
## 10 STL(Employed ~ … 1990 Oct 13185. 13100. 54.3 30.7 13131.
## # … with 347 more rows
us_retail_employment %>%
model(STL(Employed ~ trend(window=15) +
season(window="periodic"),
robust = TRUE)
) %>% components()
## # A dable: 357 x 7 [1M]
## # Key: .model [1]
## # STL Decomposition: Employed = trend + season_year + remainder
## .model Month Employed trend season_year remainder season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "STL(Employed ~… 1990 Jan 13256. 13247. -80.8 89.9 13337.
## 2 "STL(Employed ~… 1990 Feb 12966. 13235. -273. 4.72 13240.
## 3 "STL(Employed ~… 1990 Mar 12938. 13223. -258. -26.5 13197.
## 4 "STL(Employed ~… 1990 Apr 13012. 13211. -186. -12.6 13198.
## 5 "STL(Employed ~… 1990 May 13108. 13198. -88.4 -1.74 13197.
## 6 "STL(Employed ~… 1990 Jun 13183. 13186. -8.47 5.67 13191.
## 7 "STL(Employed ~… 1990 Jul 13170. 13173. -10.9 8.17 13181.
## 8 "STL(Employed ~… 1990 Aug 13160. 13157. -11.5 13.5 13171.
## 9 "STL(Employed ~… 1990 Sep 13113. 13142. -88.0 59.2 13201.
## 10 "STL(Employed ~… 1990 Oct 13185. 13116. 39.0 29.8 13146.
## # … with 347 more rows
- trend(window = ?) controls the wiggliness of trend component
- As window becomes shorter, you’re averaging over smaller things so it is more wiggly
- As window becomes bigger, you’re averageing over more time, so it is less wiggly (variant)
- season(window = ?) controls variation on seasonal component
- Same thing as above but for seasonal, how many seasons are we averaging over? how many Januarys, febuarys etc.
- season(window = ‘periodic’) is equivalent to an infinate window
- Means data is purely periodic
- Default to 13 periods if monthly data