head(housingsales)
## # A tsibble: 6 x 3 [1M]
## DATE HSN1FNSA Time
## <chr> <dbl> <mth>
## 1 2009-12-01 24 2009 Dec
## 2 2010-01-01 24 2010 Jan
## 3 2010-02-01 27 2010 Feb
## 4 2010-03-01 36 2010 Mar
## 5 2010-04-01 41 2010 Apr
## 6 2010-05-01 26 2010 May
For this discussion, I used a FRED dataset that tracks the sales
of new single-family homes in the United States. The dataset is a time
series dataset, with reoccurring observations each month. The dataset
has two variables: the date variable, which indicates the month and year
of the observation, and HSN1FNSA, which shows the number of new
single-family houses sold that month in thousands.
housingsales%>%
autoplot(HSN1FNSA) +
labs(y = "Units Sold (Thousands)",
title = "New Single Family Houses Sold",
subtitle = "Source : FRED")
After plotting the variable using the autoplot function, we see
that the time series shows some signs of heterogeneity, with the
variance of the more leftwards seasonal patterns being much smaller than
that of the more rightwards seasonal patterns.
housingsales%>%
autoplot(HSN1FNSA^(1/5)) +
labs(y = "Units Sold (Thousands)",
title = "New Single Family Houses Sold",
subtitle = "Source : FRED")
housingsales$HSN1FNSA <- (housingsales$HSN1FNSA)^(1/5)
To account for the heterogeneity in the variable, I employed a
box-cox transformation. I do this by raising the variable to the power
of 1/5, in other words, by taking the fifth root of the variable.
Comparing the two graphs, we see that the heterogeneity of variance in
seasonal patterns is not as pronounced in the box-cox-transformed graph
compared to the non-box-cox-transformed graph.
The next step now is to isolate the three-time series
components: the trend cycle (T), the seasonal component (S), and the
remainder component (R). There are two classical ways to do so: additive
decomposition and multiplicative decomposition.
## Warning: Removed 6 rows containing missing values (`geom_line()`).
## Warning: Removed 6 rows containing missing values (`geom_line()`).
## Warning: Removed 6 rows containing missing values (`geom_line()`).
All of the classical decompositions are performing very
similarly. Since it is hard to see a difference between the graphs, I
opt to play to the strengths of the individual methods. Additive
decomposition is appropriate when the magnitude of seasonal fluctuations
does not vary around the time series. Since this is not the case, and
there is pretty clear heterogeneity in the variable’s variance,
multiplicative decomposition might be preferable. However, it should be
said that using box-cox transformation in the first additive
decomposition accounts for the variable’s variance’s heterogeneity
equally well. I would therefore choose to use the box-cox adjusted
additive decomposition, as it gives me more control of the data.