Introduction

head(housingsales)

## # A tsibble: 6 x 3 [1M]
##   DATE       HSN1FNSA     Time
##   <chr>         <dbl>    <mth>
## 1 2009-12-01       24 2009 Dec
## 2 2010-01-01       24 2010 Jan
## 3 2010-02-01       27 2010 Feb
## 4 2010-03-01       36 2010 Mar
## 5 2010-04-01       41 2010 Apr
## 6 2010-05-01       26 2010 May

For this discussion, I used a FRED dataset that tracks the sales of new single-family homes in the United States. The dataset is a time series dataset, with reoccurring observations each month. The dataset has two variables: the date variable, which indicates the month and year of the observation, and HSN1FNSA, which shows the number of new single-family houses sold that month in thousands.

Graphs

housingsales%>%
  autoplot(HSN1FNSA) + 
  labs(y = "Units Sold (Thousands)",
       title = "New Single Family Houses Sold",
       subtitle = "Source : FRED")

After plotting the variable using the autoplot function, we see that the time series shows some signs of heterogeneity, with the variance of the more leftwards seasonal patterns being much smaller than that of the more rightwards seasonal patterns.

housingsales%>%
  autoplot(HSN1FNSA^(1/5)) + 
  labs(y = "Units Sold (Thousands)",
       title = "New Single Family Houses Sold",
       subtitle = "Source : FRED")

housingsales$HSN1FNSA <- (housingsales$HSN1FNSA)^(1/5)

To account for the heterogeneity in the variable, I employed a box-cox transformation. I do this by raising the variable to the power of 1/5, in other words, by taking the fifth root of the variable. Comparing the two graphs, we see that the heterogeneity of variance in seasonal patterns is not as pronounced in the box-cox-transformed graph compared to the non-box-cox-transformed graph.

Decomposition

The next step now is to isolate the three-time series components: the trend cycle (T), the seasonal component (S), and the remainder component (R). There are two classical ways to do so: additive decomposition and multiplicative decomposition.

Additive Decomposition

\(y_{t}\) = \(S_{t}\) + \(T_{t}\) + \(R_{t}\)

## Warning: Removed 6 rows containing missing values (`geom_line()`).

Additive Decomposition without Box-Cox

\(y_{t}\) = \(S_{t}\) + \(T_{t}\) + \(R_{t}\)

## Warning: Removed 6 rows containing missing values (`geom_line()`).

Multiplicative Decomposition

\(y_{t}\) = \(S_{t}\) x \(T_{t}\) x \(R_{t}\)

## Warning: Removed 6 rows containing missing values (`geom_line()`).

Conclusion

All of the classical decompositions are performing very similarly. Since it is hard to see a difference between the graphs, I opt to play to the strengths of the individual methods. Additive decomposition is appropriate when the magnitude of seasonal fluctuations does not vary around the time series. Since this is not the case, and there is pretty clear heterogeneity in the variable’s variance, multiplicative decomposition might be preferable. However, it should be said that using box-cox transformation in the first additive decomposition accounts for the variable’s variance’s heterogeneity equally well. I would therefore choose to use the box-cox adjusted additive decomposition, as it gives me more control of the data.

Discussion #1

Samuel C. Singer

2024-03-13