#Decomposition
One approach to the analysis of time series data is based on smoothing past data in order to separate the underlying pattern in the data series from randomness.
The underlying pattern then can be projected into the future and used as the forecast.
The underlying pattern can also be broken down into sub patterns to identify the component factors that influence each of the values in a series.
This procedure is called …decomposition.
Decomposition methods usually try to identify two separate components of the basic underlying pattern that tend to characterize economics and business series.
Trend Cycle
Seasonal Factors
The trend Cycle represents long term changes in the level of series. The Seasonal factor is the periodic fluctuations of constant length that is usually caused by known factors such as rainfall, month of the year, temperature, timing of the Holidays, etc. The decomposition model assumes that the data has the following form:
\[Data = Pattern + Error\] or
\[Data = f (trend, cyclicity, seasonality, error)\]
The additive decomposition model looks like this.
\[Y_t = f(S_t, C_t, T_t, \epsilon_t)\]
Normally, the multi-year periodic component is modeled by exception. The exact functional form depends on the decomposition model actually used. Two common approaches are the additive model..
\[Y_t = S_t+T_t+\epsilon_t\] and the multiplicative model
\[Y_t = S_t*T_t*\epsilon_t\] ## Additive vs. Multiplicative?
An additive model is appropriate if the magnitude of the seasonal fluctuation does not vary with the level of the series. The multiplicative model is more prevalent with economic series since most seasonal economic series have seasonal variation which increases with the level of the series. That said, taking logarithms converts a multiplicative model to an additive. \(log(Y_t)+log(S_t)+log(T_t)+log(\epsilon_t)\)
For additive methods, you can pull seasonal components by using the following: \(Y_t-S_t=T_t+E_t\). Most published economic series are seasonally adjusted because Seasonal variation is usually not of primary interest. To do so is easy: \(\frac{Y_t}{S_t}=T_t+E_t\). The process of deseasonalizing the data has useful results. The seasonalized data allow us to see better the underlying pattern in the data. It provides us with measures of the extent of seasonality in the form of seasonal indexes. It provides us with a tool in projecting what one quarter’s (or month’s) observation may portend for the entire year.
Let’s look at an example of decomposition using Microsoft stock.
require(forecast)
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 4.0.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
require(fpp)
## Loading required package: fpp
## Warning: package 'fpp' was built under R version 4.0.2
## Loading required package: fma
## Loading required package: expsmooth
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: tseries
require(car)
## Loading required package: car
## Loading required package: carData
require(MASS)
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following objects are masked from 'package:fma':
##
## cement, housing, petrol
mydata<-read.csv("d:/msft.csv")
t=seq(2015.8,2020.7, length.out=length(mydata$Adj.Close))
#look at the data
par(mfrow=c(1,1))
scatter.smooth(t,mydata$Adj.Close, col="orange", main="MSFT") #simple time series plot
lines(mydata$Adj.Close~t,col="blue") #simple line
abline(reg=lm(mydata$Adj.Close~t), col="red") #simple regression
#look at the histogram of the data
hist(mydata$Adj.Close)
lines(250*dnorm(seq(10:100),mean(mydata$Adj.Close),sd(mydata$Adj.Close)))
#define a time series
myts<-ts(mydata$Adj.Close,frequency=12,start=c(1986,3))
#look at it
plot(myts)
#decompose
dec1<-decompose(myts,type="additive") #build an additive decomposition model
dec2<-decompose(myts,type="multiplicative") #build a multiplicative decomposition model
plot(dec1)
plot(dec2)
#the decomposition provides seasonal components, time series components, and random components.
s=dec1$seasonal
t=dec1$trend
r=dec1$random #these are really the residuals!
#if you add these items together for the additive, you get the original data set
proveit<-s+t+r
#if you multiply for the multiplicative, you get the original data set
proveit2<-dec2$seasonal*dec2$trend*dec2$random
#Note: for decomposition, we will not generally forecast. We use it to remove seasonality (if present. A very naive forecast for decompsition would be to get an overall linear trend (or some order of polynomial and couple that with the seasonality..example
#how do you evaluate the "goodness" of the decomposition models? Which one is better for deseasonlization?
#manually...we will use forecast for other models!
error1<-na.omit(dec1$random) #generate residuals without NA
error2<-na.omit(dec2$random) #generate residuals without NA
error1<-as.vector(error1) #set residuals as a vector, not a time series
error2<-as.vector(error2)
datavector<-length(dec2$random)-length(dec2$random[is.na(dec2$random)])
abserror1<-abs(error1)
abserror2<-abs(error2)
sqerror1<-error1^2
sqerror2<-error2^2
pererror1<-(abserror1/datavector)*100
pererror2<-(abserror2/datavector)*100
temp1=temp2=temp3=temp4=k=0
#Calculate the denominator for MASE, which compares MAE of model with MAE of Naive or SNaive
for (i in seq(13,length(na.omit(dec2$random)), by=12)){
temp1=abs(dec1$x[i]-dec1$x[i-12])
temp2=abs(dec2$x[i]-dec2$x[i-12])
temp3=temp3+temp1
temp4=temp4+temp2
k=k+1
}
denom1=temp3/k
denom2=temp4/k
#Calculate Statisics
me<-c(mean(error1),mean(error2))
mad<-c(mean(abserror1), mean(abserror2))
mse<-c(mean(sqerror1),mean(sqerror2))
rmse<-sqrt(mse)
mape<-c(mean(pererror1),mean(pererror2))
mase=c(mad[1]/denom1, mad[2]/denom2)
rbind(me,mad,mse,rmse,mape, mase)
## [,1] [,2]
## me -0.5362755 0.99238562
## mad 2.8561456 0.99238562
## mse 15.1707638 0.98631741
## rmse 3.8949665 0.99313514
## mape 5.9503033 2.06747004
## mase 0.1227485 0.04264975
One of the most popular methods for decomposing quarterly and monthly data is X-12-ARIMA, which has its origins in methods developed by the US Bureau of the Census. It is now widely used by the Bureau and government agencies around the world. Earlier versions of the method included X-11 and X-11-ARIMA. An X-13-ARIMA method was recently released as well.
The X-12-ARIMA method is based on classical decomposition, but with many extra steps and features to overcome the drawbacks of classical decomposition that were discussed in the previous section. In particular, the trend estimate is available for all observations including the end points, and the seasonal component is allowed to vary slowly over time. It is also relatively robust to occasional unusual observations. X-12-ARIMA handles both additive and multiplicative decomposition, but only allows for quarterly and monthly data.
X-12-ARIMA also has some sophisticated methods to handle trading day variation, holiday effects and the effects of known predictors, which are not covered here.
A complete discussion of the method is available in Ladiray and Quenneville (2001).
There is currently no R package for X-12-ARIMA decomposition. However, free software that implements the method is available from the US Census Bureau and an R interface to that software is provided by the x12 package.
We are likely to discuss X-12 and X-13 in the advance forecasting sections.
STL is an acronym for “Seasonal and Trend decomposition using Loess”, while Loess is a method for estimating nonlinear relationships. The STL method was developed by Cleveland et al. (1990) STL has several advantages over the classical decomposition method and X-12-ARIMA:
.Unlike X-12-ARIMA, STL will handle any type of seasonality, not only monthly and quarterly data.
.The seasonal component is allowed to change over time, and the rate of change can be controlled by the user.
.The smoothness of the trend-cycle can also be controlled by the user.
.It can be robust to outliers (i.e., the user can specify a robust decomposition). So occasional unusual observations will not affect the estimates of the trend-cycle and seasonal components. They will, however, affect the remainder component.
On the other hand, STL has some disadvantages. In particular, it does not automatically handle trading day or calendar variation, and it only provides facilities for additive decompositions. It is possible to obtain a multiplicative decomposition by first taking logs of the data, and then back-transforming the components. Decompositions some way between additive and multiplicative can be obtained using a Box-Cox transformation of the data with \(0<\lambda<1\) . A value of \(\lambda=0\) indicates multiplicative (logs). \(\lambda=1\) is additive (raised to the 1st power.)
The two main parameters to be chosen when using STL are the trend window (t.window) and seasonal window (s.window). These control how rapidly the trend and seasonal components can change. Small values allow more rapid change. Setting the seasonal window to be infinite is equivalent to forcing the seasonal component to be periodic (i.e., identical across years).
mystl=stl(na.omit(myts), t.window=15, s.degree=0, s.window="periodic", robust=TRUE)
plot(mystl)
#mystl$weights
seasonadjust=seasadj(mystl)
plot(naive(seasonadjust))
fcast=forecast(mystl, method="naive")
plot(fcast)
While decomposition is primarily useful for studying time series data, and exploring the historical changes over time, it can also be used in forecasting. To forecast a decomposed time series, we separately forecast the seasonal component and the seasonally adjusted component. It is usually assumed that the seasonal component is unchanging, or changing extremely slowly, and so it is forecast by simply taking the last year of the estimated component. In other words, a seasonal na?ve method is used for the seasonal component.
To forecast the seasonally adjusted component, any non-seasonal forecasting method may be used. For example, a random walk with drift model, or Holt’s method, or a non-seasonal ARIMA model (discussed in Chapter 8), may be used.