We consider monthly time series data for Housing Starts from the FRED Database. This data is not seasonally adjusted and we restrict the data to an end date of December 2013. We look at the differences year over year (yoy) of our time series data. Looking at yoy differences will seasonally adjust our data.
\[yoy_t = housing_t - housing_{t-12} \]
Our \(housing_t\) value is the monthly data for Housing Starts. Next we plot the data to make sure it is at least weakly stationary.
library("Quandl")
housing<-Quandl("FRED/HOUSTNSA", type="zoo")
housing2<-window(housing,end="Dec 2013")
season<-diff(housing2,12)
fullseason<-diff(housing,12)
We plot the raw housing data and the seasonally adjusted housing data. We leave the entire data series in the “fullseason” object for later use at the end when we test our forecasting estimators.
par(mfrow=c(1,2))
plot(housing2,xlab="Time", ylab="Housing Data", main="Raw Housing Data")
plot(season,xlab="Time", ylab="Year over Year",main="Seasonally Adj Housing")
We graph the time series \(yoy_t = housing_t - housing_{t-12}\) and the yoy do suggest the seasonally adjusted data is weakly stationarity. So we do not have to take log differencing values like in the previous problems. Next we should examine the ACF and PACF of the seasonally adjusted data to determine which models should be used for our estimation.
Lets look at the ACF and PACF of our data.
par(mfrow=c(1,2))
acf(season,type='correlation',na.action=na.pass,lag=96)
acf(season,type='partial',na.action=na.pass,lag=96)
There does not seem to be a clear cut answer. However, I can make a guess that it is an AR(2) model with a high first coefficient and a negative small coefficient. Lets look at the Ljung-Box statistic to see if that makes sense.
ARMA22 <- arima(season, order=c(2,0,2))
tsdiag(ARMA22,gof.lag=24)
Looking at our results with the arima function, we find that an ARIMA (2,2) model may provide the closest values that we may have some semblance of white noise.
Next we examine our statistic values.
## ar1 ar2 ma1 ma2 intercept
## 0.0000000 0.0000000 0.0000000 0.0000000 0.7895113
We see that our ARMA(2,2) term is very significant.
Lets also look at ML to see what it suggests in its AIC and BIC check suggestions.
ml <- ar(season,method="mle")
ml$order
## [1] 12
ml$aic
## 0 1 2 3 4 5 6
## 902.82691 107.98011 39.70776 38.27675 40.22692 37.53032 38.39098
## 7 8 9 10 11 12
## 36.77910 36.54342 38.53976 30.95503 32.80236 0.00000
We can see that ML suggests AR(12). So we have two potential estimates. We have ARMA(2,2) and AR(12). Lets look at the BIC check to see which one is better using that measure.
BIC(ARMA22)
## [1] 5204.8
AR12<-arima(season, order=c(12,0,0))
BIC(AR12)
## [1] 5213.925
Our BIC test shows that ARMA(2,2) is a better estimate than AR(12) because the BIC is lower for ARMA(2,2)
Now we will construct forecasts using the ARMA(2,2) model and estimate how close we get to the actual housing starts values from Jan 2014-Dec 2015. We will also do the same with the AR(12) to see which one is a better estimator.
library("forecast")
ARMA22fore<-forecast.Arima(ARMA22,h=24)
AR12fore<-forecast(AR12,h=24)
With our forecasts, we plot them with the actual data.
par(mfrow=c(1,2), cex=0.65)
plot(ARMA22fore, xlim=c(2010,2015),ylim=c(-60,60))
lines(ARMA22fore$mean, type="p", pch=16, col="blue")
lines(fullseason, type="o", pch=16)
plot(AR12fore, xlim=c(2010,2015),ylim=c(-60,60))
lines(AR12fore$mean, type="p", pch=16, col="blue")
lines(fullseason, type="o", pch=16)
As we see, both estimaors look to be relatively similar in its forecasts. It does not seem to be that we can definitely say that the ARMA(2,2) model is significantly better than the AR(12) model.