Q3

We consider monthly time series data for Housing Starts from the FRED Database. This data is not seasonally adjusted and we restrict the data to an end date of December 2013. We look at the differences year over year (yoy) of our time series data. Looking at yoy differences will seasonally adjust our data.

\[yoy_t = housing_t - housing_{t-12} \]

Our \(housing_t\) value is the monthly data for Housing Starts. Next we plot the data to make sure it is at least weakly stationary.

Plot the Data

library("Quandl")
housing<-Quandl("FRED/HOUSTNSA", type="zoo")
housing2<-window(housing,end="Dec 2013")
season<-diff(housing2,12)

fullseason<-diff(housing,12)

We plot the raw housing data and the seasonally adjusted housing data. We leave the entire data series in the “fullseason” object for later use at the end when we test our forecasting estimators.

par(mfrow=c(1,2))
plot(housing2,xlab="Time", ylab="Housing Data", main="Raw Housing Data")
plot(season,xlab="Time", ylab="Year over Year",main="Seasonally Adj Housing")

We graph the time series \(yoy_t = housing_t - housing_{t-12}\) and the yoy do suggest the seasonally adjusted data is weakly stationarity. So we do not have to take log differencing values like in the previous problems. Next we should examine the ACF and PACF of the seasonally adjusted data to determine which models should be used for our estimation.

ACF & PACF Suggestions for Estimated Model

Lets look at the ACF and PACF of our data.

par(mfrow=c(1,2))
acf(season,type='correlation',na.action=na.pass,lag=96)
acf(season,type='partial',na.action=na.pass,lag=96)

There does not seem to be a clear cut answer. However, I can make a guess that it is an AR(2) model with a high first coefficient and a negative small coefficient. Lets look at the Ljung-Box statistic to see if that makes sense.

Ljung-Box Q Statistic

ARMA22 <- arima(season, order=c(2,0,2))
tsdiag(ARMA22,gof.lag=24)

Looking at our results with the arima function, we find that an ARIMA (2,2) model may provide the closest values that we may have some semblance of white noise.

Next we examine our statistic values.

##       ar1       ar2       ma1       ma2 intercept 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.7895113

We see that our ARMA(2,2) term is very significant.

Lets also look at ML to see what it suggests in its AIC and BIC check suggestions.

AIC Check

ml <- ar(season,method="mle")
ml$order
## [1] 12
ml$aic
##         0         1         2         3         4         5         6 
## 902.82691 107.98011  39.70776  38.27675  40.22692  37.53032  38.39098 
##         7         8         9        10        11        12 
##  36.77910  36.54342  38.53976  30.95503  32.80236   0.00000

We can see that ML suggests AR(12). So we have two potential estimates. We have ARMA(2,2) and AR(12). Lets look at the BIC check to see which one is better using that measure.

BIC Check

BIC(ARMA22)
## [1] 5204.8
AR12<-arima(season, order=c(12,0,0))
BIC(AR12)
## [1] 5213.925

Our BIC test shows that ARMA(2,2) is a better estimate than AR(12) because the BIC is lower for ARMA(2,2)

Forecasts

Now we will construct forecasts using the ARMA(2,2) model and estimate how close we get to the actual housing starts values from Jan 2014-Dec 2015. We will also do the same with the AR(12) to see which one is a better estimator.

library("forecast")
ARMA22fore<-forecast.Arima(ARMA22,h=24)
AR12fore<-forecast(AR12,h=24)

With our forecasts, we plot them with the actual data.

par(mfrow=c(1,2), cex=0.65)
plot(ARMA22fore, xlim=c(2010,2015),ylim=c(-60,60))
lines(ARMA22fore$mean, type="p", pch=16, col="blue")
lines(fullseason, type="o", pch=16)
plot(AR12fore, xlim=c(2010,2015),ylim=c(-60,60))
lines(AR12fore$mean, type="p", pch=16, col="blue")
lines(fullseason, type="o", pch=16)

As we see, both estimaors look to be relatively similar in its forecasts. It does not seem to be that we can definitely say that the ARMA(2,2) model is significantly better than the AR(12) model.