Problem 3

The first step is to import the number of New Privately Owned Housing Units (Thousands of Units Not Seasonally Adjusted), FRED/HOUSTNSA, using Quandl packages. We will denote FRED/HOUSTNSA as \(house_{t}\).

house <- Quandl("FRED/HOUSTNSA", type="ts")
str(house)
 Time-Series [1:685] from 1959 to 2016: 96.2 99 127.7 150.8 152.5 ...

We then split the sample into two parts (up to 2013 and after 2013) in order to be able to show out model accuracy after forecast.

# split sample into two parts - estimation sample and prediction samplesplit sample into two parts - estimation sample and prediction sample
houseall <- house
house1 <- window(houseall, end=c(2013,12))
house2 <- window(houseall, start=c(2014,1))

# first part used to identify and estimate the model
house <- house1

The next step is to construct a new time series transformation of \(house_{t}\) in log (\(\log{house_{t}}\)), log-change (\(\Delta{\log{house_{t}}}\)), seasonal log change (\(\Delta_{12}{\log{house_{t}}}\)), and 2nd difference of seasonal log change (\(\Delta_{12}\Delta{\log{house_{t}}}\))). Note that since \(house_{t}\) is a monthly data, we set the seasonal lag to be 12 months.

lhouse <- log(house) #Log number of New Privately Owned Housing Units
dlhouse1 <- diff(lhouse) #Log-changes of number of New Privately Owned Housing Units
dlhouse12 <- diff(lhouse,4) #Seasonal log-changes of number of New Privately Owned Housing Units
dlhouse12_1 <- diff(diff(lhouse,12)) #2d diff of seasonal log-changes of number of New Privately Owned Housing Units

The following figures shows the original and time series transformation of FRED/HOUSTNSA, \(house_{t}\):

par(mfrow=c(2,3))
plot(house, main=expression(house))
plot(lhouse, main=expression(log(house)))
plot.new()
plot(dlhouse1, main=expression(paste(Delta, "log(house)")))
plot(dlhouse12, main=expression(paste(Delta[12], "log(house)")))
plot(dlhouse12_1, main=expression(paste(Delta, Delta[12], "log(house)")))

In the next part, we will conducting model identification, estimation, and adequacy check for all transformation (\(\Delta{\log{house_{t}}}\), \(\Delta_{12}{\log{house_{t}}}\), and \(\Delta_{12}\Delta{\log{house_{t}}}\)), with the exception of \(\log{house_{t}}\), since the time series plot indicate that \(\log{house_{t}}\) is nonstationary.

Model identification, estimation, and adequacy check for \(\Delta{\log{house_{t}}}\)

The first step in model identification is to plot the ACF and PACF for all transformation of \(\Delta{\log{house_{t}}}\):

From the ACF and PACF of \(\Delta{\log{house_{t}}}\), the most likely specification is ARIMA(1,0,0)(1,0,0)[12], this is equivalent with ARIMA(1,1,0)(1,0,0)[12] specification for \(\log{house_{t}}\). We can estimate the specification using the following code:

m1 <- arima(dlhouse1,order=c(1,0,0),seasonal=list(order=c(1,0,0),period=12))
m1
## 
## Call:
## arima(x = dlhouse1, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0), 
##     period = 12))
## 
## Coefficients:
##           ar1    sar1  intercept
##       -0.3009  0.7550    -0.0002
## s.e.   0.0398  0.0264     0.0127
## 
## sigma^2 estimated as 0.01195:  log likelihood = 518.62,  aic = -1029.23
tsdiag(m1,gof.lag=12)

Based on the plots above, ARIMA(1,0,0)(1,0,0)[12] specification is borderline adequate. We can also test the specification further for adequacy using Ljung-Box test on the residuals. Given that we have 659 observations, \(\log{T}\) is 6.4907 or approximately 6, we choose \(m=6\).

m1.LB <- Box.test(m1$residuals, lag=6, type="Ljung")
m1.LB
## 
##  Box-Ljung test
## 
## data:  m1$residuals
## X-squared = 2.7538, df = 6, p-value = 0.8391

From the estimation result above, ARIMA(1,0,0)(1,0,0)[12] for \(\Delta{\log{house_{t}}}\) is actually adequate.

Comparing the result to alternative specification gives us the following (codes and outputs are suppressed for simplicity):

Specification Q-Statistics AIC BIC
ARIMA(1,0,0)(1,0,0)[12] 0.8391 -1029.23 -1011.269
ARIMA(2,0,0)(1,0,0)[12] 0.9646 -1028.92 -1006.471
ARIMA(3,0,0)(1,0,0)[12] 0.9664 -1026.94 -999.998
ARIMA(4,0,0)(1,0,0)[12] 0.9999 -1026.27 -994.835

From the table above we can see that all of the specification is actually adequate based on the Q-statistics. However, our initial specification ARIMA(1,0,0)(1,0,0)[12] has the lowest AIC and BIC, indicate that it is the most adequate specification.

To further scrutinize our result, we will estimate alternative specification of using different time series transformation (i.e. \(\Delta_{12}{\log{house_{t}}}\) and \(\Delta_{12}\Delta{\log{house_{t}}}\)).

Model identification, estimation, and adequacy check for \(\Delta_{12}{\log{house_{t}}}\)

The first step in model identification is to plot the ACF and PACF for all transformation of \(\Delta_{12}{\log{house_{t}}}\):

From the ACF and PACF of \(\Delta_{12}{\log{house_{t}}}\), the most likely specification is ARIMA(2,0,0)(1,0,0)[12] with zero mean, We can estimate the specification using the following code:

m2 <- arima(dlhouse12,order=c(2,0,0),seasonal=list(order=c(1,0,0),period=12), include.mean=FALSE)
m2
## 
## Call:
## arima(x = dlhouse12, order = c(2, 0, 0), seasonal = list(order = c(1, 0, 0), 
##     period = 12), include.mean = FALSE)
## 
## Coefficients:
##          ar1     ar2    sar1
##       0.4777  0.1668  0.8274
## s.e.  0.0404  0.0407  0.0222
## 
## sigma^2 estimated as 0.01953:  log likelihood = 352.99,  aic = -697.97
tsdiag(m2,gof.lag=12)

From the estimation result above, ARIMA(2,0,0)(1,0,0)[12] for \(\Delta_{12}{\log{house_{t}}}\) is inadequate, even though there is no signigicant serial correlation in the ACF of residuals, we have extremely low value \(p\)-values for Ljung-Box statistics for lag beyond lag 2.

Model identification, estimation, and adequacy check for \(\Delta_{12}\Delta{\log{house_{t}}}\)

The first step in model identification is to plot the ACF and PACF for all transformation of \(\Delta_{12}\Delta{\log{house_{t}}}\):

From the ACF and PACF of \(\Delta_{12}\Delta{\log{house_{t}}}\), the most likely specification is ARIMA(2,0,2)(2,0,1)[12], We can estimate the specification using the following code:

m3 <- arima(dlhouse12_1,order=c(2,0,2),seasonal=list(order=c(2,0,1),period=12))
m3
## 
## Call:
## arima(x = dlhouse12_1, order = c(2, 0, 2), seasonal = list(order = c(2, 0, 1), 
##     period = 12))
## 
## Coefficients:
##          ar1      ar2      ma1     ma2    sar1    sar2     sma1  intercept
##       0.6175  -0.1777  -0.9720  0.4326  0.0739  0.0028  -0.8890      0e+00
## s.e.  0.1861   0.1065   0.1773  0.1109  0.0458  0.0450   0.0243      4e-04
## 
## sigma^2 estimated as 0.007975:  log likelihood = 636.26,  aic = -1254.53
tsdiag(m3,gof.lag=12)

Based on the plots above, ARIMA(2,0,2)(2,0,1)[12] specification for \(\Delta_{12}\Delta{\log{house_{t}}}\) is adequate. We can also test the specification further for adequacy using Ljung-Box test on the residuals. Given that we have 647 observations, \(\log{T}\) is 6.472 or approximately 6, we choose \(m=6\).

m3.LB <- Box.test(m3$residuals, lag=6, type="Ljung")
m3.LB
## 
##  Box-Ljung test
## 
## data:  m3$residuals
## X-squared = 2.1559, df = 6, p-value = 0.9048

Comparing the result to alternative specifications gives us the following (codes and outputs are suppressed for simplicity):

Specification Q-Statistics AIC BIC
ARIMA(1,0,0)(1,0,0)[12] 0.0006 -1080.11 -1062.22
ARIMA(1,0,1)(1,0,0)[12] 0.0125 -1085.88 -1063.52
ARIMA(1,0,1)(1,0,1)[12] 0.1568 -1252.18 -1225.34
ARIMA(2,0,0)(1,0,0)[12] 0.0520 -1090.25 -1067.89
ARIMA(2,0,1)(1,0,0)[12] 0.0864 -1089.22 -1062.39
ARIMA(2,0,1)(1,0,1)[12] 0.4206 -1253.19 -1221.89
ARIMA(2,0,0)(2,0,0)[12] 0.0137 -1142.05 -1115.22
ARIMA(2,0,1)(2,0,0)[12] 0.0288 -1141.78 -1110.48
ARIMA(2,0,1)(2,0,1)[12] 0.4199 -1251.19 -1215.42
ARIMA(2,0,1)(2,0,2)[12] 0.4235 -1250.93 -1210.67
ARIMA(2,0,2)(2,0,0)[12] 0.8009 -1151.18 -1115.40
ARIMA(2,0,2)(2,0,1)[12] 0.9048 -1254.53 -1214.28
ARIMA(2,0,2)(2,0,2)[12] 0.8927 -1254.21 -1209.49

Our specification of ARIMA(2,0,2)(2,0,1)[12] is indeed the most adequate specification for \(\Delta_{12}\Delta{\log{house_{t}}}\) based on Q-statistics and AIC.

Forecasting

Based on previous result, we ended up with two potential model to be used for forecasting:

  1. ARIMA(1,0,0)(1,0,0)[12] specification for \(\Delta{\log{house_{t}}}\)
  2. ARIMA(2,0,2)(2,0,1)[12] specification for \(\Delta_{12}\Delta{\log{house_{t}}}\)

To determine the best specification to be used to forecast FRED/HOUSTNSA, we will compare the forecasting accuracy of both specification. We will forecast both \(\Delta{\log{house_{t}}}\) and \(\Delta_{12}\Delta{\log{house_{t}}}\) for 25 months ahead (up until January 2016) using estimation sample (1959 - 2013) and compare it to the actual data for the out-of-sample observation (January 2014 - January 2016).

Forecasting with ARIMA(1,0,0)(1,0,0)[12] using \(\Delta{\log{house_{t}}}\) transformation:

m1.fcast <- forecast(m1, h=25)
plot(m1.fcast, xlim=c(2012,2016))
lines(diff(log(houseall)))

Forecasting with ARIMA(2,0,2)(2,0,1)[12] using \(\Delta_{12}\Delta{\log{house_{t}}}\) transformation:

m3.fcast <- forecast(m3, h=25)
plot(m3.fcast, xlim=c(2012,2016))
lines(diff(diff(log(houseall),12)))

Conclusion

From our comparison of forecast accuracy, the most adequate specification to forecast FRED/HOUSTNSA is ARIMA(1,0,0)(1,0,0)[12] using \(\Delta{\log{house_{t}}}\) transformation.