The first step is to import the number of New Privately Owned Housing Units (Thousands of Units Not Seasonally Adjusted), FRED/HOUSTNSA, using Quandl packages. We will denote FRED/HOUSTNSA as \(house_{t}\).
house <- Quandl("FRED/HOUSTNSA", type="ts")
str(house)
Time-Series [1:685] from 1959 to 2016: 96.2 99 127.7 150.8 152.5 ...
We then split the sample into two parts (up to 2013 and after 2013) in order to be able to show out model accuracy after forecast.
# split sample into two parts - estimation sample and prediction samplesplit sample into two parts - estimation sample and prediction sample
houseall <- house
house1 <- window(houseall, end=c(2013,12))
house2 <- window(houseall, start=c(2014,1))
# first part used to identify and estimate the model
house <- house1
The next step is to construct a new time series transformation of \(house_{t}\) in log (\(\log{house_{t}}\)), log-change (\(\Delta{\log{house_{t}}}\)), seasonal log change (\(\Delta_{12}{\log{house_{t}}}\)), and 2nd difference of seasonal log change (\(\Delta_{12}\Delta{\log{house_{t}}}\))). Note that since \(house_{t}\) is a monthly data, we set the seasonal lag to be 12 months.
lhouse <- log(house) #Log number of New Privately Owned Housing Units
dlhouse1 <- diff(lhouse) #Log-changes of number of New Privately Owned Housing Units
dlhouse12 <- diff(lhouse,4) #Seasonal log-changes of number of New Privately Owned Housing Units
dlhouse12_1 <- diff(diff(lhouse,12)) #2d diff of seasonal log-changes of number of New Privately Owned Housing Units
The following figures shows the original and time series transformation of FRED/HOUSTNSA, \(house_{t}\):
par(mfrow=c(2,3))
plot(house, main=expression(house))
plot(lhouse, main=expression(log(house)))
plot.new()
plot(dlhouse1, main=expression(paste(Delta, "log(house)")))
plot(dlhouse12, main=expression(paste(Delta[12], "log(house)")))
plot(dlhouse12_1, main=expression(paste(Delta, Delta[12], "log(house)")))
In the next part, we will conducting model identification, estimation, and adequacy check for all transformation (\(\Delta{\log{house_{t}}}\), \(\Delta_{12}{\log{house_{t}}}\), and \(\Delta_{12}\Delta{\log{house_{t}}}\)), with the exception of \(\log{house_{t}}\), since the time series plot indicate that \(\log{house_{t}}\) is nonstationary.
The first step in model identification is to plot the ACF and PACF for all transformation of \(\Delta{\log{house_{t}}}\):
From the ACF and PACF of \(\Delta{\log{house_{t}}}\), the most likely specification is ARIMA(1,0,0)(1,0,0)[12], this is equivalent with ARIMA(1,1,0)(1,0,0)[12] specification for \(\log{house_{t}}\). We can estimate the specification using the following code:
m1 <- arima(dlhouse1,order=c(1,0,0),seasonal=list(order=c(1,0,0),period=12))
m1
##
## Call:
## arima(x = dlhouse1, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0),
## period = 12))
##
## Coefficients:
## ar1 sar1 intercept
## -0.3009 0.7550 -0.0002
## s.e. 0.0398 0.0264 0.0127
##
## sigma^2 estimated as 0.01195: log likelihood = 518.62, aic = -1029.23
tsdiag(m1,gof.lag=12)
Based on the plots above, ARIMA(1,0,0)(1,0,0)[12] specification is borderline adequate. We can also test the specification further for adequacy using Ljung-Box test on the residuals. Given that we have 659 observations, \(\log{T}\) is 6.4907 or approximately 6, we choose \(m=6\).
m1.LB <- Box.test(m1$residuals, lag=6, type="Ljung")
m1.LB
##
## Box-Ljung test
##
## data: m1$residuals
## X-squared = 2.7538, df = 6, p-value = 0.8391
From the estimation result above, ARIMA(1,0,0)(1,0,0)[12] for \(\Delta{\log{house_{t}}}\) is actually adequate.
Comparing the result to alternative specification gives us the following (codes and outputs are suppressed for simplicity):
| Specification | Q-Statistics | AIC | BIC |
|---|---|---|---|
| ARIMA(1,0,0)(1,0,0)[12] | 0.8391 | -1029.23 | -1011.269 |
| ARIMA(2,0,0)(1,0,0)[12] | 0.9646 | -1028.92 | -1006.471 |
| ARIMA(3,0,0)(1,0,0)[12] | 0.9664 | -1026.94 | -999.998 |
| ARIMA(4,0,0)(1,0,0)[12] | 0.9999 | -1026.27 | -994.835 |
From the table above we can see that all of the specification is actually adequate based on the Q-statistics. However, our initial specification ARIMA(1,0,0)(1,0,0)[12] has the lowest AIC and BIC, indicate that it is the most adequate specification.
To further scrutinize our result, we will estimate alternative specification of using different time series transformation (i.e. \(\Delta_{12}{\log{house_{t}}}\) and \(\Delta_{12}\Delta{\log{house_{t}}}\)).
The first step in model identification is to plot the ACF and PACF for all transformation of \(\Delta_{12}{\log{house_{t}}}\):
From the ACF and PACF of \(\Delta_{12}{\log{house_{t}}}\), the most likely specification is ARIMA(2,0,0)(1,0,0)[12] with zero mean, We can estimate the specification using the following code:
m2 <- arima(dlhouse12,order=c(2,0,0),seasonal=list(order=c(1,0,0),period=12), include.mean=FALSE)
m2
##
## Call:
## arima(x = dlhouse12, order = c(2, 0, 0), seasonal = list(order = c(1, 0, 0),
## period = 12), include.mean = FALSE)
##
## Coefficients:
## ar1 ar2 sar1
## 0.4777 0.1668 0.8274
## s.e. 0.0404 0.0407 0.0222
##
## sigma^2 estimated as 0.01953: log likelihood = 352.99, aic = -697.97
tsdiag(m2,gof.lag=12)
From the estimation result above, ARIMA(2,0,0)(1,0,0)[12] for \(\Delta_{12}{\log{house_{t}}}\) is inadequate, even though there is no signigicant serial correlation in the ACF of residuals, we have extremely low value \(p\)-values for Ljung-Box statistics for lag beyond lag 2.
The first step in model identification is to plot the ACF and PACF for all transformation of \(\Delta_{12}\Delta{\log{house_{t}}}\):
From the ACF and PACF of \(\Delta_{12}\Delta{\log{house_{t}}}\), the most likely specification is ARIMA(2,0,2)(2,0,1)[12], We can estimate the specification using the following code:
m3 <- arima(dlhouse12_1,order=c(2,0,2),seasonal=list(order=c(2,0,1),period=12))
m3
##
## Call:
## arima(x = dlhouse12_1, order = c(2, 0, 2), seasonal = list(order = c(2, 0, 1),
## period = 12))
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sar2 sma1 intercept
## 0.6175 -0.1777 -0.9720 0.4326 0.0739 0.0028 -0.8890 0e+00
## s.e. 0.1861 0.1065 0.1773 0.1109 0.0458 0.0450 0.0243 4e-04
##
## sigma^2 estimated as 0.007975: log likelihood = 636.26, aic = -1254.53
tsdiag(m3,gof.lag=12)
Based on the plots above, ARIMA(2,0,2)(2,0,1)[12] specification for \(\Delta_{12}\Delta{\log{house_{t}}}\) is adequate. We can also test the specification further for adequacy using Ljung-Box test on the residuals. Given that we have 647 observations, \(\log{T}\) is 6.472 or approximately 6, we choose \(m=6\).
m3.LB <- Box.test(m3$residuals, lag=6, type="Ljung")
m3.LB
##
## Box-Ljung test
##
## data: m3$residuals
## X-squared = 2.1559, df = 6, p-value = 0.9048
Comparing the result to alternative specifications gives us the following (codes and outputs are suppressed for simplicity):
| Specification | Q-Statistics | AIC | BIC |
|---|---|---|---|
| ARIMA(1,0,0)(1,0,0)[12] | 0.0006 | -1080.11 | -1062.22 |
| ARIMA(1,0,1)(1,0,0)[12] | 0.0125 | -1085.88 | -1063.52 |
| ARIMA(1,0,1)(1,0,1)[12] | 0.1568 | -1252.18 | -1225.34 |
| ARIMA(2,0,0)(1,0,0)[12] | 0.0520 | -1090.25 | -1067.89 |
| ARIMA(2,0,1)(1,0,0)[12] | 0.0864 | -1089.22 | -1062.39 |
| ARIMA(2,0,1)(1,0,1)[12] | 0.4206 | -1253.19 | -1221.89 |
| ARIMA(2,0,0)(2,0,0)[12] | 0.0137 | -1142.05 | -1115.22 |
| ARIMA(2,0,1)(2,0,0)[12] | 0.0288 | -1141.78 | -1110.48 |
| ARIMA(2,0,1)(2,0,1)[12] | 0.4199 | -1251.19 | -1215.42 |
| ARIMA(2,0,1)(2,0,2)[12] | 0.4235 | -1250.93 | -1210.67 |
| ARIMA(2,0,2)(2,0,0)[12] | 0.8009 | -1151.18 | -1115.40 |
| ARIMA(2,0,2)(2,0,1)[12] | 0.9048 | -1254.53 | -1214.28 |
| ARIMA(2,0,2)(2,0,2)[12] | 0.8927 | -1254.21 | -1209.49 |
Our specification of ARIMA(2,0,2)(2,0,1)[12] is indeed the most adequate specification for \(\Delta_{12}\Delta{\log{house_{t}}}\) based on Q-statistics and AIC.
Based on previous result, we ended up with two potential model to be used for forecasting:
To determine the best specification to be used to forecast FRED/HOUSTNSA, we will compare the forecasting accuracy of both specification. We will forecast both \(\Delta{\log{house_{t}}}\) and \(\Delta_{12}\Delta{\log{house_{t}}}\) for 25 months ahead (up until January 2016) using estimation sample (1959 - 2013) and compare it to the actual data for the out-of-sample observation (January 2014 - January 2016).
Forecasting with ARIMA(1,0,0)(1,0,0)[12] using \(\Delta{\log{house_{t}}}\) transformation:
m1.fcast <- forecast(m1, h=25)
plot(m1.fcast, xlim=c(2012,2016))
lines(diff(log(houseall)))
Forecasting with ARIMA(2,0,2)(2,0,1)[12] using \(\Delta_{12}\Delta{\log{house_{t}}}\) transformation:
m3.fcast <- forecast(m3, h=25)
plot(m3.fcast, xlim=c(2012,2016))
lines(diff(diff(log(houseall),12)))
From our comparison of forecast accuracy, the most adequate specification to forecast FRED/HOUSTNSA is ARIMA(1,0,0)(1,0,0)[12] using \(\Delta{\log{house_{t}}}\) transformation.