ECO 5316, Spring 2016: Homework 2, Question 3

Problem 3

Consider the monthly time series for Housting Starts (New Privately Owned Housing Units), FRED/HOUSTNA. Note that the data is not seasonally adjusted. Following the Box-Jenkins methodology of forecasting I’ll built a time series model based on the data until the end of 2013. After I’ll check the model for adequacy and then use it to plot a forecast until the end of 2016, then I’ll compare the forecasting’s accuracy.

Data

HStarts <- read.csv(url("https://research.stlouisfed.org/fred2/data/HOUSTNSA.csv"))
summary(HStarts)

##          DATE         VALUE      
##  1959-01-01:  1   Min.   : 31.9  
##  1959-02-01:  1   1st Qu.: 91.3  
##  1959-03-01:  1   Median :120.9  
##  1959-04-01:  1   Mean   :120.2  
##  1959-05-01:  1   3rd Qu.:147.3  
##  1959-06-01:  1   Max.   :234.0  
##  (Other)   :679

tail(HStarts)

##           DATE VALUE
## 680 2015-08-01  99.2
## 681 2015-09-01 111.6
## 682 2015-10-01  90.9
## 683 2015-11-01  89.9
## 684 2015-12-01  77.5
## 685 2016-01-01  73.6

plot(HStarts[,2], type = "l",col="blue", xlab = "Years 1959 to 2016", ylab = "Housing Starts, Thousands", main= "Housing Starts Historical Plot")

Log-difference of the Data

lnHStarts <- diff(HStarts[,2])
plot(lnHStarts,type = "l",col="dark green", xlab="Years to 2016", ylab="", main="Log-Difference in Housing Starts")

Because the data is now stationary, we can go ahead and identify the basic model.

Plot an ACF and PACF: To help Indentify ‘p’ and ‘q’ in AR(p) and MA(q), respectively.

acf(lnHStarts, type="correlation", lag=100, xlab="Weekly Lag", ylab="Correlations", main="Sample ACF")

acf(lnHStarts, type="partial", lag=100, xlab="Weekly Lag", ylab="Correlations", main="Sample PACF")

As you may discern from the Sample ACF you can tell there is a basic seasonal pattern that’s repeated over repeated seasonal intervals. So we need to apply seasonal differencing (SD) and as autoregressive and moving average tools are available with the overall series, so too, are they available for seasonal phenomena using seasonal autoregressive parameter (SAR) and seasonal moving average (SMA)

Seasonal Differencing (SD)

SDHS <- diff(lnHStarts,12)
plot(SDHS,type = "l",col="red", xlab="Years to 2016", ylab="", main="Seasonal Log-Difference in Housing Starts")

acf(SDHS, type="correlation", lag=100, xlab="Weekly Lag", ylab="Correlations", main="Double Differenced ACF")

acf(SDHS, type="partial", lag=100, xlab="Weekly Lag", ylab="Correlations", main="Double Differenced PACF")

Given the results of the Double Differenced ACF and Doubled Differenced PACF we can start to estimate some models as see which one works best for the forecast.

Estimating A Model for The Forecast

AR(2)

ar1 <- arima(SDHS, order = c(2,0,0), seasonal = list(order = c(1,0,1), period = 12))
ar1

## 
## Call:
## arima(x = SDHS, order = c(2, 0, 0), seasonal = list(order = c(1, 0, 1), period = 12))
## 
## Coefficients:
##           ar1      ar2    sar1     sma1  intercept
##       -0.3474  -0.0999  0.1461  -0.8873     0.0029
## s.e.   0.0391   0.0384  0.0449   0.0214     0.0440
## 
## sigma^2 estimated as 113:  log likelihood = -2549.87,  aic = 5111.74

tsdiag(ar1, gof.lag=50)

The ACF of residuals don’t show any trend; the p-values of the Ljung-Box indicates that the autocorrelations are significant untill the 10th lag.

Forecasting and Evaluation

library(forecast)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: timeDate

## This is forecast 6.2

par(mfrow = c(1,1))
fcast.ar1 <- forecast(ar1, h = 25)
par(mfrow = c(1,1))
plot(fcast.ar1)
lines(lnHStarts)