Consider the monthly time series for Housting Starts (New Privately Owned Housing Units), FRED/HOUSTNA. Note that the data is not seasonally adjusted. Following the Box-Jenkins methodology of forecasting I’ll built a time series model based on the data until the end of 2013. After I’ll check the model for adequacy and then use it to plot a forecast until the end of 2016, then I’ll compare the forecasting’s accuracy.
HStarts <- read.csv(url("https://research.stlouisfed.org/fred2/data/HOUSTNSA.csv"))
summary(HStarts)
## DATE VALUE
## 1959-01-01: 1 Min. : 31.9
## 1959-02-01: 1 1st Qu.: 91.3
## 1959-03-01: 1 Median :120.9
## 1959-04-01: 1 Mean :120.2
## 1959-05-01: 1 3rd Qu.:147.3
## 1959-06-01: 1 Max. :234.0
## (Other) :679
tail(HStarts)
## DATE VALUE
## 680 2015-08-01 99.2
## 681 2015-09-01 111.6
## 682 2015-10-01 90.9
## 683 2015-11-01 89.9
## 684 2015-12-01 77.5
## 685 2016-01-01 73.6
plot(HStarts[,2], type = "l",col="blue", xlab = "Years 1959 to 2016", ylab = "Housing Starts, Thousands", main= "Housing Starts Historical Plot")
lnHStarts <- diff(HStarts[,2])
plot(lnHStarts,type = "l",col="dark green", xlab="Years to 2016", ylab="", main="Log-Difference in Housing Starts")
Because the data is now stationary, we can go ahead and identify the basic model.
acf(lnHStarts, type="correlation", lag=100, xlab="Weekly Lag", ylab="Correlations", main="Sample ACF")
acf(lnHStarts, type="partial", lag=100, xlab="Weekly Lag", ylab="Correlations", main="Sample PACF")
As you may discern from the Sample ACF you can tell there is a basic seasonal pattern that’s repeated over repeated seasonal intervals. So we need to apply seasonal differencing (SD) and as autoregressive and moving average tools are available with the overall series, so too, are they available for seasonal phenomena using seasonal autoregressive parameter (SAR) and seasonal moving average (SMA)
SDHS <- diff(lnHStarts,12)
plot(SDHS,type = "l",col="red", xlab="Years to 2016", ylab="", main="Seasonal Log-Difference in Housing Starts")
acf(SDHS, type="correlation", lag=100, xlab="Weekly Lag", ylab="Correlations", main="Double Differenced ACF")
acf(SDHS, type="partial", lag=100, xlab="Weekly Lag", ylab="Correlations", main="Double Differenced PACF")
Given the results of the Double Differenced ACF and Doubled Differenced PACF we can start to estimate some models as see which one works best for the forecast.
ar1 <- arima(SDHS, order = c(2,0,0), seasonal = list(order = c(1,0,1), period = 12))
ar1
##
## Call:
## arima(x = SDHS, order = c(2, 0, 0), seasonal = list(order = c(1, 0, 1), period = 12))
##
## Coefficients:
## ar1 ar2 sar1 sma1 intercept
## -0.3474 -0.0999 0.1461 -0.8873 0.0029
## s.e. 0.0391 0.0384 0.0449 0.0214 0.0440
##
## sigma^2 estimated as 113: log likelihood = -2549.87, aic = 5111.74
tsdiag(ar1, gof.lag=50)
The ACF of residuals don’t show any trend; the p-values of the Ljung-Box indicates that the autocorrelations are significant untill the 10th lag.
library(forecast)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 6.2
par(mfrow = c(1,1))
fcast.ar1 <- forecast(ar1, h = 25)
par(mfrow = c(1,1))
plot(fcast.ar1)
lines(lnHStarts)