library("zoo")
## Warning: package 'zoo' was built under R version 3.2.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library("forecast")
## Warning: package 'forecast' was built under R version 3.2.3
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.2.3
## This is forecast 6.2
library("tseries")
## Warning: package 'tseries' was built under R version 3.2.3

Problem 3

Consider the monthly time series for Housing Starts (New Privately Owned Housing Units), FRED/HOUSTNSA.Note that the data is not seasonally adjusted. * Follow the Box-Jenkins methodology to build a time series model based on the data until the end of 2013. * After you check the model for adequacy use it to produce and plot a forecast until the end of 2016. * If there are several competing specifications for the model comparetheir forecasting accuracy.

Introduction:

In this problem, data is analysed about monthly privately owned housing units from 1960 to 2015.

DATA:

The data for this problem is collectd from Quand.Quandl collected the data from Federal Reserve Economic Data website. The unit of private owned house used in this anlayis is monthly data from 1960 to 2015.

priv_own_d<- read.csv(url("http://research.stlouisfed.org/fred2/data/HOUSTNSA.csv"))
priv_own<- ts(priv_own_d[,2], start=c(1959,1), frequency=12)

str(priv_own)
##  Time-Series [1:685] from 1959 to 2016: 96.2 99 127.7 150.8 152.5 ...
head(priv_own)
## [1]  96.2  99.0 127.7 150.8 152.5 147.8
tail(priv_own)
## [1]  99.2 111.6  90.9  89.9  77.5  73.6
plot(priv_own, xlab=" Years 1960-2015", ylab="Unit of Privately Owned Houses", main="New Privately Owned Housing Units, Monthly")

The graphs shows the monthly number of privatelty owned housing between 1959 and 2016. The number of house ownwer has been highly volatile. In 2008, there was lowest number of house owner and after 2010 the number of units are increasing trend.It seem the number house owners are affected by seasons, as there is increae and decrese within a year. The is also not stationary.

# ploting every year data to see the seasonal trend within a specific year   

priv_own_month <-matrix(priv_own,ncol = 12, byrow = TRUE)
## Warning in matrix(priv_own, ncol = 12, byrow = TRUE): data length [685] is
## not a sub-multiple or multiple of the number of rows [58]
par(mfrow=c(4,1))
plot(priv_own_month[1,], xlab=" Months", ylab="Unit of Privately Owned Houses", main="year 1959, Privately Owned Housing Units, Monthly")
plot(priv_own_month[2,], xlab=" Months", ylab="Unit of Privately Owned Houses", main="year 1960, Privately Owned Housing Units, Monthly")
plot(priv_own_month[3,], xlab=" Months", ylab="Unit of Privately Owned Houses", main="year 1961, Privately Owned Housing Units, Monthly")
plot(priv_own_month[4,], xlab=" Months", ylab="Unit of Privately Owned Houses", main="year 1962, Privately Owned Housing Units, Monthly")

par(mfrow=c(3,1))
plot(priv_own_month[50,], xlab=" Months", ylab="Unit of Privately Owned Houses", main="year 2011, Privately Owned Housing Units, Monthly")
plot(priv_own_month[53,], xlab=" Months", ylab="Unit of Privately Owned Houses", main="year 2014, Privately Owned Housing Units, Monthly")
plot(priv_own_month[55,], xlab=" Months", ylab="Unit of Privately Owned Houses", main="year 2016, Privately Owned Housing Units, Monthly")

The abouve plots shows that high variance among the month within a year. From April to October the number of unit owned is higher than other months.

 #split sample - into two parts in order to test the forecasting  adequecy of the model
priv_own_all <- priv_own
priv_own_2013 <- window(priv_own_all, end=c(2013,12))
priv_own_2014 <- window(priv_own_all, start=c(2014,1))

The following command is converting the data into stationary data

lpriv_own <- log(priv_own_2013)
tsdisplay(lpriv_own ,xlab=" Years 1959-2013", ylab="Log Unit of Privately Owned Houses", main=expression(log(priv_own_2013)))

There is higher variance within months of a year and across the years. The log of the data is taken to decrease the variance. The plot shows the seasonaly in the data set. Due to log the variance has decreased.

# taking the seasonal difference of the log data set
dlpriv_own12 <- diff(lpriv_own,12)
tsdisplay(dlpriv_own12,xlab=" Years 1959-2016", ylab="seasonally differenced Log Unit of Privately Owned Houses", main=expression(paste(Delta[12], "log(priv_own)")))

The above plot is clearly non-stationary.lets take further one order difference.

# taking further difference of the seasonal difference of the log data set
ddlpriv_own12 <- diff(diff(lpriv_own,12))
tsdisplay(ddlpriv_own12,xlab=" Years 1959-2016", ylab="difference of the seasonally differenced Log Unit of Privately Owned Houses", main=expression(paste(Delta, Delta[12], "log(priv_own)")),lag.max = 48)

The above plot appears stationary.

Modeling

Studying teh spike of ACF and PACF plot helps to decide on ARMA model. On PACF plot,there is 4 spike first four lag and 3 spike on seasonal order. This suggest AR(4)(3)_12 model. On the ACF plot, there is one spike on the first lag and 3 spike on seasonal order.Thi suggest MA(1)(3)_12 model.The complete model is ARMA(4,1)(3,3)_12 model.

arma1 <- Arima(ddlpriv_own12, order=c(4,0,1), seasonal=list(order=c(3,0,3),period=12))
tsdisplay(residuals(arma1))

arma1
## Series: ddlpriv_own12 
## ARIMA(4,0,1)(3,0,3)[12] with non-zero mean 
## 
## Coefficients:
## Warning in sqrt(diag(x$var.coef)): NaNs produced
##          ar1     ar2     ar3     ar4      ma1     sar1     sar2    sar3
##       0.1044  0.0668  0.1117  0.0442  -0.4499  -1.0665  -0.5303  0.0000
## s.e.  0.0027     NaN     NaN     NaN      NaN   0.0201   0.0265  0.0192
##         sma1     sma2     sma3  intercept
##       0.2484  -0.4005  -0.5299      0e+00
## s.e.     NaN      NaN   0.0187      4e-04
## 
## sigma^2 estimated as 0.007959:  log likelihood=636.82
## AIC=-1247.63   AICc=-1247.06   BIC=-1189.49
tsdiag(arma1,gof.lag=40)

The model forcastabilty is higher for short timeframe.The model forcastablity decrease after 12 lags showing seagonal effects. Lets add one more lag in AR seasonal.The complete model is ARMA(4,1)(4,3)_12 model.

arma2 <- Arima(ddlpriv_own12, order=c(4,0,1), seasonal=list(order=c(4,0,3),period=12))
tsdisplay(residuals(arma2))

arma2
## Series: ddlpriv_own12 
## ARIMA(4,0,1)(4,0,3)[12] with non-zero mean 
## 
## Coefficients:
## Warning in sqrt(diag(x$var.coef)): NaNs produced
##           ar1     ar2     ar3     ar4      ma1     sar1     sar2    sar3
##       -0.0051  0.0317  0.1043  0.0554  -0.3454  -1.2886  -0.8761  0.0429
## s.e.   0.0092  0.0049  0.0032  0.0058      NaN   0.0123   0.0268  0.0320
##          sar4    sma1     sma2     sma3  intercept
##       -0.0018  0.4901  -0.2200  -0.8845      0e+00
## s.e.   0.0224  0.0145   0.0226   0.0358      4e-04
## 
## sigma^2 estimated as 0.007748:  log likelihood=638.77
## AIC=-1249.54   AICc=-1248.87   BIC=-1186.92
tsdiag(arma2,gof.lag=40)

The model forcastabilty is still higher for short timeframe and loww for the longer time farme. .The model forcastablity decrease after 12 lags showing seagonal effects.

Forecast

Lets use the mode to forecast and check its forecasting power.

par(mfrow=c(1,1))
ARMA.fcast1 <- forecast(arma2, h=25)
plot(ARMA.fcast1, xlim=c(2000,2016))
lines(diff(diff(log(priv_own_2014),12)))

As indicated in the modeling section, the forecasting is good for early time period. The forecasting power decreases as the time increases.