Download delinquency rates and real GDP data from Quercus and put them into your working directory.
Calculate the changes of real GDP using the following R codes.
CL = ts(read.csv("DRCLACBS.csv")[,2],frequency = 4, end = c(2020,2))
dat = read.csv("GDPC1.csv")
RGDP = ts(diff(dat[,2])/100,frequency = 4, end = c(2020,3))
Your analyses should include:
Conduct prewhitening
analysis to identify the lead-lag relationship between changes of real GDP and delinquency rates;
source("preWhitening.R")
#' @prewhiten x and y
#'
mod.arma <- auto.arima(rgdp, max.p = 52, max.q = 52, stationary = TRUE)
p = mod.arma$arma[1]; q = mod.arma$arma[2]; npq = p+q
plot(mod.arma)
## Series: rgdp
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.2857 0.1926 0.8235
## s.e. 0.0865 0.0864 0.1200
##
## sigma^2 estimated as 0.5204: log likelihood=-137.32
## AIC=282.63 AICc=282.96 BIC=294.01
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0008729149 0.7128208 0.5250615 -3.771976 97.28674 0.6819326
## ACF1
## Training set 0.0003294011
#Analysis: We use the auto.arima function in forecast package to fit rgdp, then use the \(x_t\) model to prewhiten x and y as shown above. An AR(2) model is selected based on AIC. The inverse AR roots are all inside the unit circle so the fitted ARMA model is causal and invertible. Through the LBTest plot, we can see that all points are above the redline.
#' @ccf plot prewhiten x and y
#'
mod = mod.arma;nAR = mod$arma[1]; nMA = mod$arma[2]
if(nMA!=0){
xf = PreWhiten.arma(rgdp, ar = mod$coef[1:nAR],
ma = mod$coef[(1:nMA)+nAR])[-(1:nAR)]
yf = PreWhiten.arma(cl, ar = mod$coef[1:nAR],
ma=mod$coef[(1:nMA)+nAR])[-(1:nAR)]
}else{
xf = PreWhiten.arma(rgdp, ar = mod$coef[1:nAR],
ma = 0)[-(1:nAR)]
yf = PreWhiten.arma(cl, ar = mod$coef[1:nAR],
ma=0)[-(1:nAR)]
}
par(cex=0.75)
ccf(c(xf), c(yf), lwd=4, ylab="Cross-correlation functions",
main="CCF of prewhitened change of real GDP and delinquency rates")
abline(v=0, col="gold", lwd=1, lty="dashed")
text(-0.5, -0.2, "0", col=2)
text(-1.5, -0.2, "-1", col=2)
text(-2.5, -0.2, "-2", col=2)
text(-3.5, -0.2, "-3", col=2)
text(-4.5, -0.2, "-4", col=2)
text(-5.5, -0.2, "-5", col=2)
Conclusion: As indicated in the above cross-correlation plot, we include \(rgdp_t\), \(rgdp_{t-1}\), \(rgdp_{t-2}\), \(rgdp_{t-3}\), \(rgdp_{t-4}\), \(rgdp_{t-5}\) in our transfer function noise model.
prewhitening
step, i.e. \[y_t = \sum_i v_i x_{t-i} +\xi_t,~~~(1)\] where \(y_t\) and \(x_t\) denote the output and input process, respectively, and \(\xi_t\) is the noise process.(Hint: Use prewhitening
to select the lagged \(\{x_i\}\) in the regression)#' @fit Equation (1)
#'
y <- CL
x <- RGDP
dat <- ts.intersect(y,x, stats::lag(x, -1), stats::lag(x,-2), stats::lag(x,-3),stats:: lag(x,-4), stats::lag(x,-5))
colnames(dat) <- c("CL", "RGDP", "RGDP1", "RGDP2", "RGDP3", "RGDP4", "RGDP5")
dat.obs = window(dat, start = c(1987,1), end = c(2018,3))
mod.lm = lm(CL ~ RGDP + RGDP1 + RGDP2 + RGDP3 + RGDP4 + RGDP5, data = dat.obs)
summary(mod.lm)
##
## Call:
## lm(formula = CL ~ RGDP + RGDP1 + RGDP2 + RGDP3 + RGDP4 + RGDP5,
## data = dat.obs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.18072 -0.25376 0.06047 0.33032 0.97158
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.86949 0.09895 39.106 <2e-16 ***
## RGDP -0.12977 0.06896 -1.882 0.0623 .
## RGDP1 -0.13089 0.07163 -1.827 0.0701 .
## RGDP2 -0.11823 0.07290 -1.622 0.1074
## RGDP3 -0.10950 0.07299 -1.500 0.1362
## RGDP4 -0.13460 0.07185 -1.873 0.0635 .
## RGDP5 -0.16141 0.06939 -2.326 0.0217 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5531 on 120 degrees of freedom
## Multiple R-squared: 0.3156, Adjusted R-squared: 0.2814
## F-statistic: 9.224 on 6 and 120 DF, p-value: 2.651e-08
#' @plot residual ACF and PACF of the above regression
#'
par(mfrow=c(1,2))
acf(mod.lm$residuals)
pacf(mod.lm$residuals)
#' @fit Equation (2) and show the fitted model
#'
mod.tfn = auto.arima(dat.obs[,1], xreg = dat.obs[,-1], stationary = TRUE)
summary(mod.tfn)
## Series: dat.obs[, 1]
## Regression with ARIMA(2,0,2)(2,0,0)[4] errors
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sar2 intercept
## 1.8881 -0.9023 -0.5593 0.0438 -0.1497 -0.3804 3.4561
## s.e. 0.0550 0.0548 0.1086 0.0937 0.0895 0.0885 0.1725
## RGDP RGDP1 RGDP2 RGDP3 RGDP4 RGDP5
## -0.0517 -0.0570 -0.0496 -0.0315 -0.0472 -0.0435
## s.e. 0.0101 0.0121 0.0129 0.0130 0.0124 0.0101
##
## sigma^2 estimated as 0.007977: log likelihood=130.67
## AIC=-233.33 AICc=-229.58 BIC=-193.51
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 8.68984e-06 0.0846206 0.06370182 -0.1146379 2.019421
## MASE ACF1
## Training set 0.2058136 0.00876814
## ar1 ar2 ma1 ma2 sar1 sar2 intercept
## 1.89 -0.90 -0.56 0.04 -0.15 -0.38 3.46
## RGDP RGDP1 RGDP2 RGDP3 RGDP4 RGDP5
## -0.05 -0.06 -0.05 -0.03 -0.05 -0.04
#my answer:
Mathematically, we can express it as \[y_t = 3.46 +(-0.15y_{t-4}-0.38y_{t-8})(1.89y_{t-1} - 0.90y_{t-2} )- 0.05rgdp_t - 0.06rgdp_{t-1} - 0.05rgdp_{t-2} - 0.03rgdp_{t-3} - 0.05rgdp_{t-4} - 0.04rgdp_{t-5} + \xi_t - 0.56\xi_{t-1} + 0.04\xi_{t-2}\]
Using compact notation, the fitted model becomes \[(1+0.15B^4+0.38B^8)(1-1.89B+0.90B^2)y_t = 3.46 + (-0.05 - 0.06B - 0.05B^2 - 0.03B^3 - 0.05B^4 - 0.04B^5)rgdp_t + (1 -0.56B + 0.04B^2)n_t\]
Rearranging this equation, we have \[y_t = 0.02 - \frac{0.05- 0.06B - 0.05B^2 - 0.03B^3 - 0.05B^4 - 0.04B^5}{(1+0.15B^4+0.38B^8)(1-1.89B+0.90B^2)}rgdp_t+\frac{1--0.56B + 0.04B^2}{(1+0.15B^4+0.38B^8)(1-1.89B+0.90B^2)}n_t\]
where \[\frac{3.46}{(1+0.15+0.38)(1-1.89+0.90)}≈0.02\]
#Analysis: For residual portmanteau test, all P values are above 5% significance level, so the tfn model is adequate. For the cross-correlation check plot, it did not get passed. This implies that there exists correlation between input series and noise term.
#Linear Regression Model adequacy check
#' @check model adequacy of linear regression serial correlation
#'
m = 40
lags = 1:m
df <- (0+6+1):m
n = length(mod.lm$res)
rccf = ccf(as.numeric(mod$residuals), mod.lm$residuals, plot = FALSE, lag.max = m)$acf[-(1:m)]
Qm = n * (n + 2) * cumsum((rccf^2)/(n - (0:m)))[df]
pv <- 1 - pchisq(Qm, df)
a = cbind(df, Qm, pv)
#' @check model adequacy of regression crosss correlation
#'
par(mfrow = c(1,2))
LBTest(mod.lm$res, nPQ = 0, ifPlot = TRUE)
plot(x = a[,1],y = a[,3],
ylim = c(0,1), pch = 15, col =4,
ylab = "p-value", xlab = "m",
main = "Cross-correlation check")
abline(h = 0.05, col = 2)
grid()
#Analysis: From Ljung-Box portmanteau test, the residuals from linear regression model are highly correlated since p-values are all significant.
\[RMSE = \sqrt \frac{\sum_{i=1}^L (y_{t+i}-\hat y_t(i))^2}{L}\] \[MAE = \frac{\sum_{i=1}^L \left|y_{t+i}-\hat y_t(i)\right|}{L}\] \[MAPE = \frac{1}{L}\sum_{i=1}^L \left|1-\frac{\hat y_t(i)}{y_{t+i}}\right|,\]
where \(\hat y_t(i)\) denotes the forecast at origin \(t\) with lead time \(i\)
#' @forecast using tfn
#'
dat.test = window(dat, start = c(2018, 4))
pred.tfn <- forecast(mod.tfn, xreg = dat.test[, -1])
pred.lm <- forecast(mod.lm, newdata = as.data.frame(dat.test[, -1]))
#' @calculate MSE, MAE, MAPE
#'
result = rbind(accuracy(pred.tfn, dat.test[, 1])[2, c(2, 3, 5)], accuracy(pred.lm, dat.test[, 1])[2, c(2, 3, 5)])
rownames(result) = c("Transfer function noise model", "Linear Model")
knitr::kable(result, digit = 2, caption = "Out-of-sample forecast accuracy")
RMSE | MAE | MAPE | |
---|---|---|---|
Transfer function noise model | 0.78 | 0.45 | 21.25 |
Linear Model | 1.60 | 1.15 | 53.18 |
#Analysis: From above data, the Transfer function noise model is better when we comparing RMSE, MAE, MAPE values since it is smaller than values from the linear Model.
auto.arima
but ensure that the fitted model pass the Ljung-Box test.## ar1 ar2
## 0.49 0.16
This is an AR2 model, it passed the Ljung-Box portmanteau test since all dots are above the redline.
#' @forecast
pred.arima = forecast(mod.arima, h = length(dat.test[,1]))
#' @calculate MSE, MAE, MAPE
#'
result = rbind(accuracy(pred.tfn, dat.test[,1])[2,c(2,3,5)], accuracy(pred.arima,dat.test[,1])
[2,c(2,3,5)])
rownames(result) = c("TFN", "ARIMA")
knitr::kable(result, digit = 2, caption = "Out-of-sample forecast accuracy")
RMSE | MAE | MAPE | |
---|---|---|---|
TFN | 0.78 | 0.45 | 21.25 |
ARIMA | 0.15 | 0.08 | 3.85 |
#Conclusion: From the result above, we can conclude that ARIMA model is better since the value of RMSE, MAE, MAPE are smaller than the value from TFN model.
The combined forecaster \(\hat f_t(i)\) may be given by
\[\hat f_t(i) = w_a ~ \hat y_t^{(a)}(i)+w_b~ \hat y_t^{(b)}(i),\]
where the superscripts \((a)\) and \((b)\) stand for transfer function noise model and ARIMA model, respectively. For the equal weight scheme, \(w_a = w_b = 0.5\), and for the MSE weighting scheme, its weights is the solution of
\[\min_{w_a} \sqrt {\sum_{t=1}^n \{y_t -w_a \hat y_t^{(a)}-(1-w_a)\hat y_t^{(b)}\}^2},\]
where \(w_a, w_b \in[0,1]\), \(w_a+w_b=1\), and \(\hat y_t^{(a)}\) denote the fitted value at time \(t\) in the training sample and \(n\) is the series length.
#' @calculate MSE, MAE, MAPE for the equal weight forecast
#'
pred.equal.combine = 0.5 * pred.tfn[["mean"]]+0.5*pred.arima[["mean"]]
accuracy(pred.equal.combine, dat.test[,1])[1,c(2,3,5)]
## RMSE MAE MAPE
## 0.4519857 0.2441970 11.6818376
#' @calculate MSE scheme weight
#'
wa = seq(0,1,0.001)
sumsquare = sapply(wa, function(w){sum((w*mod.tfn$fitted+(1-w)*mod.arima$fitted-dat.obs[,1])^2)})
minw = wa[which.min(sumsquare)]
pred.mse.combine = minw*pred.tfn[["mean"]]+(1-minw)*pred.arima[["mean"]]
accuracy(pred.mse.combine, dat.test[,1])[1,c(2,3,5)]
## RMSE MAE MAPE
## 0.7610087 0.4395943 20.7906699
#' @calculate MSE, MAE, MAPE for the above combination forecast
#'
output = rbind(accuracy(pred.tfn, dat.test[,1])[2,c(2,3,5)], accuracy(pred.arima, dat.test[,1])[2,c(2,3,5)],
accuracy(pred.equal.combine, dat.test[,1])[1,c(2,3,5)], accuracy(pred.mse.combine,dat.test[,1])[1,c(2,3,5)])
rownames(output) = c("TFN", "ARIMA", "Equal-Scheme", "MSE-Scheme")
knitr::kable(output, digit = 2)
RMSE | MAE | MAPE | |
---|---|---|---|
TFN | 0.78 | 0.45 | 21.25 |
ARIMA | 0.15 | 0.08 | 3.85 |
Equal-Scheme | 0.45 | 0.24 | 11.68 |
MSE-Scheme | 0.76 | 0.44 | 20.79 |
#Analysis: After comparing RMSE, MAE, and MAPE values of four models listed above, we can conclude that ARIMA model is better than other models since it has smaller values.