1. Download delinquency rates and real GDP data from Quercus and put them into your working directory.

  2. Calculate the changes of real GDP using the following R codes.

CL = ts(read.csv("DRCLACBS.csv")[,2],frequency = 4, end = c(2020,2))
dat = read.csv("GDPC1.csv")
RGDP = ts(diff(dat[,2])/100,frequency = 4, end = c(2020,3))
Use the data between 1987 Q1 and 2018 Q3 to study the relationship between the changes of real GDP and delinquency rates.

Model this relationship using the transfer function noise model.(For simplicity, assume that both delinquency rates and changes of real GDP are stationary.)

Your analyses should include:

  1. Conduct prewhitening analysis to identify the lead-lag relationship between changes of real GDP and delinquency rates;

    • ARMA model for changes of real GDP and its residual ACF and PACF plots
    • Use cross correlation plot of prewhitened processes to identify transfer function (\(\nu_i\))

## Series: rgdp 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2    mean
##       0.2857  0.1926  0.8235
## s.e.  0.0865  0.0864  0.1200
## 
## sigma^2 estimated as 0.5204:  log likelihood=-137.32
## AIC=282.63   AICc=282.96   BIC=294.01
## 
## Training set error measures:
##                        ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.0008729149 0.7128208 0.5250615 -3.771976 97.28674 0.6819326
##                      ACF1
## Training set 0.0003294011

#Analysis: We use the auto.arima function in forecast package to fit rgdp, then use the \(x_t\) model to prewhiten x and y as shown above. An AR(2) model is selected based on AIC. The inverse AR roots are all inside the unit circle so the fitted ARMA model is causal and invertible. Through the LBTest plot, we can see that all points are above the redline.

Conclusion: As indicated in the above cross-correlation plot, we include \(rgdp_t\), \(rgdp_{t-1}\), \(rgdp_{t-2}\), \(rgdp_{t-3}\), \(rgdp_{t-4}\), \(rgdp_{t-5}\) in our transfer function noise model.

  1. Fit a multiple regression using the findings in the prewhitening step, i.e. \[y_t = \sum_i v_i x_{t-i} +\xi_t,~~~(1)\] where \(y_t\) and \(x_t\) denote the output and input process, respectively, and \(\xi_t\) is the noise process.(Hint: Use prewhitening to select the lagged \(\{x_i\}\) in the regression)
## 
## Call:
## lm(formula = CL ~ RGDP + RGDP1 + RGDP2 + RGDP3 + RGDP4 + RGDP5, 
##     data = dat.obs)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.18072 -0.25376  0.06047  0.33032  0.97158 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.86949    0.09895  39.106   <2e-16 ***
## RGDP        -0.12977    0.06896  -1.882   0.0623 .  
## RGDP1       -0.13089    0.07163  -1.827   0.0701 .  
## RGDP2       -0.11823    0.07290  -1.622   0.1074    
## RGDP3       -0.10950    0.07299  -1.500   0.1362    
## RGDP4       -0.13460    0.07185  -1.873   0.0635 .  
## RGDP5       -0.16141    0.06939  -2.326   0.0217 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5531 on 120 degrees of freedom
## Multiple R-squared:  0.3156, Adjusted R-squared:  0.2814 
## F-statistic: 9.224 on 6 and 120 DF,  p-value: 2.651e-08

  1. Fit a transfer function noise model using the rational distributed lag function, i.e. \[y_t = \frac{\delta(B)}{\omega(B)}x_t+n_t,~~~(2)\] where \(\delta(B)\) and \(\omega(B)\) are polynomials in the backward shift operator \(B\), and \(n_t\) follows an ARMA process. Write down the mathematical representation of the fitted model.
## Series: dat.obs[, 1] 
## Regression with ARIMA(2,0,2)(2,0,0)[4] errors 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2     sar1     sar2  intercept
##       1.8881  -0.9023  -0.5593  0.0438  -0.1497  -0.3804     3.4561
## s.e.  0.0550   0.0548   0.1086  0.0937   0.0895   0.0885     0.1725
##          RGDP    RGDP1    RGDP2    RGDP3    RGDP4    RGDP5
##       -0.0517  -0.0570  -0.0496  -0.0315  -0.0472  -0.0435
## s.e.   0.0101   0.0121   0.0129   0.0130   0.0124   0.0101
## 
## sigma^2 estimated as 0.007977:  log likelihood=130.67
## AIC=-233.33   AICc=-229.58   BIC=-193.51
## 
## Training set error measures:
##                       ME      RMSE        MAE        MPE     MAPE
## Training set 8.68984e-06 0.0846206 0.06370182 -0.1146379 2.019421
##                   MASE       ACF1
## Training set 0.2058136 0.00876814
##       ar1       ar2       ma1       ma2      sar1      sar2 intercept 
##      1.89     -0.90     -0.56      0.04     -0.15     -0.38      3.46 
##      RGDP     RGDP1     RGDP2     RGDP3     RGDP4     RGDP5 
##     -0.05     -0.06     -0.05     -0.03     -0.05     -0.04

#my answer:

Mathematically, we can express it as \[y_t = 3.46 +(-0.15y_{t-4}-0.38y_{t-8})(1.89y_{t-1} - 0.90y_{t-2} )- 0.05rgdp_t - 0.06rgdp_{t-1} - 0.05rgdp_{t-2} - 0.03rgdp_{t-3} - 0.05rgdp_{t-4} - 0.04rgdp_{t-5} + \xi_t - 0.56\xi_{t-1} + 0.04\xi_{t-2}\]

Using compact notation, the fitted model becomes \[(1+0.15B^4+0.38B^8)(1-1.89B+0.90B^2)y_t = 3.46 + (-0.05 - 0.06B - 0.05B^2 - 0.03B^3 - 0.05B^4 - 0.04B^5)rgdp_t + (1 -0.56B + 0.04B^2)n_t\]

Rearranging this equation, we have \[y_t = 0.02 - \frac{0.05- 0.06B - 0.05B^2 - 0.03B^3 - 0.05B^4 - 0.04B^5}{(1+0.15B^4+0.38B^8)(1-1.89B+0.90B^2)}rgdp_t+\frac{1--0.56B + 0.04B^2}{(1+0.15B^4+0.38B^8)(1-1.89B+0.90B^2)}n_t\]

where \[\frac{3.46}{(1+0.15+0.38)(1-1.89+0.90)}≈0.02\]

  1. Conduct the model adequacy tests (diagnostics) on the above models and conclude your inference.

#Analysis: For residual portmanteau test, all P values are above 5% significance level, so the tfn model is adequate. For the cross-correlation check plot, it did not get passed. This implies that there exists correlation between input series and noise term.

#Analysis: From Ljung-Box portmanteau test, the residuals from linear regression model are highly correlated since p-values are all significant.

    

Conduct the out of sample forecasts of the above fitted models using the remaining observations. Calculate the forecast performance using Mean squared error (MSE), Mean absolute error (MAE), and Mean absolute percentage error (MAPE):

\[RMSE = \sqrt \frac{\sum_{i=1}^L (y_{t+i}-\hat y_t(i))^2}{L}\] \[MAE = \frac{\sum_{i=1}^L \left|y_{t+i}-\hat y_t(i)\right|}{L}\] \[MAPE = \frac{1}{L}\sum_{i=1}^L \left|1-\frac{\hat y_t(i)}{y_{t+i}}\right|,\]

where \(\hat y_t(i)\) denotes the forecast at origin \(t\) with lead time \(i\)

Out-of-sample forecast accuracy
RMSE MAE MAPE
Transfer function noise model 0.78 0.45 21.25
Linear Model 1.60 1.15 53.18

#Analysis: From above data, the Transfer function noise model is better when we comparing RMSE, MAE, MAPE values since it is smaller than values from the linear Model.

    

4. Conduct the same out of sample forecasts soley on \(y_t\) using an ARIMA model. Compare and discuss its peformance metrics with the TFN model.
##  ar1  ar2 
## 0.49 0.16

This is an AR2 model, it passed the Ljung-Box portmanteau test since all dots are above the redline.

Out-of-sample forecast accuracy
RMSE MAE MAPE
TFN 0.78 0.45 21.25
ARIMA 0.15 0.08 3.85

#Conclusion: From the result above, we can conclude that ARIMA model is better since the value of RMSE, MAE, MAPE are smaller than the value from TFN model.

Conduct the same out of sample forecast analysis using forecast combination of the fitted TFN model and ARIMA model (equal weight and MSE weighting). Compare its forecast metrics with those in the previous two questions

The combined forecaster \(\hat f_t(i)\) may be given by

\[\hat f_t(i) = w_a ~ \hat y_t^{(a)}(i)+w_b~ \hat y_t^{(b)}(i),\]

where the superscripts \((a)\) and \((b)\) stand for transfer function noise model and ARIMA model, respectively. For the equal weight scheme, \(w_a = w_b = 0.5\), and for the MSE weighting scheme, its weights is the solution of

\[\min_{w_a} \sqrt {\sum_{t=1}^n \{y_t -w_a \hat y_t^{(a)}-(1-w_a)\hat y_t^{(b)}\}^2},\]

where \(w_a, w_b \in[0,1]\), \(w_a+w_b=1\), and \(\hat y_t^{(a)}\) denote the fitted value at time \(t\) in the training sample and \(n\) is the series length.

##       RMSE        MAE       MAPE 
##  0.4519857  0.2441970 11.6818376
##       RMSE        MAE       MAPE 
##  0.7610087  0.4395943 20.7906699
RMSE MAE MAPE
TFN 0.78 0.45 21.25
ARIMA 0.15 0.08 3.85
Equal-Scheme 0.45 0.24 11.68
MSE-Scheme 0.76 0.44 20.79

#Analysis: After comparing RMSE, MAE, and MAPE values of four models listed above, we can conclude that ARIMA model is better than other models since it has smaller values.