Econometrics: Methods and Applications by Erasmus University Rotterdam

Week 3 Assignment: Model Selection

This document was created with R Markdown, and then printed as pdf for peer-graded evaluation purposes.

Code chunks will not be echoed in the paper.


Data set

Monthly economic data for the USA over the period 1960 through 2014. The data are from the St. Louis Federal Reserve Economic Dataset (FRED). The variables are:
- INTRATE: Federal funds interest rate
- INFL: Inflation
- PROD: Production
- UNEMPL: Unemployment
- COMMPRI: Commodity prices
- PCE: Personal consumption expenditure
- PERSINC: Personal income
- HOUST: Housing starts

Questions

This test exercise is of an applied nature and uses data that are available in the data file TestExer3. We consider the so-called Taylor rule for setting the (nominal) interest rate. This model describes the level of the nominal interest rate that the central bank sets as a function of equilibrium real interest rate and inflation, and considers the current level of inflation and production. Taylor (1993) considers the model: \[i_t =r^∗ +π_t +0.5(π_t −π^∗)+0.5g_t\] with it the Federal funds target interest rate at time \(t\), \(r^∗\) the equilibrium real federal funds rate, \(π_t\) a measure of inflation, \(π^∗\) the target inflation rate and \(g_t\) the output gap (how much actual output deviates from potential output). We simplify the Taylor rule in two manners. First, we avoid determining \(r^*\) and \(π^∗\) and simply add an intercept to the model to capture these two variables (and any other deviations in the means). Second, we consider production yy rather than the output gap. In this form the Taylor rule is \[i_t =β_1 +β_2π_t +β_3y_t +ε\]


(a) Use general-to-specific to come to a model. Start by regressing the federal funds rate on the other 7 variables and eliminate 1 variable at a time.

We create our model going backwards, starting with a complete model, that includes all possible predictors:

## 
## Call:
## lm(formula = INTRATE ~ INFL + PROD + UNEMPL + COMMPRI + PCE + 
##     PERSINC + HOUST, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4066 -1.4340 -0.1175  1.3555  7.7386 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.221161   0.244995  -0.903   0.3670    
## INFL         0.696059   0.062229  11.185  < 2e-16 ***
## PROD        -0.057743   0.039900  -1.447   0.1483    
## UNEMPL       0.102481   0.096757   1.059   0.2899    
## COMMPRI     -0.005521   0.002974  -1.857   0.0638 .  
## PCE          0.344380   0.069455   4.958 9.08e-07 ***
## PERSINC      0.246999   0.060590   4.077 5.13e-05 ***
## HOUST       -0.019411   0.004672  -4.155 3.68e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.188 on 652 degrees of freedom
## Multiple R-squared:  0.6385, Adjusted R-squared:  0.6346 
## F-statistic: 164.5 on 7 and 652 DF,  p-value: < 2.2e-16
## AIC: 2916.3  -  BIC: 2956.73

The variable with the least explanatory power, based on p-value, is Unemployment, so we create a second model that excludes it:

## 
## Call:
## lm(formula = INTRATE ~ INFL + PROD + COMMPRI + PCE + PERSINC + 
##     HOUST, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5322 -1.4982 -0.1005  1.3882  7.6954 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.290851   0.236016  -1.232   0.2183    
## INFL         0.693309   0.062180  11.150  < 2e-16 ***
## PROD        -0.025460   0.025752  -0.989   0.3232    
## COMMPRI     -0.006514   0.002822  -2.308   0.0213 *  
## PCE          0.368561   0.065602   5.618 2.86e-08 ***
## PERSINC      0.251581   0.060441   4.162 3.57e-05 ***
## HOUST       -0.021023   0.004417  -4.760 2.39e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.188 on 653 degrees of freedom
## Multiple R-squared:  0.6379, Adjusted R-squared:  0.6346 
## F-statistic: 191.7 on 6 and 653 DF,  p-value: < 2.2e-16
## AIC: 2915.44  -  BIC: 2951.37

Also Production has high p-value so we remove it for our third and final round. Infact all remaining variables has absolute t-values above 2, with p-values below 0.05 and are deemed significant:

## 
## Call:
## lm(formula = INTRATE ~ INFL + COMMPRI + PCE + PERSINC + HOUST, 
##     data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1631 -1.5244 -0.1125  1.3715  7.6725 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.240119   0.230366  -1.042  0.29764    
## INFL         0.717527   0.057152  12.555  < 2e-16 ***
## COMMPRI     -0.007501   0.002640  -2.841  0.00464 ** 
## PCE          0.340525   0.059156   5.756 1.32e-08 ***
## PERSINC      0.240242   0.059342   4.048 5.77e-05 ***
## HOUST       -0.020530   0.004389  -4.678 3.52e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.188 on 654 degrees of freedom
## Multiple R-squared:  0.6374, Adjusted R-squared:  0.6346 
## F-statistic: 229.9 on 5 and 654 DF,  p-value: < 2.2e-16
## AIC: 2914.42  -  BIC: 2945.87

The final model predicts Interest Rates using Inflation, Commodity Prices, Personal Expenditure, Personal Income, and Housing Starts.

The coefficients for Commodity Prices and Housing Starts both have a negative sign, indicating higher asset values reduce Interest Rates. This fits economic theory, with lower Interest Rates reducing the future value of money, and increasing asset values today.

The coefficients for Inflation, Personal Expenditure and Personal Income all have positive signs, indicating that higher spending leads to higher Interest Rates. This is as expected, with higher Interest Rates acting to “cool down” the economy.


(b) Use specific-to-general to come to a model. Start by regressing the federal funds rate on only a constant and add 1 variable at a time. Is the model the same as in (a)?

Running model selection forwards, we start with a model containing only an intercept, that coincides with the mean average Interest Rate over all time periods.

Using forward selection (with AIC for model selection), the first variable to be added is Inflation. This is the strongest linear predictor of Interest Rates, compared to a model with only Intercept. With the same process, we lower our AIC adding Personal Income, then Personal Expenditure, Housing Starts and Commodity Prices.

Unemployment and Production get the AIC higher so are excluded.

## 
## Call:
## lm(formula = INTRATE ~ INFL + PCE + HOUST + PERSINC + COMMPRI, 
##     data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1631 -1.5244 -0.1125  1.3715  7.6725 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.240119   0.230366  -1.042  0.29764    
## INFL         0.717527   0.057152  12.555  < 2e-16 ***
## PCE          0.340525   0.059156   5.756 1.32e-08 ***
## HOUST       -0.020530   0.004389  -4.678 3.52e-06 ***
## PERSINC      0.240242   0.059342   4.048 5.77e-05 ***
## COMMPRI     -0.007501   0.002640  -2.841  0.00464 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.188 on 654 degrees of freedom
## Multiple R-squared:  0.6374, Adjusted R-squared:  0.6346 
## F-statistic: 229.9 on 5 and 654 DF,  p-value: < 2.2e-16
## AIC: 2914.42  -  BIC: 2945.87

The final model predicts Interest Rates using Inflation, Personal Expenditure, Housing Starts, Personal Income and Commodity Prices.

Whilst the order of the variables is different to the model using backward selection, the two linear models are identical. The model in (b) is the same as in (a).


(c) Compare your model from (a) and the Taylor rule of equation (1). Consider \(R^2\), AIC and BIC.

Which of the models do you prefer?

The linear regression model from (a) has an \(R^2\) of 0.637, an AIC of 2914.42, and a BIC of 2945.87.

The linear regression using the Taylor rule of equation has an \(R^2\) of 0.575, an AIC of 3013.62, and a BIC of 3031.59.

## 
## Call:
## lm(formula = INTRATE ~ INFL + PROD, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1592 -1.6762  0.0141  1.3730  7.9203 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.24890    0.17619   7.088 3.51e-12 ***
## INFL         0.97498    0.03273  29.785  < 2e-16 ***
## PROD         0.09472    0.01971   4.805 1.92e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.364 on 657 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5734 
## F-statistic: 443.9 on 2 and 657 DF,  p-value: < 2.2e-16
## AIC: 3013.62  -  BIC: 3031.59

The \(R^2\) in our model is higher, and the AIC and BIC lower compared to Taylor rule of equation. We must therefore prefer the model from question (a) as it fits the data better. It explains more of the variation in historic Interest Rates and should be prove itself more performing in predicting future changes.


(d) Test the Taylor rule of equation (1) using the RESET test, Chow break and forecast test (with in both tests as break date January 1980) and a Jarque-Bera test. What do you conclude?

The RESET test on fitted values is not significant: it does not reject the Null hypothesis that additional variables would not improve the explanatory power of the model.

## 
##  RESET test
## 
## data:  model_t
## RESET = 2.5371, df1 = 1, df2 = 656, p-value = 0.1117

We can see that the Chow test is significant, implying a structural break in 1980.

##      F value        d.f.1        d.f.2      P value 
## 2.873501e+01 3.000000e+00 6.540000e+02 1.836802e-17

Similar results from Chow forecast (Test statistic: F=5.511 with p<0.001).

The Jarque Bera normality test is also significant: it signals that data do not have a normal distribution, rejecting the null hypotheses of normality of the residuals.

## 
##  Jarque Bera Test
## 
## data:  model_t$residuals
## X-squared = 12.444, df = 2, p-value = 0.001985

We can see this in the residual plot, with a lot of variation in the residuals. The residuals are negative before the suggested break (observation 241), and mostly positive for a few years afterwards, going back in range in the 1990’s.

We can conclude therefore that a structural break within interest rate setting policy happened in 1980. Indeed, the United States changed completely their interest rate policy with the famous Volcker Rule, from Fed Governor Volcker, probably the most influential central banker of his time.

To have the best results, we should create two models: one before and one after President Volcker.