MOOC Econometrics

Test Exercise 3

Notes:

• See website for how to submit your answers and how feedback is organized. • This exercise uses the datafile TestExer3 and requires a computer. Goals and skills being used: • Experience the process of model selection. • Apply methods to compare models. • Apply tests to evaluate a model.

Questions

This test exercise is of an applied nature and uses data that are available in the data file TestExer3. We consider the so-called Taylor rule for setting the (nominal) interest rate. This model describes the level of the nominal interest rate that the central bank sets as a function of equilibrium real interest rate and inflation, and considers the current level of inflation and production. Taylor (1993)1 considers the model:

# it = r∗ + πt + 0.5(πt − π∗) + 0.5gt,

with it the Federal funds target interest rate at time t, r∗ the equilibrium real federal funds rate, πt a measure of inflation, π∗ the target inflation rate and gt the output gap (how much actual output deviates from potential output). We simplify the Taylor rule in two manners. First, we avoid determining r∗ and π∗ and simply add an intercept to the model to capture these two variables (and any other deviations in the means). Second, we consider production yy rather than the output gap. In this form the Taylor rule is

#it = β1 + β2πt + β3yt + εt. (1)

Monthly data are available for the USA over the period 1960 through 2014 for the following variables:

• INTRATE: Federal funds interest rate

• INFL: Inflation

• PROD: Production

• UNEMPL: Unemployment

• COMMPRI: Commodity prices

• PCE: Personal consumption expenditure

• PERSINC: Personal income

• HOUST: Housing starts

(a) Use general-to-specific to come to a model. Start by regressing the federal funds rate on the other 7 variables and eliminate 1 variable at a time.

First we need to load the data and summarize the variables:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(readxl)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
df <- read_excel("finaldata.xlsx",col_names = TRUE)
head(df)
## # A tibble: 6 x 9
##   OBS    INTRATE  INFL   PROD UNEMPL COMMPRI   PCE PERSINC  HOUST
##   <chr>    <dbl> <dbl>  <dbl>  <dbl>   <dbl> <dbl>   <dbl>  <dbl>
## 1 1960:1    3.99  1.24 10.0     3.42    7.95  5.71  1.68   -11.9 
## 2 1960:2    3.97  1.41  6.96    3.47   -8.56  5.06  1.33    -9.84
## 3 1960:3    3.84  1.52  4.50    2.72  -16.8   5.56  0.892  -31.5 
## 4 1960:4    3.92  1.93  1.51    2.80   -5.03  7.77  0.676  -18.9 
## 5 1960:5    3.85  1.83 -0.114   1.73  -12.4   4.39  0.337  -15.2 
## 6 1960:6    3.32  1.72 -1.48    1.24  -24.3   3.74  0.0672 -17.0
start_date <- as.Date("1960/01/01")
dates <- seq(start_date, by= "month", length.out = 660)
df$dates <- dates

head(df)
## # A tibble: 6 x 10
##   OBS    INTRATE  INFL   PROD UNEMPL COMMPRI   PCE PERSINC  HOUST dates     
##   <chr>    <dbl> <dbl>  <dbl>  <dbl>   <dbl> <dbl>   <dbl>  <dbl> <date>    
## 1 1960:1    3.99  1.24 10.0     3.42    7.95  5.71  1.68   -11.9  1960-01-01
## 2 1960:2    3.97  1.41  6.96    3.47   -8.56  5.06  1.33    -9.84 1960-02-01
## 3 1960:3    3.84  1.52  4.50    2.72  -16.8   5.56  0.892  -31.5  1960-03-01
## 4 1960:4    3.92  1.93  1.51    2.80   -5.03  7.77  0.676  -18.9  1960-04-01
## 5 1960:5    3.85  1.83 -0.114   1.73  -12.4   4.39  0.337  -15.2  1960-05-01
## 6 1960:6    3.32  1.72 -1.48    1.24  -24.3   3.74  0.0672 -17.0  1960-06-01
df%>% ggplot(aes(dates,INTRATE)) + geom_line()

So next we need to fit the model:

fit <- lm(formula = INTRATE ~ INFL + PROD + UNEMPL + COMMPRI + PCE + PERSINC + HOUST, data = df)
summary(fit)

Call:
lm(formula = INTRATE ~ INFL + PROD + UNEMPL + COMMPRI + PCE + 
    PERSINC + HOUST, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.4066 -1.4340 -0.1175  1.3555  7.7386 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.221161   0.244995  -0.903   0.3670    
INFL         0.696059   0.062229  11.185  < 2e-16 ***
PROD        -0.057743   0.039900  -1.447   0.1483    
UNEMPL       0.102481   0.096757   1.059   0.2899    
COMMPRI     -0.005521   0.002974  -1.857   0.0638 .  
PCE          0.344380   0.069455   4.958 9.08e-07 ***
PERSINC      0.246999   0.060590   4.077 5.13e-05 ***
HOUST       -0.019411   0.004672  -4.155 3.68e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.188 on 652 degrees of freedom
Multiple R-squared:  0.6385,    Adjusted R-squared:  0.6346 
F-statistic: 164.5 on 7 and 652 DF,  p-value: < 2.2e-16
print(paste("AIC:" ,AIC(fit)," and BIC :", BIC(fit)))
[1] "AIC: 2916.30140865829  and BIC : 2956.73156717348"

The variable with the least explanatory power, based on p-value, is Unemployment, so is necessary to eliminate it and create a second model that excludes it:

fit <- lm(formula = INTRATE ~ INFL + PROD + COMMPRI + PCE + PERSINC + HOUST, data = df)
summary(fit)
## 
## Call:
## lm(formula = INTRATE ~ INFL + PROD + COMMPRI + PCE + PERSINC + 
##     HOUST, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5322 -1.4982 -0.1005  1.3882  7.6954 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.290851   0.236016  -1.232   0.2183    
## INFL         0.693309   0.062180  11.150  < 2e-16 ***
## PROD        -0.025460   0.025752  -0.989   0.3232    
## COMMPRI     -0.006514   0.002822  -2.308   0.0213 *  
## PCE          0.368561   0.065602   5.618 2.86e-08 ***
## PERSINC      0.251581   0.060441   4.162 3.57e-05 ***
## HOUST       -0.021023   0.004417  -4.760 2.39e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.188 on 653 degrees of freedom
## Multiple R-squared:  0.6379, Adjusted R-squared:  0.6346 
## F-statistic: 191.7 on 6 and 653 DF,  p-value: < 2.2e-16
print(paste("AIC:" ,AIC(fit)," and BIC :", BIC(fit)))
## [1] "AIC: 2915.43601815001  and BIC : 2951.37393683018"

Also Production has high p-value so we remove it for our third and final round. In fact all remaining variables has absolute t-values above 2, with p-values below 0.05 so they are significant:

fit <- lm(formula = INTRATE ~ INFL + COMMPRI + PCE + PERSINC + HOUST, data = df)
summary(fit)
## 
## Call:
## lm(formula = INTRATE ~ INFL + COMMPRI + PCE + PERSINC + HOUST, 
##     data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1631 -1.5244 -0.1125  1.3715  7.6725 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.240119   0.230366  -1.042  0.29764    
## INFL         0.717527   0.057152  12.555  < 2e-16 ***
## COMMPRI     -0.007501   0.002640  -2.841  0.00464 ** 
## PCE          0.340525   0.059156   5.756 1.32e-08 ***
## PERSINC      0.240242   0.059342   4.048 5.77e-05 ***
## HOUST       -0.020530   0.004389  -4.678 3.52e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.188 on 654 degrees of freedom
## Multiple R-squared:  0.6374, Adjusted R-squared:  0.6346 
## F-statistic: 229.9 on 5 and 654 DF,  p-value: < 2.2e-16
print(paste("AIC:" ,AIC(fit)," and BIC :", BIC(fit)))
## [1] "AIC: 2914.42324696844  and BIC : 2945.86892581358"
R2_A <- 0.6374

Inflation, Commodity Prices, Personal Expenditure, Personal Income, and Housing Starts, predicts Interest Rates.

Commodity Prices and Housing Starts have a negative relationship with Interest Rates, so higher asset values reduce Interest Rates. This fits economic theory, with lower Interest Rates reducing the future value of money, and increasing asset values today.

On the other hand, Inflation, Personal Expenditure and Personal Income all have positive relationship, so higher spending leads to higher Interest Rates. This is as expected, with higher Interest Rates acting to “cool down” the economy.

(b) Use specific-to-general to come to a model. Start by regressing the federal funds rate on only a constant and add 1 variable at a time. Is the model the same as in (a)?

First we start with a model containing only an intercept(the mean average Interest Rate over all time periods), then, thanks to AIC y BIC we added Inflation and continuing with this process is necessary to add Personal Income, then Personal Expenditure, Housing Starts and Commodity Prices.

Also as you know, Unemployment and Production get the AIC higher so are excluded.So the final model shows us:

lm(formula = INTRATE ~ INFL + PCE + HOUST + PERSINC + COMMPRI, data = df)
## 
## Call:
## lm(formula = INTRATE ~ INFL + PCE + HOUST + PERSINC + COMMPRI, 
##     data = df)
## 
## Coefficients:
## (Intercept)         INFL          PCE        HOUST      PERSINC      COMMPRI  
##   -0.240119     0.717527     0.340525    -0.020530     0.240242    -0.007501

Interest Rates uses Inflation, Personal Expenditure, Housing Starts, Personal Income and Commodity Prices.

While the order of the variables is different to the model using backward selection, the two linear models are identical. The model in (b) is the same as in (a).

(c) Compare your model from (a) and the Taylor rule of equation (1). Consider R2, AIC and BIC. Which of the models do you prefer?

So we need to remember that:

print(paste("The model A has R square :",R2_A))
## [1] "The model A has R square : 0.6374"
print(paste("AIC:" ,AIC(fit)," and BIC :", BIC(fit)))
## [1] "AIC: 2914.42324696844  and BIC : 2945.86892581358"

Now using the Taylor Rule:

taylor <-lm(formula = INTRATE ~ INFL + PROD, data = df)
summary(taylor)
## 
## Call:
## lm(formula = INTRATE ~ INFL + PROD, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1592 -1.6762  0.0141  1.3730  7.9203 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.24890    0.17619   7.088 3.51e-12 ***
## INFL         0.97498    0.03273  29.785  < 2e-16 ***
## PROD         0.09472    0.01971   4.805 1.92e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.364 on 657 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5734 
## F-statistic: 443.9 on 2 and 657 DF,  p-value: < 2.2e-16
R2_taylor <- 0.5747
print(paste("The model Taylor Rule(model B) has R square :",R2_taylor))
## [1] "The model Taylor Rule(model B) has R square : 0.5747"
print(paste("AIC:" ,AIC(taylor)," and BIC :", BIC(taylor)))
## [1] "AIC: 3013.61634323759  and BIC : 3031.58530257768"

The R2 in our model is higher, and the AIC and BIC lower compared to Taylor rule of equation. We must therefore prefer the model from question (a) as it fits the data better. It explains more of the variation in historic Interest Rates and should be prove itself more performing in predicting future changes.

resid <- residuals.lm(fit)
df$Resid_model_a <- resid
df %>% ggplot(aes(Resid_model_a,INTRATE))+ geom_point()

(d) Test the Taylor rule of equation (1) using the RESET test, Chow break and forecast test (with in both tests as break date January 1980) and a Jarque-Bera test. What do you conclude?

The RESET test on fitted values is not significant: it does not reject the Null hypothesis that additional variables would not improve the explanatory power of the model.

library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
taylor <-lm(formula = INTRATE ~ INFL + PROD, data = df)
summary(taylor)
## 
## Call:
## lm(formula = INTRATE ~ INFL + PROD, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1592 -1.6762  0.0141  1.3730  7.9203 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.24890    0.17619   7.088 3.51e-12 ***
## INFL         0.97498    0.03273  29.785  < 2e-16 ***
## PROD         0.09472    0.01971   4.805 1.92e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.364 on 657 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5734 
## F-statistic: 443.9 on 2 and 657 DF,  p-value: < 2.2e-16
resettest(taylor)
## 
##  RESET test
## 
## data:  taylor
## RESET = 2.2578, df1 = 2, df2 = 655, p-value = 0.1054
# Ho : There is linearity in the model

So we accept the Ho.

We can see that the Chow test is significant, implying a structural break in 1980 as we did in the classes previously.

To know if there is a structural change:

# Testing 
library(strucchange)
## Warning: package 'strucchange' was built under R version 4.0.2
## Loading required package: sandwich
sctest(taylor)
## 
##  M-fluctuation test
## 
## data:  taylor
## f(efp) = 5.6454, p-value < 2.2e-16
#HO: There is no structural change

#Chow break testing

library(gap)
## Warning: package 'gap' was built under R version 4.0.2
## gap version 1.2.2
grp <- df[df$dates < "1980-01-01", ]
x1 <- grp[, c("INFL", "PROD")]; y1 <- data.frame( INTRATE = grp["INTRATE"] )
grp <- df[df$dates >= "1980-01-01", ]
x2 <- grp[, c("INFL", "PROD")]; y2 <- data.frame( INTRATE = grp["INTRATE"] )
#chow.test
chow.test(y1, x1, y2, x2)
##      F value        d.f.1        d.f.2      P value 
## 2.873501e+01 3.000000e+00 6.540000e+02 1.836802e-17

There is structural change and that change is in 1980, Similar results from Chow forecast (Test statistic: F=5.511 with p<0.001).

Finally the Shapiro test:

library(nortest)
hist(residuals.lm(taylor))

boxplot(residuals.lm(taylor))

shapiro.test(residuals.lm(taylor))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals.lm(taylor)
## W = 0.987, p-value = 1.323e-05
#ho : There is normal distribution 
qqplot(df$INTRATE,predict(taylor))

So we can conclude that that we need to reject the Ho,and say that there is no normal distribution in the residuals.