• See website for how to submit your answers and how feedback is organized. • This exercise uses the datafile TestExer3 and requires a computer. Goals and skills being used: • Experience the process of model selection. • Apply methods to compare models. • Apply tests to evaluate a model.
This test exercise is of an applied nature and uses data that are available in the data file TestExer3. We consider the so-called Taylor rule for setting the (nominal) interest rate. This model describes the level of the nominal interest rate that the central bank sets as a function of equilibrium real interest rate and inflation, and considers the current level of inflation and production. Taylor (1993)1 considers the model:
# it = r∗ + πt + 0.5(πt − π∗) + 0.5gt,
with it the Federal funds target interest rate at time t, r∗ the equilibrium real federal funds rate, πt a measure of inflation, π∗ the target inflation rate and gt the output gap (how much actual output deviates from potential output). We simplify the Taylor rule in two manners. First, we avoid determining r∗ and π∗ and simply add an intercept to the model to capture these two variables (and any other deviations in the means). Second, we consider production yy rather than the output gap. In this form the Taylor rule is
#it = β1 + β2πt + β3yt + εt. (1)
Monthly data are available for the USA over the period 1960 through 2014 for the following variables:
• INTRATE: Federal funds interest rate
• INFL: Inflation
• PROD: Production
• UNEMPL: Unemployment
• COMMPRI: Commodity prices
• PCE: Personal consumption expenditure
• PERSINC: Personal income
• HOUST: Housing starts
First we need to load the data and summarize the variables:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(readxl)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
df <- read_excel("finaldata.xlsx",col_names = TRUE)
head(df)
## # A tibble: 6 x 9
## OBS INTRATE INFL PROD UNEMPL COMMPRI PCE PERSINC HOUST
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1960:1 3.99 1.24 10.0 3.42 7.95 5.71 1.68 -11.9
## 2 1960:2 3.97 1.41 6.96 3.47 -8.56 5.06 1.33 -9.84
## 3 1960:3 3.84 1.52 4.50 2.72 -16.8 5.56 0.892 -31.5
## 4 1960:4 3.92 1.93 1.51 2.80 -5.03 7.77 0.676 -18.9
## 5 1960:5 3.85 1.83 -0.114 1.73 -12.4 4.39 0.337 -15.2
## 6 1960:6 3.32 1.72 -1.48 1.24 -24.3 3.74 0.0672 -17.0
start_date <- as.Date("1960/01/01")
dates <- seq(start_date, by= "month", length.out = 660)
df$dates <- dates
head(df)
## # A tibble: 6 x 10
## OBS INTRATE INFL PROD UNEMPL COMMPRI PCE PERSINC HOUST dates
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <date>
## 1 1960:1 3.99 1.24 10.0 3.42 7.95 5.71 1.68 -11.9 1960-01-01
## 2 1960:2 3.97 1.41 6.96 3.47 -8.56 5.06 1.33 -9.84 1960-02-01
## 3 1960:3 3.84 1.52 4.50 2.72 -16.8 5.56 0.892 -31.5 1960-03-01
## 4 1960:4 3.92 1.93 1.51 2.80 -5.03 7.77 0.676 -18.9 1960-04-01
## 5 1960:5 3.85 1.83 -0.114 1.73 -12.4 4.39 0.337 -15.2 1960-05-01
## 6 1960:6 3.32 1.72 -1.48 1.24 -24.3 3.74 0.0672 -17.0 1960-06-01
df%>% ggplot(aes(dates,INTRATE)) + geom_line()
So next we need to fit the model:
fit <- lm(formula = INTRATE ~ INFL + PROD + UNEMPL + COMMPRI + PCE + PERSINC + HOUST, data = df)
summary(fit)
Call:
lm(formula = INTRATE ~ INFL + PROD + UNEMPL + COMMPRI + PCE +
PERSINC + HOUST, data = df)
Residuals:
Min 1Q Median 3Q Max
-7.4066 -1.4340 -0.1175 1.3555 7.7386
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.221161 0.244995 -0.903 0.3670
INFL 0.696059 0.062229 11.185 < 2e-16 ***
PROD -0.057743 0.039900 -1.447 0.1483
UNEMPL 0.102481 0.096757 1.059 0.2899
COMMPRI -0.005521 0.002974 -1.857 0.0638 .
PCE 0.344380 0.069455 4.958 9.08e-07 ***
PERSINC 0.246999 0.060590 4.077 5.13e-05 ***
HOUST -0.019411 0.004672 -4.155 3.68e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.188 on 652 degrees of freedom
Multiple R-squared: 0.6385, Adjusted R-squared: 0.6346
F-statistic: 164.5 on 7 and 652 DF, p-value: < 2.2e-16
print(paste("AIC:" ,AIC(fit)," and BIC :", BIC(fit)))
[1] "AIC: 2916.30140865829 and BIC : 2956.73156717348"
The variable with the least explanatory power, based on p-value, is Unemployment, so is necessary to eliminate it and create a second model that excludes it:
fit <- lm(formula = INTRATE ~ INFL + PROD + COMMPRI + PCE + PERSINC + HOUST, data = df)
summary(fit)
##
## Call:
## lm(formula = INTRATE ~ INFL + PROD + COMMPRI + PCE + PERSINC +
## HOUST, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5322 -1.4982 -0.1005 1.3882 7.6954
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.290851 0.236016 -1.232 0.2183
## INFL 0.693309 0.062180 11.150 < 2e-16 ***
## PROD -0.025460 0.025752 -0.989 0.3232
## COMMPRI -0.006514 0.002822 -2.308 0.0213 *
## PCE 0.368561 0.065602 5.618 2.86e-08 ***
## PERSINC 0.251581 0.060441 4.162 3.57e-05 ***
## HOUST -0.021023 0.004417 -4.760 2.39e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.188 on 653 degrees of freedom
## Multiple R-squared: 0.6379, Adjusted R-squared: 0.6346
## F-statistic: 191.7 on 6 and 653 DF, p-value: < 2.2e-16
print(paste("AIC:" ,AIC(fit)," and BIC :", BIC(fit)))
## [1] "AIC: 2915.43601815001 and BIC : 2951.37393683018"
Also Production has high p-value so we remove it for our third and final round. In fact all remaining variables has absolute t-values above 2, with p-values below 0.05 so they are significant:
fit <- lm(formula = INTRATE ~ INFL + COMMPRI + PCE + PERSINC + HOUST, data = df)
summary(fit)
##
## Call:
## lm(formula = INTRATE ~ INFL + COMMPRI + PCE + PERSINC + HOUST,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1631 -1.5244 -0.1125 1.3715 7.6725
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.240119 0.230366 -1.042 0.29764
## INFL 0.717527 0.057152 12.555 < 2e-16 ***
## COMMPRI -0.007501 0.002640 -2.841 0.00464 **
## PCE 0.340525 0.059156 5.756 1.32e-08 ***
## PERSINC 0.240242 0.059342 4.048 5.77e-05 ***
## HOUST -0.020530 0.004389 -4.678 3.52e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.188 on 654 degrees of freedom
## Multiple R-squared: 0.6374, Adjusted R-squared: 0.6346
## F-statistic: 229.9 on 5 and 654 DF, p-value: < 2.2e-16
print(paste("AIC:" ,AIC(fit)," and BIC :", BIC(fit)))
## [1] "AIC: 2914.42324696844 and BIC : 2945.86892581358"
R2_A <- 0.6374
Inflation, Commodity Prices, Personal Expenditure, Personal Income, and Housing Starts, predicts Interest Rates.
Commodity Prices and Housing Starts have a negative relationship with Interest Rates, so higher asset values reduce Interest Rates. This fits economic theory, with lower Interest Rates reducing the future value of money, and increasing asset values today.
On the other hand, Inflation, Personal Expenditure and Personal Income all have positive relationship, so higher spending leads to higher Interest Rates. This is as expected, with higher Interest Rates acting to “cool down” the economy.
First we start with a model containing only an intercept(the mean average Interest Rate over all time periods), then, thanks to AIC y BIC we added Inflation and continuing with this process is necessary to add Personal Income, then Personal Expenditure, Housing Starts and Commodity Prices.
Also as you know, Unemployment and Production get the AIC higher so are excluded.So the final model shows us:
lm(formula = INTRATE ~ INFL + PCE + HOUST + PERSINC + COMMPRI, data = df)
##
## Call:
## lm(formula = INTRATE ~ INFL + PCE + HOUST + PERSINC + COMMPRI,
## data = df)
##
## Coefficients:
## (Intercept) INFL PCE HOUST PERSINC COMMPRI
## -0.240119 0.717527 0.340525 -0.020530 0.240242 -0.007501
Interest Rates uses Inflation, Personal Expenditure, Housing Starts, Personal Income and Commodity Prices.
While the order of the variables is different to the model using backward selection, the two linear models are identical. The model in (b) is the same as in (a).
So we need to remember that:
print(paste("The model A has R square :",R2_A))
## [1] "The model A has R square : 0.6374"
print(paste("AIC:" ,AIC(fit)," and BIC :", BIC(fit)))
## [1] "AIC: 2914.42324696844 and BIC : 2945.86892581358"
Now using the Taylor Rule:
taylor <-lm(formula = INTRATE ~ INFL + PROD, data = df)
summary(taylor)
##
## Call:
## lm(formula = INTRATE ~ INFL + PROD, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1592 -1.6762 0.0141 1.3730 7.9203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.24890 0.17619 7.088 3.51e-12 ***
## INFL 0.97498 0.03273 29.785 < 2e-16 ***
## PROD 0.09472 0.01971 4.805 1.92e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.364 on 657 degrees of freedom
## Multiple R-squared: 0.5747, Adjusted R-squared: 0.5734
## F-statistic: 443.9 on 2 and 657 DF, p-value: < 2.2e-16
R2_taylor <- 0.5747
print(paste("The model Taylor Rule(model B) has R square :",R2_taylor))
## [1] "The model Taylor Rule(model B) has R square : 0.5747"
print(paste("AIC:" ,AIC(taylor)," and BIC :", BIC(taylor)))
## [1] "AIC: 3013.61634323759 and BIC : 3031.58530257768"
The R2 in our model is higher, and the AIC and BIC lower compared to Taylor rule of equation. We must therefore prefer the model from question (a) as it fits the data better. It explains more of the variation in historic Interest Rates and should be prove itself more performing in predicting future changes.
resid <- residuals.lm(fit)
df$Resid_model_a <- resid
df %>% ggplot(aes(Resid_model_a,INTRATE))+ geom_point()
The RESET test on fitted values is not significant: it does not reject the Null hypothesis that additional variables would not improve the explanatory power of the model.
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
taylor <-lm(formula = INTRATE ~ INFL + PROD, data = df)
summary(taylor)
##
## Call:
## lm(formula = INTRATE ~ INFL + PROD, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1592 -1.6762 0.0141 1.3730 7.9203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.24890 0.17619 7.088 3.51e-12 ***
## INFL 0.97498 0.03273 29.785 < 2e-16 ***
## PROD 0.09472 0.01971 4.805 1.92e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.364 on 657 degrees of freedom
## Multiple R-squared: 0.5747, Adjusted R-squared: 0.5734
## F-statistic: 443.9 on 2 and 657 DF, p-value: < 2.2e-16
resettest(taylor)
##
## RESET test
##
## data: taylor
## RESET = 2.2578, df1 = 2, df2 = 655, p-value = 0.1054
# Ho : There is linearity in the model
So we accept the Ho.
We can see that the Chow test is significant, implying a structural break in 1980 as we did in the classes previously.
To know if there is a structural change:
# Testing
library(strucchange)
## Warning: package 'strucchange' was built under R version 4.0.2
## Loading required package: sandwich
sctest(taylor)
##
## M-fluctuation test
##
## data: taylor
## f(efp) = 5.6454, p-value < 2.2e-16
#HO: There is no structural change
#Chow break testing
library(gap)
## Warning: package 'gap' was built under R version 4.0.2
## gap version 1.2.2
grp <- df[df$dates < "1980-01-01", ]
x1 <- grp[, c("INFL", "PROD")]; y1 <- data.frame( INTRATE = grp["INTRATE"] )
grp <- df[df$dates >= "1980-01-01", ]
x2 <- grp[, c("INFL", "PROD")]; y2 <- data.frame( INTRATE = grp["INTRATE"] )
#chow.test
chow.test(y1, x1, y2, x2)
## F value d.f.1 d.f.2 P value
## 2.873501e+01 3.000000e+00 6.540000e+02 1.836802e-17
There is structural change and that change is in 1980, Similar results from Chow forecast (Test statistic: F=5.511 with p<0.001).
Finally the Shapiro test:
library(nortest)
hist(residuals.lm(taylor))
boxplot(residuals.lm(taylor))
shapiro.test(residuals.lm(taylor))
##
## Shapiro-Wilk normality test
##
## data: residuals.lm(taylor)
## W = 0.987, p-value = 1.323e-05
#ho : There is normal distribution
qqplot(df$INTRATE,predict(taylor))
So we can conclude that that we need to reject the Ho,and say that there is no normal distribution in the residuals.