MOOC Econometrics

Test Exercise 6

Notes:

• See website for how to submit your answers and how feedback is organized. • This exercise uses the datafile TestExer6 and requires a computer. • The dataset TestExer6 is available on the website.

Goals and skills being used:

• Experience the process of practical application of time series analysis. • Get hands-on experience with the analysis of time series. • Give correct interpretation of outcomes of the analysis.

Questions

This test exercise uses data that are available in the data file TestExer6. The question of interest is to model monthly inflation in the Euro area and to investigate whether inflation in the United States of America has predictive power for inflation in the Euro area. Monthly data on the consumer price index (CPI) for the Euro area and the USA are available from January 2000 until December 2011. The data for January 2000 until December 2010 are used for specification and estimation of models, and the data for 2011 are left out for forecast evaluation purposes.

(a) Make time series plots of the CPI of the Euro area and the USA, and also of their logarithm log(CPI) and of the two monthly inflation series DP = ∆log(CPI). What conclusions do you draw from these plots?

First we need to load the data:

library(readxl)
library(ggplot2)
library(dplyr)
library(aTSA)
library(knitr)
library(dynlm)
## Warning: package 'dynlm' was built under R version 4.0.2
df <- read_excel("finaldata6.xlsx", col_names = TRUE)
head(df)
## # A tibble: 6 x 8
##   `YYYY-MM` TREND CPI_EUR CPI_USA LOGPEUR LOGPUSA DPEUR           DPUSA         
##   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>           <chr>         
## 1 2000M01       1    105.    108.    4.65    4.68 NA              NA            
## 2 2000M02       2    105.    108.    4.66    4.68 2.850358224356~ 6.48450627926~
## 3 2000M03       3    106.    109.    4.66    4.69 3.787883316936~ 7.35973883208~
## 4 2000M04       4    106.    109.    4.66    4.69 9.447331831617~ 9.16170471779~
## 5 2000M05       5    106     109.    4.66    4.69 9.438415047062~ 9.15331871688~
## 6 2000M06       6    106.    110.    4.67    4.70 3.766482795477~ 5.47446622708~
tail(df)
## # A tibble: 6 x 8
##   `YYYY-MM` TREND CPI_EUR CPI_USA LOGPEUR LOGPUSA DPEUR           DPUSA         
##   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>           <chr>         
## 1 2011M07     139    134.    144     4.90    4.97 -5.96571493399~ 6.94685682678~
## 2 2011M08     140    134.    144.    4.90    4.97 1.494768589229~ 2.77392688272~
## 3 2011M09     141    135.    145.    4.90    4.97 7.440510516582~ 2.07540717871~
## 4 2011M10     142    135.    144.    4.91    4.97 3.699597264464~ -2.0754071787~
## 5 2011M11     143    136.    144.    4.91    4.97 7.382798415811~ -1.3860016078~
## 6 2011M12     144    136     144.    4.91    4.97 3.683245416296~ -2.0826109575~
str(df)
## tibble [144 x 8] (S3: tbl_df/tbl/data.frame)
##  $ YYYY-MM: chr [1:144] "2000M01" "2000M02" "2000M03" "2000M04" ...
##  $ TREND  : num [1:144] 1 2 3 4 5 6 7 8 9 10 ...
##  $ CPI_EUR: num [1:144] 105 105 106 106 106 ...
##  $ CPI_USA: num [1:144] 108 108 109 109 109 ...
##  $ LOGPEUR: num [1:144] 4.65 4.66 4.66 4.66 4.66 ...
##  $ LOGPUSA: num [1:144] 4.68 4.68 4.69 4.69 4.69 ...
##  $ DPEUR  : chr [1:144] "NA" "2.85035822435642E-3" "3.78788331693691E-3" "9.4473318316179401E-4" ...
##  $ DPUSA  : chr [1:144] "NA" "6.4845062792606703E-3" "7.35973883208007E-3" "9.1617047177994205E-4" ...

So we need to add the dates correctly:

start_d <- as.Date("2000-01-01")
date <- seq(start_d,by="month", length.out = nrow(df))
df$Date <- date
head(df)
## # A tibble: 6 x 9
##   `YYYY-MM` TREND CPI_EUR CPI_USA LOGPEUR LOGPUSA DPEUR     DPUSA     Date      
##   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>     <chr>     <date>    
## 1 2000M01       1    105.    108.    4.65    4.68 NA        NA        2000-01-01
## 2 2000M02       2    105.    108.    4.66    4.68 2.850358~ 6.484506~ 2000-02-01
## 3 2000M03       3    106.    109.    4.66    4.69 3.787883~ 7.359738~ 2000-03-01
## 4 2000M04       4    106.    109.    4.66    4.69 9.447331~ 9.161704~ 2000-04-01
## 5 2000M05       5    106     109.    4.66    4.69 9.438415~ 9.153318~ 2000-05-01
## 6 2000M06       6    106.    110.    4.67    4.70 3.766482~ 5.474466~ 2000-06-01

So we need to plot the graphs:

df %>% ggplot(aes(x= Date, y= CPI_EUR))+ 
  geom_line(color= "red")+
  geom_line(y= df$CPI_USA, color= "blue") +
  ylim(c(105,145))+
  ggtitle("CPI_EUR(red) and CPI_USA(blue) Evolution")

The two consumer price index (CPI) plots indicate that:

USA and EURO prices seem to be correlated. While USA prices are typically higher than EURO prices, and the difference seems to slightly increase over time. Both indexes are steadily increasing over time, with very few exceptions (i.e. year 2008).The indexes increasing trend seems to be rather logarithmic than linear.

df %>% ggplot(aes(x= Date, y= LOGPEUR))+ 
  geom_line(color= "red")+
  geom_line(y= df$LOGPUSA, color= "blue") +
  ylim(4.65, 5)+
  ggtitle("CPI_EUR(red) and CPI_USA(blue) Evolution Logaritmic Scale")

Is the same in the logarithm case an the corresponding diagram looks very much similar to the previous one. The apparently linear logarithmic increase confirms that the indexes increase over time is probably logarithmic rather than linear.

df$DPEUR <- as.numeric(df$DPEUR) 
## Warning: NAs introducidos por coerción
df$DPUSA <- as.numeric(df$DPUSA) 
## Warning: NAs introducidos por coerción
head(df)
## # A tibble: 6 x 9
##   `YYYY-MM` TREND CPI_EUR CPI_USA LOGPEUR LOGPUSA     DPEUR     DPUSA Date      
##   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>     <dbl>     <dbl> <date>    
## 1 2000M01       1    105.    108.    4.65    4.68 NA        NA        2000-01-01
## 2 2000M02       2    105.    108.    4.66    4.68  0.00285   0.00648  2000-02-01
## 3 2000M03       3    106.    109.    4.66    4.69  0.00379   0.00736  2000-03-01
## 4 2000M04       4    106.    109.    4.66    4.69  0.000945  0.000916 2000-04-01
## 5 2000M05       5    106     109.    4.66    4.69  0.000944  0.000915 2000-05-01
## 6 2000M06       6    106.    110.    4.67    4.70  0.00377   0.00547  2000-06-01
df %>% ggplot(aes(x= Date, y= DPEUR))+ 
  geom_line(color= "red")+
  geom_line(y =df$DPUSA ,color= "blue") +
  ylim(c(-0.02, 0.02))+
  ggtitle("CPI_EUR(red) and CPI_USA(blue) Evolution Logaritmic Scale")
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 row(s) containing missing values (geom_path).

The monthly inflation series DP=Δlog(CPI) plots indicate that:

  • USA and EURO inflation series seem to be correlated even in the difference series.

  • Both inflation series seems to be stationary with E[yi]= 0.

  • USA inflation usually seems to fluctuate more than EURO inflation, with the exception of the most recent years (i.e. 2009 and later).

  • There was a negative inflation peak (indicating a significant deflation), especially in the USA, at the year 2008. This observation is consistent with the unusual CPI decrease shown by the previous diagrams.

(b) Perform the Augmented Dickey-Fuller (ADF) test for the two log(CPI) series. In the ADF test equation, include a constant (α), a deterministic trend term (βt), three lags of DP = ∆log(CPI) and, of course, the variable of interest log(CPIt−1). Report the coefficient of log(CPIt−1) and its standard error and t-value, and draw your conclusion.

According with the Dicker Fuller Test:

adf1 <- adf.test(df$LOGPEUR) # is stationary with 4 Lags
## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag  ADF p.value
## [1,]   0 6.17    0.99
## [2,]   1 4.96    0.99
## [3,]   2 5.38    0.99
## [4,]   3 5.79    0.99
## [5,]   4 6.24    0.99
## Type 2: with drift no trend 
##      lag    ADF p.value
## [1,]   0 -0.544   0.856
## [2,]   1 -0.498   0.873
## [3,]   2 -0.417   0.900
## [4,]   3 -0.486   0.877
## [5,]   4 -0.602   0.836
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -3.36  0.0643
## [2,]   1 -3.78  0.0222
## [3,]   2 -3.24  0.0841
## [4,]   3 -2.83  0.2309
## [5,]   4 -2.41  0.4038
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

with the other variable :

adf1 <- adf.test(df$LOGPUSA) # is stationary with 4 Lags
## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag  ADF p.value
## [1,]   0 5.78    0.99
## [2,]   1 2.85    0.99
## [3,]   2 3.51    0.99
## [4,]   3 3.57    0.99
## [5,]   4 3.50    0.99
## Type 2: with drift no trend 
##      lag    ADF p.value
## [1,]   0 -1.084   0.667
## [2,]   1 -0.850   0.749
## [3,]   2 -0.715   0.796
## [4,]   3 -0.848   0.750
## [5,]   4 -0.817   0.761
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -1.96  0.5901
## [2,]   1 -3.56  0.0395
## [3,]   2 -2.82  0.2315
## [4,]   3 -2.73  0.2687
## [5,]   4 -2.67  0.2959
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

So we can conclude that for both variables, the one difference with drift and trend are consider stationary series, but others no.

(c) As the two series of log(CPI) are not cointegrated (you need not check this), we continue by modelling the monthly inflation series DPEUR = ∆log(CPIEUR) for the Euro area. Determine the sample autocorrelations and the sample partial autocorrelations of this series to motivate the use of the following AR model: DPEURt = α+β1DPEURt−6 +β2DPEURt−12 +εt . Estimate the parameters of this model (sample Jan 2000 - Dec 2010).

demean <-  TRUE

# Select sample Jan 2000 - Dec 2010
datWork <- df[1:(12*11),]

nn <- nrow(datWork) - 1
# Prepare results data.frame
ac <- data.frame(lag = 1:nn)
ac$AC <- NA;    ac$PAC <- NA
for (i in 1:nn) {
    acf <- acf(datWork$DPEUR, lag.max = i, type = 'correlation', 
               na.action = na.pass, plot = FALSE, demean = demean)
    #print(acf)
    pcf <- acf(datWork$DPEUR, lag.max = i, type = 'partial', 
               na.action = na.pass, plot = FALSE, demean = demean)
    #print(pcf)
    ac[i, 1] <- i
    ac[i, 2] <- acf$acf[i + 1, 1, 1]
    ac[i, 3] <- pcf$acf[i + 0, 1, 1]
}

# Print-out ac data.frame as a table

 table_ac_pac <- kable(ac, format = "markdown", align = 'c', digits = 2)
 head(table_ac_pac)
## [1] "| lag |  AC   |  PAC  |" "|:---:|:-----:|:-----:|"
## [3] "|  1  | 0.08  | 0.08  |" "|  2  | -0.11 | -0.12 |"
## [5] "|  3  | -0.20 | -0.18 |" "|  4  | -0.16 | -0.15 |"

Thus, the findings motivate using lags 6 and 12 with the AR model. And as this is not an MA model (residuals are not included), AR model could be estimated using the OLS method (instead of the MLE method).

Estimating the AR model for lags 6 and 12 and using OLS, produces the following Autoregressive Fit Model:

ar_digits = 3
ar_method = 'ols'

ar12 <- ar(datWork$DPEUR[2:(12*11-1)], order.max = 12, 
           method = ar_method)
ar           <- round(ar12$ar,          digits = ar_digits)
ar.intercept <- round(ar12$x.intercept, digits = ar_digits)
ar12

Call:
ar(x = datWork$DPEUR[2:(12 * 11 - 1)], order.max = 12, method = ar_method)

Coefficients:
      1        2        3        4        5        6        7        8  
 0.0590   0.0014  -0.0972   0.0082  -0.1393   0.1943  -0.0567  -0.1271  
      9       10       11       12  
-0.0431  -0.1074   0.0603   0.5168  

Intercept: -2.317e-05 (0.000223) 

Order selected 12  sigma^2 estimated as  5.858e-06

(d) Extend the AR model of part (c) by adding lagged values of monthly inflation in the USA at lags 1, 6, and 12. Check that the coefficient at lag 6 is not significant, and estimate the ADL model DPEURt = α + β1DPEURt−6 + β2DPEURt−12 + γ1DPUSAt−1 + γ2DPUSAt−12 + εt (sample Jan 2000 - Dec 2010).

First ADL model estimation

# Convert sample Jan 2000 - Dec 2010 to timeseries object(s) for dynlm(.) function
dpEur <- ts(datWork$DPEUR, start =2)
dpUsa <- ts(datWork$DPUSA, start =2)
# Estimating Autoregressive Distributed Lag model: ADL(p; r).
model <- dynlm(dpEur ~ L(dpEur, 6) + L(dpEur, 12) + L(dpUsa, 1) + L(dpUsa, 6) + L(dpUsa, 12))
print(ADL.summary <- summary(model))
## 
## Time series regression with "ts" data:
## Start = 15, End = 133
## 
## Call:
## dynlm(formula = dpEur ~ L(dpEur, 6) + L(dpEur, 12) + L(dpUsa, 
##     1) + L(dpUsa, 6) + L(dpUsa, 12))
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0065866 -0.0016535 -0.0000117  0.0012633  0.0082683 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.0004407  0.0002853   1.545    0.125    
## L(dpEur, 6)   0.2029831  0.0785535   2.584    0.011 *  
## L(dpEur, 12)  0.6367563  0.0874784   7.279 4.78e-11 ***
## L(dpUsa, 1)   0.2264303  0.0511299   4.429 2.20e-05 ***
## L(dpUsa, 6)  -0.0560495  0.0547668  -1.023    0.308    
## L(dpUsa, 12) -0.2300590  0.0541714  -4.247 4.47e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.002272 on 113 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.5602, Adjusted R-squared:  0.5408 
## F-statistic: 28.79 on 5 and 113 DF,  p-value: < 2.2e-16
print(paste("The R-squared: ", round(ADL.summary$r.squared,4)))
## [1] "The R-squared:  0.5602"

BUt the other model is Simplified ADL model estimation:

# Estimating Autoregressive Distributed Lag model: ADL(p; r).
model <- dynlm(dpEur ~ L(dpEur, 6) + L(dpEur, 12) + L(dpUsa, 1) + L(dpUsa, 12))
print(ADL.summary <- summary(model))
## 
## Time series regression with "ts" data:
## Start = 15, End = 133
## 
## Call:
## dynlm(formula = dpEur ~ L(dpEur, 6) + L(dpEur, 12) + L(dpUsa, 
##     1) + L(dpUsa, 12))
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0067809 -0.0016353  0.0000532  0.0013660  0.0082449 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.0003391  0.0002676   1.267   0.2076    
## L(dpEur, 6)   0.1687277  0.0710803   2.374   0.0193 *  
## L(dpEur, 12)  0.6551636  0.0856272   7.651 6.93e-12 ***
## L(dpUsa, 1)   0.2326463  0.0507784   4.582 1.19e-05 ***
## L(dpUsa, 12) -0.2265061  0.0540712  -4.189 5.55e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.002273 on 114 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.5561, Adjusted R-squared:  0.5406 
## F-statistic: 35.71 on 4 and 114 DF,  p-value: < 2.2e-16
print(paste("The R-squared: ", round(ADL.summary$r.squared,4)))
## [1] "The R-squared:  0.5561"

(e) Use the models of parts (c) and (d) to make two series of 12 monthly inflation forecasts for 2011. At each month, you should use the data that are then available, for example, to forecast inflation for September 2011 you can use the data up to and including August 2011. However, do not re-estimate the model and use the coefficients as obtained in parts (c) and (d). For each of the two forecast series, compute the values of the root mean squared error (RMSE), mean absolute error (MAE), and the sum of the forecast errors (SUM). Finally,give your interpretation of the outcomes.

The (simpler) AR model estimated in part (c):

# DPEURt^ = 0+0.194⋅DPEURt−6+0.517⋅DPEURt−12+ϵt

and the simplified ADL model estimated in part (d):

# DPEURt^=0+0.169⋅DPEURt−6+0.655⋅DPEURt−12+0.233⋅DPUSAt−1−0.226⋅DPUSAt−12+ϵt

both, are used to forecast the EURO 2011 monthly inflation series, and compared against the real 2011 values:

fitted_val<- predict(model)
str(fitted_val)
##  Named num [1:119] 0.00236 0.00298 0.00117 0.0021 0.00322 ...
##  - attr(*, "names")= chr [1:119] "15" "16" "17" "18" ...
dpeur <- df$DPEUR[26:nrow(df)]
Date_r <- df$Date[26:nrow(df)]

df1 <- data.frame(dpeur,fitted_val)
head(df1)
##           dpeur   fitted_val
## 15  0.001814883 0.0023599579
## 16  0.005424968 0.0029832921
## 17  0.004498433 0.0011652210
## 18  0.001793722 0.0020984438
## 19  0.000000000 0.0032235269
## 20 -0.000896459 0.0001195738
plot(df1, col= "blue")