• See website for how to submit your answers and how feedback is organized. • This exercise uses the datafile TestExer6 and requires a computer. • The dataset TestExer6 is available on the website.
• Experience the process of practical application of time series analysis. • Get hands-on experience with the analysis of time series. • Give correct interpretation of outcomes of the analysis.
This test exercise uses data that are available in the data file TestExer6. The question of interest is to model monthly inflation in the Euro area and to investigate whether inflation in the United States of America has predictive power for inflation in the Euro area. Monthly data on the consumer price index (CPI) for the Euro area and the USA are available from January 2000 until December 2011. The data for January 2000 until December 2010 are used for specification and estimation of models, and the data for 2011 are left out for forecast evaluation purposes.
First we need to load the data:
library(readxl)
library(ggplot2)
library(dplyr)
library(aTSA)
library(knitr)
library(dynlm)
## Warning: package 'dynlm' was built under R version 4.0.2
df <- read_excel("finaldata6.xlsx", col_names = TRUE)
head(df)
## # A tibble: 6 x 8
## `YYYY-MM` TREND CPI_EUR CPI_USA LOGPEUR LOGPUSA DPEUR DPUSA
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 2000M01 1 105. 108. 4.65 4.68 NA NA
## 2 2000M02 2 105. 108. 4.66 4.68 2.850358224356~ 6.48450627926~
## 3 2000M03 3 106. 109. 4.66 4.69 3.787883316936~ 7.35973883208~
## 4 2000M04 4 106. 109. 4.66 4.69 9.447331831617~ 9.16170471779~
## 5 2000M05 5 106 109. 4.66 4.69 9.438415047062~ 9.15331871688~
## 6 2000M06 6 106. 110. 4.67 4.70 3.766482795477~ 5.47446622708~
tail(df)
## # A tibble: 6 x 8
## `YYYY-MM` TREND CPI_EUR CPI_USA LOGPEUR LOGPUSA DPEUR DPUSA
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 2011M07 139 134. 144 4.90 4.97 -5.96571493399~ 6.94685682678~
## 2 2011M08 140 134. 144. 4.90 4.97 1.494768589229~ 2.77392688272~
## 3 2011M09 141 135. 145. 4.90 4.97 7.440510516582~ 2.07540717871~
## 4 2011M10 142 135. 144. 4.91 4.97 3.699597264464~ -2.0754071787~
## 5 2011M11 143 136. 144. 4.91 4.97 7.382798415811~ -1.3860016078~
## 6 2011M12 144 136 144. 4.91 4.97 3.683245416296~ -2.0826109575~
str(df)
## tibble [144 x 8] (S3: tbl_df/tbl/data.frame)
## $ YYYY-MM: chr [1:144] "2000M01" "2000M02" "2000M03" "2000M04" ...
## $ TREND : num [1:144] 1 2 3 4 5 6 7 8 9 10 ...
## $ CPI_EUR: num [1:144] 105 105 106 106 106 ...
## $ CPI_USA: num [1:144] 108 108 109 109 109 ...
## $ LOGPEUR: num [1:144] 4.65 4.66 4.66 4.66 4.66 ...
## $ LOGPUSA: num [1:144] 4.68 4.68 4.69 4.69 4.69 ...
## $ DPEUR : chr [1:144] "NA" "2.85035822435642E-3" "3.78788331693691E-3" "9.4473318316179401E-4" ...
## $ DPUSA : chr [1:144] "NA" "6.4845062792606703E-3" "7.35973883208007E-3" "9.1617047177994205E-4" ...
So we need to add the dates correctly:
start_d <- as.Date("2000-01-01")
date <- seq(start_d,by="month", length.out = nrow(df))
df$Date <- date
head(df)
## # A tibble: 6 x 9
## `YYYY-MM` TREND CPI_EUR CPI_USA LOGPEUR LOGPUSA DPEUR DPUSA Date
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <date>
## 1 2000M01 1 105. 108. 4.65 4.68 NA NA 2000-01-01
## 2 2000M02 2 105. 108. 4.66 4.68 2.850358~ 6.484506~ 2000-02-01
## 3 2000M03 3 106. 109. 4.66 4.69 3.787883~ 7.359738~ 2000-03-01
## 4 2000M04 4 106. 109. 4.66 4.69 9.447331~ 9.161704~ 2000-04-01
## 5 2000M05 5 106 109. 4.66 4.69 9.438415~ 9.153318~ 2000-05-01
## 6 2000M06 6 106. 110. 4.67 4.70 3.766482~ 5.474466~ 2000-06-01
So we need to plot the graphs:
df %>% ggplot(aes(x= Date, y= CPI_EUR))+
geom_line(color= "red")+
geom_line(y= df$CPI_USA, color= "blue") +
ylim(c(105,145))+
ggtitle("CPI_EUR(red) and CPI_USA(blue) Evolution")
The two consumer price index (CPI) plots indicate that:
USA and EURO prices seem to be correlated. While USA prices are typically higher than EURO prices, and the difference seems to slightly increase over time. Both indexes are steadily increasing over time, with very few exceptions (i.e. year 2008).The indexes increasing trend seems to be rather logarithmic than linear.
df %>% ggplot(aes(x= Date, y= LOGPEUR))+
geom_line(color= "red")+
geom_line(y= df$LOGPUSA, color= "blue") +
ylim(4.65, 5)+
ggtitle("CPI_EUR(red) and CPI_USA(blue) Evolution Logaritmic Scale")
Is the same in the logarithm case an the corresponding diagram looks very much similar to the previous one. The apparently linear logarithmic increase confirms that the indexes increase over time is probably logarithmic rather than linear.
df$DPEUR <- as.numeric(df$DPEUR)
## Warning: NAs introducidos por coerción
df$DPUSA <- as.numeric(df$DPUSA)
## Warning: NAs introducidos por coerción
head(df)
## # A tibble: 6 x 9
## `YYYY-MM` TREND CPI_EUR CPI_USA LOGPEUR LOGPUSA DPEUR DPUSA Date
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <date>
## 1 2000M01 1 105. 108. 4.65 4.68 NA NA 2000-01-01
## 2 2000M02 2 105. 108. 4.66 4.68 0.00285 0.00648 2000-02-01
## 3 2000M03 3 106. 109. 4.66 4.69 0.00379 0.00736 2000-03-01
## 4 2000M04 4 106. 109. 4.66 4.69 0.000945 0.000916 2000-04-01
## 5 2000M05 5 106 109. 4.66 4.69 0.000944 0.000915 2000-05-01
## 6 2000M06 6 106. 110. 4.67 4.70 0.00377 0.00547 2000-06-01
df %>% ggplot(aes(x= Date, y= DPEUR))+
geom_line(color= "red")+
geom_line(y =df$DPUSA ,color= "blue") +
ylim(c(-0.02, 0.02))+
ggtitle("CPI_EUR(red) and CPI_USA(blue) Evolution Logaritmic Scale")
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 row(s) containing missing values (geom_path).
The monthly inflation series DP=Δlog(CPI) plots indicate that:
USA and EURO inflation series seem to be correlated even in the difference series.
Both inflation series seems to be stationary with E[yi]= 0.
USA inflation usually seems to fluctuate more than EURO inflation, with the exception of the most recent years (i.e. 2009 and later).
There was a negative inflation peak (indicating a significant deflation), especially in the USA, at the year 2008. This observation is consistent with the unusual CPI decrease shown by the previous diagrams.
According with the Dicker Fuller Test:
adf1 <- adf.test(df$LOGPEUR) # is stationary with 4 Lags
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 6.17 0.99
## [2,] 1 4.96 0.99
## [3,] 2 5.38 0.99
## [4,] 3 5.79 0.99
## [5,] 4 6.24 0.99
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -0.544 0.856
## [2,] 1 -0.498 0.873
## [3,] 2 -0.417 0.900
## [4,] 3 -0.486 0.877
## [5,] 4 -0.602 0.836
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -3.36 0.0643
## [2,] 1 -3.78 0.0222
## [3,] 2 -3.24 0.0841
## [4,] 3 -2.83 0.2309
## [5,] 4 -2.41 0.4038
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
with the other variable :
adf1 <- adf.test(df$LOGPUSA) # is stationary with 4 Lags
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 5.78 0.99
## [2,] 1 2.85 0.99
## [3,] 2 3.51 0.99
## [4,] 3 3.57 0.99
## [5,] 4 3.50 0.99
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -1.084 0.667
## [2,] 1 -0.850 0.749
## [3,] 2 -0.715 0.796
## [4,] 3 -0.848 0.750
## [5,] 4 -0.817 0.761
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -1.96 0.5901
## [2,] 1 -3.56 0.0395
## [3,] 2 -2.82 0.2315
## [4,] 3 -2.73 0.2687
## [5,] 4 -2.67 0.2959
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
So we can conclude that for both variables, the one difference with drift and trend are consider stationary series, but others no.
demean <- TRUE
# Select sample Jan 2000 - Dec 2010
datWork <- df[1:(12*11),]
nn <- nrow(datWork) - 1
# Prepare results data.frame
ac <- data.frame(lag = 1:nn)
ac$AC <- NA; ac$PAC <- NA
for (i in 1:nn) {
acf <- acf(datWork$DPEUR, lag.max = i, type = 'correlation',
na.action = na.pass, plot = FALSE, demean = demean)
#print(acf)
pcf <- acf(datWork$DPEUR, lag.max = i, type = 'partial',
na.action = na.pass, plot = FALSE, demean = demean)
#print(pcf)
ac[i, 1] <- i
ac[i, 2] <- acf$acf[i + 1, 1, 1]
ac[i, 3] <- pcf$acf[i + 0, 1, 1]
}
# Print-out ac data.frame as a table
table_ac_pac <- kable(ac, format = "markdown", align = 'c', digits = 2)
head(table_ac_pac)
## [1] "| lag | AC | PAC |" "|:---:|:-----:|:-----:|"
## [3] "| 1 | 0.08 | 0.08 |" "| 2 | -0.11 | -0.12 |"
## [5] "| 3 | -0.20 | -0.18 |" "| 4 | -0.16 | -0.15 |"
Thus, the findings motivate using lags 6 and 12 with the AR model. And as this is not an MA model (residuals are not included), AR model could be estimated using the OLS method (instead of the MLE method).
Estimating the AR model for lags 6 and 12 and using OLS, produces the following Autoregressive Fit Model:
ar_digits = 3
ar_method = 'ols'
ar12 <- ar(datWork$DPEUR[2:(12*11-1)], order.max = 12,
method = ar_method)
ar <- round(ar12$ar, digits = ar_digits)
ar.intercept <- round(ar12$x.intercept, digits = ar_digits)
ar12
Call:
ar(x = datWork$DPEUR[2:(12 * 11 - 1)], order.max = 12, method = ar_method)
Coefficients:
1 2 3 4 5 6 7 8
0.0590 0.0014 -0.0972 0.0082 -0.1393 0.1943 -0.0567 -0.1271
9 10 11 12
-0.0431 -0.1074 0.0603 0.5168
Intercept: -2.317e-05 (0.000223)
Order selected 12 sigma^2 estimated as 5.858e-06
First ADL model estimation
# Convert sample Jan 2000 - Dec 2010 to timeseries object(s) for dynlm(.) function
dpEur <- ts(datWork$DPEUR, start =2)
dpUsa <- ts(datWork$DPUSA, start =2)
# Estimating Autoregressive Distributed Lag model: ADL(p; r).
model <- dynlm(dpEur ~ L(dpEur, 6) + L(dpEur, 12) + L(dpUsa, 1) + L(dpUsa, 6) + L(dpUsa, 12))
print(ADL.summary <- summary(model))
##
## Time series regression with "ts" data:
## Start = 15, End = 133
##
## Call:
## dynlm(formula = dpEur ~ L(dpEur, 6) + L(dpEur, 12) + L(dpUsa,
## 1) + L(dpUsa, 6) + L(dpUsa, 12))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0065866 -0.0016535 -0.0000117 0.0012633 0.0082683
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0004407 0.0002853 1.545 0.125
## L(dpEur, 6) 0.2029831 0.0785535 2.584 0.011 *
## L(dpEur, 12) 0.6367563 0.0874784 7.279 4.78e-11 ***
## L(dpUsa, 1) 0.2264303 0.0511299 4.429 2.20e-05 ***
## L(dpUsa, 6) -0.0560495 0.0547668 -1.023 0.308
## L(dpUsa, 12) -0.2300590 0.0541714 -4.247 4.47e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.002272 on 113 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.5602, Adjusted R-squared: 0.5408
## F-statistic: 28.79 on 5 and 113 DF, p-value: < 2.2e-16
print(paste("The R-squared: ", round(ADL.summary$r.squared,4)))
## [1] "The R-squared: 0.5602"
BUt the other model is Simplified ADL model estimation:
# Estimating Autoregressive Distributed Lag model: ADL(p; r).
model <- dynlm(dpEur ~ L(dpEur, 6) + L(dpEur, 12) + L(dpUsa, 1) + L(dpUsa, 12))
print(ADL.summary <- summary(model))
##
## Time series regression with "ts" data:
## Start = 15, End = 133
##
## Call:
## dynlm(formula = dpEur ~ L(dpEur, 6) + L(dpEur, 12) + L(dpUsa,
## 1) + L(dpUsa, 12))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0067809 -0.0016353 0.0000532 0.0013660 0.0082449
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0003391 0.0002676 1.267 0.2076
## L(dpEur, 6) 0.1687277 0.0710803 2.374 0.0193 *
## L(dpEur, 12) 0.6551636 0.0856272 7.651 6.93e-12 ***
## L(dpUsa, 1) 0.2326463 0.0507784 4.582 1.19e-05 ***
## L(dpUsa, 12) -0.2265061 0.0540712 -4.189 5.55e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.002273 on 114 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.5561, Adjusted R-squared: 0.5406
## F-statistic: 35.71 on 4 and 114 DF, p-value: < 2.2e-16
print(paste("The R-squared: ", round(ADL.summary$r.squared,4)))
## [1] "The R-squared: 0.5561"
The (simpler) AR model estimated in part (c):
# DPEURt^ = 0+0.194⋅DPEURt−6+0.517⋅DPEURt−12+ϵt
and the simplified ADL model estimated in part (d):
# DPEURt^=0+0.169⋅DPEURt−6+0.655⋅DPEURt−12+0.233⋅DPUSAt−1−0.226⋅DPUSAt−12+ϵt
both, are used to forecast the EURO 2011 monthly inflation series, and compared against the real 2011 values:
fitted_val<- predict(model)
str(fitted_val)
## Named num [1:119] 0.00236 0.00298 0.00117 0.0021 0.00322 ...
## - attr(*, "names")= chr [1:119] "15" "16" "17" "18" ...
dpeur <- df$DPEUR[26:nrow(df)]
Date_r <- df$Date[26:nrow(df)]
df1 <- data.frame(dpeur,fitted_val)
head(df1)
## dpeur fitted_val
## 15 0.001814883 0.0023599579
## 16 0.005424968 0.0029832921
## 17 0.004498433 0.0011652210
## 18 0.001793722 0.0020984438
## 19 0.000000000 0.0032235269
## 20 -0.000896459 0.0001195738
plot(df1, col= "blue")