Here are the notes for working with time series (generally.) Included
in this section will be the plotline told from:
1. Understanding the series (nature of stochastic processes).
2. Transformations depending on the aforementioned nature.
3. Regressions (ECM, FGLS)
4. Regression diagnostics
5. Finality (interpretations and HAC errors.)
We begin with basic diagnostics.
setwd("~/Downloads/R/515")
# load package for the data
library(readxl)
# load the data
data_time_series <- read_excel("MICH.xlsx")
With the data loaded, we begin with initial tests and observations.
# make time series variables
pcr <- data_time_series$pcr
pce <- data_time_series$pce
expec <- data_time_series$expec
pcr <- ts(pcr, frequency = 12, start = c("2020", "3"))
pce <- ts(pce, frequency = 12, start = c("2020", "3"))
expec <- ts(expec, frequency = 12, start = c("2020", "3"))
# make plots with time mean and labels
ts.plot(pcr, mean(pcr)) # appears to be white noise
ts.plot(pce, mean(pce)) # appears to be trending
ts.plot(expec, mean(expec)) # probably random walk
# at this point we suspect certain types of behavior in the series. the next check is with autocorrelograms.
acf(pcr, lag.max = 10, plot = TRUE)
acf(pce, lag.max = 10, plot = TRUE)
acf(expec, lag.max = 10, plot = TRUE)
# evidence continues to convince us of autocorrelative behavior (pre-regression.) Formality is ascertained from the following tests.
# to begin, a Ljung-Box test should suffice. All in base R, under null H0: series is non-autocorrelated
Box.test(pcr, lag = 10, type = c("Ljung-Box")) # note that lag length must be logical
##
## Box-Ljung test
##
## data: pcr
## X-squared = 100.03, df = 10, p-value < 2.2e-16
Box.test(pce, lag = 1, type = c("Ljung-Box")) # rejects, so autocorrelated
##
## Box-Ljung test
##
## data: pce
## X-squared = 55.341, df = 1, p-value = 1.014e-13
Box.test(expec, lag = 5, type = c("Ljung-Box"))
##
## Box-Ljung test
##
## data: expec
## X-squared = 168.27, df = 5, p-value < 2.2e-16
# A brief note: we can also understand which order $q$ of AR(q) the series is, informally, by regressing it on itself.
# for time series regressions:
library(dynlm)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
pcr_1 <- ts((c(NA, pcr[-length(pcr)])), frequency = 12, start = c("2020", "3"))
pcr_arq <- dynlm(pcr ~ pcr_1)
stargazer(pcr_arq, type = "text")
##
## ===============================================
## Dependent variable:
## ---------------------------
## pcr
## -----------------------------------------------
## pcr_1 0.678***
## (0.089)
##
## Constant 0.292***
## (0.083)
##
## -----------------------------------------------
## Observations 58
## R2 0.507
## Adjusted R2 0.498
## Residual Std. Error 0.068 (df = 56)
## F Statistic 57.542*** (df = 1; 56)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
# manual AR(1) has B_1 of 0.678; not informally a unit root but notable. defer to Ljung-Box
At this point, we have formally confirmed autocorrelative behavior (random walk on \(pcr\), random walk with trend on \(pce\), and random walk on \(expec\).) Now we need to search for cointegrating relationships, beginning with a simple augmented Dickey-Fuller.
# start with ADF tests under null H0: series is nonstationary
library(fUnitRoots)
adfTest(pcr, lags = 10, type = "nc") # fails to reject; has a unit root
##
## Title:
## Augmented Dickey-Fuller Test
##
## Test Results:
## PARAMETER:
## Lag Order: 10
## STATISTIC:
## Dickey-Fuller: 0.2156
## P VALUE:
## 0.6799
##
## Description:
## Sun Jun 1 20:14:31 2025 by user:
adfTest(pce, lags=10, type = "ct")
##
## Title:
## Augmented Dickey-Fuller Test
##
## Test Results:
## PARAMETER:
## Lag Order: 10
## STATISTIC:
## Dickey-Fuller: -3.9315
## P VALUE:
## 0.01877
##
## Description:
## Sun Jun 1 20:14:31 2025 by user:
adfTest(expec, lags=10, type = "c")
##
## Title:
## Augmented Dickey-Fuller Test
##
## Test Results:
## PARAMETER:
## Lag Order: 10
## STATISTIC:
## Dickey-Fuller: -2.2071
## P VALUE:
## 0.2421
##
## Description:
## Sun Jun 1 20:14:31 2025 by user:
# all series are I(1), implying two things: they are or aren't cointegrated. in the case of cointegration, an ECM must be applied. if not, apply FGLS or first-differencing on OLS (no cointegrating vector.)
# i am under the impression Engle-Grangers are just a worse Johansen test, so it will be ignored.
library(urca)
##
## Attaching package: 'urca'
## The following objects are masked from 'package:fUnitRoots':
##
## punitroot, qunitroot, unitrootTable
# begin by binding the variables without the date
matrix <- cbind(pcr, pce, expec)
result <- ca.jo(matrix, type = c("trace"), ecdet = c("trend"), K = 10)
summary(result)
##
## ######################
## # Johansen-Procedure #
## ######################
##
## Test type: trace statistic , with linear trend in cointegration
##
## Eigenvalues (lambda):
## [1] 7.596875e-01 3.980277e-01 1.530043e-01 -6.646899e-16
##
## Values of teststatistic and critical values of test:
##
## test 10pct 5pct 1pct
## r <= 2 | 8.14 10.49 12.25 16.26
## r <= 1 | 33.01 22.76 25.32 30.45
## r = 0 | 102.87 39.06 42.44 48.45
##
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
##
## pcr.l10 pce.l10 expec.l10 trend.l10
## pcr.l10 1.0000000000 1.0000000000 1.0000000000 1.0000000000
## pce.l10 0.0006666181 -0.0001019701 -0.0003043738 0.0005861086
## expec.l10 -0.1943670716 0.0604923166 0.0360075149 -0.3023095599
## trend.l10 -0.0663770999 0.0084886157 0.0314426533 -0.0753092021
##
## Weights W:
## (This is the loading matrix)
##
## pcr.l10 pce.l10 expec.l10 trend.l10
## pcr.d -4.483948e-01 0.5277479 -0.6287569 4.765308e-11
## pce.d -1.083850e+03 -626.3938313 401.3680084 2.160714e-07
## expec.d -7.951866e-02 -2.8223374 -3.1001021 -3.562967e-10
# with nulls on the LHS, we see evidence of exactly TWO cointegrating relationships.
This part of the story told by the data confirms the presence of cointegration, i.e., some combination of the series such that they are stationary in the longrun. So, the correct interpretation is just the regression involving the differenced series (since all are I(1)) and the lagged cointegrating vector(s). To acquire the cointegrating vectors in this section, simply take the residuals from the eigenvectors. They imply a long-run equilibria, whose \(\gamma\) parameter is the speed of adjustment back towards equilibrium (interpreted as a percent at the speed of unit indices.)
# we needa trend!
t <- ts(seq(1,59, 1), start = c("2020", "3"), frequency = 12)
cointegrating_relation <- pcr - (0.0006666181*(pce)) + (0.1943670716*(expec)) + (0.0663770999*(t))
# now we proceed with the full ECM.
#diff each variable
delta_pcr <- diff(pcr)
delta_pce <- diff(pce)
delta_expec <- diff(expec)
cointegrating_relation_1 <- ts((c(NA, cointegrating_relation[-length(cointegrating_relation)])), frequency = 12, start = c("2020", "3")) # manual time series lag without needing to learn zoo package syntax
# create ecm
ecm_lm <- dynlm(delta_pcr ~ delta_pce + delta_expec + cointegrating_relation_1)
stargazer(ecm_lm, type = "text")
##
## ====================================================
## Dependent variable:
## ---------------------------
## delta_pcr
## ----------------------------------------------------
## delta_pce 0.0001*
## (0.00003)
##
## delta_expec -0.001
## (0.021)
##
## cointegrating_relation_1 -0.111***
## (0.035)
##
## Constant -0.898***
## (0.280)
##
## ----------------------------------------------------
## Observations 58
## R2 0.164
## Adjusted R2 0.118
## Residual Std. Error 0.070 (df = 54)
## F Statistic 3.535** (df = 3; 54)
## ====================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
With the regression complete, standard time series diagnostics apply. If autocorrelation is present, use Newey-West errors. If not, still use them just because. Onwards:
# ljung-box on residuals
Box.test(ecm_lm$residuals, lag = 5, type = c("Ljung-Box")) #clean!
##
## Box-Ljung test
##
## data: ecm_lm$residuals
## X-squared = 4.9647, df = 5, p-value = 0.4202
#breusch-godfrey
library(lmtest)
bgtest(dynlm(delta_pcr ~ delta_pce + delta_expec + cointegrating_relation_1), order = 1)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: dynlm(delta_pcr ~ delta_pce + delta_expec + cointegrating_relation_1)
## LM test = 2.6153, df = 1, p-value = 0.1058
# evidence of autocorrelation in one lag. HAC errors now justifiable
# standard heteroskedasticity tests apply also.
bptest(dynlm(delta_pcr ~ delta_pce + delta_expec + cointegrating_relation_1))
##
## studentized Breusch-Pagan test
##
## data: dynlm(delta_pcr ~ delta_pce + delta_expec + cointegrating_relation_1)
## BP = 5.6975, df = 3, p-value = 0.1273
gqtest(dynlm(delta_pcr ~ delta_pce + delta_expec + cointegrating_relation_1))
##
## Goldfeld-Quandt test
##
## data: dynlm(delta_pcr ~ delta_pce + delta_expec + cointegrating_relation_1)
## GQ = 0.71707, df1 = 25, df2 = 25, p-value = 0.7943
## alternative hypothesis: variance increases from segment 1 to 2
# applying HAC errors:
library(sandwich)
ecm_lm_nw <- NeweyWest(ecm_lm, lag = 1, prewhite = FALSE)
final_model <- coeftest(ecm_lm, vcov = ecm_lm_nw)
final_model
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.9826e-01 1.3658e-01 -6.5767 2.003e-08 ***
## delta_pce 5.7000e-05 1.7242e-05 3.3058 0.001687 **
## delta_expec -9.7672e-04 2.1782e-02 -0.0448 0.964400
## cointegrating_relation_1 -1.1081e-01 1.7133e-02 -6.4678 3.006e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# significances increased. congratulations