Given two series mt and nt defined by the following equations:
m_t= 1.5 + m_t-1 + em_t
n_t= -2 + n_t-1 + en_t
where em_t and en_t are both white noise.
Theoretically, both series takes the form of xt = c + xt-1 + et,
which is a random walk with drift c. Each value depends on its previous
value (mt_{t-1} and nt_{t-1}).Thus, et is white noise. The
autoregressive coefficient is 1, which means the process accumulates
shocks over time, leading to nonstationarity.
data <- read.csv("M3b_stationary/module3b_data_Simulated_Nonstationary.CSV")
# Ensure the white noise series em and en are available
em <- data$em
en <- data$en
n <- length(em)
# Initialize mt and nt
mt <- numeric(n)
nt <- numeric(n)
mt[1] <- 0
nt[1] <- 0
# Generate mt and nt using the recursive formulas
for (t in 2:n) {
mt[t] <- 1.5 + mt[t-1] + em[t]
nt[t] <- -2 + nt[t-1] + en[t]
}
# Add to data frame for plotting
data$mt <- mt
data$nt <- nt
# Plot mt and nt
par(mfrow = c(1, 1))
plot(mt, type = "l", col = "blue", main = "Series mt", ylab = "mt", xlab = "Time")
plot(nt, type = "l", col = "red", main = "Series nt", ylab = "nt", xlab = "Time")
Check the First Differences of mt and nt for stationary.
library(tseries)
dmt <- diff(mt)
dnt <- diff(nt)
adf.test(dmt)
##
## Augmented Dickey-Fuller Test
##
## data: dmt
## Dickey-Fuller = -7.8931, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
adf.test(dnt)
##
## Augmented Dickey-Fuller Test
##
## data: dnt
## Dickey-Fuller = -9.018, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
A p-value = 0.01 from the Augmented Dickey-Fuller (ADF) test means
you can reject the null hypothesis of a unit root at the 1% significance
level. Your first-differenced series dnt is statistically
stationary.
plot(dmt, type = "l", main = "Differenced mt", col = "blue")
plot(dnt, type = "l", main = "Differenced nt", col = "red")
# Create lagged variable
data$nt_lag <- dplyr::lag(data$nt, 1)
data$mt_lag <- dplyr::lag(data$mt, 1)
data_ols <- na.omit(data)
ols_ar1_mt <- lm(mt ~ mt_lag, data = data_ols)
summary(ols_ar1_mt)
##
## Call:
## lm(formula = mt ~ mt_lag, data = data_ols)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4897 -0.6957 -0.0056 0.6934 3.1326
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5355834 0.0898756 17.09 <2e-16 ***
## mt_lag 0.9996247 0.0002176 4594.42 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9888 on 497 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 2.111e+07 on 1 and 497 DF, p-value: < 2.2e-16
# Estimate AR(1) model for n
ols_ar1_nt <- lm(nt ~ nt_lag, data = data_ols)
summary(ols_ar1_nt)
##
## Call:
## lm(formula = nt ~ nt_lag, data = data_ols)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.08574 -0.72978 -0.02884 0.71308 3.05588
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.0168557 0.0919398 -21.94 <2e-16 ***
## nt_lag 0.9999038 0.0001616 6187.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.019 on 497 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 3.829e+07 on 1 and 497 DF, p-value: < 2.2e-16
reg_model <- lm(mt ~ nt, data = data)
summary(reg_model)
##
## Call:
## lm(formula = mt ~ nt, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.093 -4.544 -1.541 4.462 18.205
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6886154 0.5624114 6.559 1.36e-10 ***
## nt -0.7202340 0.0009865 -730.065 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.239 on 498 degrees of freedom
## Multiple R-squared: 0.9991, Adjusted R-squared: 0.9991
## F-statistic: 5.33e+05 on 1 and 498 DF, p-value: < 2.2e-16
Slope (β) = –0.720 –> highly significant relationship
R-squared = 0.9991 –> the variation in mt is explained by
nt—exceptionally high. Extremely high R² and very low standard error on
the slope suggest nt and mt may be nearly perfectly collinear.This might
be a sign of non-stationarity — e.g., both series could be trending or
integrated (I(1)). If so, the regression could be spurious.
residuals_model <- residuals(reg_model)
adf.test(residuals_model)
##
## Augmented Dickey-Fuller Test
##
## data: residuals_model
## Dickey-Fuller = -1.8667, Lag order = 7, p-value = 0.6347
## alternative hypothesis: stationary
p-value is > 0.05, residuals are non-stationary ⇒ the
regression is likely spurious.
acf(residuals_model, main = "ACF of Regression Residuals")
The autocorrelations decay slowly and significantly over many
lags.
All bars are well outside the blue confidence bands (which
represent the 95% CI).
This pattern is typical of a non-stationary
process — likely an integrated series (I(1)).
Box.test(residuals_model, lag = 10, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: residuals_model
## X-squared = 3783.3, df = 10, p-value < 2.2e-16
Null hypothesis (H₀): residuals are white noise (no
autocorrelation). p-value < 0.05 ⇒ reject H₀ ⇒ residuals are
autocorrelated The strong association should therefore be described as
spurious, since these two series by construction are actually unrelated
(i.e., there is no relationship between them).