Identifying Non-Stationary

Given two series mt and nt defined by the following equations:

m_t= 1.5 + m_t-1 + em_t
n_t= -2 + n_t-1 + en_t

where em_t and en_t are both white noise.

Theoretically, both series takes the form of xt = c + xt-1 + et, which is a random walk with drift c. Each value depends on its previous value (mt_{t-1} and nt_{t-1}).Thus, et is white noise. The autoregressive coefficient is 1, which means the process accumulates shocks over time, leading to nonstationarity.

data <- read.csv("M3b_stationary/module3b_data_Simulated_Nonstationary.CSV")

# Ensure the white noise series em and en are available
em <- data$em
en <- data$en
n <- length(em)

# Initialize mt and nt
mt <- numeric(n)
nt <- numeric(n)
mt[1] <- 0
nt[1] <- 0

# Generate mt and nt using the recursive formulas
for (t in 2:n) {
  mt[t] <- 1.5 + mt[t-1] + em[t]
  nt[t] <- -2 + nt[t-1] + en[t]
}

# Add to data frame for plotting
data$mt <- mt
data$nt <- nt

# Plot mt and nt
par(mfrow = c(1, 1))
plot(mt, type = "l", col = "blue", main = "Series mt", ylab = "mt", xlab = "Time")

plot(nt, type = "l", col = "red", main = "Series nt", ylab = "nt", xlab = "Time")

Check the First Differences of mt and nt for stationary.

library(tseries)
dmt <- diff(mt)
dnt <- diff(nt)

adf.test(dmt)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  dmt
## Dickey-Fuller = -7.8931, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
adf.test(dnt)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  dnt
## Dickey-Fuller = -9.018, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

A p-value = 0.01 from the Augmented Dickey-Fuller (ADF) test means you can reject the null hypothesis of a unit root at the 1% significance level. Your first-differenced series dnt is statistically stationary.

plot(dmt, type = "l", main = "Differenced mt", col = "blue")

plot(dnt, type = "l", main = "Differenced nt", col = "red")


Estimate AR(1)

# Create lagged variable
data$nt_lag <- dplyr::lag(data$nt, 1)
data$mt_lag <- dplyr::lag(data$mt, 1)

data_ols <- na.omit(data)

ols_ar1_mt <- lm(mt ~ mt_lag, data = data_ols)
summary(ols_ar1_mt)
## 
## Call:
## lm(formula = mt ~ mt_lag, data = data_ols)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4897 -0.6957 -0.0056  0.6934  3.1326 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.5355834  0.0898756   17.09   <2e-16 ***
## mt_lag      0.9996247  0.0002176 4594.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9888 on 497 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 2.111e+07 on 1 and 497 DF,  p-value: < 2.2e-16
# Estimate AR(1) model for n
ols_ar1_nt <- lm(nt ~ nt_lag, data = data_ols)
summary(ols_ar1_nt)
## 
## Call:
## lm(formula = nt ~ nt_lag, data = data_ols)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.08574 -0.72978 -0.02884  0.71308  3.05588 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.0168557  0.0919398  -21.94   <2e-16 ***
## nt_lag       0.9999038  0.0001616 6187.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.019 on 497 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 3.829e+07 on 1 and 497 DF,  p-value: < 2.2e-16


Run a regression of m on n, together with a constant term.

reg_model <- lm(mt ~ nt, data = data)
summary(reg_model)
## 
## Call:
## lm(formula = mt ~ nt, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.093  -4.544  -1.541   4.462  18.205 
## 
## Coefficients:
##               Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)  3.6886154  0.5624114    6.559 1.36e-10 ***
## nt          -0.7202340  0.0009865 -730.065  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.239 on 498 degrees of freedom
## Multiple R-squared:  0.9991, Adjusted R-squared:  0.9991 
## F-statistic: 5.33e+05 on 1 and 498 DF,  p-value: < 2.2e-16


Slope (β) = –0.720 –> highly significant relationship R-squared = 0.9991 –> the variation in mt is explained by nt—exceptionally high. Extremely high R² and very low standard error on the slope suggest nt and mt may be nearly perfectly collinear.This might be a sign of non-stationarity — e.g., both series could be trending or integrated (I(1)). If so, the regression could be spurious.

Check stationarity of the residuals.

residuals_model <- residuals(reg_model)
adf.test(residuals_model)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  residuals_model
## Dickey-Fuller = -1.8667, Lag order = 7, p-value = 0.6347
## alternative hypothesis: stationary


p-value is > 0.05, residuals are non-stationary ⇒ the regression is likely spurious.

Plot ACF of Residuals

acf(residuals_model, main = "ACF of Regression Residuals")


The autocorrelations decay slowly and significantly over many lags.
All bars are well outside the blue confidence bands (which represent the 95% CI).
This pattern is typical of a non-stationary process — likely an integrated series (I(1)).

Q-Test

Box.test(residuals_model, lag = 10, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  residuals_model
## X-squared = 3783.3, df = 10, p-value < 2.2e-16


Null hypothesis (H₀): residuals are white noise (no autocorrelation). p-value < 0.05 ⇒ reject H₀ ⇒ residuals are autocorrelated The strong association should therefore be described as spurious, since these two series by construction are actually unrelated (i.e., there is no relationship between them).