Stationarity in Time Series

Author

AS

1 1. What is Stationarity?

A stationary time series is one whose statistical properties do not change over time.

  1. Mean is constant
  2. Variance is constant
  3. Autocovariance depends only on lag, not on time

https://towardsdatascience.com/stationarity-in-time-series-a-comprehensive-guide-8beabe20d68/

ImportantThe Three Pillars

To have full Covariance Stationarity, your mental movie needs all three:

  1. Mean: The “Center Line” stays flat.

  2. Variance: The “Tube Width” stays constant.

  3. Covariance: The “Internal Rhythm” (how fast it wiggles) stays consistent.

TipMean Stationarity

1.0.1 1. The “Invisible Flat Line”

Imagine drawing a perfectly straight, horizontal line across your graph at a specific height (e.g., \(y=5\)).

  • Stationary Mean means that this invisible line is the permanent “center of gravity” for your data.

  • No matter how far right you go on the X-axis (into the future), the data points will always tend to return to this line. They might jump above it or dive below it, but they are always “orbiting” this specific level.

1.0.2 2. The “Sliding Window” Test

Imagine you take a window (like a picture frame) and slide it along your X-axis.

  • If the Mean is Stationary: If you calculate the average of all the dots inside your frame, it should be roughly the same number whether the frame is at the beginning, the middle, or the end of the graph.

  • If the Mean is NOT Stationary: As you slide the frame to the right, the average keeps getting higher (an upward trend) or lower.

TipVariance Stationarity

1.0.3 1. The “Tube” or “Tunnel” Analogy

Imagine drawing two boundary lines parallel to your central Mean line—one above it and one below it.

  • Variance Stationarity means your data fits comfortably inside a straight “tube” or “tunnel” of constant width.

  • At \(t=1\), the dots scatter about 5 inches up and down.

  • At \(t=100\), the dots still scatter about 5 inches up and down.

  • The “cloud” of data points doesn’t get fatter or thinner as you move to the right.

1.0.4 2. The Counter-Example: “The Funnel” (Non-Stationary)

The easiest way to understand constant variance is to look at what happens when you don’t have it (Heteroscedasticity).

  • The Megaphone/Funnel: Imagine the data starts tight near the line on the left, but as you move to the right, the dots start swinging wildly further and further away. The “tunnel” becomes a “megaphone.”

  • The Burst: Imagine periods of calm (tight dots) followed by periods of chaos (dots everywhere), like a seismograph during an earthquake. This is non-stationary variance.

1.0.5 3. The “Volume Knob” Analogy

Think of the vertical spikes of your data (\(Y_t - \mu\)) as sound waves.

  • Stationary Variance: The volume knob is left alone. The noise level stays at a steady “background hum.”

  • Non-Stationary Variance: Someone keeps turning the volume knob up and down. Sometimes the signal is quiet; sometimes it is blasting loud.

TipCovariance Stationarity

1.0.6 1. The “Sliding Template” Visualization

Imagine you take a transparent sheet of plastic and trace the “shape” of the data for a short period (say, from \(t=1\) to \(t=5\)).

  • Covariance Stationarity means that if you find that same shape later in the timeline (say, \(t=50\) to \(t=55\)), the relationship between the points inside that shape should be the same.

  • If a “high” point at \(t\) is usually followed by a “low” point at \(t+1\) (negative correlation), this “zigzag” pattern must hold true whether you are looking at 1990, 2000, or 2020.

  • Non-Stationary Covariance: In the 1990s, the data zigzags (high-low-high). In the 2000s, the data becomes “sticky” (high-high-high). The internal relationship changed.

1.0.7 2. The “Lag Plot” Test

Imagine plotting \(Y_t\) on the x-axis and \(Y_{t+1}\) (the next day’s value) on the y-axis.

  • This creates a scatter plot showing how today relates to tomorrow.

  • Stationarity: If you make this scatter plot for the first half of your data, and then make another one for the second half, the two blobs of dots should look identical (same slope, same shape).

  • Non-Stationary: The first blob is a tight upward line (strong correlation), but the second blob is a fuzzy circle (no correlation). This means the covariance structure broke down.

1.0.8 3. The “Elastic Band” Analogy

Think of the connection between \(Y_t\) and \(Y_{t+1}\) as an elastic band.

  • Covariance Stationarity: The “stiffness” of that elastic band never changes. If you pull \(Y_t\) up, \(Y_{t+1}\) gets pulled with the same force, regardless of when it happens.

  • Non-Stationary: Sometimes the band is tight (strong memory), and sometimes it is loose or snapped (random walk).


2 2. Types of Stationarity

Strict Stationarity
- The joint distribution of \((Y_t, Y_{t+k}, …)\) is the same no matter the time index. This is a very strong condition: it means not just the mean/variance, but all higher-order moments (skewness, kurtosis, etc.) are invariant over time.
- Strong requirement, rarely used directly in applications.

Weak (Covariance) Stationarity

Only the first two moments are time-invariant and much easier to satisfy:
- \(E[Y_t] = \mu\) (constant mean)
- \(Var(Y_t) = \sigma^2\) (constant variance)
- \(Cov(Y_t, Y_{t+k})\) depends only on \(k\) (lag)

2.0.1 Key Difference

  • Strict Stationarity → requires invariance of the entire probability distribution.

  • Covariance Stationarity → only requires invariance of the mean, variance, and autocovariance function.

2.0.2 In practice:

  • Econometricians and data scientists usually assume covariance stationarity, because it’s sufficient for most linear time series methods.

  • Strict stationarity is more theoretical and rarely testable with real data.


3 3. Why It Matters

  • Many econometric models (ARMA, ARIMA, VAR, GARCH) require stationarity.
  • Without it:
    • Risk of spurious regressions (high \(R^2\) but meaningless results).
    • Standard errors and tests become unreliable.
  • Stationarity ensures sample moments converge to population moments.

4 4. Sources of Non-Stationarity

  1. Trends
    • Deterministic: linear or polynomial trend.

      Code
      # --- Packages ---------------------------------------------------------------
      suppressPackageStartupMessages({
        library(tseries)       # ADF, KPSS tests
        library(forecast)      # autoplot.ts, ggtsdisplay (optional)
      })
      
      set.seed(123)
      
      # --- Deterministic trend: linear & quadratic --------------------------------
      n          <- 300
      time       <- 1:n
      eps        <- rnorm(n, 0, 1)
      
      y_lin      <- 0.5 + 0.02*time + eps                      # linear trend
      y_poly     <- 0.5 + 0.02*time - 0.00005*time^2 + eps     # polynomial (quadratic)
      
      # --- Stochastic trend: random walk ------------------------------------------
      u          <- rnorm(n, 0, 1)
      y_rw       <- cumsum(u)                                   # random walk (unit root)
      
      # --- Quick plots -------------------------------------------------------------
      par(mfrow = c(1,3))
      plot.ts(y_lin,  main = "Deterministic: Linear Trend", ylab="y")
      plot.ts(y_poly, main = "Deterministic: Quadratic Trend", ylab="y")
      plot.ts(y_rw,   main = "Stochastic: Random Walk", ylab="y")

    • Stochastic: random walk.

  2. Changing Variance
    • E.g., volatility clustering in finance (ARCH/GARCH).

      Code
      # amazon closing price clearly looks like a stochastic trend (unit-root-ish)
      amzn <-
        gafa_stock |>
        filter(Symbol == "AMZN") |>
        as_tsibble(index = Date)
      
      #autoplot(amzn, Close) +
      #  labs(title = "gafa_stock (AMZN): price with stochastic trend", y = "Close")
      
      # difference the log price to remove the stochastic trend
      amzn_ret <-
        amzn |>
        mutate(ret = difference(log(Close))) |>
        drop_na()
      
      #autoplot(amzn_ret, ret) +
      #  labs(title = "AMZN: differenced log price (≈ stationary)", y = "log return")
      
      # rolling volatility visualization
      amzn_ret |>
        mutate(roll_sd = slider::slide_dbl(ret, sd, .before = 20, .complete = TRUE)) |>
        ggplot(aes(Date)) +
        geom_line(aes(y = ret)) +
        geom_line(aes(y = roll_sd)) +
        labs(title = "AMZN: returns and rolling SD (≈ volatility)",
             y = "return / rolling sd", x = NULL)

      Code
      # squared returns often show autocorrelation when volatility clusters
      amzn_ret |>
        mutate(r2 = ret^2) |>
        ACF(r2) |>
        autoplot() +
        labs(title = "AMZN: ACF of squared returns → volatility persistence")

  3. Structural Breaks
    • Policy change, financial crisis, regime shifts.

      Code
      # aggregate half-hourly demand to daily; visualize possible shifts
      elec_daily <-
        vic_elec |>
        index_by(Date = as_date(Time)) |>
        summarise(Demand = sum(Demand))
      
      autoplot(elec_daily, Demand) +
        labs(title = "vic_elec: daily demand (visual check for breaks)", x = NULL, y = "MWh")

      Code
      # real GDP growth for united states; look for break around the GFC
      library(strucchange)
      
      us_growth <-
        global_economy |>
        filter(Country == "United States") |>
        mutate(g = difference(log(GDP))) |>
        drop_na()
      
      autoplot(us_growth, g) +
        labs(title = "United States: GDP growth (log diff of GDP)", y = "growth")

      Code
      # estimate possible break(s) in the mean of growth
      bp <- breakpoints(g ~ 1, data = as.data.frame(us_growth))
      bp
      
           Optimal 4-segment partition: 
      
      Call:
      breakpoints.formula(formula = g ~ 1, data = as.data.frame(us_growth))
      
      Breakpoints at observation number:
      11 21 47 
      
      Corresponding to breakdates:
      0.1964286 0.375 0.8392857 
      Code
      autoplot(us_growth, g) +
        geom_vline(xintercept = year(breakdates(bp, breaks = 1)), linetype = 2, color = "red") +
        labs(title = "Break in mean growth (Bai–Perron 1-break)")


5 5. How to Achieve Stationarity

  1. Detrending: Removes systematic trend component; useful when trend is deterministic

  2. Differencing: Removes stochastic trends; first difference often sufficient. \(\Delta Y_t = Y_t - Y_{t-1}\).

  3. Log Transformations: Applied before other transformations to stabilize variance

  4. Seasonal Differencing: Handle seasonal effects. Use lag = m where m is seasonal period (12 for monthly data)

Code
library(fpp3)

# Create a single dataset with trend, seasonality, and changing variance
set.seed(123)
dates <- yearmonth("2010 Jan") + 0:119
trend <- 1:120
seasonal <- 10 * sin(2 * pi * (1:120) / 12)
noise <- rnorm(120, 0, 2)
y <- exp(0.01 * trend) * (50 + trend + seasonal + noise)

ts_data <- tsibble(
  date = dates,
  value = y,
  index = date
)

# 1. DETRENDING (remove linear trend)
fit <- ts_data %>% model(TSLM(value ~ trend()))
detrended <- augment(fit) %>% select(date, .resid)

p1 <- autoplot(ts_data, value) + 
  labs(title = "Original Series", y = "Value")
p2 <- autoplot(detrended, .resid) + 
  labs(title = "Detrended", y = "Residuals")

# 2. DIFFERENCING
differenced <- ts_data %>% mutate(diff_value = difference(value))

p3 <- autoplot(ts_data, value) + 
  labs(title = "Original Series", y = "Value")
p4 <- autoplot(differenced, diff_value) + 
  labs(title = "First Difference", y = "Δ Value")

# 3. LOG TRANSFORMATION (stabilize variance)
log_transformed <- ts_data %>% mutate(log_value = log(value))

p5 <- autoplot(ts_data, value) + 
  labs(title = "Original Series", y = "Value")
p6 <- autoplot(log_transformed, log_value) + 
  labs(title = "Log Transformed", y = "log(Value)")

# 4. SEASONAL DIFFERENCING
seasonal_diff <- ts_data %>% mutate(seas_diff = difference(value, lag = 12))

p7 <- autoplot(ts_data, value) + 
  labs(title = "Original Series", y = "Value")
p8 <- autoplot(seasonal_diff, seas_diff) + 
  labs(title = "Seasonal Difference (lag=12)", y = "Change in y_{12} Value") # \Delta
  
# Display all plots
library(patchwork)
(p1 / p2) | (p3 / p4)

Code
(p5 / p6) | (p7 / p8)


6 6. Testing for Stationarity

  1. Visual Inspection: Time plot.

  2. Correlogram (ACF/PACF): Slowly decaying ACF \(\Rightarrow\) likely non-stationary.

  3. Formal Tests:

    • Augmented Dickey-Fuller (ADF)
    • Phillips-Perron (PP)
    • KPSS (tests stationarity as null)
Code
library(forecast) 
library(tseries)

set.seed(123) 
y1 <- arima.sim(model = list(ar = 0.6),
                n = 200) # stationary AR(1) 
y2 <- cumsum(rnorm(200)) # non-stationary random walk

#par(mfrow = c(2,1)) 
plot.ts(y1, main = "Stationary AR(1) Process") 

Code
plot.ts(y2, main = "Non-Stationary Random Walk")

6.1 Three Formal Tests - Key Differences:

6.1.1 1. Augmented Dickey-Fuller (ADF)

Null Hypothesis: Series has a unit root (non-stationary) Reject null → Series is stationary Specification: Tests for unit root with optional trend/drift terms Usage: Most common; good for trend-stationary processes Weakness: Low power against near-unit-root processes

6.1.2 2. Phillips-Perron (PP)

Null Hypothesis: Series has a unit root (non-stationary) Reject null → Series is stationary Specification: Non-parametric correction for serial correlation Difference from ADF: Robust to heteroskedasticity and serial correlation without adding lags Usage: Better when error structure is complex

6.1.3 3. KPSS (Kwiatkowski-Phillips-Schmidt-Shin)

Null Hypothesis: Series is stationary (OPPOSITE of ADF/PP!) Reject null → Series is non-stationary Specification: Tests stationarity around deterministic trend or level Usage: Confirmatory test; use WITH ADF/PP for robustness Strategy: If ADF rejects non-stationarity AND KPSS doesn’t reject stationarity → strong evidence for stationarity

Tip

Use KPSS and ADF together.

Agreement = clear conclusion.

Disagreement = borderline case requiring judgment.

7 7. White Noise vs Random Walk

7.0.1 A. White Noise Process

A white noise process is the simplest stationary process:

\[Y_t = \varepsilon_t\]

where \(\varepsilon_t \sim \text{iid}(0, \sigma^2)\) (independent and identically distributed).

Properties:

  • Mean: \(E(Y_t) = 0\)

  • Variance: \(\text{Var}(Y_t) = \sigma^2\) (constant)

  • Autocorrelation: \(\rho_k = 0\) for all \(k \neq 0\)

  • Stationary: Mean and variance don’t change over time

7.0.2 B. Random Walk Process

A random walk is a non-stationary process:

\[Y_t = Y_{t-1} + \varepsilon_t\]

where \(\varepsilon_t \sim \text{iid}(0, \sigma^2)\).

Alternatively, expanding recursively from initial value \(Y_0\):

\[Y_t = Y_0 + \sum_{i=1}^{t} \varepsilon_i\]

Properties:

  • Mean: \(E(Y_t) = Y_0\) (constant if \(Y_0 = 0\))

  • Variance: \(\text{Var}(Y_t) = t\sigma^2\) (increases with time!)

  • Non-stationary: Variance is time-dependent

  • First difference is stationary: \(\Delta Y_t = Y_t - Y_{t-1} = \varepsilon_t\) (white noise)

ImportantKey Insight
  • Random walk = cumulative sum of white noise.

  • Differencing a random walk recovers the underlying white noise process.

8 Appendix

8.1 More Rigorous Introduction

The structural analysis of temporal data requires a rigorous mathematical foundation to ensure that the statistical properties inferred from historical observations remain valid for future predictions. Central to this foundation is the concept of stationarity, a condition where the statistical characteristics of a stochastic process are invariant to shifts in time.

In the broader context of time series analysis, stationarity serves as the primary gateway for applying the laws of large numbers and central limit theorems to dependent data, thereby facilitating robust estimation and forecasting. While the theoretical definitions of mean and covariance stationarity are mathematically precise, their practical application extends into the complex realms of financial engineering and life-cycle planning.

8.2 Theoretical Foundations of Stationary Processes

A stochastic process is formally defined as a family or collection of random variables \(\{X_t, t \in T\}\) indexed by a set \(T\), which typically represents time. These variables are defined on a common probability space \((\Omega, \mathcal{F}, P)\). The realization of such a process at a specific time \(t\) is a random variable \(x(t, \omega)\), where \(\omega\) represents an outcome in the event space \(\Omega\). Stationarity, at its core, is the property that the statistical “rules” governing the generation of these random variables do not change as the index \(t\) progresses.

8.3 Mechanics of Autocovariance and Correlation

The autocovariance function \(\gamma(h)\) is the primary tool for measuring linear dependence within a stationary process. Under the assumption of covariance stationarity, \(\gamma(h)\) exhibits several essential mathematical properties :

  • Symmetry: The autocovariance is an even function, meaning \(\gamma(h) = \gamma(-h)\). This reflects the fact that the correlation between two points separated by \(h\) units of time is the same regardless of the direction of the shift.

  • Maximum Value at Zero Lag: The absolute value of the autocovariance at any lag \(h\) cannot exceed the variance of the process: \(|\gamma(h)| \le \gamma(0)\). This is derived from the Cauchy-Schwarz inequality.

  • Normalization: To facilitate comparisons across different processes, the autocovariance is often normalized to produce the autocorrelation function (ACF), denoted as \(\rho(h) = \frac{\gamma(h)}{\gamma(0)}\)

The values of \(\rho(h)\) are bounded within the range \([-1, 1]\), providing a scale-free measure of temporal dependence.

ImportantMoments

In statistics, moments are used to describe the shape and characteristics of a probability distribution. They are categorized into raw moments (about the origin), central moments (about the mean), and standardized moments.

The formulas for the first four primary moments are:

  • Mean (1st Raw Moment): Represents the central tendency or balancing point of the distribution.

    \(\mu = \mathbb{E}[X]\).

  • Variance (2nd Central Moment): Measures the spread or variability of the data around the mean.

    \(\sigma^2 = \mathbb{E}[(X - \mu)^2]\).

  • Skewness (3rd Standardized Moment): Quantifies the asymmetry of the distribution. A positive value indicates a longer right tail, while a negative value indicates a longer left tail.

    \(\gamma_1 = \mathbb{E}[(\frac{X - \mu}{\sigma})^3]\).

  • Kurtosis (4th Standardized Moment): Measures the “tailedness” of the distribution, describing the thickness of the tails relative to a normal distribution.

    \(\beta_2 = \mathbb{E}[(\frac{X - \mu}{\sigma})^4]\).

Moments that follow kurtosis, such as the 5th and 6th orders, are collectively referred to as higher moments or higher-order moments. While these moments can be used to estimate further shape parameters, they have very little practical use in standard probability and statistics compared to the first four.

TipMoments Math

There is a unifying mathematical framework that defines all moments as the expected value of a random variable raised to a power.

The general formula for the \(k^{th}\) moment of a random variable \(X\) about a constant value \(c\) is:

\[\mu_k(c) = \mathbb{E}[(X - c)^k]\]

For a continuous random variable with a probability density function \(f(x)\), this is expressed as:

\[\mathbb{E}[(X - c)^k] = \int_{-\infty}^{\infty} (x - c)^k f(x) dx\]

Depending on the value chosen for \(c\), this formula generates the three main types of moments:

  1. Raw Moments (\(c = 0\)): The \(k^{th}\) raw moment is \(\mathbb{E}[X^k]\). The first raw moment (\(k=1\)) is the mean.

  2. Central Moments (\(c = \mu\)): The \(k^{th}\) central moment is \(\mathbb{E}[(X - \mu)^k]\), where \(\mu\) is the mean. The second central moment (\(k=2\)) is the variance.

  3. Standardized Moments: These are central moments normalized by the standard deviation (\(\sigma\)) to make them scale-invariant: \(\mathbb{E}[(\frac{X - \mu}{\sigma})^k]\). The third and fourth standardized moments are skewness and kurtosis, respectively.

The Unifying Function: Moment Generating Function (MGF)

The Moment Generating Function \(M_X(t)\) is considered a “unifying” tool because it encodes all raw moments into a single power series. It is defined as:

\[M_X(t) = \mathbb{E}[e^{tX}] = \sum_{k=0}^{\infty} \mathbb{E}[X^k] \frac{t^k}{k!}\]

You can derive any \(k^{th}\) raw moment by taking the \(k^{th}\) derivative of the MGF with respect to \(t\) and evaluating it at \(t = 0\):

\[\mathbb{E}[X^k] = \frac{d^k}{dt^k} M_X(t) \Big|_{t=0}\]