A time series is a series of data points captured in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. This post is the first in a series of blogs on time series methods and forecasting.
In this blog, we will discuss about stationarity, random walk, deterministic drift and other vocabulary which form as foundation to time series:
A random or stochastic process is a collection of random variables ordered in time. It is denoted as \(Y_t\). For example, in-time of an employee is a stochastic process. How is in-time a stochastic process? Consider the in-time on a particular day is 9:00 AM. In theory, the in-time could be any particular value which depends on many factors like traffic, work load, weather etc. The figure 9:00 AM is a particular realization of many such possibilities. Therefore we can say that in-time is a stochastic process where as the actual values observed are a particular realization (sample) of the process.
A stochastic process is said to be stationary if the following conditions are met:
1. Mean is constant over time
2. Variance is constant over time
3. Value of the co-variance between two time periods depends only on the distance or gap or lag between the two time periods and not the actual time at which the co variance is computed
This type of process is also called weakly stationary, or co variance stationary, or second-order stationary or wide sense stationary process.
Written mathematically, the conditions are: \[ Mean: E(Y_t) = \mu \] \[ Variance: var(Y_t) = E(Y_t-\mu)^2 = \sigma^2 \] \[ Covariance: \gamma_k = E[(Y_y - \mu)(Y_{t+k} - \mu)] \]
A stochastic process is purely random if it has zero mean, constant variance, and is serially uncorrelated. An example of white noise is the error term in a linear regression which has zero mean, constant standard deviation and no auto-correlation.
date | realization_1 | realization_2 | realization_25 | realization_50 | realization_100 | |
---|---|---|---|---|---|---|
1 | 2019-12-16 | 0.7356201 | 0.2374115 | 0.0360584 | 0.8730372 | 0.5718014 |
2 | 2019-12-17 | 0.1441992 | 0.5452946 | 0.6921414 | 0.7099068 | 0.1587868 |
3 | 2019-12-18 | 0.3230618 | 0.1497708 | 0.3391369 | 0.0973547 | 0.6085889 |
10 | 2019-12-25 | 0.1017506 | 0.4812825 | 0.7688191 | 0.1277465 | 0.1499435 |
15 | 2019-12-30 | 0.1308073 | 0.2781965 | 0.1058099 | 0.2748190 | 0.7266108 |
30 | 2020-01-14 | 0.2226795 | 0.6059500 | 0.2601266 | 0.6362089 | 0.4759561 |
The mean, variance and co-variance between the samples (realizations) across are as follows:
For a stationary process, the mean, variance and co variance are constant.
If a time series is not stationary, it is called a non-stationary time series. In other words, a non-stationary time series will have a time-varying mean or a time-varying variance or both. Random walk, random walk with drift etc are examples of non-stationary processes.
Suppose \(\epsilon_t\) is a white noise error term with mean 0 and variance \(σ_2\). Then the series \(Y_t\) is said to be a random walk if \[ Y_t = Y_{t−1} + \epsilon_t \] In the random walk model, the value of Y at time t is equal to its value at time (t − 1) plus a random shock.
For a random walk, \[ Y_1 = Y_0 + \epsilon_1 \] \[ Y_2 = Y_1 + \epsilon_2 = Y_0 + \epsilon_1 + \epsilon_2 \] \[ Y_3 = Y_2 + \epsilon_3 = Y_0 + \epsilon_1 + \epsilon_2 + \epsilon_3 \] and so on.. In general we could write
\[ Y_t = Y_0 + \sum \epsilon_t \] As \[ E(Y_t) = E(Y_0 + \sum \epsilon_t) = Y_0 \] \[ var(Y_t) = t\times \sigma^2 \]
Although the mean is constant with time, the variance is proportional to time.
date | realization_1 | realization_2 | realization_25 | realization_50 | realization_100 | |
---|---|---|---|---|---|---|
1 | 2019-12-16 | 4.000000 | 4.000000 | 4.000000 | 4.000000 | 4.000000 |
2 | 2019-12-17 | 3.882959 | 3.116363 | 4.015556 | 2.224053 | 3.472241 |
3 | 2019-12-18 | 3.586484 | 3.178970 | 5.510334 | 1.651408 | 2.836190 |
10 | 2019-12-25 | 3.423350 | 5.718359 | 5.279429 | 4.355010 | 4.016813 |
15 | 2019-12-30 | 4.152690 | 5.739801 | 7.333779 | 2.978225 | 2.694669 |
30 | 2020-01-14 | 2.958519 | 4.031016 | 9.864177 | 3.122688 | -4.095760 |
The mean, variance and covariances between the samples (realizations) across time would look like follows:
From the above plot, the mean of Y is equal to its initial, or starting value, which is constant, but as t increases, its variance increases indefinitely, thus violating a condition of stationarity.
A random walk process is also called as a unit root process.
date | realization_1 | realization_2 | realization_25 | realization_50 | realization_100 | |
---|---|---|---|---|---|---|
1 | 2019-12-16 | 4.000000 | 4.000000 | 4.000000 | 4.000000 | 4.000000 |
2 | 2019-12-17 | 4.916676 | 3.445681 | 3.732304 | 3.489562 | 4.687424 |
3 | 2019-12-18 | 4.406755 | 4.341785 | 5.879230 | 3.927224 | 5.858994 |
10 | 2019-12-25 | 5.271076 | 3.892017 | 5.619834 | 8.790161 | 9.149255 |
15 | 2019-12-30 | 5.264659 | 4.524672 | 6.942081 | 10.377750 | 12.176134 |
30 | 2020-01-14 | 11.929499 | 15.110744 | 17.040556 | 20.335734 | 20.527694 |
The mean, variance and the co-variance are all dependent on time.
Unit root stochastic process is another name for Random walk process. A random walk process can be written as \[ Y_t = \rho \times Y_{t−1} + \epsilon_t \] Where \(\rho = 1\). If \(|\rho| < 1\) then the process represents Markov first order auto regressive model which is stationary. Only for \(\rho = 1\) we get non-stationary. The distribution of mean, variance and co-variance for \(\rho =0.5\) is
date | realization_1 | realization_2 | realization_25 | realization_50 | realization_100 | |
---|---|---|---|---|---|---|
1 | 2019-12-16 | 0.7844548 | 1.351543 | 1.959021 | -0.5513578 | 1.592412 |
2 | 2019-12-17 | 3.5492975 | 1.517891 | 1.491118 | 1.1883721 | 1.076729 |
3 | 2019-12-18 | 3.9577508 | 2.623527 | 4.115253 | 2.0636018 | 3.616692 |
10 | 2019-12-25 | 11.7488688 | 9.578051 | 10.242385 | 8.4459909 | 11.751336 |
15 | 2019-12-30 | 13.6550047 | 15.117834 | 15.668208 | 14.9922993 | 18.064545 |
30 | 2020-01-14 | 29.6632534 | 29.150716 | 30.654134 | 30.3425522 | 28.767587 |
A combination of deterministic and stochastic trend could also exist in a process.
A comparison of all the processes is shown below: