T K Chakrabarty
2026-04-03
We have seen a number of examples of time series in our last two chapters. We can now say that a time series is a collection of observations sequentially in time.
Our interest will not be in such series that are deterministic but rather in those whose values behave according to the laws of probability.
As such, each observation \(x_t\) at time t, of a time series is a realization of a random variable \(X_t\). In this chapter, we will discuss the fundamentals involved in the statistical analysis of time series. To begin, we must be more careful in our definition of a time series. Actually, a time series is a special type of stochastic process.
A time series is a stochastic process \(\{X_t|t\in T\}\), a collection of random variables \(X_t\) sequentially over a time index set \(T\). If \(T\) takes on values on the set \(T=\{0, 1, 2,...\}\) or \(T=\{0,\pm 1,\pm 2,...\}\), we refer to as discrete parameter time series.In case, \(T=(-\infty ,\infty )\) or \(T=(0, \infty)\), the series is continuous parameter process.
A time series model for the observed data \(\{x_t\}\) is a specification of the joint distributions (or possibly only the means and covariances) of a sequence of random variables \(\{X_t\}\) of which \(\{x_t\}\) is postulated to be a realization.
A complete probabilistic time series model for the sequence of random variables \(\{X_1,X_2, . . .\}\) is avoided.
Instead we specify only the first- and second-order moments of the joint distributions, i.e., the expected values \(E(X_t)\) and the expected products \(E(X_{t+h}X_t ), t = 1, 2, . . ., h = 0, 1, 2, . . .\), focusing on properties of the sequence \(\{Xt\}\) that depend only on these. Such properties of \(\{X_t\}\) are referred to as second-order properties.
We now discuss various measures that describe the general behavior of a time series process as it evolves over time. While defining these measures, we shall be restricting our attention only to the second-order properties as stated before.
Let \(\{X_t\}\) be a time series with \(E(X_t)^2<\infty\).
The mean function of \(\{Xt\}\) is \(\mu_X(t)= E(X_t)\).
The covariance function of \(\{X_t\}\) is
\(\gamma_X(r, s) = Cov(X_r,X_s)\)
\(= E[(X_r-\mu_X(r))(X_s-\mu_X(s))]\)
for all integers r and s.
A time series with finite variance process where
the mean value function, \(\mu_X(t)\), is constant and is independent of t, and
the autocovariance function, \(\gamma(r,s)\) depends on times s and t only through their time difference or lag \(|s-r|=h\).
In view of the condition (ii), whenever we use the term covariance function with reference to a stationary time series \(\{X_t\}\), we shall mean the function \(\gamma_X\) of one variable, defined by
\(\gamma_X(h) = \gamma_X(t+h,t)\).
The function \(\gamma_X(.)\) will be referred to as the autocovariance function and \(\gamma_X(h)\) as its value at lag h.
Let \(\{X_t\}\) be a stationary time series. The autocovariance function (ACVF) of \(\{X_t\}\) at lag h is defined as \[\begin{equation} \gamma_X(h) =\gamma_X(t + h, t)= Cov(X_{t+h},X_t). \end{equation}\] The autocorrelation function (ACF) of \(\{X_t\}\) at lag h is \[\begin{equation} \rho_X(h) = \frac{\gamma_X(h)}{\gamma_X(0)}= Cor(X_{t+h},X_t). \end{equation}\]
correlation measures the linear association between a pair of variables, and is obtained by standardising the covariance, by dividing the covariance by the standard deviations of the variables.
A value of \(+1\) or \(-1\) indicates an exact linear association with the pairs falling on a straight line of positive or negative slope respectively.
In time series, observations tend to be serially correlated, the measure of linear dependence is called as autocorrelation.
The adjective “auto” ,which means self, is used to refer to the relation between the same variable at different time points.
Because it is a correlation, we have \(1\le \rho(h)\le 1\) for all \(h\), enabling one to assess the relative importance of a given autocorrelation value by comparing with the extreme values \(-1\) and \(1\).
In this section, we will learn the properties of the autocovariance and autocorrelation functions for stationary time series. If a time series is weakly stationary, then the autocovariance function only depends on h. Thus, for stationary processes, we denote this autocovariance function by \(\gamma(h)\). Similarly, the autocorrelation function for a stationary process is given by \(\rho(h)=\frac{\gamma(h)}{\gamma(0)}\) . The autocovariance function of a stationary time series satisfies the following properties:
Theorem
\(\gamma(0)\ge 0\).
\(|\gamma(h)|\le\gamma(0)\)for all \(h\).
\(\gamma(.)\) is even, i.e., \(\gamma(h)=\gamma(-h)\) for all h.
The function \(\gamma(.)\) is positive semidefinite. That is, for any set of time points \(t_1, t_2,...,t_k \in T\) and and all real \(a_1, a_2,...,a_k\), we have \[\begin{equation} \sum_{i=1}^{k}\sum_{j=1}^{k}a_i\gamma(t_i-t_j)a_j\ge0. \end{equation}\]
Proof. The first property is simply the statement that \(\gamma(0)=Var(X_t)\ge0\), the second is an immediate consequence of the fact that correlations are less than or equal to \(1\) in absolute value (or the Cauchy–Schwarz inequality), and the third is established by observing that \[\begin{equation*} \gamma(h)=Cov(X_{t+h},X_t)=Cov(X_t,X_{t+h})=\gamma(-h). \end{equation*}\] To prove (4), let \(W=\sum_{i=1}^{k}a_iX_(t_i)\). Now, \[\begin{align*} Var(W)&\ge 0\\ aD(X)a^T&\ge 0\\ \sum_{i=1}^{k}\sum_{j=1}^{k}a_i\gamma(t_i-t_j)a_j&\ge 0 \end{align*}\] and the result follows.
The equation (3.3) is equivalent to the following autocovariance matrix \[\begin{align} \Gamma_k & = \begin{pmatrix} 1 & \gamma_1 & \cdots & \gamma_k \\ \gamma_1 & 1 & \cdots & \gamma_{k-1} \\ \vdots & \vdots & \vdots & \vdots \\ \gamma_k & \gamma_{k-1} & \cdots & 1 \\ \end{pmatrix}, \end{align}\] is positive semidefinite for each k.
The autocorrelation function satisfies the following analogous properties: Theorem
\(\rho(0) = 1\).
\(|\rho(h)| \le 1\)for all \(h\).
\(\gamma(.)\) is even, i.e., \(\rho(h)=\rho(-h)\) for all h.
The function \(\rho(h)\) is positive semidefinite, and the matrix \[\begin{align} \mathrm{P}_k & = \begin{pmatrix} 1 & \rho_1 & \cdots & \rho_k \\ \rho_1 & 1 & \cdots & \rho_{k-1} \\ \vdots & \vdots & \vdots & \vdots \\ \rho_k & \rho_{k-1} & \cdots & 1 \\ \end{pmatrix}, \end{align}\] is positive semidefinite for each k.
For data analysis, only the sample values, \(x_1, x_2, . . . , x_n,\) are available for estimating the mean, autocovariance, and autocorrelation functions. In this case, the assumption of stationarity becomes critical and allows the use of averaging to estimate the population mean and covariance functions. Accordingly, if a time series is stationary, the mean function \(\mu_t = \mu\) is constant so we can estimate it by the sample mean, \[\begin{equation} \overline{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \end{equation}\] The sample autocovariance function is defined as
\[\begin{equation} \hat{\gamma}(h) = \frac{1}{n} \sum_{t=1}^{n-h} (x_{t+h} - \bar{x})(x_t - \bar{x}) \end{equation}\] for \(h=0,1,2,...,n-1\). The sample variance function is given by \[\begin{equation} \hat{\gamma}(0) = \frac{1}{n} \sum_{t=1}^{n} (x_{t} - \bar{x})^2 \end{equation}\]
The sample autocorrelation function is defined as \[\begin{equation} \hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma}(0)} = \frac{\sum_{t=1}^{n-h} (x_{t+h} - \bar{x})(x_t - \bar{x})}{\sum_{t=1}^{n} (x_t - \bar{x})^2} \end{equation}\] for \(h=0,1,...n-1\). The sum in the numerator above runs over a restricted range because \(x_{t+h}\) is not available for \(t + h > n\). Note that we are in fact estimating the autocovariance function by \(\hat{\gamma}(h)\), with \(\hat{\gamma}(-h)=\hat{\gamma}(h)\) for \(h=0,1,...,n-1\). That is, we divide by \(n\) even though there are only \(n-h\) pairs of observations at lag h, \(\{(x_{t+h},x_t);t=1,...,n-h\}\).
This assures that the sample autocovariance function will behave as a true autocovariance function, and for example, will not give negative values when estimating \(var(\overline(x))\) by replacing \(\gamma_x(h)\) with \(\hat{\gamma}_x(h)\).
If \(x_t\) is white noise, then for large \(n\) and under mild conditions, the sample ACF, \(\hat{\gamma_x}(h)\), for \(h = 1, 2, . . . , H\), where \(H\) is fixed but arbitrary, is approximately normal with zero mean and standard deviation given by \(\frac{1}{\sqrt{n}}\).
Based on this property, we obtain a rough method for assessing whether a series is white noise by determining how many values of \(\hat{\rho}(h)\) are outside the interval \(\pm 1.96\frac{1}{\sqrt{n}}\) or (two standard errors); for white noise, approximately 95% of the sample ACFs should be within these limits.
A correlogram is a graphical tool used in time series analysis to study the dependence structure across time.
Core Idea: In a time series, observations are often correlated with their past values. A correlogram displays how this correlation changes as the time lag increases.
What is plotted? A plot of \(\rho (k)\) against \(k\) is called the correlogram.
It is important to understand the extent to which a plot of the autocorrelations for a given model describes the behavior in time series realizations from that model. We shall now consider a few basic stationary models to illustrate the behaviour of their correlation function.
The horizontal axis shows \(lag(h)\).
The vertical axis shows autocorrelation values (between \(−1\) and \(+1\))
Bars (or spikes) represent correlation at each lag
Confidence bands help identify statistically significant correlations
Slow decay of autocorrelation → indicates trend / non-stationarity
Sharp cutoff after a few lags → suggests a short-memory process
Oscillating pattern → indicates seasonality or cyclic behavior
No significant spikes → resembles white noise
Helps detect serial dependence
Guides identification of models like AR, MA, or ARMA
Assists in checking stationarity and model adequacy
provides a compact visual summary of how the past influences the present in a time series.
In practice, correlogram analysis relies on sample (empirical) counterparts of theoretical quantities, because the true distribution of the time series is unknown.
Why sample quantities are used? Population quantities are unknown
The theoretical ACF depends on:
These are unobservable, so we estimate them using sample moments.
Consistency and large-sample justification
Approximate sampling distribution
Let \(\{Xt\}\) be a sequence of uncorrelated random variables, each with zero mean and variance \(\sigma^2 < \infty\). Such a sequence is referred to as white noise (with mean 0 and variance \(\sigma^2\)). This is indicated by the notation \(X_t \sim WN(0,\sigma^2)\). Since the random variables are uncorrelated, the autocovariance function of the process can be written as, \[ \gamma_X(t+h, t) = \begin{cases} \begin{split} \sigma^2&,\text{if} & h=0, \\ 0 &, \text{if }& h\neq 0. \end{split} \end{cases} \] which does not depend on t . Hence white noise with finite second moment is stationary.
The following figure shows 200 simulated values of normally distributed \(iid (0, 1)\), denoted by \(iid N(0, 1)\), noise. Also, shows the corresponding sample autocorrelation function and partial autocorrelation function at lags \(0, 1, . . . , 40\). Since \(\rho(h)=0\) for h > 0, one would also expect the corresponding sample autocorrelations to be near 0.
200 simulated values of iid N(0,\(1\)) noise
The time plot conveys an apparent stationarity in the proess around the mean line passing through 0 and more or less a stable constant variance.
There is no evidence of presence of any significant sample autocorrelation.
Similarly no evidence of any significant partial autocorrelation.
Hence one can safely conclude that the process from which the data is generated is a iid process.
our second example consists of realizations from a stationary AR(1) process. The series is plotted in the upper panel of the accompanying figure below which shows a wandering or piecewise trending behavior. Note that it is typical for \(x_t\) and \(x_{t+1}\) to be relatively close to each other;that is, the value of \(X_{t+1}\) is usually not very far from the value of \(X_t\), and, as a consequence, there is a rather strong positive correlation between the random variables \(X_t\) and say \(X_{t+1}\). Note also that, for large lags, \(k\), there seems to be less correlation between \(X_t\) and \(X_{t+k}\) to the extent that there is very little correlation between \(X_t\) and say \(X_{t+40}\). We see this behavior manifested in the down panels of the Figure, which displays the true autocorrelations associated with the model from which the realizations were generated. In this plot \(\rho_1 ≈ 0.96\) while as the lag increases, the autocorrelations decrease, and by lag 40 the autocorrelation has decreased to \(\rho_{40} ≈ 0.1\).
Simulated data from an AR(1) process and corresponding sample ACF and PACF
\(\hat{\rho}(h)\) as a function of h is a very slowly decreasing function of h
for moderately high values of h, \(\hat{\rho}(h)\) is considerably high, indicating an \(AR(1)\) model with the value of AR coefficient to be close to \(1\).
This intuition is validated by the plot of sample ACF, which shows higher order autocorrelations are of considerable magnitude that contradicts stationarity of the time series.
The plot of partial autocorrelation coefficient is not able to convey anything more.