Univariate Time-Series Modelling and Forecasting

Introduction:

Univariate time-series models are a class of specifications where one attempts to model and to predict financial variables using only information contained in their own past values and possibly current and past values of an error term.

Time series models are usually a-theoretical, implying that their construction and use is not based upon any underlying theoretical model of the behaviour of a variable

Some Notation and Concepts

Stationary process:

Is one whose properties and behaviours do not depend on the time at which the series is observed

A Strictly Stationary Process:

A strictly stationary process is one where, for any \(t_1, t_2, ..., t_T ∈ Z, \;\text{any}\; k ∈ Z \; \text{and} \; T = 1,2, ...\) \[F_{y_{t_1},y_{t_2},...,y_{t_T}}(y_{t_1}, y_{t_2}, ..., y_{t_T})=F_{y_{{t_1}+k},y_{{t_2}+k},...,y_{{t_T}+k}}(y_{t_1}, y_{t_2}, ..., y_{t_T})\] where F denotes the joint distribution function of the set of random variables

A series is strictly stationary if the distribution of its values remains the same as time progresses, implying that the probability that y falls within a particular interval is the same now as at any time in the past or the future.

A Weakly Stationary Process:

If a series satisfies \((1) − (3)\) for \(t = 1, 2, . . . , \infty\), it is said to be weakly or covariance stationary

  1. \(\mathbb E(y_t) = \mu\) A stationary process should have a constant mean
  2. \(\mathbb E(y_t-\mu)(y_t-\mu)= \sigma^2 < \infty\) A stationary process should have a constant variance
  3. \(\mathbb E(y_{t_1}-\mu)(y_{t_2}-\mu)\)

A stationary process should have a constant covariance. The auto-co-variances determine how y is related to its previous values, and for a stationary series they depend only on the difference between \((t_1 \;\text{and}\; t_2 )\rightarrow (y_t\) and \(y_{t−1})\) is the same as the covariance between \(y_{t−10}\) and \(y_{t−11}\)

The moment $E(y_t-E(y_t))(y_{t-s}-E (y_{t-s}))=_s<$

is known as the autocovariance function. When \(s = 0\), the autocovariance at lag zero is obtained, which is the autocovariance of \(y_t\) with \(y_t\), i.e., the variance of y. These covariances, \(γ_s\), are also known as autocovariances since they are the covariances of y with its own previous values. The autocovariances are not a particularly useful measure of the relationship between y and its previous values, however, since the values of the autocovariances depend on the units of measurement of \(y_t\), and hence the values that they take have no immediate interpretation.

It is thus more convenient to use the autocorrelations, which are the autocovariances normalised by dividing by the variance \[\tau _s=\frac{\gamma_s}{\gamma_0}\;\;\;s=0,1,2,...\] The series \(\tau_s\) now has the standard property of correlation coefficients that the values are bounded to lie between ±1. In the case that s = 0, the autocorrelation at lag zero is obtained, i.e., the correlation of \(y_t\) with \(y_t\), which is of course 1. If \(\tau_s\) is plotted against s = 0,1,2, …, a graph known as the autocorrelation function (acf) or correlogram is obtained.

Remark:

  • Trend and seasonal time-series are non-stationary
  • Stationary time series have no predictable patterns in the long-run
  • The time series with cyclic behaviours, but the cycle is appearance with no trend and seasonality is also stationary
  • ACF values of stationary process decrease quickly
  • The non-stationary decrease slowly

White noise process:

A white noise process (for disturbance errors) is a stationary process, with no discernible (perceptible) structure:

  1. \(\mathbb E(u_t)=\mu_u ;\;\;t =1,2,..,\infty\) White noise process has constant mean

  2. \(\text{var}(u_t)=\sigma^2_u < \infty\) White noise process has constant variance

  3. \(\gamma_s=\left\{\begin{matrix} \sigma^2_u &\text{if}\; s=0 \\ 0 & \text{if}\; s \neq 0 \end{matrix}\right.\)

Each observation is uncorrelated with all other values in the sequence. Thus a white noise process has zero auto-co-variances, except at lag zero. Another way to state this last condition would be to say that each observation is uncorrelated with all other values in the sequence.

Hence the autocorrelation function for a white noise process will be zero apart from a single peak of 1 at s = 0. \[\tau_s=\frac{\gamma_s}{\gamma_0}=\left\{\begin{matrix} 1 &\text{if}\; s=0 \\ 0 & \text{if}\; s \neq 0 \end{matrix}\right.\] If \(\mu_u=0\), and the three conditions hold, the process is known as zero mean white noise.

Furthermore, If it is assumed that \(\mu_t\) is distributed normally, then the sample autocorrelation coefficients are also approximately normally distributed

\[\hat{\tau}\sim \mathbb N(0, \frac{1}{T})\] where T is the sample size, and \(\hat{\tau}\) denotes the autocorrelation coefficient at lag s estimated from a sample

Hypothesis testing for \(\tau_S\):

Constructing a non-rejection region:

\[\begin{matrix} H_0: \tau_s=0\\ H_A: \tau_s\neq 0 \end{matrix}\] \(95\%\) non - rejection region: \((-1.96\times\frac{1}{\sqrt{T}},1.96\times\frac{1}{\sqrt{T}})\)

If the sample autocorrelation coefficient \(\tau_s\) falls outside this region for a given value of s, then the null hypothesis that the true value of the coefficient at that lag s is zero is rejected.

Alternatively, we could calculate \(TS=\hat{\tau_s}\times\sqrt T\)

  • \(|TS|> Z_{\frac{\alpha}{2}}\longrightarrow\) Reject \(H_0\)
  • \(|TS|\leq Z_{\frac{\alpha}{2}}\longrightarrow\) Fall to reject \(H_0\)

Box-Pierce test

  • \(H_0: \tau_{s_1}=0, \tau_{s_2}=0, ..., \tau_{s_m}=0\)
  • \(H_a\): otherwise
  • \(TS=Q=T\times \Sigma^{m}_{k=1}\hat{\tau_{s_k}}^2\sim\chi^2_{m}\) where T = sample size, m = maximum lag length
  • \(|TS|>\chi^{2}_{a,m} \rightarrow\) Reject \(H_0\)
  • \(|TS|\leq \chi^{2}_{a,m}\rightarrow\) Fall to reject \(H_0\)

As for any joint hypothesis test, only one autocorrelation coefficient needs to be statistically significant for the test to result in a rejection. However, the Box– Pierce test has poor small sample properties, implying that it leads to the wrong decision too frequently for small samples. A variant of the Box–Pierce test, having better small sample properties, has been developed. The modified statistic is known as the Ljung–Box (1978) statistic.

Ljung-Box test

  • \(H_0:\tau_{s_1}=0, \tau_{s_2}=0, \tau_{s_3}=0, ..., \tau_{s_m}=0\)
  • \(H_a: \text{Otherwise}\)

\[TS = Q^*=T\times(T+2)\times\sum^{m}_{k=1}\frac{\hat{\tau_{S_k}}^2}{T-K}\sim\chi^2_m\] It should be clear from the form of the statistic that asymptotically (that is, as the sample size increases towards infinity), the (T + 2) and (T − k) terms in the Ljung–Box formulation will cancel out, so that the statistic is equivalent to the Box–Pierce test. This statistic is very useful as a portmanteau (general) test of linear dependence in time series.

Example:

Suppose that a researcher had estimated the first five autocorrelation coefficients using a series of length 100 observations, and found them to be

lag 1 2 3 4 5
Autocorrelation coefficient 0.207 -0.013 0.086 0.005 -0.022

Test each of the individual correlation coefficients for significance, and test all five jointly using the Box–Pierce and Ljung–Box tests.

  1. Constructing a non-rejection region: 95% non-rejection region: \((-1.96\times\frac{1}{\sqrt T}, 1.96\times\frac{1}{\sqrt T})\) where T = 100 in this case. The decision rule is thus to reject the null hypothesis that a given coefficient is zero in the cases where the coefficient lies outside the range (−0.196,0.196). For this example, it would be concluded that only the first autocorrelation coefficient is significantly different from zero at the 5% level.

  2. Box-Pierce and Ljung-Box test:Turning to the joint tests, the null hypothesis is that all of the first five autocorrelation coefficients are jointly zero, i.e.

\[H_0: \tau_{S_1}=0, \tau_{S_2}=0, \tau_{S_3}=0, \tau_{S_4}=0, \tau_{S_5}=0\] or simply \[H_0: \tau_{1}=0, \tau_{2}=0, \tau_{3}=0, \tau_{4}=0, \tau_{5}=0\] The test statistics for the Box–Pierce and Ljung–Box tests are given respectively, as

\[Q=T\sum^{m}_{k=1}\hat{\tau_{S_k}}^2=100×(0.207^2 + −0.013^2+ 0.086^2 + 0.005^2 + −0.022^2=5.09\] \[Q^*=T(T+2)\times\sum^{m}_{k=1}\frac{\hat{\tau_{S_k}}^2}{T-k}=100\times102\times(\frac{0.207^2}{100-1} + \frac{−0.013^2}{100-2}+ \frac{0.086^2}{100-3} + \frac{0.005^2}{100-4} + \frac{−0.022^2}{100-5}=5.09\]

Moving average process MA(q):

Properties:

Process invertibility \(MA(1) ~ AR(∞)\):

Autoregressive process AR(p)

Stationary condition:

Wold’s Decomposition Theorem

Example:

The Partial Autocorrelation Function

The PACF is useful for telling the difference between an AR process and an ARMA process.

In the case of an AR(p), there are direct connections between \(y_t\) and \(y_{t-s}\) only for s ≤ p. So for an AR(p), the theoretical PACF will be zero after lag p.

In the case of an MA(q), if it is invertible (roots of characteristic equations \(\theta(\text{z})=0\) lie outside the unit circle), it can be written as an AR(∞), so there are direct connections between \(y_t\) and all its previous values. For an MA(q), the theoretical PACF will be geometrically declining.

  • At lag 1: \(\tau_{11}=\tau_1\)
  • At lag 2: \(\tau_{22}=\frac{\tau_2-\tau_1^2}{1-\tau_1^2}\)

ARMA(p,q):

By combining the AR(p) and MA(q) models, an ARMA(p, q) model is obtained. Such a model states that the current value of some series y depends linearly on its own previous values plus a combination of current and previous values of a white noise error term. The model could be written

For \(\phi(L)=1-\phi_1L-\phi_2L^2-...-\phi_pL^p\)

And \(\theta(L)= 1+\theta_1L+\theta_2L^2+...+ \theta_qL^q\), we have \[\Phi(L)y_t=\mu+\theta(L)u_t\], or \[y_t=\mu + \phi_1y_{t-1}+\phi_2y_{t-2}+...+\phi_py_{t-p}+\theta_1u_{t-1}+\theta_2u_{t-2}+...+ \theta_qu_{t-q}+u_t\]

Stat Series: \(y_t\) is stat when: \[\left\{\begin{matrix} E(y_t)=\mu\\ Var(y_t)=\sigma^2<\infty\\ Cov(y_t, y_{t-s})=\delta_s \end{matrix}\right.\]

CLRM expect x affect y

Non stat series?

\(\Rightarrow\) Trend

\(\Rightarrow\) Seasonality

\(\Rightarrow\) Random walk

For 16% of case s, \(R^2\geq 0.5\)

y,x ind

y,x not stat

\[\Rightarrow TS=\frac{\hat{\beta}}{SE(\hat{\beta})}\] of case for 98% \(|TS|>2\)

Two types of non-stationary

  1. The random walk with drift \(y_t=p+y_{t-1}+\mu_t\)

  2. The deterministic trend process \(y_t=\alpha+\beta t+\mu_t\)

Ex: Check AR(1) \(y_t=y_{t-1}+\mu_t\)

CE \((1-L)y_t=\mu_t\)

Solve \(\Phi(z)=0\Leftrightarrow 1-z=0 \Leftrightarrow z=1\)

\(\Rightarrow\) Non-stationary since \(|z|<1\)

Use \(z_t=\Delta y_t=y_t-y_{t-1}=u_t\) stationary

Consider AR(1) process:

\(y_t=\Phi y_{t-1}+\mu_t= \Phi(\Phi y_{t-2}+u_{t-1})+u_t=\Phi^2y_{t-2}+\Phi u_{t-1}+ u_t\)

  • Case 1: Stationary \(\Phi <1\) $$

  • Case 2: Non-Stationary \(\Phi \geq 1\) \(y_t=y_0+\Sigma_{t=0} u_t\)

d: integrated of order \(y_t\sim I_d \rightarrow \Delta^d y_t\sim I_0\)

  • Series is stat \(\rightarrow y_t\sim I(0)\)

*\[y_t \sim I(t)\; \text{if}\;\left\{\begin{matrix} y_t \text{ is not stat}\\ \Delta y_t\sim I(0) \text{ stat} \end{matrix}\right.\]