Overview

This document explains the derivation in Section 7.2.3 of a spatial-temporal statistics text, where the author simplifies expressions involving:

  • \(\Sigma_{12} H^{-1}\)
  • \(\Sigma_{12} H^{-1} \Sigma_{21}\)

for prediction at location \((s_0, t_0)\) using separable covariance matrices.


Notation

Symbol Meaning
\(\Sigma_s\) \(n \times n\) spatial covariance matrix, \((\Sigma_s)_{ij} = \sigma_s(s_i - s_j)\)
\(\Sigma_t\) \(T \times T\) temporal covariance matrix, \((\Sigma_t)_{km} = \sigma_t(t_k - t_m)\)
\(H\) \(\Sigma_s \otimes \Sigma_t\) (Kronecker product), the covariance of observed data
\(\Sigma_{21}\) \(nT \times 1\) column vector of covariances between observed data and prediction point
\(\Sigma_{12}\) \(1 \times nT\) row vector, equal to \(\Sigma_{21}^\top\)
\(s_0, t_0\) Prediction spatial location and time
\(t'\) or \(t_0\) Prediction time (used interchangeably)

The Joint Covariance Matrix

The full joint covariance (observed data + prediction point) is:

\[ \begin{bmatrix} 1 & \Sigma_{12} \\ \Sigma_{21} & H \end{bmatrix} \]

where:

  • \(\Sigma_{21} = \mathbf{v}_s \otimes \mathbf{v}_t\), with:
    • \(\mathbf{v}_s\): \(n \times 1\), \((\mathbf{v}_s)_i = \sigma_s(s_i - s_0)\)
    • \(\mathbf{v}_t\): \(T \times 1\), \((\mathbf{v}_t)_k = \sigma_t(t_k - t_0)\)

Step 1: Simplifying \(\Sigma_{12} H^{-1}\)

Since \(H = \Sigma_s \otimes \Sigma_t\), its inverse is:

\[ H^{-1} = \Sigma_s^{-1} \otimes \Sigma_t^{-1} \]

Let \(b_{jk}(s_0, t_0)\) be the \((j,k)\)-th entry of the row vector \(\Sigma_{12} H^{-1}\), where:

  • \(j = 1,\dots,n\) (spatial index)
  • \(k = 1,\dots,T\) (temporal index)

Then:

\[ b_{jk}(s_0, t_0) = \sum_{i=1}^n \sum_{m=1}^T \sigma_s(s_i - s_0) \sigma_t(t_m - t_0) (\Sigma_s^{-1})_{ij} (\Sigma_t^{-1})_{mk} \]

Factor the sums:

\[ b_{jk}(s_0, t_0) = \underbrace{\left[ \sum_{i=1}^n \sigma_s(s_i - s_0) (\Sigma_s^{-1})_{ij} \right]}_{=: b_s(j, s_0)} \cdot \underbrace{\left[ \sum_{m=1}^T \sigma_t(t_m - t_0) (\Sigma_t^{-1})_{mk} \right]}_{=: b_t(k, t_0)} \]

Thus:

\[ \boxed{b_{jk}(s_0, t_0) = b_s(j, s_0) \cdot b_t(k, t_0)} \]


Step 2: Simplifying \(b_t(k, t_0)\)

We now focus on:

\[ b_t(k, t_0) = \sum_{m=1}^T \sigma_t(t_m - t_0) (\Sigma_t^{-1})_{mk} \]

Case 1: \(t_0 \le T\) (interpolation)

When \(t_0\) is one of the observed times, \(\sigma_t(t_m - t_0) = (\Sigma_t)_{m, t_0}\). Therefore:

\[ b_t(k, t_0) = \sum_{m=1}^T (\Sigma_t)_{m, t_0} (\Sigma_t^{-1})_{mk} = \delta_{k, t_0} \]

where \(\delta_{i,j} = 1\) if \(i=j\), else \(0\).

Case 2: \(t_0 > T\) (forecasting)

Now \(t_0\) is beyond all observed times. We cannot directly use \(\Sigma_t\). The author introduces a key property of the exponential covariance function.

The Exponential Covariance Function

For exponential covariance:

\[ \sigma_t(\tau) = \exp\left(-\frac{|\tau|}{\phi}\right) \]

The Factorization Property (Equation 7.9)

For \(t_0 > T\) and \(m = 1,\dots,T\):

\[ \sigma_t(t_m - t_0) = \sigma_t(t_0 - T) \cdot \sigma_t(T - t_m) \]

Verification:

  • Since \(t_0 > T \ge t_m\), we have \(t_0 - T > 0\) and \(T - t_m \ge 0\)
  • Also \(t_m - t_0 < 0\), so \(|t_m - t_0| = t_0 - t_m\)
  • Note: \(t_0 - t_m = (t_0 - T) + (T - t_m)\)

Therefore:

\[ \begin{aligned} \sigma_t(t_m - t_0) &= \exp\left(-\frac{t_0 - t_m}{\phi}\right) \\ &= \exp\left(-\frac{t_0 - T}{\phi}\right) \cdot \exp\left(-\frac{T - t_m}{\phi}\right) \\ &= \sigma_t(t_0 - T) \cdot \sigma_t(T - t_m) \end{aligned} \]

Substituting into \(b_t(k, t_0)\)

\[ \begin{aligned} b_t(k, t_0) &= \sum_{m=1}^T \left[ \sigma_t(t_0 - T) \cdot \sigma_t(T - t_m) \right] (\Sigma_t^{-1})_{mk} \\ &= \sigma_t(t_0 - T) \sum_{m=1}^T \sigma_t(T - t_m) (\Sigma_t^{-1})_{mk} \end{aligned} \]

Recognizing \(\sigma_t(T - t_m)\) as a column of \(\Sigma_t\)

Assuming \(t_T = T\) (observed times are \(1, 2, \dots, T\)):

\[ (\Sigma_t)_{T, m} = \sigma_t(t_T - t_m) = \sigma_t(T - t_m) \]

Thus:

\[ b_t(k, t_0) = \sigma_t(t_0 - T) \sum_{m=1}^T (\Sigma_t)_{T, m} (\Sigma_t^{-1})_{mk} \]

Using the Matrix Inverse Property

By definition of the inverse matrix:

\[ \sum_{m=1}^T (\Sigma_t)_{T, m} (\Sigma_t^{-1})_{mk} = \delta_{T, k} \]

This equals \(1\) if \(k = T\), and \(0\) otherwise.

Final Result for \(t_0 > T\)

\[ b_t(k, t_0) = \sigma_t(t_0 - T) \cdot \delta_{k, T} \]

Summary for \(b_t(k, t_0)\)

\[ \boxed{b_t(k, t_0) = \begin{cases} \delta_{k, t_0}, & t_0 \le T \\[6pt] \delta_{k, T} \cdot \sigma_t(t_0 - T), & t_0 > T \end{cases}} \]


Step 3: Applying to \(\Sigma_{12} H^{-1} a\)

Let \(a\) be any \(nT \times 1\) vector with entries \(a_{jk}\).

\[ \Sigma_{12} H^{-1} a = \sum_{j=1}^n \sum_{k=1}^T b_{jk}(s_0, t_0) a_{jk} = \sum_{j=1}^n b_s(j, s_0) \sum_{k=1}^T a_{jk} b_t(k, t_0) \]

Using the \(b_t\) result:

  • If \(t_0 \le T\): \(\sum_{k=1}^T a_{jk} b_t(k, t_0) = a_{j, t_0}\)
  • If \(t_0 > T\): \(\sum_{k=1}^T a_{jk} b_t(k, t_0) = a_{j, T} \cdot \sigma_t(t_0 - T)\)

Therefore:

\[ \boxed{\Sigma_{12} H^{-1} a = \begin{cases} \sum_{j=1}^n b_s(j, s_0) a_{j, t_0}, & t_0 \le T \\[6pt] \sigma_t(t_0 - T) \sum_{j=1}^n b_s(j, s_0) a_{j, T}, & t_0 > T \end{cases}} \]


Step 4: Simplifying \(\Sigma_{12} H^{-1} \Sigma_{21}\)

Now set \(a = \Sigma_{21}\), so \(a_{jk} = \sigma_s(s_j - s_0) \sigma_t(t_k - t_0)\).

Case \(t_0 \le T\):

\[ \begin{aligned} \Sigma_{12} H^{-1} \Sigma_{21} &= \sum_{j=1}^n b_s(j, s_0) \cdot \sigma_s(s_j - s_0) \cdot \sigma_t(t_0 - t_0) \\ &= \sum_{j=1}^n b_s(j, s_0) \sigma_s(s_j - s_0) \quad (\text{since } \sigma_t(0)=1) \end{aligned} \]

Define:

\[ a_s(s_0) := \sum_{i=1}^n \sum_{j=1}^n \sigma_s(s_i - s_0) (\Sigma_s^{-1})_{ij} \sigma_s(s_j - s_0) \]

Then:

\[ \Sigma_{12} H^{-1} \Sigma_{21} = a_s(s_0) \]

Case \(t_0 > T\):

\[ \begin{aligned} \Sigma_{12} H^{-1} \Sigma_{21} &= \sigma_t(t_0 - T) \sum_{j=1}^n b_s(j, s_0) \cdot \sigma_s(s_j - s_0) \cdot \sigma_t(T - t_0) \\ &= \sigma_t(t_0 - T) \cdot \sigma_t(T - t_0) \cdot \sum_{j=1}^n b_s(j, s_0) \sigma_s(s_j - s_0) \end{aligned} \]

By stationarity, \(\sigma_t(T - t_0) = \sigma_t(t_0 - T)\). Thus:

\[ \Sigma_{12} H^{-1} \Sigma_{21} = a_s(s_0) \cdot [\sigma_t(t_0 - T)]^2 \]


Step 5: Conditional Variance

The conditional variance of \(y(s_0, t_0)\) given observed data is:

\[ \delta^2(s_0, t_0) = 1 - \Sigma_{12} H^{-1} \Sigma_{21} \]

Therefore:

\[ \boxed{\delta^2(s_0, t_0) = 1 - a_s(s_0) \cdot a_t(t_0)} \]

where:

\[ a_t(t_0) = \begin{cases} 1, & t_0 \le T \\[4pt] [\sigma_t(t_0 - T)]^2, & t_0 > T \end{cases} \]


Summary Table

Quantity Expression
\(b_{jk}(s_0, t_0)\) \(b_s(j, s_0) \cdot b_t(k, t_0)\)
\(b_t(k, t_0)\) ( \(t_0 \le T\) ) \(\delta_{k, t_0}\)
\(b_t(k, t_0)\) ( \(t_0 > T\) ) \(\delta_{k, T} \cdot \sigma_t(t_0 - T)\)
\(\Sigma_{12} H^{-1} \Sigma_{21}\) ( \(t_0 \le T\) ) \(a_s(s_0)\)
\(\Sigma_{12} H^{-1} \Sigma_{21}\) ( \(t_0 > T\) ) \(a_s(s_0) \cdot [\sigma_t(t_0 - T)]^2\)
Conditional variance \(1 - a_s(s_0) a_t(t_0)\)

Key Insights

  1. Separability (Kronecker product) allows spatial and temporal computations to factor.
  2. For exponential covariance, the covariance between a future time \(t_0\) and observed times factors as \(\sigma_t(t_0 - T) \cdot \sigma_t(T - t_m)\).
  3. When predicting at an observed time (\(t_0 \le T\)), only data from that same time receive nonzero weight.
  4. When forecasting (\(t_0 > T\)), only data from the last observed time \(t_T\) receive nonzero weight, scaled by \(\sigma_t(t_0 - T)\).
  5. These simplifications are used in Chapter 9 for forecasting.