The Econometric Analysis of Data Aggreagation / Mixed Frequency Data

Definition

Mi(xed) Da(ta) S(ampling) regression (henceforth MIDAS regression) construct regressions combining data with different sampling frequencies

Motivation

A dilemma faced by forecasters is that data are not all sampled at the same frequency.

Example

Regression combining monthly and quarterly data. Such as exploiting high frequency financial data to predict low frequency macro data.

Models of stock market volatility
- low frequency variable: quadratic variation
- high frequency data: past market information
Macroeconomic data
- sampled monthly: price series and monetary aggregates
- sampled quarterly or annually: real activity series like GDP
News impact on the stock market
- low frequency event: macro and corporate news
- high frequency: individual stock returns

Common solutions to the mixed-frequency problem

Temporal aggregation issue

The mathematical structure commonly assumes that the underlying stochastic processes evolve in continuous time and data are collected at equi-distant discrete points in time.
TIME AGGREGATION :
- Pros: admittedly easier
- Cons: many, including
  - Loss of (past) information
  - Shocks, impulse and response mechanics mis-specified
  - No scope for real-time updating (so called Â’now-castingÂ’)
  - Econometric issues of bias and inefficiencies.
- Making use of the disaggregate information, under general conditions, is theoretically preferable, since it can lead to more efficiency.
Time Averaging
- simple averaging is the most common method
Step Weighting
- use (normalized) weighting function: One might have the prior belief, for example, that more weight should be given to the samples of X that are more contemporaneous to the observed Y.
- lead to parameter proliferation
- lead to overfitting
MIDAS :
- employ (exogenously chosen) distributed lag polynomials as weighting functions.
- Benifit:
  - preserves the timing information
  - achieve flexibility while maintaining parsimony

Explicitly modeling the flow of data (e.g., using mixed data sampling) may be more beneficial to the forecaster, especially if the forecaster is interested in constructing intra-period forecasts.

Features

MIDAS involve regressors with different sampling frequencies
- Similar to distributed lag models \[Y_t = \beta_0 + B(L)X_t +\epsilon_t\] Where \(B(L)\) are some finite or infinite lag polynomial operator.

Augmented distributed lag functions

When the difference in sampling frequencies between the regressand and the regressors is large, distributed lag functions are typically employed to model dynamics avoiding parameter proliferation

Distributed Lag Models

A distributed lag model is a model for time series data in which a linear regressionÂ—regression equation is used to predict current values of a dependent variable based on both the current values of an explanatory variable and the lagged values of this explanatory variable.
finite \[y_t+1^Q = \mu+ \beta_1 x_t^Q + \beta_2 x_{t-1}^Q + \beta_3 x_{t-2}^Q +…+u_{t+1}\]
infinite

\[y_t+1^Q = \mu+ \beta_1 x_t^Q + \beta_2 x_{t-1}^Q + \beta_3 x_{t-2}^Q +…+ \beta_n x_{t-n+1}^Q + u_{t+1}\]

Distributed Lag Models - Unrestricted estimation

The simplest way to estimate parameters by OLS, assuming a fixed maximum lag. However, multicollinearity among the lagged regressors often arises, leading to high variance of the coefficient estimates.
The most important finite distributed lag model is due to Almon (1965). The Almon lag assumes that \(n + 1\) lag weights are related to \(P + 1\) linearly estimable underlying parameters \((n < k)\) according to \[\beta_i = \sum_{j=0}^P \theta_j i^j\] where \(i = 0, ..., n.\)

Autoregressive Distributed Lag Models

Autoregressive Distributed Lag Model - \(ADL(P_Y ,P_X )\) \[y_t+1^Q = \mu+ \alpha_1 y_t^Q+ \alpha_P y_{t-P_Y}^Q + \beta_1 x_t^Q + \beta_2 x_{t-1}^Q + \beta_3 x_{t-2}^Q +…+ \beta_n x_{t-P_X+1}^Q + u_{t+1}\]

Original univariate MIDAS regression model

\[y_t = \alpha + \beta midas^K (\theta) x_t^k + \epsilon_t\]

where \(midas^K\) : smoothing the \(K\) past value of the covariate \(x_t\) by using a functional polynomial: \[midas^K (\theta) x_t^k := \sum_{k=1}^K \frac {f_K (k,\theta)}{\sum_{l=1}^K f_K (l,\theta)} x_{t-(k-1)/k}^k \]

An extended MIDAS regression

\(y_t^Q\) quarterly sampled stationary variable that we aim at predicting,
\(x_t^M\) a vector of \(N_M\) stationary monthly variables,
\(x_t^D\) a vector of \(N_D\) stationary daily variables.
extended MIDAS model enabling the mixing of daily and monthly information: \[y_t^Q = \alpha + \phi y_{t-1}^Q + \sum_{i=1}^{N_M} (\gamma_j) midas^{K_D} (\omega_j) x_{t,j}^M +\sum_{i=1}^{N_D} (\beta_i) midas^{K_D} (\theta_i) x_{t,i}^D + \epsilon_t\]

Parameterization

There are several possible parameterizations of the MIDAS polynomial weights including, for example, the U-MIDAS (unrestricted MIDAS polynomial), normalized Beta probability density function, normalized exponential Almon lag polynomial, and polynomial specification with step functions.

The exponential Almon function \(\gamma\) : \[f_K (k,\theta) = \gamma_K(k,\theta_1,\theta_2) = exp(\theta_1k +\theta_2k^2)\]
The “Beta-Lag” function \(\Xi\), based on the Beta function: \[f_K (k,\theta) = \Xi_K(k,\theta_1,\theta_2) = (\frac{k}{K})^{\theta_1 -1} (1-\frac{k}{K})^{\theta_2 -1} \Gamma(\theta_1 +\theta_2)/(\Gamma(\theta_1)\Gamma(\theta_2))\],
The restricted “Beta-Lag” function \(\Psi\), with \(\theta_1 = 1\) and \(\theta_2 > 1\), of the form: \[f_K (k,\theta) = \Psi_K(k,\theta) = \theta (1-\frac{k}{K})^{\theta -1}\]

AR-MIDAS

U-MIDAS: MIDAS regressions with unrestricted lag polynomials

FAMIDAS : Dynamic factor MIDAS

The choice of indicators
Different methods to tackle with the “curse of dimensionality
- factor models [Giannone et al., 2008]
- bridge models [Barhoumi et al., 2012]
- ridge regression models [Exterkate et al., 2011]
- LASSO/LARS [Tibshirani, 1996]
- bayesian shrinkage [De Mol et al., 2008]
In the context of MIDAS regression, the dynamic factor MIDAS (FAMIDAS):[Frale and Monteforte, 2011].

MS-U-MIDAS : Markov-switching MIDAS model

MIDAS regression R

install.packages("midasr")
library(midasr)
# or
# install.packages("devtools")
library(devtools)
install_github("midasr","mpiktas")

The midasr R package provides econometric methods for working with mixed frequency data. The package provides tools for estimating time series MIDAS regression, where response and explanatory variables are of different frequency, e.g. quarterly vs monthly. The fitted regression model can be tested for adequacy and then used for forecasting.

Regional Analysis and Aggregation

From a forecasting point of view, the pooling of disaggregated forecasts should help reduce the variance of the forecast errors, because the potential heterogeneity can be better captured.
improvements in forecast accuracy
the use of pooled region-specific forecasts improves upon the forecasting performance of countrywide GDP