Estimation and model selection for mixing graphical models

Magno Severino & Florencia Leonardi

October 26, 2023

Agenda

  • Introduction and motivation.
  • Graph Estimator.
  • Algorithms for estimation.
  • Applications to real data.
  • Directions for future work.

  • Process \(\mathbf{X}=\{X^{(i)}\colon -\infty < i < \infty\}\).

  • \(X^{(i)}=(X^{(i)}_{1}, \ldots, X^{(i)}_{d}),\) with set of vertices \(V = \{1, \dots, d\}.\)

  • \(X^{(i)}_{v} \in A\), \(A\) is a finite alphabet.

  • Process observed at time \(i \in \{1, \ldots, n\}.\)

  • Subscript indicates the vertex and superscript the time the observation was taken.

  • We assume that the process \(\mathbf{X}\) has an underlying graph \(G^*.\)

Motivation 1: Leonardi et al. (2023)

Consider these estimated neighborhoods for a set of 5 discrete random variables: \[ \mathrm{ne(X_1)=\{2,3\}}, \qquad \mathrm{ne(X_2)=\{3\}}, \qquad\mathrm{ne(X_3)=\{1,2,4,5\}}, \] \[ \mathrm{ne(X_4)=\{3\}}, \qquad \mathrm{ne(X_5)=\{3,4\}}. \]

Motivation 2: Leonardi et al. (2021)

  • Leonardi et al. (2021) presents a model selection criterion for independence within random vectors.

  • \(\mathbf{Y}=\{Y^{(i)}\}\) is multivariate stochastic process.

  • \(Y^{(i)}=(Y^{(i)}_{1}, \ldots, Y^{(i)}_{7}),\) with joint probability distribution \(p(y_1,\ldots,y_7).\)

  • The method decomposes the vector’s distribution function into independent blocks.

  • If the set of independence of \(\mathbf{Y}\) is \(U = \{3\}\), then \[p(y_1,\ldots,y_7) = p(y_1,y_2,y_3)p(y_4,y_5,y_6,y_7).\]

  • Therefore, \((Y_1,Y_2,Y_3) \perp (Y_4,Y_5,Y_6,Y_7).\)

Objectives of the research

  • Proposal: Method to estimate the graph of conditional dependencies for multivariate stochastic processes with mixing conditions.

  • Aims to overcome limitations of previous works:

    • Leonardi et al. (2023): estimator for iid data only
    • Leonardi et al. (2021): method assumes decomposition into subvectors with immediate neighbor dependencies.
  • Proposed solution: penalized pseudo-likelihood criterion for entire graph estimation for multivariate processes with mixing conditions.

  • Key advantages:

    • Handles non-iid data.
    • Provides a global estimation approach.

Mixing condition

Definition: For \(i<j\), let \(X^{(i:j)}\) denote the sequence of vectors \(X^{(i)}, X^{(i+1)}, \ldots, X^{(j)}.\) We say the process \(\mathbf{X} = \{X^{(i)}\colon -\infty < i < \infty\}\) satisfies a mixing condition with rate \(\{\psi(\ell)\}_{\ell \in \mathbb R}\) if for each \(k, m \in \mathbb N\) and each \(x^{(1:k)} \in (A^d)^k\), \(x^{(1:m)}\in (A^d)^m\) with \(\mathbb{P}(X^{(1:m)}=x^{(1:m)})>0,\) we have that \[ \Bigl| \mathbb{P} \bigl(X^{(n:(n+k-1))}=x^{(1:k)}\, |\, X^{(1:m)}=x^{(1:m)}\bigr) - \mathbb{P} \bigl(X^{(n:(n+k-1))} =x^{(1:k)}\bigr)\Bigr| \qquad\qquad \] \[ \qquad\qquad\qquad\qquad\qquad\qquad\qquad \leq \psi(n-m) \mathbb{P}\bigl(X^{(n:(n+k-1))}=x^{(1:k)}\bigr), \] for \(n\geq m+\ell.\)

Regularized graph estimator

We take a regularized pseudo maximum likelihood approach to estimate the graph \(G^*\), given a sample \(x^{(1)},\dotsc, x^{(n)}\) of the stochastic process.

The pseudo log likelihood estimator is given by \[ \log \widehat L(G) \;=\; \sum_{v \in V} \:\sum_{(a_v\in A)} \:\sum_{a_{G(v)}\in A^{|G(v)|}} N(a_v,a_{G(v)})\log \widehat\pi(a_v|a_{G(v)})\,, \] The conditional probabilities are estimated from the observed data \[ \hat\pi(a_v|a_{G(v)}) = \frac{N(a_v, a_{G(v)})}{N(a_{G(v)})}. \]

Therefore, the graph estimator is \(\widehat G = \underset{G}{\arg\max}\Big\{\log \widehat L(G) - \lambda_n \sum_{v \in V} |A|^{|G(v)|}\Big\}.\)

We prove the consistency of this estimator.

Algorithms for estimation

  • Exact algorithm
  • Simulated Annealing
  • Stepwise selection algorithm

Illustrative example

Forward stepwise algorithm




Backward stepwise algorithm




São Francisco river data

Directions for future work

  • Generalize the approach to handle both discrete and continuous random variables.
    • It would broaden the applicability of the estimator.
  • Adapt the model to estimate the direction of edges in graphs.
    • Address intricate dependencies among random variables.
    • Improve accuracy and insights in various applications.

Conclusion

  • Study motivated by Leonardi et al. (2021) and Leonardi et al. (2023).

  • Overcame limitations of previous works.

    • Handles non-iid data .
    • Provides a global estimation approach for the entire graph.
  • Proved consistency and convergence rate of the proposed estimator.

  • Proposed practical implementation methods: exact, simulated annealing, and stepwise algorithm.

  • Factors affecting estimation process: number of nodes, alphabet size, and graph structure.

References

  • Leonardi, F., Carvalho, R., and Frondana, I. (2023). Structure recovery for partially observed discrete markov random fields on graphs under not necessarily positive distributions. Scandinavian Journal of Statistics.

  • Leonardi, F., Lopez-Rosenfeld, M., Rodriguez, D., Severino, M. T. F., and Sued, M. (2021). Independent block identification in multivariate time series. Journal of Time Series Analysis, 42(1):19–33.

Acknowledgements



This work was produced as part of the activities of the Research, Innovation and Dissemination Center for Neuromathematics (grant FAPESP 2013/07699-0).

The authors also received financial support from CAPES and CNPq.