Magno
Severino
Área: 1.a. Inferência para Processos Estocásticos.
Área correlata: 3.a. Aprendizagem Estatística e Ciência de Dados.
Introduction
Theorethical Background
Research Proposal
Viability
Previous research (Severino, 2025, Stochastic Processes and their Applications):
Research proposals:
\(\bf X^{(i)} = \big(X_1^{(i)}, X_2^{(i)}, \dots, X_d^{(i)}\big) \in A^d,\) where \(A\) a finite alphabet.
Invariant distribution \(\pi,\) probability space \(\big((A^d)^\mathbb{N}, \mathcal{F}, \mathbb{P}\big)\)
Process \(\bf X = \{X^{(i)}: i \in\mathbb{N}\}\) is stationary and ergodic.
We assume the process \(\bf X\) has an underlying graph \(G^* = (V, E^*)\) encoding the conditional dependencies in \(\pi\).
\(X^{(i:j)}\) denote the sequence of vectors \(X^{(i)}, X^{(i+1)}, \ldots, X^{(j)}.\)
\(\bf X = \{X^{(i)}\colon -\infty < i < \infty\}\) satisfies a mixing condition with rate \(\{\psi(\ell)\}_{\ell \in \mathbb R}\) if \[\begin{equation*} \begin{split} \Bigl| \mathbb P \bigl(X^{(n:(n+k-1))}=x^{(1:k)}\, |\, X^{(1:m)}=x^{(1:m)}&\bigr) - \mathbb P \bigl( X^{(n:(n+k-1))} =x^{(1:k)}\bigr)\Bigr| \\ &\leq \psi(n-m) \mathbb P\bigl(X^{(n:(n+k-1))}=x^{(1:k)}\bigr), \end{split} \end{equation*}\] for \(n\geq m+\ell\) and for each \(k, m \in \mathbb N\) and each \(x^{(1:k)} \in (A^d)^k\), \(x^{(1:m)}\in (A^d)^m\) with \(\mathbb P(X^{(1:m)}=x^{(1:m)})>0.\)
Given a process \(\bf X\) \(\in \{a,b\}^3\) and the sample of size \(5\) below. Then
\(X_1\) | \(X_2\) | \(X_3\) |
---|---|---|
\(b\) | \(a\) | \(a\) |
\(a\) | \(b\) | \(b\) |
\(b\) | \(b\) | \(a\) |
\(a\) | \(a\) | \(a\) |
\(b\) | \(a\) | \(a\) |
\[ \widehat{\pi}\big(\{X_1=a\}\big) = \frac{2}{5}, \qquad \widehat{\pi}\big(\{X_1=a, X_3=a\}\big) = \frac{1}{5}, \] \[ \widehat{\pi}\big(\{X_1=b\}\big\vert\{X_2=a, X_3=a\}\big) = \frac{2}{3}. \] as \(\hat\pi(\{X_2=a, X_3=a\})>0.\)
Formally, assume we observe a sample of size \(n\) of the process, denoted by \(\{x^{(i)}\colon i=1,\dots,n\}\). Then, for any \(W\subset V\) and any \(a_W\in A^{W}\), \[\begin{equation*} \widehat{\pi}(a_W) = \frac{N(a_W)}{n}; \quad \quad \widehat\pi(a_{W'}|a_{W}) = \frac{\widehat\pi(a_{W'\cup W})}{\widehat\pi(a_{W})}\,, \end{equation*}\] for \(\:\widehat\pi(a_W)>0,\) two disjoint subsets \(W,W'\subset V,\) and configurations \(a_W\in A^W, a_{W'}\in A^{W'}\).
Given a graph \(G=(V,E)\) and \(v \in V,\) define \[G(v) = \big\{u \in V: (u,v) \in E \big\},\] the set of neighbors of \(v\) in graph \(G.\)
For \(v=X_1,\) then \(G(v)=\{X_2, X_3\}.\)
Then \[\begin{equation*} \widehat\pi(a_v|a_{G(v)}) = \frac{\widehat\pi(a_{\{v\}\cup G(v)})}{\widehat\pi(a_{G(v)})}. \end{equation*}\]
Given any graph \(G=(V,E)\) and a sample of the process, we define the log-pseudo-likelihood function by \[\begin{equation} \log L(G) \;=\; \sum_{v \in V} \:\sum_{(a_v\in A)} \:\sum_{a_{G(v)}\in A^{|G(v)|}} N(a_v,a_{G(v)})\log \pi (a_v \vert a_{G(v)}). \end{equation}\]
As the conditional probabilities \(\pi\) are not known, we can estimate them from the data \[\begin{equation} \log \widehat L(G) \;=\; \sum_{v \in V} \:\sum_{(a_v\in A)} \:\sum_{a_{G(v)}\in A^{|G(v)|}} N(a_v,a_{G(v)})\log \widehat \pi (a_v \vert a_{G(v)}). \end{equation}\]
Theorem (Severino, 2025, Stochastic Processes and their Applications):
Let \(\{X^{(i)}: i \in \mathbb{N}\}\) be a stationary process that satisfies the mixing condition presented before with rate \(\psi(\ell) = O(1/\ell^{1+\epsilon})\) for some \(\epsilon>0.\)
Then, by taking \(\lambda_n = c \log n,\) for \(c>0,\) we have that \[\begin{equation*} \widehat G = \underset{G}{\arg\max}\Big\{\log \widehat L(G) - \lambda_n \sum_{v \in V} |A|^{|G(v)|}\Big\} \end{equation*}\] satisfies \(\widehat G=G^*\) eventually almost surely as \(n\to \infty.\)
Combine Proposal 1 (countable infinite graphs) and Proposal 2 (continuous data).
Develop a unified method for estimating MRFs with continuous variables and countable infinite vertices.
Provide a versatile tool for modeling complex stochastic processes.
Minimal financial costs.
Theoretical development does not require any additional resources.
Simulation and real data application of the developed algorithms will be conducted using existing computer server resources at the department.
Anticipated expenses: presenting the research at international scientific conferences.
Severino, M. T. F., & Leonardi, F. (2025). Model selection for Markov random fields on graphs under a mixing condition. Stochastic Processes and their Applications.
Leonardi, F., Lopez-Rosenfeld, M., Rodriguez, D., Severino, M. T. F., & Sued, M. (2021). Independent block identification in multivariate time series. Journal of Time Series Analysis.
Leonardi, F., Carvalho, R., & Frondana, I. (2023). Structure recovery for partially observed discrete Markov random fields on graphs under not necessarily positive distributions. Scandinavian Journal of Statistics.
Lauritzen, S. L. (1996). Graphical models (Vol. 17). Clarendon Press.
Oodaira, H., & Yoshihara, K. I. (1971). The law of the iterated logarithm for stationary processes satisfying mixing conditions. Kodai Mathematical Seminar Reports.
Based on the Law of the Iterated Logarithm for stationary polynomial mixing processes proved in Oodaira and Yoshihara (1971), we can derive the rate of convergence of the empirical probabilities to the true probabilities of the process.
Proposition 1 (Typicality): Assume the process \(\{X^{(i)}\colon i \in \mathbb{Z}\}\) satisfies the mixing condition with rate \(\psi(\ell) = O(1/\ell^{1+\epsilon}),\) for some \(\epsilon>0.\) Then, for any \(W \subset V\) and \(\delta > 0,\) \[\begin{equation*} \Big\vert \widehat{\pi}(a_W) - \pi(a_W) \Big\vert < \sqrt{\frac{\delta\log n}{n}}, \end{equation*}\] eventually almost surely as \(n \rightarrow \infty.\)
Proposition 2 (Conditional typicality): Then for any \(\delta > 0\), any disjoint sets \(W, W' \subset V\) and any \(a_W \in A^{W}\) and \(a_{W'} \in A^{{W'}}\) we have that \[\begin{equation*} \Big\vert \widehat\pi(a_W|a_{W'}) - \pi(a_W|a_{W'}) \Big\vert < \sqrt{\frac{\delta\log n}{N(a_W)}}, \end{equation*}\] eventually almost surely as \(n \rightarrow \infty.\)
rview of Theorem Proof {.smaller visibility=“uncounted”}
Consider \(\{\widehat G\neq G^*\} = \big\{G^* \subsetneq \widehat G \big\} \cup \big\{G^* \not\subset \widehat G \big\}.\)
We prove that, eventually almost surely as \(n\to\infty,\) neither of the cases above can happen, which implies that \(\widehat G = G^*.\)
Define
Focus: determine the maximal value of \(H(\cdot)\) and identify the argument at which this maximum occurs.
Let \(H(G) = \log \widehat L(G) - \lambda_n \sum_{v \in V} |A|^{|G(v)|}.\)
Define \[\mathcal{G} = \Big\{ G = (V, E): E \subseteq \{V\times V\} \setminus \{(v,v):v \in V\}\Big\}.\]
Set \[\widehat G = \arg\max_{G \:\in\: \mathcal{G}} H(G).\]
Drawback: computational complexity
\(\qquad |\mathcal{G}| = 2^{{d(d-1)}/{2}},\)
\(\qquad |V|=d.\)
Starts with an empty graph.
Adds edges one at a time, selecting the edge that maximizes improvement in fit.
Stops when no further enhancement in fit is achieved.
Begins with a complete graph.
Removes edges one at a time, selecting the edge that maximizes improvement in fit.
Stops when no further enhancement in fit is achieved.
\(H(G) = \log \widehat L(G) - \lambda_n \sum_{v \in V} |A|^{|G(v)|}.\)
Definition of a neighbor of a graph.
Objective: Assess the performance of the proposed algorithms via simulation studies.
Scenario 1: Fixed True Graph
Scenario 2: Random Graphs with Varying Edge Number
Consider \(\mathbf{X} = (X_1, \dots, X_5).\)
Joint probability function of these variables: \[ p(x_1,x_2,x_3,x_4,x_5) = p(x_3) p(x_1|x_3) p(x_2|x_1,x_3) p(x_4|x_3) p(x_5|x_3). \]
Aim: assess the performance of the algorithms for estimation.
Data generation: Gibbs Sampler algorithm.
Effects of sample size and penalizing constant value. Here, \(\lambda_N = c\log N.\)
Effects of sample size and penalizing constant value. Here, \(\lambda_N = c\log N.\)
Effects of sample size and penalizing constant value. Here, \(\lambda_N = c\log N.\)
Effects of sample size and penalizing constant value. Here, \(\lambda_N = c\log N.\)
Objective: generate several samples and assess the performance of the algorithms.
Metrics: underestimation error (\(ue\)), overestimation error (\(oe\)), and total error (\(te\)).
\(ue(G, \hat G) = \frac{\sum_{(v, w)}\mathbf{1}\{(v, w)\in E \text{ and } (v,w) \not\in \hat E\}}{\sum_{(v, w)}\mathbf{1}\{(v, w) \in E\}},\)
\(oe(G, \hat G) = \frac{\sum_{(v, w)}\mathbf{1}\{(v, w)\not\in E \text{ and } (v,w) \in \hat E\}}{\sum_{(v, w)}\mathbf{1}\{(v, w) \not\in E\}},\)
\(\qquad \qquad te(G, \hat G) = \frac{ue \sum_{(v, w)}\mathbf{1}\{(v, w) \not\in E\} + oe \sum_{(v, w)}\mathbf{1}\{(v, w) \in E\}}{|V|(|V|-1)/2}.\)
Utilize \(k\)-fold cross-validation to assess model performance and select optimal penalizing constant values.
Compute pseudo-log-likelihood over validation sets for different constant values. \[\begin{equation} \mathrm{CV}_k^{(i)}(c) = \sum_{v \in V} \:\sum_{(a_v\in A)} \:\sum_{a_{\hat G(v)}\in A^{|\hat G(v)|}} N_i(a_v,a_{\widehat G(v)})\log \widehat\pi(a_v|a_{\widehat G(v)})\,. \end{equation}\]
\(5\)-fold cross-validation error for Example 11, considering a sample of size \(5{,}000\) and set of values for penalizing constant.
Water volume measured at \(d\) stations along the river’s course, denoted as \(X_{u}\), where \(u = 1, \ldots, d\).
Vector \(\mathbf{X}\) observed at discrete time intervals (10-day mean), from January 1977 to December 2016.
Process \(\mathbf{X}^n=\{\mathbf{X}^{(i)}: 1 \leq i \leq n\}\), \(\mathbf{X}^{(i)}= (X_{1}^{(i)}, \ldots, X_{d}^{(i)})\).
Forward stepwise algorithm and a \(5\)-fold cross-validation approach below.
Results similar to Leonardi et al (2021).
Estimated graph, considering the penalizing constant value chosen by cross-validation.