An Introduction to BME

Kinspride Duah

2024-06-16

Introduction

The bme package is used make spatial interpolations for unobserved locations using hard and soft-interval data. This vignette provides an overview of basic features in bme. We load bme by running

library(bme.kd)

If you use bme in a formal publication or report, please cite it. Citing bme lets us devote more resources to it in the future. We view the bme citation by running

citation(package = "bme.kd")

The Data

Environmental data are often imprecise due to various limitations and uncertainties in the measuring process. As a result, they often consist of a combination of both precise and imprecise information, referred to as hard and soft data, respectively. Often in practice, soft data are characterized as intervals as a simple form to properly preserve the underlying imprecision. All the data sets we use in this vignette are data frames objects. The data sets comprise of reliability-targeted snow loads in California, Colorado and Idaho. Each data set has 9 columns: station, latitude, longitude, elevation, center, lower, upper, eco3} and type. The column description are given below:

Sources consist of measurements of snow depth (SNWD) and snow load, which are measured as the water equivalent of snow on the ground (WESD). WESD is precise (hard data) whilst SNWD is is imprecise (soft data). Here you can load and view the california data.

# Load the example dataset
data("california_data", package = "bme.kd")

# View the dataset
head(california_data)

Bayesian Maximum Entropy (BME)

Bayesian maximum entropy (BME; Christakos (1990)) is a spatiotemporal random field (STRF) prediction method that incorporates knowledge and data from various sources and of varying precision, and combines them systematically to achieve improved prediction accuracy. The BME approach combines the entropy theory and operational Bayesian statistics to create a scientific mathematical framework for analysis and spatial/spatiotemporal mapping (Christakos, 1990). The entropy theory allows for integrating the general and specification knowledge bases. The general knowledge relies on background knowledge and the expertise of others. On the other hand, the specification knowledge is gained through personal experience with the particular situation, including empirical observations and site-specific data (Christakos, 1998). This feature is particularly advantageous in data-driven analytical environments as it allows for the utilization of general knowledge bases that might otherwise be overlooked.

One of the key benefits of BME is its ability to utilize measured data beyond single-value observations (Christakos, 2000) by distinguishing between hard and soft data. Accurate measurements obtained from real-time observation devices, numerical simulation, etc., are considered hard data, while uncertain measurements expressed in terms of interval values, probability statements, empirical charts, among others, are classified as soft data. Soft data can result from various sources of uncertainty such as measurement inaccuracies, modeling uncertainties, and human error. The BME framework enables rigorous incorporation of both hard and soft data types in the interpolation process to maximize the information input and create a most inclusive prediction (Christakos, 1998).

The Bayesian maximum entropy (BME) for spatiotemporal interpolation was introduced by Christakos (1990). The method aims to create a systematic framework for organizing scientific information of multiple types into the general \((\mathcal{G})\) and specification \((\mathcal{S})\) knowledge bases, which are then combined into the total physical knowledge \(\mathcal{K}\), thus, \(\mathcal{K} = \mathcal{G} \cup \mathcal{S}\). By utilizing spatiotemporal mapping and the appropriate \(\mathcal{K}\), one can gain a broader and more complete understanding of a natural occurrence or situation. This approach leads to the three main stages of knowledge acquisition, processing, and integration: a prior stage, pre-posterior stage, and posterior stage (Christakos, 1998).

Let \(\boldsymbol{Z}_{map} = \{\boldsymbol{Z}_h, \boldsymbol{Z}_s, Z_k\}\) denote the SRF of the complete set of hard data locations \(\mathcal{H}\), soft data locations \(\mathcal{S}\), and an arbitrary unmeasured location \(\boldsymbol{x}_k\). Let \(\boldsymbol{z}_{map} = \{\boldsymbol{z}_h, \boldsymbol{z}_s, z_k \}\) be a realization of \(\boldsymbol{Z}_{map}\). We are interested in predicting \(z_k\) based on the data \(\boldsymbol{z}=\{\boldsymbol{z}_h, \boldsymbol{z}_s\}\).

Prior stage

The goal of the prior stage is to maximize information relative to the general knowledge \(\mathcal{G}\) (Serre, 1999), which begins with a set of assumptions and constraints. Equation (2) represents the mathematical formulation of the prior knowledge in terms of the constraints \[\begin{equation}\label{eqn:E2} \bar{g}_\beta = \int g_\beta\left(\boldsymbol{z}_{map}\right) f_{\mathcal{G}}\left(\boldsymbol{z}_{map}\right) d\boldsymbol{z}_{map},\ \ \beta = 1, 2, \dots, N_{\beta}. \end{equation}\] , N_{}. \end{equation} Here, \(g_{\beta}\) are a set of known functions of \(\boldsymbol{z}_{map}\), and \(f_{\mathcal{G}} \left(\boldsymbol{z}_{map}\right)\) is the prior pdf of \(\boldsymbol{Z}_{map}\) that represents our knowledge before observing the data. The expectations \(\bar{g}_{\beta}\) provide the spatial moments of interest. There are a series of popular choices regarding the form of \(g_{\alpha}\) to represent the prior knowledge. For example, (3) and (4) represent the means, and covariances, where \(|\mathcal{H}|\) and \(|\mathcal{S}|\) are the sizes of hard and soft data, respectively. \[\begin{equation}\label{eqn:cons_mean} g_{\alpha}\left(\boldsymbol{z}_{map}\right) =\boldsymbol{z}_{map,\alpha},\ \ \alpha=1,\cdots,|\mathscr{H}|+|\mathscr{S}|+1, \end{equation}\]s,||+||+1, \end{equation} \[\begin{eqnarray}\label{eqn:cons_cov} g_{\alpha}(\boldsymbol{z}_{map})= \left(\boldsymbol{z}_{map,\alpha}-\bar{\boldsymbol{z}}_{map,\alpha}\right) \left(\boldsymbol{z}_{map,\alpha'}-\bar{\boldsymbol{z}}_{map,\alpha'}\right),\ \ \alpha, \alpha' =1,\cdots,|\mathscr{H}|+|\mathscr{S}|+1, \end{eqnarray}\]athscr{H}|+||+1, \end{eqnarray} Higher spatial orders, if desired, can be formulated similarly. Shannon’s information entropy is used as a measure of uncertainty contained in the prior pdf \(f_{\mathcal{G}}\), which is expressed as
\[\begin{equation}\label{eqn:shannon} \mathbb{E}\left[\text{Info}_\mathcal{G}(\boldsymbol{Z}_{map})\right] = - \int \log\left[{f_\mathcal{G} (\boldsymbol{z}_{map})}\right] \hspace{.2em} f_\mathcal{G} (\boldsymbol{z}_{map}) \hspace{.2em} d \boldsymbol{z}_{map}. \end{equation}\] In general, a more concentrated pdf has lower entropy. To determine the shape of the prior pdf, a process is carried out that maximizes the entropy function (\(\ref{eqn:shannon}\)) subject to the constraints specified in (2) that represent the prior knowledge . In particular, if the constrains are restricted to the first two spatial moments (3)-(4), the \(f_\mathcal{G}\) that maximizes the Shannon’s entropy (\(\ref{eqn:shannon}\)) is the multivariate Gaussian distribution with mean \(\overline{\boldsymbol{z}}_{map}\) and covariance matrix \(\textbf{C}_{map}\) . In our study, since the data is the regression residual and has zero mean, we have \(\overline{\boldsymbol{z}}_{map} = \boldsymbol{0}\), and consequently the prior pdf is given by \[\begin{equation}\label{eqn:prior_pdf} f_\mathcal{G} (\boldsymbol{z}_{map})=(2\pi)^{-(n+1)/2} \rvert{\textbf{C}_{map}\lvert}^{-1/2} \exp\left\{ -\dfrac{1}{2} \boldsymbol{z}_{map}^T \hspace{.3em}\textbf{C}_{map}^{-1}\boldsymbol{z}_{map} \right\}, \end{equation}\] where \(\textbf{C}_{map}\) has the form \[\begin{equation}\label{eqn:C} \textbf{C}_{map} = \begin{bmatrix} \textbf{C}_{h,h} & \textbf{C}_{h,s} & \textbf{C}_{h,k} \\ \textbf{C}_{s,h} & \textbf{C}_{s,s} & \textbf{C}_{s,k} \\ \textbf{C}_{k,h} & \textbf{C}_{k,s} & \textbf{C}_{k,k} \end{bmatrix} \end{equation}\] with regards to the covariances of hard data, soft data, and prediction locations, as well as their cross-covariances.

Pre-posterior stage

This stage considers the specification knowledge \(\mathcal{S}\), mostly the physical data \(\boldsymbol{z}_{data} = \boldsymbol{z} = \{\boldsymbol{z}_h, \boldsymbol{z}_s \}\), to be incorporated in the posterior stage below to provide information leading to the posterior estimation. The process mainly involves collecting and organizing additional information in the forms of hard and soft data suitable for the BME framework. In practice, this is often a nontrivial process and considered more of an art rather than a quantitative task. The soft data in this study are restricted to intervals, but generally they can take a variety of other forms such as probability functions, physical models, and functional data. An important feature of BME, and kriging as well, is that some or all of the data are often used twice in the whole estimation process. More specifically, data are used implicitly in the prior stage to derive the prior pdf, and then used again explicitly in the posterior stage to calculate the posterior estimates. Particularly, in our study, the component blocks \(\boldsymbol{C}_{h,h}, \boldsymbol{C}_{s,s}, \boldsymbol{C}_{k,k}, \boldsymbol{C}_{s,h}, \boldsymbol{C}_{k,h}\) of the prior covariance matrix \(\boldsymbol{C}_{map}\) defined in (\(\ref{eqn:C}\)) are estimated empirically from the data \(\boldsymbol{z}_{data}\) by the corresponding sample covariances. The covariance \(\boldsymbol{C}_{s,s}\) for the sole soft data is computed with an added uncertainty from a prior distribution \(f_s(\boldsymbol{z}_s)\) for the soft data. More precisely, the diagonals of \(\boldsymbol{C}_{s,s}\) are defined as the sum of the usual covariance function at lag \(0\) and the variance of the prior soft data distribution, i.e., \[\begin{equation} diag\left\{\boldsymbol{C}_{s,s}\right\} =C(0)+\sigma^2\left\{f_s(\boldsymbol{z}_s)\right\}, \end{equation}\] where \(C(0)\) is estimated by the sample variance. The value of \(\sigma^2\left\{f_s(\boldsymbol{z}_s)\right\}\) depends on the form of \(f_s\), which is usually assumed to be a uniform distribution without additional prior information, and thus takes the form
\[\begin{equation} \sigma^2\left\{f_s(\boldsymbol{z}_s)\right\}=\frac{1}{12}\left(\boldsymbol{b}-\boldsymbol{a}\right)^2, \end{equation}\] where \(\boldsymbol{b}\) and \(\boldsymbol{a}\) are the upper and lower bounds of the soft-intervals.

Posterior stage

The goal of the posterior stage is to maximize the posterior probability given both \(\mathcal{G}\) and \(\mathcal{S}\), with a focus on coherency. This stage is further divided into two steps. In the first step, the posterior probability density for any realization \(\tilde{\boldsymbol{z}}\) of all locations conditioning on the data \(\boldsymbol{z}(\boldsymbol{x})\) is computed according to the Bayes rule as: \[\begin{equation}\label{eqn:post} \pi(\tilde{\boldsymbol{z}})=A^{-1}l(\boldsymbol{z}|\tilde{\boldsymbol{z}})f_G(\boldsymbol{z}), \end{equation}\] where \(A\) is a normalizing constant. The likelihood function \(l(\boldsymbol{z}|\tilde{\boldsymbol{z}})\) of the data \(\boldsymbol{z}(\boldsymbol{x})\) given a realization \(\tilde{\boldsymbol{z}}\) is the key in this Bayesian formulation, which is defined for three cases. If \(\boldsymbol{z}(\boldsymbol{x}_h)\ne\tilde{\boldsymbol{z}}_h\), then the likelihood is zero. If \(\boldsymbol{z}(\boldsymbol{x}_h)=\tilde{\boldsymbol{z}}_h\) and there are no soft data, then the likelihood is one. The third case is when the hard data coincides exactly with the realization \(\tilde{\boldsymbol{z}}_h\) and there are additional soft data \(\boldsymbol{z}(\boldsymbol{x}_s)\), and the likelihood is \(l(\boldsymbol{z}(\boldsymbol{x})|\tilde{\boldsymbol{z}})=\Pi_{i=1}^{n_s}f_{s,i}(\tilde{z}_{s,i}):=f_s(\tilde{\boldsymbol{z}}_s)\). Namely, \[\begin{equation}\label{eqn:likelihood} l(\boldsymbol{z}(\boldsymbol{x})|\tilde{\boldsymbol{z}})= \begin{cases} f_s(\tilde{\boldsymbol{z}}_s)=\Pi_{i=1}^{n_s}f_{s,i}(\tilde{z}_{s,i}) & \boldsymbol{z}_h=\tilde{\boldsymbol{z}}_h,\ n_s>0,\\ 1, & \boldsymbol{z}_h=\tilde{\boldsymbol{z}}_h,\ n_s=0,\\ 0, & \boldsymbol{z}_h\ne\tilde{\boldsymbol{z}}_h. \end{cases} \end{equation}\] In the second step, the posterior pdf \(\pi(\tilde{z}_k)\) of \(Z(\boldsymbol{x}_k)\) is calculated as the marginal of the posterior joint pdf (\(\ref{eqn:post}\)). For the case when there are both hard and soft data, since the hard data are fixed values, the integration is only with respect to the soft data. More precisely, \[\begin{eqnarray*} \pi(\tilde{z}_k) &=& A^{-1}\int\pi(\tilde{\boldsymbol{z}})d(\tilde{\boldsymbol{z}}_h, \tilde{\boldsymbol{z}}_s) =A^{-1}\int l(\boldsymbol{z}|\tilde{\boldsymbol{z}})f_G(\tilde{\boldsymbol{z}}) d(\tilde{\boldsymbol{z}}_h, \tilde{\boldsymbol{z}}_s)\\ &=&A^{-1}\int_{\tilde{z}_s}f_s(\tilde{\boldsymbol{z}}_s)f_G(\boldsymbol{z}_h,\tilde{\boldsymbol{z}}_s,\tilde{z}_k)d(\tilde{\boldsymbol{z}}_s). \end{eqnarray*}\] For simplification of notation, the above formula for the posterior pdf of \(Z(\boldsymbol{x}_k)\) is equivalently expressed as \[\begin{equation}\label{eqn:post-k} \pi(z_k)=A^{-1}\int f_s(\boldsymbol{z}_s)f_G(\boldsymbol{z}_h, \boldsymbol{z}_s, z_k)d\boldsymbol{z}_s, \end{equation}\] with the note that the \(\boldsymbol{z}_s\) in the integral is not the actual soft data but any realization within the associated domain. For the special case when \(\boldsymbol{z}_s\) are intervals with lower bounds \(\boldsymbol{a}\) and upper bounds \(\boldsymbol{b}\), the posterior pdf (\(\ref{eqn:post-k}\)) takes the form: \[\begin{equation}\label{eqn:post-k-int} \pi(z_k)=A^{-1}\int_{\boldsymbol{a}}^{\boldsymbol{b}}f_G(\boldsymbol{z}_h, \boldsymbol{z}_s, z_k)d\boldsymbol{z}_s, \end{equation}\] where \(\int_{\boldsymbol{a}}^{\boldsymbol{b}}\) is an \(n_s\)-dimensional integral over the intervals \([a_i, b_i]\), \(i=1,\cdots, n_s\).

The updated pdf (\(\ref{eqn:post-k}\)) at the posterior stage is a conditional pdf expressed in terms of the prior pdf and the new data and information considered at the pre-posterior stage as follows \[\begin{equation}\label{E6} f_{\mathcal{K}}(z_k|\boldsymbol{z}_{data}) = A^{-1} \int f(\boldsymbol{z}_s) f_\mathcal{G} (\boldsymbol{z}_{map}) d\boldsymbol{z}_s. \end{equation}\] Equation (\(\ref{E6}\)) represents an application of the Bayes law, where \(\boldsymbol{z}_{data}\) indicates a known knowledge and \(z_k|\boldsymbol{z}_{data}\) represents the potential values \(z_k\) of the unknown knowledge. It is important to recognize that Equation (\(\ref{E6}\)) differs from the traditional Bayesian law by using the mapping prior \(f_{\mathcal{K}}(\boldsymbol{z}_{map})\) rather than the usual prior \(f_{\mathcal{G}}(z_k)\). The mathematical analysis involved in the three stages of the BME approach leads to the posterior pdf at location \(\boldsymbol{x}_k\) expressed as \[\begin{equation}\label{E7} f_{\mathcal{K}}(z_k) = A^{-1} \int^{\boldsymbol{b}}_{\boldsymbol{a}} f_\mathcal{G} (\boldsymbol{z}_s | \boldsymbol{z}_h, z_k) d\boldsymbol{z}_s f_\mathcal{G} (\boldsymbol{z}_h, z_k). \end{equation}\] This posterior pdf, which may not be Gaussian provides a full picture of the prediction situation at the target point. The posterior information, which refers to the updated knowledge after the data \(\boldsymbol{z}(\boldsymbol{x})\) of the pre-posterior stage have been incorporated, is given by \[\begin{equation*} \text{Info}_\mathcal{G}(z_k|\boldsymbol{z}_{data}) = -\log\left[{f_{\mathcal{K}} (z_k|\boldsymbol{z}_{data})}\right]. \end{equation*}\] Interested readers are referred to for a more detailed description of the BME framework.

BME estimate and uncertainty

For a targeted location \(\boldsymbol{x}_k\) that is typically found on a regular grid that includes specific points \(\boldsymbol{x}_i, i = 1,\dots,n\) of the available data \(\boldsymbol{z}_{data}\), the BME prediction is defined as the posterior mean
\[\begin{equation}\label{eqn:post-mean} \bar{z}_{k|\mathcal{K}} = \int z_k f_{\mathcal{K}}(z_k)d z_k, \end{equation}\] which is a nonlinear function of the available data and aims to minimize the squared prediction error. The prediction variance defined as \[\begin{equation}\label{eqn:post-var} \sigma^2_{k|\mathcal{K}} = \int \left(z_k - \bar{z}_{k|K} \right)^2 f_{\mathcal{K}}(z_k)d z_k, \end{equation}\] provides a measure of uncertainty associated with the predicted values. It is of note that the quantity \(\sigma^2_{k|\mathcal{K}}\) corresponds to the variance of the prediction error \(e_k = z_k - \bar{z}_{k|\mathcal{K}}\), not the variance of the actual predictor \(\bar{z}_{k|K}\). It is a commonly used accuracy measure in traditional estimation theory that effectively evaluates the estimation error, especially when the error distribution is not overly complex.

Calculate Group Mean

The calculate_group_mean function calculates the mean of a numeric variable for rows where a specified binary variable is TRUE. In our example dataset, it can be used to find the mean age of people who like coffee.

Here is how you can use the calculate_group_mean function:

calculate_group_mean(example_data, "Age", "Likes_Coffee")