The bme package is used make spatial interpolations for
unobserved locations using hard and soft-interval data. This vignette
provides an overview of basic features in bme. We load
bme by running
If you use bme in a formal publication or report, please
cite it. Citing bme lets us devote more resources to it in
the future. We view the bme citation by running
Environmental data are often imprecise due to various limitations and
uncertainties in the measuring process. As a result, they often consist
of a combination of both precise and imprecise information, referred to
as hard and soft data, respectively. Often in practice, soft data are
characterized as intervals as a simple form to properly preserve the
underlying imprecision. All the data sets we use in this vignette are
data frames objects. The data sets comprise of reliability-targeted snow
loads in California, Colorado and Idaho. Each data set has 9 columns:
station, latitude, longitude, elevation, center, lower, upper, eco3}
and type. The column description are given below:
station: name of the snow measuring station.(latitude, longitude): geographic location of the
measuring station.eco3: eco-region of the measuring station.center: name of the snow measuring station.(lower, upper): lower and upper limits of soft-interval
data.type: type of snow measurement.Sources consist of measurements of snow depth (SNWD) and snow load,
which are measured as the water equivalent of snow on the ground (WESD).
WESD is precise (hard data) whilst SNWD is is imprecise (soft data).
Here you can load and view the california data.
Bayesian maximum entropy (BME; Christakos (1990)) is a spatiotemporal random field (STRF) prediction method that incorporates knowledge and data from various sources and of varying precision, and combines them systematically to achieve improved prediction accuracy. The BME approach combines the entropy theory and operational Bayesian statistics to create a scientific mathematical framework for analysis and spatial/spatiotemporal mapping (Christakos, 1990). The entropy theory allows for integrating the general and specification knowledge bases. The general knowledge relies on background knowledge and the expertise of others. On the other hand, the specification knowledge is gained through personal experience with the particular situation, including empirical observations and site-specific data (Christakos, 1998). This feature is particularly advantageous in data-driven analytical environments as it allows for the utilization of general knowledge bases that might otherwise be overlooked.
One of the key benefits of BME is its ability to utilize measured data beyond single-value observations (Christakos, 2000) by distinguishing between hard and soft data. Accurate measurements obtained from real-time observation devices, numerical simulation, etc., are considered hard data, while uncertain measurements expressed in terms of interval values, probability statements, empirical charts, among others, are classified as soft data. Soft data can result from various sources of uncertainty such as measurement inaccuracies, modeling uncertainties, and human error. The BME framework enables rigorous incorporation of both hard and soft data types in the interpolation process to maximize the information input and create a most inclusive prediction (Christakos, 1998).
The Bayesian maximum entropy (BME) for spatiotemporal interpolation was introduced by Christakos (1990). The method aims to create a systematic framework for organizing scientific information of multiple types into the general \((\mathcal{G})\) and specification \((\mathcal{S})\) knowledge bases, which are then combined into the total physical knowledge \(\mathcal{K}\), thus, \(\mathcal{K} = \mathcal{G} \cup \mathcal{S}\). By utilizing spatiotemporal mapping and the appropriate \(\mathcal{K}\), one can gain a broader and more complete understanding of a natural occurrence or situation. This approach leads to the three main stages of knowledge acquisition, processing, and integration: a prior stage, pre-posterior stage, and posterior stage (Christakos, 1998).
Let \(\boldsymbol{Z}_{map} = \{\boldsymbol{Z}_h, \boldsymbol{Z}_s, Z_k\}\) denote the SRF of the complete set of hard data locations \(\mathcal{H}\), soft data locations \(\mathcal{S}\), and an arbitrary unmeasured location \(\boldsymbol{x}_k\). Let \(\boldsymbol{z}_{map} = \{\boldsymbol{z}_h, \boldsymbol{z}_s, z_k \}\) be a realization of \(\boldsymbol{Z}_{map}\). We are interested in predicting \(z_k\) based on the data \(\boldsymbol{z}=\{\boldsymbol{z}_h, \boldsymbol{z}_s\}\).
The goal of the prior stage is to maximize information relative to
the general knowledge \(\mathcal{G}\)
(Serre, 1999), which begins with a set of assumptions and constraints.
Equation (2) represents the mathematical formulation of the
prior knowledge in terms of the constraints \[\begin{equation}\label{eqn:E2}
\bar{g}_\beta =
\int
g_\beta\left(\boldsymbol{z}_{map}\right) f_{\mathcal{G}}\left(\boldsymbol{z}_{map}\right)
d\boldsymbol{z}_{map},\ \
\beta = 1, 2, \dots, N_{\beta}.
\end{equation}\] , N_{}. \end{equation} Here, \(g_{\beta}\) are a set of known functions of
\(\boldsymbol{z}_{map}\), and \(f_{\mathcal{G}}
\left(\boldsymbol{z}_{map}\right)\) is the prior pdf of \(\boldsymbol{Z}_{map}\) that represents our
knowledge before observing the data. The expectations \(\bar{g}_{\beta}\) provide the spatial
moments of interest. There are a series of popular choices regarding the
form of \(g_{\alpha}\) to represent the
prior knowledge. For example, (3) and (4) represent the means, and
covariances, where \(|\mathcal{H}|\)
and \(|\mathcal{S}|\) are the sizes of
hard and soft data, respectively. \[\begin{equation}\label{eqn:cons_mean}
g_{\alpha}\left(\boldsymbol{z}_{map}\right)
=\boldsymbol{z}_{map,\alpha},\ \
\alpha=1,\cdots,|\mathscr{H}|+|\mathscr{S}|+1,
\end{equation}\]s,||+||+1, \end{equation} \[\begin{eqnarray}\label{eqn:cons_cov}
g_{\alpha}(\boldsymbol{z}_{map})=
\left(\boldsymbol{z}_{map,\alpha}-\bar{\boldsymbol{z}}_{map,\alpha}\right)
\left(\boldsymbol{z}_{map,\alpha'}-\bar{\boldsymbol{z}}_{map,\alpha'}\right),\
\
\alpha, \alpha'
=1,\cdots,|\mathscr{H}|+|\mathscr{S}|+1,
\end{eqnarray}\]athscr{H}|+||+1, \end{eqnarray} Higher spatial
orders, if desired, can be formulated similarly. Shannon’s information
entropy is used as a measure of uncertainty contained in the prior pdf
\(f_{\mathcal{G}}\), which is expressed
as
\[\begin{equation}\label{eqn:shannon}
\mathbb{E}\left[\text{Info}_\mathcal{G}(\boldsymbol{Z}_{map})\right]
= - \int \log\left[{f_\mathcal{G} (\boldsymbol{z}_{map})}\right]
\hspace{.2em} f_\mathcal{G} (\boldsymbol{z}_{map}) \hspace{.2em} d
\boldsymbol{z}_{map}.
\end{equation}\] In general, a more concentrated pdf has lower
entropy. To determine the shape of the prior pdf, a process is carried
out that maximizes the entropy function (\(\ref{eqn:shannon}\)) subject to the
constraints specified in (2) that represent the prior knowledge . In
particular, if the constrains are restricted to the first two spatial
moments (3)-(4), the \(f_\mathcal{G}\)
that maximizes the Shannon’s entropy (\(\ref{eqn:shannon}\)) is the multivariate
Gaussian distribution with mean \(\overline{\boldsymbol{z}}_{map}\) and
covariance matrix \(\textbf{C}_{map}\)
. In our study, since the data is the regression residual and has zero
mean, we have \(\overline{\boldsymbol{z}}_{map} =
\boldsymbol{0}\), and consequently the prior pdf is given by
\[\begin{equation}\label{eqn:prior_pdf}
f_\mathcal{G} (\boldsymbol{z}_{map})=(2\pi)^{-(n+1)/2}
\rvert{\textbf{C}_{map}\lvert}^{-1/2}
\exp\left\{ -\dfrac{1}{2} \boldsymbol{z}_{map}^T
\hspace{.3em}\textbf{C}_{map}^{-1}\boldsymbol{z}_{map} \right\},
\end{equation}\] where \(\textbf{C}_{map}\) has the form \[\begin{equation}\label{eqn:C}
\textbf{C}_{map} =
\begin{bmatrix}
\textbf{C}_{h,h} & \textbf{C}_{h,s} & \textbf{C}_{h,k} \\
\textbf{C}_{s,h} & \textbf{C}_{s,s} & \textbf{C}_{s,k} \\
\textbf{C}_{k,h} & \textbf{C}_{k,s} & \textbf{C}_{k,k}
\end{bmatrix}
\end{equation}\] with regards to the covariances of hard data,
soft data, and prediction locations, as well as their
cross-covariances.
This stage considers the specification knowledge \(\mathcal{S}\), mostly the physical data
\(\boldsymbol{z}_{data} = \boldsymbol{z} =
\{\boldsymbol{z}_h, \boldsymbol{z}_s \}\), to be incorporated in
the posterior stage below to provide information leading to the
posterior estimation. The process mainly involves collecting and
organizing additional information in the forms of hard and soft data
suitable for the BME framework. In practice, this is often a nontrivial
process and considered more of an art rather than a quantitative task.
The soft data in this study are restricted to intervals, but generally
they can take a variety of other forms such as probability functions,
physical models, and functional data. An important feature of BME, and
kriging as well, is that some or all of the data are often used twice in
the whole estimation process. More specifically, data are used
implicitly in the prior stage to derive the prior pdf, and then used
again explicitly in the posterior stage to calculate the posterior
estimates. Particularly, in our study, the component blocks \(\boldsymbol{C}_{h,h}, \boldsymbol{C}_{s,s},
\boldsymbol{C}_{k,k}, \boldsymbol{C}_{s,h},
\boldsymbol{C}_{k,h}\) of the prior covariance matrix \(\boldsymbol{C}_{map}\) defined in (\(\ref{eqn:C}\)) are estimated empirically
from the data \(\boldsymbol{z}_{data}\)
by the corresponding sample covariances. The covariance \(\boldsymbol{C}_{s,s}\) for the sole soft
data is computed with an added uncertainty from a prior distribution
\(f_s(\boldsymbol{z}_s)\) for the soft
data. More precisely, the diagonals of \(\boldsymbol{C}_{s,s}\) are defined as the
sum of the usual covariance function at lag \(0\) and the variance of the prior soft data
distribution, i.e., \[\begin{equation}
diag\left\{\boldsymbol{C}_{s,s}\right\}
=C(0)+\sigma^2\left\{f_s(\boldsymbol{z}_s)\right\},
\end{equation}\] where \(C(0)\)
is estimated by the sample variance. The value of \(\sigma^2\left\{f_s(\boldsymbol{z}_s)\right\}\)
depends on the form of \(f_s\), which
is usually assumed to be a uniform distribution without additional prior
information, and thus takes the form
\[\begin{equation}
\sigma^2\left\{f_s(\boldsymbol{z}_s)\right\}=\frac{1}{12}\left(\boldsymbol{b}-\boldsymbol{a}\right)^2,
\end{equation}\] where \(\boldsymbol{b}\) and \(\boldsymbol{a}\) are the upper and lower
bounds of the soft-intervals.
The goal of the posterior stage is to maximize the posterior probability given both \(\mathcal{G}\) and \(\mathcal{S}\), with a focus on coherency. This stage is further divided into two steps. In the first step, the posterior probability density for any realization \(\tilde{\boldsymbol{z}}\) of all locations conditioning on the data \(\boldsymbol{z}(\boldsymbol{x})\) is computed according to the Bayes rule as: \[\begin{equation}\label{eqn:post} \pi(\tilde{\boldsymbol{z}})=A^{-1}l(\boldsymbol{z}|\tilde{\boldsymbol{z}})f_G(\boldsymbol{z}), \end{equation}\] where \(A\) is a normalizing constant. The likelihood function \(l(\boldsymbol{z}|\tilde{\boldsymbol{z}})\) of the data \(\boldsymbol{z}(\boldsymbol{x})\) given a realization \(\tilde{\boldsymbol{z}}\) is the key in this Bayesian formulation, which is defined for three cases. If \(\boldsymbol{z}(\boldsymbol{x}_h)\ne\tilde{\boldsymbol{z}}_h\), then the likelihood is zero. If \(\boldsymbol{z}(\boldsymbol{x}_h)=\tilde{\boldsymbol{z}}_h\) and there are no soft data, then the likelihood is one. The third case is when the hard data coincides exactly with the realization \(\tilde{\boldsymbol{z}}_h\) and there are additional soft data \(\boldsymbol{z}(\boldsymbol{x}_s)\), and the likelihood is \(l(\boldsymbol{z}(\boldsymbol{x})|\tilde{\boldsymbol{z}})=\Pi_{i=1}^{n_s}f_{s,i}(\tilde{z}_{s,i}):=f_s(\tilde{\boldsymbol{z}}_s)\). Namely, \[\begin{equation}\label{eqn:likelihood} l(\boldsymbol{z}(\boldsymbol{x})|\tilde{\boldsymbol{z}})= \begin{cases} f_s(\tilde{\boldsymbol{z}}_s)=\Pi_{i=1}^{n_s}f_{s,i}(\tilde{z}_{s,i}) & \boldsymbol{z}_h=\tilde{\boldsymbol{z}}_h,\ n_s>0,\\ 1, & \boldsymbol{z}_h=\tilde{\boldsymbol{z}}_h,\ n_s=0,\\ 0, & \boldsymbol{z}_h\ne\tilde{\boldsymbol{z}}_h. \end{cases} \end{equation}\] In the second step, the posterior pdf \(\pi(\tilde{z}_k)\) of \(Z(\boldsymbol{x}_k)\) is calculated as the marginal of the posterior joint pdf (\(\ref{eqn:post}\)). For the case when there are both hard and soft data, since the hard data are fixed values, the integration is only with respect to the soft data. More precisely, \[\begin{eqnarray*} \pi(\tilde{z}_k) &=& A^{-1}\int\pi(\tilde{\boldsymbol{z}})d(\tilde{\boldsymbol{z}}_h, \tilde{\boldsymbol{z}}_s) =A^{-1}\int l(\boldsymbol{z}|\tilde{\boldsymbol{z}})f_G(\tilde{\boldsymbol{z}}) d(\tilde{\boldsymbol{z}}_h, \tilde{\boldsymbol{z}}_s)\\ &=&A^{-1}\int_{\tilde{z}_s}f_s(\tilde{\boldsymbol{z}}_s)f_G(\boldsymbol{z}_h,\tilde{\boldsymbol{z}}_s,\tilde{z}_k)d(\tilde{\boldsymbol{z}}_s). \end{eqnarray*}\] For simplification of notation, the above formula for the posterior pdf of \(Z(\boldsymbol{x}_k)\) is equivalently expressed as \[\begin{equation}\label{eqn:post-k} \pi(z_k)=A^{-1}\int f_s(\boldsymbol{z}_s)f_G(\boldsymbol{z}_h, \boldsymbol{z}_s, z_k)d\boldsymbol{z}_s, \end{equation}\] with the note that the \(\boldsymbol{z}_s\) in the integral is not the actual soft data but any realization within the associated domain. For the special case when \(\boldsymbol{z}_s\) are intervals with lower bounds \(\boldsymbol{a}\) and upper bounds \(\boldsymbol{b}\), the posterior pdf (\(\ref{eqn:post-k}\)) takes the form: \[\begin{equation}\label{eqn:post-k-int} \pi(z_k)=A^{-1}\int_{\boldsymbol{a}}^{\boldsymbol{b}}f_G(\boldsymbol{z}_h, \boldsymbol{z}_s, z_k)d\boldsymbol{z}_s, \end{equation}\] where \(\int_{\boldsymbol{a}}^{\boldsymbol{b}}\) is an \(n_s\)-dimensional integral over the intervals \([a_i, b_i]\), \(i=1,\cdots, n_s\).
The updated pdf (\(\ref{eqn:post-k}\)) at the posterior stage is a conditional pdf expressed in terms of the prior pdf and the new data and information considered at the pre-posterior stage as follows \[\begin{equation}\label{E6} f_{\mathcal{K}}(z_k|\boldsymbol{z}_{data}) = A^{-1} \int f(\boldsymbol{z}_s) f_\mathcal{G} (\boldsymbol{z}_{map}) d\boldsymbol{z}_s. \end{equation}\] Equation (\(\ref{E6}\)) represents an application of the Bayes law, where \(\boldsymbol{z}_{data}\) indicates a known knowledge and \(z_k|\boldsymbol{z}_{data}\) represents the potential values \(z_k\) of the unknown knowledge. It is important to recognize that Equation (\(\ref{E6}\)) differs from the traditional Bayesian law by using the mapping prior \(f_{\mathcal{K}}(\boldsymbol{z}_{map})\) rather than the usual prior \(f_{\mathcal{G}}(z_k)\). The mathematical analysis involved in the three stages of the BME approach leads to the posterior pdf at location \(\boldsymbol{x}_k\) expressed as \[\begin{equation}\label{E7} f_{\mathcal{K}}(z_k) = A^{-1} \int^{\boldsymbol{b}}_{\boldsymbol{a}} f_\mathcal{G} (\boldsymbol{z}_s | \boldsymbol{z}_h, z_k) d\boldsymbol{z}_s f_\mathcal{G} (\boldsymbol{z}_h, z_k). \end{equation}\] This posterior pdf, which may not be Gaussian provides a full picture of the prediction situation at the target point. The posterior information, which refers to the updated knowledge after the data \(\boldsymbol{z}(\boldsymbol{x})\) of the pre-posterior stage have been incorporated, is given by \[\begin{equation*} \text{Info}_\mathcal{G}(z_k|\boldsymbol{z}_{data}) = -\log\left[{f_{\mathcal{K}} (z_k|\boldsymbol{z}_{data})}\right]. \end{equation*}\] Interested readers are referred to for a more detailed description of the BME framework.
For a targeted location \(\boldsymbol{x}_k\) that is typically found
on a regular grid that includes specific points \(\boldsymbol{x}_i, i = 1,\dots,n\) of the
available data \(\boldsymbol{z}_{data}\), the BME prediction
is defined as the posterior mean
\[\begin{equation}\label{eqn:post-mean}
\bar{z}_{k|\mathcal{K}} = \int z_k f_{\mathcal{K}}(z_k)d z_k,
\end{equation}\] which is a nonlinear function of the available
data and aims to minimize the squared prediction error. The prediction
variance defined as \[\begin{equation}\label{eqn:post-var}
\sigma^2_{k|\mathcal{K}} = \int \left(z_k - \bar{z}_{k|K} \right)^2
f_{\mathcal{K}}(z_k)d z_k,
\end{equation}\] provides a measure of uncertainty associated
with the predicted values. It is of note that the quantity \(\sigma^2_{k|\mathcal{K}}\) corresponds to
the variance of the prediction error \(e_k =
z_k - \bar{z}_{k|\mathcal{K}}\), not the variance of the actual
predictor \(\bar{z}_{k|K}\). It is a
commonly used accuracy measure in traditional estimation theory that
effectively evaluates the estimation error, especially when the error
distribution is not overly complex.
The calculate_group_mean function calculates the mean of
a numeric variable for rows where a specified binary variable is TRUE.
In our example dataset, it can be used to find the mean age of people
who like coffee.
Here is how you can use the calculate_group_mean
function: