This section contains the setup and the various utility functions used throughout.
Libraries used:
library(data.table)
library(mvtnorm)
library(ggplot2)
library(MASS)
library(cubature)
Compiled code (any models used are shown later):
# mod1 <- rstan::stan_model(".//stan//logit_notrand.stan", verbose = FALSE)
# rstan::expose_stan_functions(mod1)
Previously various foundational concepts for entropy were introduced. By the chain rule, we have:
\[ \begin{aligned} H(X_1, X_2, \dots, X_n) &= \sum_{i=1}^n H(X_i | X_{i-1}, \dots, X_1) \\ &\le \sum_{i=1}^n H(X_i) \end{aligned} \]
Equality holds only if \(X_i\) is independent of \(X_{i-1}, \dots, X_1\) for all \(i\).
Assume a K-arm parallel group trial where the responses are from
independent normal distributions. Below I generate data using
rmvnorm
with a covariance matrix which has the
off-diagonals set to zero, i.e. the samples in each arm are independent.
These data are simply meant to be indicative of what you would obtain as
samples from a posterior distribution after observing the data; in other
words, it is what you would have to work with after fitting a Bayesian
model via MCMC.
mu <- qlogis(c(0.15, 0.20, 0.27))
K <- length(mu)
S <- diag(K)
S2 <- S
S2[1,1] <- 0.2
S2[2,2] <- 0.5
S2[1,2] <- 0.8
S2[2,3] <- 0.7
S2[1,3] <- 0.5
S2[2,1] <- 0.8
S2[3,2] <- 0.7
S2[3,1] <- 0.5
# need it to be positive semi-def
S2 <- S2 %*% t(S2)
S1 <- S2
S1[upper.tri(S1)] <- 0
S1[lower.tri(S1)] <- 0
#
m1 <- rmvnorm(1e3, mu, S1)
pairs(m1)
Because independence is satisfied, the entropy of the joint distribution can be computed as the sum of the entropies from the marginals.
While the random variables are continuous, I compute the Shannon entropy, by estimating a kernel density and then discretising the continous distribution over a grid. An alternative is to use a histogram as the approximation, albeit perhaps more choppy than the kernel.
entropy1 <- function(m){
# For each column of random variates:
H <- sum(unlist(lapply(1:ncol(m), function(ii){
# Compute the density of the marginal
z <- density(m[, ii], from = -5, to = 5)
# Obtained the granularity of the grid
delta <- diff(z$x)[1]
# Probability
y <- z$y * delta
# Log probability
ly <- log(z$y * delta)
# Set problem values to zero
ly[y == 0] <- 0
# Calculate Shannon entropy
-sum(y * ly)
})))
H
}
entropy1(m1)
## [1] 16.56226
Compare this to a direct calculation based on the joint distribution (also done by using a grid approximation). The result is similar to the above.
# humongous grid
z <- density(m1[, 1], from = -5, to = 5)
x <- z$x
grid <- expand.grid(
x1 = x, x2 = x, x3 = x
)
p <- dmvnorm(grid, mu, S1)
p <- p / sum(p)
lp <- log(p)
lp[p == 0] <- 0
-sum(p * lp)
## [1] 16.44398
However, in other situations, the K arms could be dependent, as in dose ranging studies.
m2 <- rmvnorm(1e3, mu, S2)
pairs(m2)
In such a setting, calculating the entropy of the joint distribution as the sum of the entropies from the marginals will overstate its true value.
entropy1(m2)
## [1] 16.505
Instead, we should compute the joint entropy directly from the joint distribution. This will produce a lower value:
# humongous grid
z <- density(m2[, 1], from = -5, to = 5)
x <- z$x
grid <- expand.grid(
x1 = x, x2 = x, x3 = x
)
p <- dmvnorm(grid, mu, S2)
p <- p / sum(p)
lp <- log(p)
lp[p == 0] <- 0
-sum(p * lp)
## [1] 14.45336
Shannon tried to extend the definition of entropy to the continuous setting via the differential entropy
\[ h(f) = -\int_{\mathcal{X}} f(x) \log(f(x)) dx \]
which essentially just replaced the summation with an integral. However, Jaynes went on to define the limiting density of discrete points, which is an adjustment to the differential entropy aimed at addressing some of its limitations. But we will ignore the LDDP for now and just work with differential entropy. For the multivariate normal distribution, the differential entropy is defined as
\[ \frac{k}{2} + \frac{k}{2} \log(2\pi) + \frac{1}{2}\log(|\Sigma|) \]
with the last term being the determinant of the covariance matrix. So for the two settings described above, we would obtain:
K/2 + K/2 * log(2*pi) + 1/2 * log(det(S1))
## [1] 4.658515
K/2 + K/2 * log(2*pi) + 1/2 * log(det(S2))
## [1] 2.662266
which we could approximate via numerical integration:
# Vectorise to speed this up.
ent_norm <- function(x, mu, S){
# note the negative
- dmvnorm(x, mu, S) * dmvnorm(x, mu, S, log=T)
}
lwr <- c(-5, -5, -5)
upr <- c(5, 5, 5)
Hmv1 <- adaptIntegrate(ent_norm, lwr, upr, mu = mu, S = S1)
Hmv2 <- adaptIntegrate(ent_norm, lwr, upr, mu = mu, S = S2)
Hmv1$integral
## [1] 4.632769
Hmv2$integral
## [1] 2.648844