Priors on variance-covariance matrices

The blme package by Vincent Dorie performs the useful task of imposing priors on the parameters of mixed-model fits executed by lme4. In particular, it allows for Wishart priors on the variance-covariance matrix of the random effects.

Rather than implement my own Wishart distribution function, I'll take one from the LaplacesDemon package (according to sos::findFn("dwishart"), there's also one in the mixAK package). (The density function is not very complicated to implement, and might even be implementable in terms of the Cholesky factor of the variance-covariance matrix, which would make it marginally more efficient in the context of lme4, but for now it's more efficient to steal rather than reinvent the wheel.)

Replicating the sequence from ?modular (in the development version of lme4):

library("lme4")
## 1. Parse the data and formula:
lmod <- lFormula(Reaction ~ Days + (Days | Subject), sleepstudy)
## 2.  Create the deviance function to be optimized:
devfun <- do.call(mkLmerDevfun, lmod)
## 3.  Optimize the deviance function:
opt <- optimizeLmer(devfun, verbose = TRUE)

## (NM) 20: f = 1749.37 at  1.45556 0.151111 0.244444
## (NM) 40: f = 1749.01 at  1.38592 0.153137 0.259747
## (NM) 60: f = 1747.09 at   1.0233 0.136541 0.282537
## (NM) 80: f = 1744.4 at     1.04784 -0.00494353    0.279921
## (NM) 100: f = 1743.65 at 0.985232 0.017029 0.226103
## (NM) 120: f = 1743.63 at  0.972338 0.0164651  0.229532
## (NM) 140: f = 1743.63 at 0.965964 0.014934 0.230881
## (NM) 160: f = 1743.63 at  0.966659 0.0151337  0.230886
## (NM) 180: f = 1743.63 at  0.966758 0.0151711  0.230915

## 4.  Package up the results:
fit1 <- mkMerMod(environment(devfun), opt, lmod$reTrms, fr = lmod$fr)

Now impose a prior: blme uses

for a grouping factor of dimension \( K \), df = \( K + 3 \). When the mode of the distribution exists, the scale is chosen to set the mode equal 10^{-2} times the identity matrix. When the mode does not exist, the mean is set to that value.

make_vc <- function(theta, n0) {
    m <- matrix(0, nrow = n0, ncol = n0)
    m[lower.tri(m, diag = TRUE)] <- theta
    m %*% t(m)
}
library("LaplacesDemon")
devfun2 <- function(theta) {
    pstr <- 0.01
    d <- devfun(theta)
    n <- length(theta)
    n0 <- (-1 + sqrt(1 + 8 * n))/2  ## dimension
    nu <- n0 + 3
    logprior <- try(dwishart(make_vc(theta, n0), nu = nu, S = diag(n0) * pstr/nu, 
        log = TRUE), silent = TRUE)
    if (inherits(logprior, "try-error")) {
        print(theta)
        stop("non-pos-def VC matrix")
    }
    d2 <- d - 2 * logprior
    cat(theta, logprior, d, d2, "\n")
    d2
}
deep_copy <- function(e1) {
    e2 <- new.env()
    for (n in ls(e1, all.names = TRUE)) assign(n, get(n, e1), e2)
    e2
}
## deep copy: is this the right/best way to go about it??  note that we
## are copying the devfun environment *after* the model is fitted
environment(devfun2) <- deep_copy(environment(devfun))
opt2 <- optimizeLmer(devfun2, verbose = TRUE)

## 0.9667 0.01517 0.2309 -223.3 1744 2190 
## 0.9867 0.01517 0.2309 -233 1744 2210 
## 0.9667 0.03517 0.2309 -223.5 1744 2191 
## 0.9667 0.01517 0.2509 -225.5 1744 2195 
## 0.9467 0.0285 0.2442 -215.4 1744 2175 
## 0.9267 0.03517 0.2509 -206.9 1744 2158 
## 0.9401 0.04184 0.2242 -210.3 1744 2164 
## 0.9223 0.02628 0.2398 -203.5 1744 2151 
## 0.9001 0.02184 0.2442 -193.8 1744 2131 
## 0.8779 0.05072 0.2487 -185.1 1744 2114 
## 0.8334 0.0685 0.2576 -167.7 1745 2080 
## 0.8334 0.04184 0.2776 -169.5 1745 2084 
## 0.7845 0.05295 0.2687 -149 1745 2043 
## 0.7134 0.06184 0.2776 -123.9 1746 1994 
## 0.6867 0.09295 0.2976 -118.6 1748 1985 
## 0.5801 0.1285 0.3242 -91.12 1752 1934 
## 0.5845 0.1307 0.2954 -88.26 1751 1927 
## 0.4601 0.1752 0.3042 -60.91 1756 1877 
## 0.3356 0.1752 0.3465 -43.4 1761 1848 
## 0.08674 0.2285 0.3909 -33.15 1773 1839 
## (NM) 20: f = 1848.07 at 0.335629 0.175169 0.346466
## 0.03785 0.2929 0.402 -43.84 1776 1863 
## [1] 0.0000 0.3359 0.4072

## Error: non-pos-def VC matrix

Snag 1: we need to be able to compute the Wishart even when the variance-covariance matrix is only positive semidefinite … although I'm not sure there's not another bug in here somewhere, as I'm surprised that the optimization even hits the boundary in this case …

To do:

create a “penalized deviance function constructor” that allows modification of prior strength etc..
try with a very weak prior and see if we get back (close to) the lmer answer (i.e. do we only hit positive semi-definite cases for stronger priors? or am I still just chasing a boneheaded bug? Did I get the model specification right?)