Merging Multiple Dirichelt Distributions

Herein we describe the second of the two steps needed to go from the (multiple) raw values we elicited as Beta distributions from the expert into a coherent probability framework. What we describe here is how to take the individual Dirichlet distributions we created in the first vignette, and create a mixture of Dirichlet Distributions. As before we will outline the mathematics, use a toy example, and then work through a proper example with code.

Mathematical Background

Let’s assume we have a prior with \(M\) categories that is a mixture of Dirichlet’s:

\[f^{(0)} = \sum_{j = 1}^J k_j^{(0)} f_j^{(0)} (\pi)\]

\[f^{(0)} = \sum_{j = 1}^J k_j^{(0)} Dir\left( d^{(0)}_{j1}, \ldots, d^{(0)}_{jM} \right)\]

Note the non-standard notation where the \(^{(0)}\) superscript refers to the prior, and later \(^{(1)}\) will refer to the posterior.

To sample from the above distribution, we choose a component distribution \(j\) with probability \(k_j^{(0)}\). We then sample from \(f_j^{(0)} (\pi)\), assuming our data are \((n_1, \ldots, n_M)\). The posterior for each component is given as:

\[ f_j^{(1)} (n) = Dir \left( d^{(1)}_{j1} + n_1, \ldots, d^{(1)}_{jM} + n_M\right ),\]

with normalising constant \(C\) given as:

\[ C_j = \frac{\Gamma \left( \sum^M_{m = 1} (d_{jm}^{(0)} + n_m)\right)}{\prod^M_{m=1}\Gamma(d_{jm}^{(0)} + n_m)}.\]

The overall posterior is:

\[ f^{(1)} (\pi) = \sum^J_{j=1} k_j^{(1)} f^{(1)}_j (\pi),\]

with

\[k_j^{(1)} = \frac{k_j^{(0)} C_j}{\sum_{i=1}^J k_j^{(0)} C_j} .\]

To sample, you choose a component distribution \(j\)with probability \(k_j^{(0)}\). We then sample from \(f_j^{(1)} (\pi) = Dir \left( d^{(0)}_{j1} + n_1, \ldots, d^{(0)}_{jM} + n_M\right )\), assuming our data are \((n_1, \ldots, n_M)\).

Posterior Calculations

Ok, let’s work through a toy example with fake data. First, let’s assume we have data with \(M\) categories \((15, 25, 60, \ldots, 21)\) and one Dirichlet prior distribution:

\[f_1^{(0)} = Dir(d_{11}^{(0)}, d_{12}^{(0)}, \ldots, d_{1M}^{(0)})\]

The posterior from which we’d sample is:

\[ f_1^{(1)} = Dir(d_{11}^{(0)} + 15, d_{12}^{(0)} + 25, d_{13}^{(0)} + 60, \ldots, d_{1M}^{(0)} + 21).\]

This is easy. But what if we have multiple Dirichlet priors from 3 different experts? Now we have three Dirichlet distributions \(D_1, D_2, D_3\) with the data as enumerated above. The prior is

\[Prior\: = \: \frac{1}{3}f_1^{(0)} + \frac{1}{3}f_2^{(0)} + \frac{1}{3}f_3^{(0)} \]

The posterior is a mixture of these Dirichlet’s, but NOT with \(\frac{1}{3}\) weights. Now we explain the proper weighting.

Say the prior weights can be written as:

\[ k^{(0)}_1 = \frac{1}{3}, k^{(0)}_2 = \frac{1}{3}, k^{(0)}_3 = \frac{1}{3} \]

and that we have three posterior Dirichlet’s:

\[ f_1^{(1)} = Dir(d_{11}^{(0)} + 15, d_{12}^{(0)} + 25, d_{13}^{(0)} + 60, \ldots, d_{1M}^{(0)} + 21) \]

\[ f_2^{(1)} = Dir(d_{21}^{(0)} + 15, d_{22}^{(0)} + 25, d_{23}^{(0)} + 60, \ldots, d_{2M}^{(0)} + 21) \]

\[ f_3^{(1)} = Dir(d_{31}^{(0)} + 15, d_{32}^{(0)} + 25, d_{33}^{(0)} + 60, \ldots, d_{3M}^{(0)} + 21) \]

Our new posterior is:

\[ k^{(1)}_1 f_1^{(1)} + k^{(1)}_2 f_2^{(1)} + k^{(1)}_3 f_3^{(1)}. \]

Next we calculate the weights. To do this we first calculate the normalising constant \(C\) for each prior, and then calculate the full prior weight for each of the three distributions.

\[ C_1 = \frac{\Gamma \left( \sum^M_{m = 1} (d_{1m}^{(0)} + n_m)\right)}{\prod^M_{m=1}\Gamma(d_{1m}^{(0)} + n_m)}, \]

\[ C_2 = \frac{\Gamma \left( \sum^M_{m = 1} (d_{2m}^{(0)} + n_m)\right)}{\prod^M_{m=1}\Gamma(d_{2m}^{(0)} + n_m)}, \]

\[ C_3 = \frac{\Gamma \left( \sum^M_{m = 1} (d_{3m}^{(0)} + n_m)\right)}{\prod^M_{m=1}\Gamma(d_{3m}^{(0)} + n_m)}. \]

All three of these are then included in the complete calculation for \(K^{(1)}_j\)

\[ k^{(1)}_1 = \frac{k^{(0)}_1 C_1}{k^{(0)}_1 C_1 + k^{(0)}_2 C_2 + k^{(0)}_3 C_3} = \frac{\frac{1}{3} C_1}{\frac{1}{3} C_1 + \frac{1}{3} C_2 + \frac{1}{3} C_3},\]

\[ k^{(1)}_2 = \frac{k^{(0)}_2 C_2}{k^{(0)}_1 C_1 + k^{(0)}_2 C_2 + k^{(0)}_3 C_3} = \frac{\frac{1}{3} C_2}{\frac{1}{3} C_1 + \frac{1}{3} C_2 + \frac{1}{3} C_3},\]

\[ k^{(1)}_3 = \frac{k^{(0)}_3 C_3}{k^{(0)}_1 C_1 + k^{(0)}_2 C_2 + k^{(0)}_3 C_3} = \frac{\frac{1}{3} C_3}{\frac{1}{3} C_1 + \frac{1}{3} C_2 + \frac{1}{3} C_3}.\]

The posterior is:

\[ K^{(1)}_1 f_1^{(1)} + K^{(1)}_2 f_2^{(1)} + K^{(1)}_3 f_3^{(1)}. \]

To sample from this, we first sample 1, 2 or 3 with probability \(K\). Let’s say we choose 2, then we sample from:

\[ f_2^{(1)} = Dir(d_{21}^{(0)} + 15, d_{22}^{(0)} + 25, d_{23}^{(0)} + 60, \ldots, d_{2M}^{(0)} + 21) \]

We repeat this many times, and then build up the mixture distribution.

Worked Example

Here we’ll start with a very simple example: 2 experts and three categories. The real trial will include an expanded number of experts, but we’ll keep it compact for the sake of the example. Here are the data along with each of the expert’s Dirichlet’s. The experts have low, and high confidence, respectively. We create these \(n\) by simply assigning the data:

library(ggplot2)
dir1 <- c(1, 2, 4)
dir2 <- c(10, 5, 1)

simDat <- c(5, 5, 5)
allDirs <- data.frame(expert = rep(c(1,2), each = 3), priors = c(dir1, dir2))
df <- data.frame(Data = simDat, Expert1 = dir1, Expert2 = dir2)
knitr::kable(df, caption = "Table 1. Data for worked example and 2 experts' prior Dirichlet distributions.")

Table 1. Data for worked example and 2 experts’ prior Dirichlet distributions.
Data	Expert1	Expert2
5	1	10
5	2	5
5	4	1

First we’ll write the function to accept the prior and the data and return the normalising constant \(C_j\):

calcC <- function(data, prior){

  dd <- as.vector(data)
  pp <- as.vector(prior)
  if(length(dd) != length(pp)) stop('Data and prior lengths do not match')
  logc <- lgamma(sum(dd + pp)) - sum(lgamma(dd + pp)) 

  return(exp(logc))
}

calcC(simDat, dir1)

## [1] 14665931280

calcC(simDat, dir2)

## [1] 6.987269e+13

To build that up for each expert we would have this in R code:

c1 <- calcC(simDat, dir1)
c2 <- calcC(simDat, dir2)
nex <- 2
k1 <- (1/nex * c1) / (1/nex * c1 + 1/nex * c2)
k2 <- (1/nex * c2) / (1/nex * c1 + 1/nex * c2)

K <- c(k1, k2)

With those assembled, we now want to sample according to probability \(K\), which is:

round(K, 3)

## [1] 0 1

On to sampling for one iteration:

idx <- which(rmultinom(1, 1, K) == 1)
prior <- allDirs[allDirs$expert == idx, 'priors']
library(gtools)
post <- rdirichlet(1, simDat + prior)
round(post, 3)

##       [,1]  [,2]  [,3]
## [1,] 0.484 0.181 0.334

We can now do this in a proper sampling framework and build up the posterior distribution.

nsamp <- 10000
outdf <- matrix(nrow = nsamp, ncol = 3)
colnames(outdf) <- c('Area 1', 'Area 2', 'Area 3')
for(i in 1:nsamp){
  
  idx <- which(rmultinom(1, 1, K) == 1)
  prior <- allDirs[allDirs$expert == idx, 'priors']
  outdf[i, ] <- rdirichlet(1, simDat + prior)
  
}

And then finally, we can visualise it:

library(reshape2)
library(ggplot2)
dflong <- melt(as.data.frame(outdf))
p <- ggplot(data = dflong, aes(value, group = variable))+
  geom_density()+
  facet_grid(variable ~ .)
p

Merging Multiple Dirichelt Distributions

Rob Schick, PhD and Michail Papathomas, PhD

2015-11-30

Mathematical Background

Posterior Calculations

Worked Example