Tail biased factory generates with biases distributed around a mode of \(\omega_1\) = 0.25, with a consistency (or concentration) of \(_K\) = 12, so that the biases are distributed as \(\theta\) ~ beta(\(\theta\) | \(\omega_1\) (\(_K\) - 2) + 1, (1 - \(\omega_1\)(\(_K\) - 2) +1) = beta(\(\theta\) | 3.5, 8.5)
The head biased factor are distributed with a mode of \(\omega_2\) = 0.75, again with a consistency of \(_K\) = 12, such that the biases of coins from the factory are distributed as \(\theta\) ~ beta(\(\theta\) | \(\omega_2\) (\(_K\) - 2) + 1, (1 - \(\omega_2\)(\(_K\) - 2) +1) = beta(\(\theta\) | 8.5, 3.5)
Figure 10.2: Hiarchical diagram for two models of a coin. Model 1 is a tail-biased mint; model 2 is a head-biased mint. This diagram is a specific case of the right panel of figure 10.1, because the likelihood function is the same for both models and only the priors are different
For large and complex models, grid approximation or formal analysis is not possible. Two approaches.
source("Jags-Ydich-Xnom1subj-MbernBeta.R")
##
## *********************************************************************
## Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:
## A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.
## *********************************************************************
## Warning: package 'rjags' was built under R version 4.0.3
## Loading required package: coda
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs
## Warning: package 'runjags' was built under R version 4.0.3
myData = c(rep(0, 9-6), rep(1,6))
mcmcCoda = genMCMC(data = myData, numSavedSteps = 10000)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 9
## Unobserved stochastic nodes: 1
## Total graph size: 21
##
## Initializing model
##
## Burning in the MCMC chain...
## Sampling final MCMC chain...
theta = as.matrix(mcmcCoda[, "theta"])
meanTheta = mean(theta)
sdTheta = sd(theta)
aPost = meanTheta * (meanTheta*(1-meanTheta)/sdTheta^2-1)
bPost = (1-meanTheta) * (meanTheta*(1-meanTheta)/sdTheta^2-1)
oneOverPD = mean( dbeta( theta , aPost , bPost ) /
( theta^6*(1-theta)^(9-6)
* dbeta( theta , 0.75*(12-2)+1 , (1-0.75)*(12-2)+1 ) ) )
PD = 1/oneOverPD
show(PD)
## [1] 0.002337505
Following figure 10.2, top level parameter is the index across models.
Figure 10.4: The prior and posterior distributions for scripts jags-ydich-xnom1subj-MbernBetaModelComp.R. The upper frame, which shows the prior distribution, has labels that indicate \[p(\theta|D)\] but the data set, \(D\), is empty. The lower frame shows the posterior distribution
using two distinct beta distributions that generate distinct \(\theta_m\) values. Resulting in a method that is more faithful to the diagram in figure 10.2.
The values for the pseudo-priors are determined as follows:
A complex model can find some combination of its parameter values that match the data better than the simpler model.
Bayesian model comparison compensates for model complexity by the fact that each model must have a prior distribution over its parameters, and more complex models must dilute their prior distributions over larger parameter spaces than simpler ones.
“Must-be-fair” model:
“Anything’s-possible” model:
Coin flip N = 20, z = 15 heads.
Figure 10.5b: Lower part shows the posterior distribution on the parameters.
* Note: The pseudo-priors nicely mimic the actual posteriors. + Because the ESS of the model index is far better than in the initial run, we trust its estimate in this run more than in the initial run. + The posterior probability of m = 2 is very nearly 92%.
mcmcMat = as.matrix( codaSamples )
m = mcmcMat[,"m"]
theta1M1 = mcmcMat[,"theta1"][ m ==1 ]
theta1M2 = mcmcMat[,"theta1"][ m ==2 ]
theta2M1 = mcmcMat[,"theta2"][ m ==1 ]
theta2M2 = mcmcMat[,"theta2"][ m ==2]
pD = function(z,N,a,b) { exp( lbeta(z+a,N-z+b) - lbeta(a,b) ) }
omega_1 <- 0.25
omega_2 <- 0.75
kappa <- 6
z <- 7
N <- 10
a_1 <- omega_1 * (kappa - 2) + 1
b_1 <- (1 - omega_1) * (kappa - 2) +1
a_2 <- omega_2 * (kappa - 2) + 1
b_2 <- (1 - omega_2) * (kappa - 2) +1
pd1 <- pD(z = z, N = N, a = a_1, b = b_1)
pd2 <- pD(z = z, N = N, a = a_2, b = b_2)
bf1 <- pd1/pd2
p1 <- .5
p2 <- 1 - p1
bfPrior = (pd1/pd2)*(p1/p2)
p1gd = bfPrior/(1.0+bfPrior)
p2gd = 1.0-p1gd
show(bf1)
## [1] 0.3333333
show(p1gd)
## [1] 0.25
show(p2gd)
## [1] 0.75
pD = function(z,N,a,b) { exp( lbeta(z+a,N-z+b) - lbeta(a,b) ) }
omega_1 <- 0.25
omega_2 <- 0.75
kappa <- 202
z <- 7
N <- 10
a_1 <- omega_1 * (kappa - 2) + 1
b_1 <- (1 - omega_1) * (kappa - 2) +1
a_2 <- omega_2 * (kappa - 2) + 1
b_2 <- (1 - omega_2) * (kappa - 2) +1
pd1 <- pD(z = z, N = N, a = a_1, b = b_1)
pd2 <- pD(z = z, N = N, a = a_2, b = b_2)
bf1 <- pd1/pd2
p1 <- .5
p2 <- 1 - p1
bfPrior = (pd1/pd2)*(p1/p2)
p1gd = bfPrior/(1.0+bfPrior)
p2gd = 1.0-p1gd
show(bf1)
## [1] 0.01621596
show(p1gd)
## [1] 0.0159572
show(p2gd)
## [1] 0.9840428
That is because with the larger kappa there is a larger number of consistencies.
Figure computed from file
Figure 10.2