# Clear the environment and load packages
rm(list = ls())
library(MCMCpack)
library(bayesplot)
library(coda)
library(ggplot2)
library(tidyverse)
library(posterior)
# setwd("~/Dropbox/IU25SP-EDUC-Y617 Psychometric Theory/Psychometric Theory Final Project/Paired Comparison MCMC CDAT")
data.folder <- "Data"
load(file.path(data.folder, "pairDataCDAT.Rda"))
# Prepare for use in MCMCpack
colnames(pairDataCDAT) <- c("rater","itemj1", "itemj2","choice")
# Remove rater 1 from raters' comparisons
pairDataCDAT <- pairDataCDAT[pairDataCDAT$rater > 1, ]
# Prepare the data for easy matching to student responses as PDFs labeled by 4-digit ID
# Create a named vector for the mapping: names are the original IDs and values are the new numbers.
lookup <- c("60" = 1121, "61" = 1154, "62" = 1222, "63" = 1637,
"64" = 1995, "65" = 3369, "66" = 4004, "67" = 4209,
"68" = 4268, "69" = 4841, "70" = 5074, "71" = 5495,
"72" = 5710, "73" = 8294, "74" = 8980, "75" = 9899)
# Map the itemj1 and itemj2 columns using the lookup vector.
pairDataCDAT$itemj1 <- lookup[as.character(pairDataCDAT$itemj1)]
pairDataCDAT$itemj2 <- lookup[as.character(pairDataCDAT$itemj2)]
pairDataCDAT$choice <- lookup[as.character(pairDataCDAT$choice)]
# Sort for easy matching in theta.start matrix below
pairDataCDAT <- pairDataCDAT[order(pairDataCDAT$itemj1), ]
save(pairDataCDAT, file = file.path(data.folder, "pairDataCDAT.1.Rda"))
n_chains <- 3 # Set the number of chains
set.seed(1234) # Set seed for randomized starting values
I <- length(unique(pairDataCDAT$rater))
The vector γ represents each respondent’s perceptual orientation in the 2D latent space in radians. Initially, all respondents are assigned a value of π/4π/4 (45° in radians). The second line randomizes the starting values within a small range [π/4−0.4,π/4+0.4][π/4−0.4,π/4+0.4] to introduce slight variation.
gamma_start <- runif(I, pi/4-.4, pi/4+.4)
Thetas start at 0 for the one-dimensional model
theta_start1d <- matrix(0, 16, 1) # All start at 0 in the model's probit units
Create the empty starting \(\theta\) matrix for 2d models
theta_start2d <- matrix(0, 16, 2) # 16 scripts, two dimensions
# Creates a 16×2 matrix (theta_start2d) to store the latent attributes of statements along two dimensions. Initially, all values are set to 0.
Find the matching indices for the anchor scripts in the matrix by viewing them in order.
unique(pairDataCDAT$itemj1)
## [1] 1121 1154 1222 1637 1995 3369 4004 4209 4268 4841 5074 5495 5710 8294 8980
## [16] 9899
Set initial values for two anchor items whose place in the matrix is determined above.
# These values anchor certain scripts in the 2d latent space and help define the rotation.
theta_start2d[11,1] <- 1 ## script 5074 is interpreted as having high creative elaboration
theta_start2d[11,2] <- -1 ## script 5074 is interpreted as having low conventional representation
theta_start2d[12,1] <- -1 ## script 5495 is interpreted as having low creative elaboration
theta_start2d[12,2] <- 1 ## script 5495 is interpreted as having high conventional representation
seeds <- c(0202, 8018, 1304) # Set seeds for each chain
chains1d <- vector("list", n_chains)
for (i in 1:n_chains) {
chains1d[[i]] <- MCMCpaircompare(
pairDataCDAT,
verbose=10000,
burnin=500, # recommended to keep this a small percent of chain
mcmc=50000, # Long chain needed for good mixing and sufficient sample size
thin=20, # Large thinning intervals are not standard practice
seed=seeds[i], # Ensures reproducibility of results.
alpha.fixed=FALSE,
alpha.start=1,
a=0, # The default prio mean of alpha is 0
A=.25, # The default prior precision is 0.25 (variance of 4)
store.theta=TRUE, # Saves item-specfic draws.
store.alpha=TRUE, # Saves both rater-specific draws.
tune=.3, # Sets the tuning parameter for the Metropolis-Hastings step.
procrustes=FALSE # Disables Procrustes alignment
)
}
##
##
## MCMCpaircompare iteration 1 of 50500
##
##
## MCMCpaircompare iteration 10001 of 50500
##
##
## MCMCpaircompare iteration 20001 of 50500
##
##
## MCMCpaircompare iteration 30001 of 50500
##
##
## MCMCpaircompare iteration 40001 of 50500
##
##
## MCMCpaircompare iteration 50001 of 50500
##
##
## MCMCpaircompare iteration 1 of 50500
##
##
## MCMCpaircompare iteration 10001 of 50500
##
##
## MCMCpaircompare iteration 20001 of 50500
##
##
## MCMCpaircompare iteration 30001 of 50500
##
##
## MCMCpaircompare iteration 40001 of 50500
##
##
## MCMCpaircompare iteration 50001 of 50500
##
##
## MCMCpaircompare iteration 1 of 50500
##
##
## MCMCpaircompare iteration 10001 of 50500
##
##
## MCMCpaircompare iteration 20001 of 50500
##
##
## MCMCpaircompare iteration 30001 of 50500
##
##
## MCMCpaircompare iteration 40001 of 50500
##
##
## MCMCpaircompare iteration 50001 of 50500
# Apply the mcmc() funciton to each chain and combine into one mcmc_list2dDP
mcmc_list1d <- mcmc.list(lapply(chains1d, mcmc))
# Save object
save(mcmc_list1d, file = "Out/mcmc_list1d.Rdata")
# Create a list of parameters for each type of parameter
thetas <- grep("^theta", colnames(mcmc_list1d[[1]]), value = TRUE)
alphas <- grep("^alpha", colnames(mcmc_list1d[[1]]), value = TRUE)
gelman_diag_1d <- gelman.diag(mcmc_list1d)
max_rhat_1d <- max(gelman_diag_1d$psrf[, "Point est."])
gelman_diag_1d
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## theta.1121 1.00 1.01
## theta.1154 1.00 1.02
## theta.1222 1.00 1.00
## theta.1637 1.00 1.01
## theta.1995 1.00 1.01
## theta.3369 1.00 1.00
## theta.4004 1.00 1.01
## theta.4209 1.00 1.01
## theta.4268 1.00 1.01
## theta.4841 1.00 1.01
## theta.5074 1.00 1.01
## theta.5495 1.00 1.01
## theta.5710 1.00 1.01
## theta.8294 1.00 1.01
## theta.8980 1.00 1.00
## theta.9899 1.00 1.00
## alpha.25 1.00 1.01
## alpha.26 1.01 1.02
## alpha.27 1.00 1.00
## alpha.28 1.00 1.01
## alpha.29 1.00 1.01
## alpha.30 1.00 1.00
## alpha.31 1.00 1.01
## alpha.34 1.00 1.00
## alpha.35 1.00 1.00
## alpha.36 1.00 1.01
## alpha.37 1.00 1.00
## alpha.38 1.00 1.00
##
## Multivariate psrf
##
## 1.01
All are within acceptable range of point estimate and upper CI, meaning convergence across chains is adequate.
The multivariate PSRF is a generalization of the univariate PSRF (Gelman-Rubin statistic, or \(\hat{R}\), which considers the joint convergence of all parameters simultaneously using the variance-covariance matrix across chains.
It quantifies how much potential gain in efficiency (via additional sampling) there is if the chains were continued. Values much above 1 indicate that the between-chain variance is still large relative to the within-chain variance.
The multivariate PSRF is based on the largest eigenvalue of the matrix:
\(Var^+/W\)
It essentially checks for convergence across linear combinations of parameters, not just marginally.
View a summary
summary1d <- round(
cbind(
as.data.frame(summary(mcmc_list1d)$statistics),
as.data.frame(summary(mcmc_list1d)$quantiles)
), 3)
summary1d$Model <- "1d"
save(summary1d, file = "Out/summary1d.Rdata")
summary1d
## Mean SD Naive SE Time-series SE 2.5% 25% 50% 75%
## theta.1121 -0.177 0.277 0.003 0.008 -0.730 -0.359 -0.173 0.010
## theta.1154 -0.947 0.343 0.004 0.009 -1.650 -1.172 -0.931 -0.711
## theta.1222 0.208 0.278 0.003 0.008 -0.342 0.022 0.204 0.398
## theta.1637 -0.029 0.270 0.003 0.008 -0.571 -0.209 -0.028 0.153
## theta.1995 -0.401 0.285 0.003 0.008 -0.971 -0.591 -0.400 -0.206
## theta.3369 1.719 0.480 0.006 0.011 0.845 1.379 1.696 2.028
## theta.4004 0.348 0.293 0.003 0.008 -0.224 0.145 0.345 0.543
## theta.4209 -0.211 0.279 0.003 0.008 -0.767 -0.390 -0.206 -0.022
## theta.4268 0.204 0.280 0.003 0.008 -0.347 0.015 0.205 0.391
## theta.4841 -0.485 0.288 0.003 0.008 -1.066 -0.668 -0.479 -0.287
## theta.5074 -0.064 0.273 0.003 0.008 -0.607 -0.244 -0.064 0.120
## theta.5495 -0.721 0.321 0.004 0.008 -1.387 -0.931 -0.713 -0.499
## theta.5710 -0.958 0.342 0.004 0.009 -1.664 -1.178 -0.945 -0.723
## theta.8294 0.290 0.279 0.003 0.008 -0.266 0.101 0.290 0.475
## theta.8980 0.803 0.333 0.004 0.009 0.167 0.572 0.793 1.021
## theta.9899 0.405 0.296 0.003 0.008 -0.177 0.203 0.402 0.609
## alpha.25 1.106 0.393 0.005 0.006 0.492 0.830 1.053 1.327
## alpha.26 2.027 0.626 0.007 0.011 1.033 1.576 1.959 2.384
## alpha.27 1.946 0.582 0.007 0.010 1.010 1.522 1.880 2.292
## alpha.28 1.114 0.416 0.005 0.006 0.452 0.816 1.058 1.349
## alpha.29 0.860 0.333 0.004 0.005 0.323 0.625 0.827 1.055
## alpha.30 0.691 0.284 0.003 0.004 0.229 0.494 0.655 0.854
## alpha.31 0.356 0.206 0.002 0.003 -0.008 0.215 0.337 0.482
## alpha.34 0.993 0.344 0.004 0.005 0.440 0.747 0.949 1.193
## alpha.35 0.515 0.235 0.003 0.003 0.126 0.351 0.490 0.650
## alpha.36 1.097 0.405 0.005 0.006 0.461 0.809 1.038 1.326
## alpha.37 1.018 0.389 0.004 0.006 0.420 0.745 0.962 1.231
## alpha.38 0.656 0.265 0.003 0.004 0.230 0.471 0.626 0.800
## 97.5% Model
## theta.1121 0.356 1d
## theta.1154 -0.310 1d
## theta.1222 0.763 1d
## theta.1637 0.496 1d
## theta.1995 0.144 1d
## theta.3369 2.742 1d
## theta.4004 0.935 1d
## theta.4209 0.328 1d
## theta.4268 0.750 1d
## theta.4841 0.066 1d
## theta.5074 0.465 1d
## theta.5495 -0.103 1d
## theta.5710 -0.317 1d
## theta.8294 0.843 1d
## theta.8980 1.484 1d
## theta.9899 0.985 1d
## alpha.25 2.028 1d
## alpha.26 3.446 1d
## alpha.27 3.289 1d
## alpha.28 2.065 1d
## alpha.29 1.609 1d
## alpha.30 1.348 1d
## alpha.31 0.794 1d
## alpha.34 1.781 1d
## alpha.35 1.047 1d
## alpha.36 2.025 1d
## alpha.37 1.947 1d
## alpha.38 1.288 1d
effectiveSize(mcmc_list1d)
## theta.1121 theta.1154 theta.1222 theta.1637 theta.1995 theta.3369 theta.4004
## 1183.106 1607.665 1259.876 1107.284 1236.771 2087.138 1404.252
## theta.4209 theta.4268 theta.4841 theta.5074 theta.5495 theta.5710 theta.8294
## 1232.249 1233.426 1281.698 1108.178 1531.724 1546.711 1235.135
## theta.8980 theta.9899 alpha.25 alpha.26 alpha.27 alpha.28 alpha.29
## 1471.531 1326.976 4040.197 3487.532 3308.899 4485.919 4632.910
## alpha.30 alpha.31 alpha.34 alpha.35 alpha.36 alpha.37 alpha.38
## 4542.408 5756.181 4347.620 5207.429 4511.091 4177.004 4795.647
print(summarize_draws(mcmc_list1d)[, c("variable", "ess_bulk")], n=28)
## # A tibble: 28 × 2
## variable ess_bulk
## <chr> <dbl>
## 1 theta.1121 1066.
## 2 theta.1154 1515.
## 3 theta.1222 1142.
## 4 theta.1637 1064.
## 5 theta.1995 1169.
## 6 theta.3369 1700.
## 7 theta.4004 1280.
## 8 theta.4209 1162.
## 9 theta.4268 1130.
## 10 theta.4841 1200.
## 11 theta.5074 1085.
## 12 theta.5495 1357.
## 13 theta.5710 1386.
## 14 theta.8294 1104.
## 15 theta.8980 1345.
## 16 theta.9899 1173.
## 17 alpha.25 4178.
## 18 alpha.26 3596.
## 19 alpha.27 3267.
## 20 alpha.28 4201.
## 21 alpha.29 4369.
## 22 alpha.30 4780.
## 23 alpha.31 5940.
## 24 alpha.34 4344.
## 25 alpha.35 5248.
## 26 alpha.36 4615.
## 27 alpha.37 4346.
## 28 alpha.38 4478.
The ESS seems to be sufficient for inference from the posterior only when each MCMC chain is run for 50,000 iterations. This is >1000 when burn-in is about 1% of iterations (500) and thinning is set conservatively at 20.
See companion page for plots of 1d model “PWC 1d MCMCpack CDAT”
seeds <- c(1304, 2406, 7916) # Set seeds for each chain
chains2d <- vector("list", n_chains)
for (i in 1:n_chains) {
chains2d[[i]] <- MCMCpaircompare2d(
pairDataCDAT,
theta.constraints=list(
'5074' = list(1, "+"),
'5074' = list(2, "-"),
'5495' = list(1, "-"),
'5495' = list(2, "+")
),
verbose=10000,
burnin=200,
mcmc=20000,
thin=20,
seed=seeds[i], # Ensures reproducibility of results.
gamma.start=gamma_start, # Uses the initialized respondent perception values.
theta.start=theta_start2d,
store.theta=TRUE, # Saves item-specfic draws.
store.gamma=TRUE, # Saves both rater-specific draws.
tune=.25, # Sets the tuning parameter for the Metropolis-Hastings proposals
# resulting acceptance rate of .44 target for univariate proposals
procrustes=FALSE # Disables Procrustes alignment
)
}
##
##
## MCMCpaircompare2d iteration 1 of 20200
##
## Summary of acceptance rates for gamma:
##
##
## MCMCpaircompare2d iteration 10001 of 20200
##
## Summary of acceptance rates for gamma:
##
##
## MCMCpaircompare2d iteration 20001 of 20200
##
## Summary of acceptance rates for gamma:
##
##
## MCMCpaircompare2d iteration 1 of 20200
##
## Summary of acceptance rates for gamma:
##
##
## MCMCpaircompare2d iteration 10001 of 20200
##
## Summary of acceptance rates for gamma:
##
##
## MCMCpaircompare2d iteration 20001 of 20200
##
## Summary of acceptance rates for gamma:
##
##
## MCMCpaircompare2d iteration 1 of 20200
##
## Summary of acceptance rates for gamma:
##
##
## MCMCpaircompare2d iteration 10001 of 20200
##
## Summary of acceptance rates for gamma:
##
##
## MCMCpaircompare2d iteration 20001 of 20200
##
## Summary of acceptance rates for gamma:
# Apply the mcmc() funciton to each chain and combine into one mcmc_list2dDP
mcmc_list2d <- mcmc.list(lapply(chains2d, mcmc))
#Save object
save(mcmc_list2d, file = "Out/mcmc_list2d.Rdata")
# Create a list of parameters for each type of parameter
thetas <- grep("^theta", colnames(mcmc_list2d[[1]]), value = TRUE)
alpha <- grep("^alpha", colnames(mcmc_list2d[[1]]), value = TRUE)
mean(attr(chains2d[[3]], "gamma_accept_rate"), na.rm=TRUE)
## [1] 0.4334241
Gamma acceptance rates are excellent with tuning proposals set to .25 for a rate of .43 where .44 is considered optimal for univariate updates (Roberts, et al., 1997)
effectiveSize(mcmc_list2d)
## theta1.1121 theta1.1154 theta1.1222 theta1.1637 theta1.1995 theta1.3369
## 1383.213 1443.011 1533.158 1307.613 1495.780 2236.972
## theta1.4004 theta1.4209 theta1.4268 theta1.4841 theta1.5074 theta1.5495
## 1575.418 1696.629 1672.043 1609.734 1547.080 1543.666
## theta1.5710 theta1.8294 theta1.8980 theta1.9899 theta2.1121 theta2.1154
## 1759.374 2024.255 1647.231 1673.853 1339.960 1451.198
## theta2.1222 theta2.1637 theta2.1995 theta2.3369 theta2.4004 theta2.4209
## 1433.658 1504.343 1222.556 1647.750 1591.155 1469.726
## theta2.4268 theta2.4841 theta2.5074 theta2.5495 theta2.5710 theta2.8294
## 1702.467 1621.296 1292.201 1571.017 1581.382 1338.498
## theta2.8980 theta2.9899 gamma.25 gamma.26 gamma.27 gamma.28
## 1668.773 1571.592 1130.004 1174.260 990.656 1365.406
## gamma.29 gamma.30 gamma.31 gamma.34 gamma.35 gamma.36
## 1186.631 1451.307 2361.445 1059.541 1859.609 1699.951
## gamma.37 gamma.38
## 1382.259 1909.395
# For 2D model
gelman_diag_2d <- gelman.diag(mcmc_list2d)
max_rhat_2d <- max(gelman_diag_2d$psrf[, "Point est."])
gelman_diag_2d
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## theta1.1121 1.001 1.01
## theta1.1154 0.999 1.00
## theta1.1222 1.001 1.01
## theta1.1637 1.001 1.00
## theta1.1995 1.002 1.01
## theta1.3369 1.000 1.00
## theta1.4004 1.003 1.01
## theta1.4209 1.000 1.00
## theta1.4268 1.002 1.01
## theta1.4841 1.000 1.00
## theta1.5074 0.999 1.00
## theta1.5495 1.000 1.00
## theta1.5710 1.002 1.00
## theta1.8294 1.002 1.01
## theta1.8980 1.000 1.00
## theta1.9899 1.003 1.01
## theta2.1121 1.001 1.01
## theta2.1154 1.001 1.00
## theta2.1222 0.999 1.00
## theta2.1637 1.002 1.00
## theta2.1995 1.003 1.01
## theta2.3369 1.000 1.00
## theta2.4004 1.004 1.02
## theta2.4209 1.008 1.03
## theta2.4268 1.005 1.02
## theta2.4841 0.999 1.00
## theta2.5074 1.003 1.01
## theta2.5495 1.005 1.02
## theta2.5710 1.001 1.00
## theta2.8294 1.002 1.01
## theta2.8980 1.001 1.01
## theta2.9899 1.005 1.02
## gamma.25 1.012 1.04
## gamma.26 1.004 1.02
## gamma.27 1.008 1.03
## gamma.28 1.003 1.01
## gamma.29 1.003 1.01
## gamma.30 1.008 1.03
## gamma.31 1.005 1.01
## gamma.34 1.005 1.02
## gamma.35 1.003 1.01
## gamma.36 1.004 1.02
## gamma.37 1.005 1.02
## gamma.38 1.005 1.02
##
## Multivariate psrf
##
## 1.03
Excellent \(\hat{R}\) and multivariate psrf for this model, all close to 1.0
View a summary of the 2d model parameter posterior distributions
summary2d <- round(
cbind(
as.data.frame(summary(mcmc_list2d)$statistics),
as.data.frame(summary(mcmc_list2d)$quantiles)
), 3)
summary2d$Model <- "2d"
save(summary2d, file="Out/summary2d.Rdata")
summary2d
## Mean SD Naive SE Time-series SE 2.5% 25% 50% 75%
## theta1.1121 1.243 0.401 0.007 0.011 0.475 0.969 1.241 1.519
## theta1.1154 0.007 0.387 0.007 0.010 -0.743 -0.258 0.005 0.265
## theta1.1222 0.613 0.362 0.007 0.009 -0.091 0.366 0.613 0.859
## theta1.1637 0.561 0.362 0.007 0.010 -0.118 0.309 0.555 0.805
## theta1.1995 0.872 0.407 0.007 0.011 0.101 0.594 0.860 1.150
## theta1.3369 1.631 0.463 0.008 0.010 0.743 1.317 1.620 1.936
## theta1.4004 -0.854 0.399 0.007 0.010 -1.671 -1.118 -0.861 -0.584
## theta1.4209 -1.234 0.391 0.007 0.010 -2.025 -1.491 -1.231 -0.965
## theta1.4268 -0.894 0.398 0.007 0.010 -1.697 -1.158 -0.887 -0.619
## theta1.4841 -0.912 0.383 0.007 0.010 -1.671 -1.167 -0.910 -0.645
## theta1.5074 1.154 0.386 0.007 0.010 0.429 0.880 1.150 1.420
## theta1.5495 -2.243 0.446 0.008 0.012 -3.162 -2.538 -2.234 -1.930
## theta1.5710 -0.873 0.395 0.007 0.009 -1.644 -1.140 -0.862 -0.606
## theta1.8294 -0.116 0.369 0.007 0.008 -0.844 -0.366 -0.115 0.137
## theta1.8980 0.679 0.384 0.007 0.009 -0.062 0.421 0.674 0.942
## theta1.9899 0.357 0.376 0.007 0.009 -0.410 0.098 0.363 0.624
## theta2.1121 -1.224 0.365 0.007 0.010 -1.940 -1.465 -1.217 -0.971
## theta2.1154 -1.546 0.359 0.007 0.009 -2.264 -1.781 -1.549 -1.305
## theta2.1222 -0.160 0.328 0.006 0.009 -0.788 -0.379 -0.154 0.068
## theta2.1637 -0.565 0.330 0.006 0.009 -1.220 -0.780 -0.565 -0.335
## theta2.1995 -1.239 0.363 0.007 0.010 -1.969 -1.476 -1.239 -0.998
## theta2.3369 1.183 0.394 0.007 0.010 0.389 0.919 1.182 1.454
## theta2.4004 1.316 0.354 0.006 0.009 0.660 1.069 1.299 1.554
## theta2.4209 0.764 0.334 0.006 0.009 0.132 0.532 0.758 0.980
## theta2.4268 1.122 0.345 0.006 0.008 0.470 0.888 1.114 1.339
## theta2.4841 -0.262 0.329 0.006 0.008 -0.892 -0.475 -0.270 -0.057
## theta2.5074 -1.056 0.358 0.007 0.010 -1.801 -1.289 -1.051 -0.806
## theta2.5495 0.579 0.332 0.006 0.009 0.046 0.327 0.549 0.793
## theta2.5710 -0.845 0.345 0.006 0.009 -1.505 -1.071 -0.848 -0.622
## theta2.8294 0.753 0.324 0.006 0.009 0.137 0.537 0.755 0.971
## theta2.8980 0.874 0.346 0.006 0.008 0.194 0.648 0.863 1.103
## theta2.9899 0.581 0.342 0.006 0.009 -0.106 0.354 0.589 0.806
## gamma.25 0.729 0.137 0.002 0.004 0.450 0.636 0.731 0.820
## gamma.26 0.832 0.127 0.002 0.004 0.582 0.747 0.833 0.918
## gamma.27 0.842 0.126 0.002 0.004 0.594 0.755 0.845 0.928
## gamma.28 0.463 0.140 0.003 0.004 0.190 0.365 0.463 0.562
## gamma.29 0.917 0.142 0.003 0.004 0.639 0.821 0.916 1.011
## gamma.30 0.327 0.148 0.003 0.004 0.054 0.221 0.321 0.426
## gamma.31 0.086 0.072 0.001 0.002 0.002 0.030 0.068 0.124
## gamma.34 0.934 0.147 0.003 0.005 0.629 0.839 0.938 1.032
## gamma.35 1.327 0.141 0.003 0.003 1.023 1.236 1.339 1.432
## gamma.36 1.309 0.125 0.002 0.003 1.048 1.227 1.312 1.400
## gamma.37 1.323 0.119 0.002 0.003 1.075 1.247 1.327 1.409
## gamma.38 1.442 0.091 0.002 0.002 1.236 1.386 1.457 1.514
## 97.5% Model
## theta1.1121 2.023 2d
## theta1.1154 0.785 2d
## theta1.1222 1.312 2d
## theta1.1637 1.255 2d
## theta1.1995 1.675 2d
## theta1.3369 2.558 2d
## theta1.4004 -0.082 2d
## theta1.4209 -0.472 2d
## theta1.4268 -0.152 2d
## theta1.4841 -0.185 2d
## theta1.5074 1.917 2d
## theta1.5495 -1.398 2d
## theta1.5710 -0.116 2d
## theta1.8294 0.596 2d
## theta1.8980 1.443 2d
## theta1.9899 1.067 2d
## theta2.1121 -0.533 2d
## theta2.1154 -0.845 2d
## theta2.1222 0.458 2d
## theta2.1637 0.060 2d
## theta2.1995 -0.527 2d
## theta2.3369 1.932 2d
## theta2.4004 2.034 2d
## theta2.4209 1.427 2d
## theta2.4268 1.844 2d
## theta2.4841 0.399 2d
## theta2.5074 -0.365 2d
## theta2.5495 1.333 2d
## theta2.5710 -0.164 2d
## theta2.8294 1.396 2d
## theta2.8980 1.570 2d
## theta2.9899 1.256 2d
## gamma.25 0.992 2d
## gamma.26 1.081 2d
## gamma.27 1.083 2d
## gamma.28 0.728 2d
## gamma.29 1.193 2d
## gamma.30 0.631 2d
## gamma.31 0.268 2d
## gamma.34 1.215 2d
## gamma.35 1.551 2d
## gamma.36 1.534 2d
## gamma.37 1.531 2d
## gamma.38 1.564 2d
effectiveSize(mcmc_list2d)
## theta1.1121 theta1.1154 theta1.1222 theta1.1637 theta1.1995 theta1.3369
## 1383.213 1443.011 1533.158 1307.613 1495.780 2236.972
## theta1.4004 theta1.4209 theta1.4268 theta1.4841 theta1.5074 theta1.5495
## 1575.418 1696.629 1672.043 1609.734 1547.080 1543.666
## theta1.5710 theta1.8294 theta1.8980 theta1.9899 theta2.1121 theta2.1154
## 1759.374 2024.255 1647.231 1673.853 1339.960 1451.198
## theta2.1222 theta2.1637 theta2.1995 theta2.3369 theta2.4004 theta2.4209
## 1433.658 1504.343 1222.556 1647.750 1591.155 1469.726
## theta2.4268 theta2.4841 theta2.5074 theta2.5495 theta2.5710 theta2.8294
## 1702.467 1621.296 1292.201 1571.017 1581.382 1338.498
## theta2.8980 theta2.9899 gamma.25 gamma.26 gamma.27 gamma.28
## 1668.773 1571.592 1130.004 1174.260 990.656 1365.406
## gamma.29 gamma.30 gamma.31 gamma.34 gamma.35 gamma.36
## 1186.631 1451.307 2361.445 1059.541 1859.609 1699.951
## gamma.37 gamma.38
## 1382.259 1909.395
print(summarize_draws(mcmc_list2d)[, c("variable", "ess_bulk")], n=28)
## # A tibble: 44 × 2
## variable ess_bulk
## <chr> <dbl>
## 1 theta1.1121 1307.
## 2 theta1.1154 1510.
## 3 theta1.1222 1459.
## 4 theta1.1637 1419.
## 5 theta1.1995 1440.
## 6 theta1.3369 1810.
## 7 theta1.4004 1147.
## 8 theta1.4209 1540.
## 9 theta1.4268 1149.
## 10 theta1.4841 1603.
## 11 theta1.5074 1421.
## 12 theta1.5495 1460.
## 13 theta1.5710 1818.
## 14 theta1.8294 1565.
## 15 theta1.8980 1678.
## 16 theta1.9899 1444.
## 17 theta2.1121 1156.
## 18 theta2.1154 1429.
## 19 theta2.1222 1283.
## 20 theta2.1637 1532.
## 21 theta2.1995 1282.
## 22 theta2.3369 1518.
## 23 theta2.4004 1538.
## 24 theta2.4209 1382.
## 25 theta2.4268 1666.
## 26 theta2.4841 1374.
## 27 theta2.5074 1170.
## 28 theta2.5495 1243.
## # ℹ 16 more rows
See companion page for plots of 2d model “PWC 2d MCMCpack CDAT”
seeds <- c(4610, 5712, 6814) # Set seeds for each chain
chains2dDP <- vector("list", n_chains)
for (i in 1:n_chains) {
chains2dDP[[i]] <- MCMCpaircompare2dDP(
pairDataCDAT,
# Main sampling chain defined:
ttheta.constraints=list(
'5074' = list(1, "+"),
'5074' = list(2, "-"),
'5495' = list(1, "-"),
'5495' = list(2, "+")
),
verbose=10000,
burnin=400,
mcmc=40000,
thin=20,
seed=seeds[i], # Ensures reproducibility of results.
gamma.start=gamma_start, # Uses the initialized respondent perception values.
theta.start=theta_start2d,
store.theta=TRUE, # Saves item-specfic draws.
store.gamma=TRUE, # Saves both rater-specific draws.
tune=.18, # Sets the tuning parameter for the proposals
procrustes=FALSE, # Disables Procrustes alignment
# Cluster MCMC at each stored iteration of the main chain:
alpha.start=1, # Sets the Dirichlet Process concentration parameter to start only if not fixed.
cluster.max=6, # Allows up to n clusters of respondents.
cluster.mcmc=100, # Runs __ iterations of MCMC for clustering at each stored iteration of main chain
alpha.fixed=FALSE, # Allows the concentration parameter to be learned from the data.
a0=.5, b0=2 # Hyperparameters for the Dirichlet Process prior for cluster sampling.
# To encourage more clustering of raters’ gamma values set the DP concentration parameter α to be low.
# Since α∼Gamma(a0,b0) choose hyperparameters a0 and b0 that favor small values of α.
# Strong clustering: a0 = .5, b0=2
# Balance clustering: a0 = 1, b0=1
# Minimal clustering: a0 = 5, b0=1
)
}
##
##
## MCMCpaircompare2dDP iteration 1 of 40400
##
##
## MCMCpaircompare2dDP iteration 10001 of 40400
##
##
## MCMCpaircompare2dDP iteration 20001 of 40400
##
##
## MCMCpaircompare2dDP iteration 30001 of 40400
##
##
## MCMCpaircompare2dDP iteration 40001 of 40400
##
##
## MCMCpaircompare2dDP iteration 1 of 40400
##
##
## MCMCpaircompare2dDP iteration 10001 of 40400
##
##
## MCMCpaircompare2dDP iteration 20001 of 40400
##
##
## MCMCpaircompare2dDP iteration 30001 of 40400
##
##
## MCMCpaircompare2dDP iteration 40001 of 40400
##
##
## MCMCpaircompare2dDP iteration 1 of 40400
##
##
## MCMCpaircompare2dDP iteration 10001 of 40400
##
##
## MCMCpaircompare2dDP iteration 20001 of 40400
##
##
## MCMCpaircompare2dDP iteration 30001 of 40400
##
##
## MCMCpaircompare2dDP iteration 40001 of 40400
# Apply the mcmc() funciton to each chain and combine into one mcmc_list2dDP
mcmc_list2dDP <- mcmc.list(lapply(chains2dDP, mcmc))
# Save object
save(mcmc_list2dDP, file = "Out/mcmc_list2dDP.Rdata")
# View the medians of posterior cluster assignments
summary(mcmc_list2dDP)$quantiles[33:56, 3]
## gamma.25 gamma.26 gamma.27 gamma.28 gamma.29 gamma.30 gamma.31
## 0.8726110 0.8937963 0.8951321 0.4653028 0.9037670 0.3582067 0.1065133
## gamma.34 gamma.35 gamma.36 gamma.37 gamma.38 cluster.25 cluster.26
## 0.9042405 1.4476224 1.4457542 1.4465269 1.4574008 2.0000000 2.0000000
## cluster.27 cluster.28 cluster.29 cluster.30 cluster.31 cluster.34 cluster.35
## 2.0000000 3.0000000 2.0000000 3.0000000 4.0000000 2.0000000 2.0000000
## cluster.36 cluster.37 cluster.38
## 2.0000000 2.0000000 2.0000000
Cluster assignments should be examined with the Maximum A Posteriori (MAP) Estimate or mode for each rater, rather than median or mean.
# Combine chains into one matrix
mcmc_matrix <- do.call(rbind, mcmc_list2dDP)
# Extract cluster columns and sort by rater ID
cluster_cols <- grep("^cluster\\.", colnames(mcmc_matrix), value = TRUE)
cluster_cols <- cluster_cols[order(as.numeric(sub("cluster\\.", "", cluster_cols)))]
# Define function to compute mode
get_mode <- function(x) {
as.numeric(names(which.max(table(x))))
}
# Compute modal cluster (MAP estimate) for each rater
modal_cluster_by_rater <- sapply(cluster_cols, function(col) {
get_mode(mcmc_matrix[, col])
})
print(modal_cluster_by_rater)
## cluster.25 cluster.26 cluster.27 cluster.28 cluster.29 cluster.30 cluster.31
## 1 1 1 3 1 3 4
## cluster.34 cluster.35 cluster.36 cluster.37 cluster.38
## 1 2 2 2 2
# Identify cluster columns
rater_ids <- cluster_cols
n_raters <- length(rater_ids)
n_samples <- nrow(mcmc_matrix)
# Initialize co-clustering matrix
co_clustering <- matrix(0, n_raters, n_raters)
colnames(co_clustering) <- rownames(co_clustering) <- sub("cluster\\.", "", rater_ids)
# Loop through each posterior draw (row of mcmc_matrix)
for (i in 1:n_samples) {
row <- mcmc_matrix[i, rater_ids]
for (j in 1:n_raters) {
for (k in 1:n_raters) {
if (row[[j]] == row[[k]]) {
co_clustering[j, k] <- co_clustering[j, k] + 1
}
}
}
}
# Normalize to get co-clustering proportions
co_clustering <- co_clustering / n_samples
library(pheatmap)
# Create cell labels with exactly 2 decimal places
co_clustering_labels <- formatC(co_clustering, digits = 2, format = "f")
# Define greyscale palette
grey_palette <- colorRampPalette(c("white", "black"))(100)
# Plot with dendrograms and formatted labels
pheatmap(co_clustering,
display_numbers = co_clustering_labels,
number_format = "%.2f",
fontsize_number = 10,
clustering_distance_rows = as.dist(1 - co_clustering),
clustering_distance_cols = as.dist(1 - co_clustering),
clustering_method = "average",
color = grey_palette,
main = "Co-Clustering Matrix (Greyscale with 2-Decimal Labels)")
mean(attr(chains2dDP[[1]], "gamma_accept_rate"), na.rm=TRUE)
## [1] 0.3646094
Gamma acceptance rates are low but acceptable with tuning proposals for this model
# For 2D-DP model
gelman_diag_2dDP <- gelman.diag(mcmc_list2dDP)
max_rhat_2dDP <- max(gelman_diag_2dDP$psrf[, "Point est."])
gelman_diag_2dDP
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## theta1.1121 1.00 1.00
## theta1.1154 1.00 1.01
## theta1.1222 1.00 1.00
## theta1.1637 1.00 1.00
## theta1.1995 1.00 1.00
## theta1.3369 1.00 1.00
## theta1.4004 1.00 1.00
## theta1.4209 1.00 1.00
## theta1.4268 1.00 1.01
## theta1.4841 1.00 1.00
## theta1.5074 1.00 1.00
## theta1.5495 1.00 1.01
## theta1.5710 1.00 1.01
## theta1.8294 1.00 1.00
## theta1.8980 1.00 1.00
## theta1.9899 1.00 1.00
## theta2.1121 1.00 1.01
## theta2.1154 1.00 1.00
## theta2.1222 1.00 1.01
## theta2.1637 1.00 1.00
## theta2.1995 1.00 1.01
## theta2.3369 1.00 1.00
## theta2.4004 1.00 1.00
## theta2.4209 1.00 1.01
## theta2.4268 1.00 1.01
## theta2.4841 1.00 1.00
## theta2.5074 1.00 1.01
## theta2.5495 1.00 1.01
## theta2.5710 1.00 1.01
## theta2.8294 1.00 1.01
## theta2.8980 1.00 1.00
## theta2.9899 1.00 1.01
## gamma.25 1.00 1.02
## gamma.26 1.01 1.02
## gamma.27 1.01 1.03
## gamma.28 1.00 1.00
## gamma.29 1.01 1.02
## gamma.30 1.00 1.02
## gamma.31 1.00 1.00
## gamma.34 1.01 1.02
## gamma.35 1.01 1.03
## gamma.36 1.01 1.03
## gamma.37 1.01 1.03
## gamma.38 1.01 1.03
## cluster.25 1.00 1.01
## cluster.26 1.00 1.01
## cluster.27 1.00 1.02
## cluster.28 1.00 1.01
## cluster.29 1.00 1.02
## cluster.30 1.00 1.00
## cluster.31 1.01 1.01
## cluster.34 1.00 1.02
## cluster.35 1.01 1.04
## cluster.36 1.01 1.04
## cluster.37 1.01 1.04
## cluster.38 1.01 1.04
## n.clusters 1.00 1.00
## alpha 1.00 1.00
##
## Multivariate psrf
##
## 1.03
All Gelman-Rubin statistics are close to 1.0 and the multivariate psrf is also within the range of acceptable.
summary2dDP <- round(
cbind(
as.data.frame(summary(mcmc_list2dDP)$statistics),
as.data.frame(summary(mcmc_list2dDP)$quantiles)
), 3)
summary2dDP$Model <- "2dDP"
save(summary2dDP, file = "Out/summary2dDP.Rdata")
summary2dDP
## Mean SD Naive SE Time-series SE 2.5% 25% 50% 75%
## theta1.1121 1.227 0.422 0.005 0.008 0.442 0.938 1.212 1.503
## theta1.1154 0.053 0.406 0.005 0.007 -0.724 -0.233 0.049 0.321
## theta1.1222 0.676 0.361 0.005 0.006 0.000 0.431 0.665 0.907
## theta1.1637 0.590 0.364 0.005 0.006 -0.111 0.342 0.581 0.832
## theta1.1995 0.862 0.433 0.006 0.008 0.060 0.562 0.843 1.142
## theta1.3369 1.539 0.451 0.006 0.007 0.654 1.234 1.538 1.839
## theta1.4004 -0.857 0.412 0.005 0.007 -1.698 -1.132 -0.847 -0.571
## theta1.4209 -1.300 0.415 0.005 0.007 -2.142 -1.575 -1.285 -1.015
## theta1.4268 -0.902 0.431 0.006 0.007 -1.769 -1.188 -0.885 -0.598
## theta1.4841 -0.894 0.387 0.005 0.006 -1.656 -1.152 -0.889 -0.630
## theta1.5074 1.222 0.404 0.005 0.007 0.468 0.943 1.208 1.483
## theta1.5495 -2.279 0.465 0.006 0.008 -3.229 -2.590 -2.263 -1.952
## theta1.5710 -0.807 0.400 0.005 0.006 -1.569 -1.081 -0.805 -0.538
## theta1.8294 -0.154 0.368 0.005 0.006 -0.884 -0.397 -0.152 0.095
## theta1.8980 0.655 0.389 0.005 0.006 -0.124 0.403 0.652 0.914
## theta1.9899 0.385 0.381 0.005 0.006 -0.368 0.134 0.380 0.643
## theta2.1121 -1.119 0.360 0.005 0.007 -1.848 -1.357 -1.107 -0.874
## theta2.1154 -1.546 0.360 0.005 0.007 -2.261 -1.783 -1.538 -1.305
## theta2.1222 -0.188 0.331 0.004 0.007 -0.852 -0.406 -0.185 0.039
## theta2.1637 -0.558 0.330 0.004 0.006 -1.221 -0.771 -0.561 -0.334
## theta2.1995 -1.173 0.356 0.005 0.007 -1.892 -1.407 -1.164 -0.930
## theta2.3369 1.270 0.401 0.005 0.007 0.468 1.005 1.272 1.538
## theta2.4004 1.215 0.356 0.005 0.007 0.538 0.970 1.208 1.451
## theta2.4209 0.713 0.357 0.005 0.007 0.021 0.471 0.709 0.942
## theta2.4268 1.028 0.354 0.005 0.006 0.345 0.786 1.023 1.268
## theta2.4841 -0.324 0.341 0.004 0.007 -0.994 -0.553 -0.325 -0.099
## theta2.5074 -1.035 0.365 0.005 0.007 -1.788 -1.276 -1.024 -0.781
## theta2.5495 0.455 0.409 0.005 0.008 -0.274 0.170 0.439 0.719
## theta2.5710 -0.936 0.357 0.005 0.007 -1.629 -1.174 -0.939 -0.702
## theta2.8294 0.747 0.333 0.004 0.007 0.075 0.523 0.747 0.966
## theta2.8980 0.885 0.341 0.004 0.007 0.208 0.659 0.887 1.110
## theta2.9899 0.549 0.336 0.004 0.006 -0.115 0.322 0.558 0.774
## gamma.25 0.859 0.153 0.002 0.003 0.527 0.760 0.873 0.970
## gamma.26 0.887 0.137 0.002 0.003 0.601 0.795 0.894 0.986
## gamma.27 0.888 0.137 0.002 0.003 0.602 0.795 0.895 0.987
## gamma.28 0.455 0.200 0.003 0.004 0.050 0.329 0.465 0.590
## gamma.29 0.897 0.139 0.002 0.003 0.612 0.803 0.904 0.996
## gamma.30 0.349 0.207 0.003 0.004 0.018 0.166 0.358 0.508
## gamma.31 0.129 0.100 0.001 0.001 0.005 0.050 0.107 0.183
## gamma.34 0.900 0.144 0.002 0.003 0.614 0.803 0.904 0.997
## gamma.35 1.424 0.115 0.001 0.002 1.137 1.365 1.448 1.514
## gamma.36 1.423 0.112 0.001 0.002 1.146 1.361 1.446 1.511
## gamma.37 1.423 0.113 0.001 0.002 1.138 1.361 1.447 1.512
## gamma.38 1.437 0.102 0.001 0.002 1.186 1.380 1.457 1.518
## cluster.25 2.514 1.516 0.020 0.037 1.000 1.000 2.000 3.000
## cluster.26 2.432 1.470 0.019 0.039 1.000 1.000 2.000 3.000
## cluster.27 2.450 1.480 0.019 0.038 1.000 1.000 2.000 3.000
## cluster.28 3.299 1.593 0.021 0.027 1.000 2.000 3.000 5.000
## cluster.29 2.475 1.491 0.019 0.037 1.000 1.000 2.000 3.000
## cluster.30 3.258 1.570 0.020 0.029 1.000 2.000 3.000 4.000
## cluster.31 3.579 1.604 0.021 0.030 1.000 2.000 4.000 5.000
## cluster.34 2.472 1.493 0.019 0.038 1.000 1.000 2.000 3.000
## cluster.35 2.675 1.501 0.019 0.039 1.000 1.000 2.000 4.000
## cluster.36 2.670 1.499 0.019 0.039 1.000 1.000 2.000 4.000
## cluster.37 2.673 1.502 0.019 0.037 1.000 1.000 2.000 4.000
## cluster.38 2.680 1.506 0.019 0.038 1.000 1.000 2.000 4.000
## n.clusters 4.476 0.782 0.010 0.012 3.000 4.000 4.000 5.000
## alpha 1.060 0.605 0.008 0.009 0.247 0.618 0.936 1.372
## 97.5% Model
## theta1.1121 2.093 2dDP
## theta1.1154 0.882 2dDP
## theta1.1222 1.402 2dDP
## theta1.1637 1.316 2dDP
## theta1.1995 1.746 2dDP
## theta1.3369 2.422 2dDP
## theta1.4004 -0.090 2dDP
## theta1.4209 -0.512 2dDP
## theta1.4268 -0.100 2dDP
## theta1.4841 -0.134 2dDP
## theta1.5074 2.048 2dDP
## theta1.5495 -1.404 2dDP
## theta1.5710 -0.014 2dDP
## theta1.8294 0.557 2dDP
## theta1.8980 1.417 2dDP
## theta1.9899 1.134 2dDP
## theta2.1121 -0.443 2dDP
## theta2.1154 -0.850 2dDP
## theta2.1222 0.459 2dDP
## theta2.1637 0.075 2dDP
## theta2.1995 -0.499 2dDP
## theta2.3369 2.038 2dDP
## theta2.4004 1.927 2dDP
## theta2.4209 1.443 2dDP
## theta2.4268 1.726 2dDP
## theta2.4841 0.354 2dDP
## theta2.5074 -0.352 2dDP
## theta2.5495 1.320 2dDP
## theta2.5710 -0.216 2dDP
## theta2.8294 1.408 2dDP
## theta2.8980 1.553 2dDP
## theta2.9899 1.213 2dDP
## gamma.25 1.115 2dDP
## gamma.26 1.130 2dDP
## gamma.27 1.131 2dDP
## gamma.28 0.842 2dDP
## gamma.29 1.143 2dDP
## gamma.30 0.732 2dDP
## gamma.31 0.372 2dDP
## gamma.34 1.158 2dDP
## gamma.35 1.564 2dDP
## gamma.36 1.565 2dDP
## gamma.37 1.564 2dDP
## gamma.38 1.565 2dDP
## cluster.25 6.000 2dDP
## cluster.26 6.000 2dDP
## cluster.27 6.000 2dDP
## cluster.28 6.000 2dDP
## cluster.29 6.000 2dDP
## cluster.30 6.000 2dDP
## cluster.31 6.000 2dDP
## cluster.34 6.000 2dDP
## cluster.35 6.000 2dDP
## cluster.36 6.000 2dDP
## cluster.37 6.000 2dDP
## cluster.38 6.000 2dDP
## n.clusters 6.000 2dDP
## alpha 2.560 2dDP
With a max of 6 clusters and 100 iterations for cluster at each stored iteration if the main MCMC chain, with a0=.5 and b0=2 strong clustering, the co-clustering matrix with dendrogram matches the visual plot of gamma medians precisely.
effectiveSize(mcmc_list2dDP)
## theta1.1121 theta1.1154 theta1.1222 theta1.1637 theta1.1995 theta1.3369
## 3021.321 3363.001 3902.996 3610.704 3039.552 4777.772
## theta1.4004 theta1.4209 theta1.4268 theta1.4841 theta1.5074 theta1.5495
## 3454.964 3531.749 3393.431 4092.856 3282.214 3691.647
## theta1.5710 theta1.8294 theta1.8980 theta1.9899 theta2.1121 theta2.1154
## 4232.548 3942.342 4048.296 3747.791 2784.995 3048.856
## theta2.1222 theta2.1637 theta2.1995 theta2.3369 theta2.4004 theta2.4209
## 2525.693 2738.471 3002.380 3017.804 2559.721 2724.787
## theta2.4268 theta2.4841 theta2.5074 theta2.5495 theta2.5710 theta2.8294
## 3066.203 2745.997 2650.245 2499.173 2882.105 2478.259
## theta2.8980 theta2.9899 gamma.25 gamma.26 gamma.27 gamma.28
## 2741.557 2871.350 2265.186 2112.832 2077.127 2717.815
## gamma.29 gamma.30 gamma.31 gamma.34 gamma.35 gamma.36
## 2007.122 2465.693 5653.283 2172.651 2754.706 2752.154
## gamma.37 gamma.38 cluster.25 cluster.26 cluster.27 cluster.28
## 2638.008 2794.730 1768.676 1487.658 1561.179 3541.323
## cluster.29 cluster.30 cluster.31 cluster.34 cluster.35 cluster.36
## 1629.082 3017.058 2928.989 1561.633 1506.192 1510.300
## cluster.37 cluster.38 n.clusters alpha
## 1682.182 1537.260 4507.174 4940.833
print(summarize_draws(mcmc_list2dDP)[, c("variable", "ess_bulk")], n=58)
## # A tibble: 58 × 2
## variable ess_bulk
## <chr> <dbl>
## 1 theta1.1121 2919.
## 2 theta1.1154 3203.
## 3 theta1.1222 3612.
## 4 theta1.1637 3437.
## 5 theta1.1995 3015.
## 6 theta1.3369 4600.
## 7 theta1.4004 3369.
## 8 theta1.4209 3587.
## 9 theta1.4268 3456.
## 10 theta1.4841 3929.
## 11 theta1.5074 3333.
## 12 theta1.5495 3528.
## 13 theta1.5710 4184.
## 14 theta1.8294 4005.
## 15 theta1.8980 3904.
## 16 theta1.9899 3678.
## 17 theta2.1121 2806.
## 18 theta2.1154 3066.
## 19 theta2.1222 2549.
## 20 theta2.1637 2580.
## 21 theta2.1995 2738.
## 22 theta2.3369 2950.
## 23 theta2.4004 2710.
## 24 theta2.4209 2678.
## 25 theta2.4268 2722.
## 26 theta2.4841 2672.
## 27 theta2.5074 2647.
## 28 theta2.5495 2626.
## 29 theta2.5710 2808.
## 30 theta2.8294 2614.
## 31 theta2.8980 2589.
## 32 theta2.9899 2712.
## 33 gamma.25 2289.
## 34 gamma.26 2009.
## 35 gamma.27 1974.
## 36 gamma.28 2678.
## 37 gamma.29 2047.
## 38 gamma.30 2552.
## 39 gamma.31 5534.
## 40 gamma.34 2058.
## 41 gamma.35 3091.
## 42 gamma.36 3041.
## 43 gamma.37 2944.
## 44 gamma.38 3161.
## 45 cluster.25 1332.
## 46 cluster.26 1206.
## 47 cluster.27 1173.
## 48 cluster.28 3275.
## 49 cluster.29 1196.
## 50 cluster.30 2963.
## 51 cluster.31 2534.
## 52 cluster.34 1229.
## 53 cluster.35 1147.
## 54 cluster.36 1198.
## 55 cluster.37 1187.
## 56 cluster.38 1185.
## 57 n.clusters 4306.
## 58 alpha 4806.
Chain length increased to sufficient ESS above 1000 for all variables.
See companion page for plots of 2dDP model “PWC 2dDP MCMCpack CDAT”
From Levy & Mislevy (2016) Ch. 10.2.4 & 10.2.5 Posterior Predictive Model Checking
The common feature across the 2-dimensional models are (a) posterior thetas of each item’s location in 2-dimensional latent space and precision of that distribution; and (b) the posterior gammas of each rater.
The common feature of all models is the choice of one item over another based on th model-implied latent traits.
The prediction of choices based on model-implied latent variables should be close obserbed choices.
The selection of the left item (j1) over the second (j2) is the test statistic to be examined with Posterior Predictive Model Checking.
(See Levy & Mislevy, 2016, pp. 237-239).
First, a function that will - combine all chains into one list of samples - iterate 200 sets of simulated choices based on comparisons that match the rater and theta IDs of the observed comparisons - for each iteration draw one value from all chains of \(\theta_{1}\), \(\theta_{2}\), and \(\gamma\). - for each comparison of the 960 observations, extract the two items and the rater for a simulated choice. - find the difference between the two items’ thetas according to the model - add the two thetas in proportion to the gamma for the variable \(\eta\) - use eta as input to the probit link function to find the probability of choosing j1 over j2 - make a random draw of 1 or 0 governed by the binomial distribution and probability computed from eta - add the 1 or 0 to a matrix of choices of j1 over j2 - return the simulated choices for the 960 comparisons 200 times
simulate_pp_data_mcmclist2d <- function(mcmc_list, pair_data) {
# Combine chains into one matrix
all_samples <- do.call(rbind, mcmc_list)
n_obs <- nrow(pair_data)
n_rep = 200
sim_choices <- matrix(NA, nrow = n_rep, ncol = n_obs)
# Identify column names for theta1, theta2, gamma
theta1_names <- grep("^theta1\\.", colnames(all_samples), value = TRUE)
theta2_names <- grep("^theta2\\.", colnames(all_samples), value = TRUE)
gamma_names <- grep("^gamma\\.", colnames(all_samples), value = TRUE)
for (r in 1:n_rep) { # Row by row iteration
s <- sample(1:nrow(all_samples), 1) # Draw from any chain
theta1 <- all_samples[s, theta1_names] # Draw one value for theta1
theta2 <- all_samples[s, theta2_names] # Draw one value for theta2
gamma <- all_samples[s, gamma_names] # Draw one value for gamma
for (i in 1:n_obs) { # Column by column iteration
j1 <- as.character(pair_data$itemj1[i]) # Locate the first item number for that row from the observed
j2 <- as.character(pair_data$itemj2[i]) # Locate the second item number for that row from the observed
rater <- as.character(pair_data$rater[i]) # Locate the rater for that row from the observed
# Find the model-implied diff between sampled theta1 of 2 items
diff1 <- theta1[paste0("theta1.", j1)] - theta1[paste0("theta1.", j2)]
# Find the model-implied diff between sampled theta2 of 2 items
diff2 <- theta2[paste0("theta2.", j1)] - theta2[paste0("theta2.", j2)]
g <- gamma[paste0("gamma.", rater)] # Find the model-implied sampled gamma of the rater
# eta will be the input to the probit link function
# cos() & sin() assume angles are in radians as they are
zi1 <- cos(g)
zi2 <- sin(g)
# Add thetas in proportion to sine or cosine of gamma, representing how much more favorable j1 is than j2
eta <- zi1 * diff1 + zi2 * diff2
# If eta is negative, then j2 is preferrable. If close to 0, then it's a toss-up.
p <- pnorm(eta) # transform eta using the probit link to the probability j1 is chosen over j2.
# If p is <.5 preference for j2, p = .5 no preference, p > .5 preference for j1.
# Generate 1 draw of 0 or 1 based on p, and add it to the sim_choices matrix
sim_choices[r, i] <- rbinom(n = 1, size = 1, p)
}
}
# return a set of binary choices where 1 is a choice of j1 and 0 is a choice of j2
return(sim_choices)
}
Run the function to simulate 200 sets of observed choices for each 2d model Each data set is a row in a matrix
First simulated data sets for the 2d without clusters
sim_2d <- simulate_pp_data_mcmclist2d(mcmc_list2d, pairDataCDAT)
Next simulate data sets for the 2d with clusters
sim_2dDP <- simulate_pp_data_mcmclist2d(mcmc_list2dDP, pairDataCDAT)
Convert observed item ID choices to 1/0 for comparison to simulated data. If itemj1 is chosen, then 1, if j2 is chosen, then 0.
#If choice is first item, 1, if choice is second item, 0
observed_binary <- ifelse(pairDataCDAT$choice == pairDataCDAT$itemj1, 1,
ifelse(pairDataCDAT$choice == pairDataCDAT$itemj2, 0, NA))
A function to compute the p-value of observed and simulated choices - given the binary version of the observed data, find the mean (proportion of j1 choices) for each simulated data set - compute the proportion of simulated data sets that have greater than or equal to the proportion of j1 choices in observed
compute_ppp_proportion <- function(observed_choices, sim_choices_matrix) {
obs_prop <- mean(observed_choices)
# The mean of the row is the proportion of same choice for that simulated data set
sim_prop <- rowMeans(sim_choices_matrix)
sim_correct <- rowMeans(sim_choices_matrix == matrix(observed_choices,
nrow = nrow(sim_choices_matrix),
ncol = length(observed_choices),
byrow = TRUE))
# Posterior predictive p-value
ppp <- mean(sim_prop >= obs_prop)
list(
observed = obs_prop,
simulated = sim_prop,
simulated_correct = sim_correct,
ppp = ppp
)
}
Compute Posterior Predictive P-value (PPP) for each 2d model
ppp2d <- compute_ppp_proportion(observed_binary, sim_2d)
ppp2d
## $observed
## [1] 0.5152198
##
## $simulated
## [1] 0.5197294 0.5107103 0.5129651 0.5028185 0.5298760 0.5140924 0.4904171
## [8] 0.5242390 0.5163472 0.5107103 0.5016911 0.4836528 0.5129651 0.5332582
## [15] 0.5163472 0.5062007 0.5231116 0.4813980 0.5016911 0.5490417 0.4994363
## [22] 0.5264938 0.4960541 0.5095829 0.5039459 0.5028185 0.4847802 0.5231116
## [29] 0.4915445 0.5073281 0.5084555 0.5062007 0.5062007 0.5028185 0.5084555
## [36] 0.5084555 0.4825254 0.4904171 0.5422773 0.5219842 0.5062007 0.4937993
## [43] 0.5219842 0.4723788 0.4915445 0.5118377 0.4712514 0.5062007 0.5118377
## [50] 0.5163472 0.5095829 0.4904171 0.4813980 0.4926719 0.5062007 0.4847802
## [57] 0.5152198 0.5231116 0.4971815 0.5050733 0.5253664 0.5208568 0.5095829
## [64] 0.5073281 0.4971815 0.5174746 0.5016911 0.4859076 0.4926719 0.5129651
## [71] 0.4983089 0.4937993 0.4983089 0.4678692 0.5084555 0.5062007 0.4994363
## [78] 0.5231116 0.5107103 0.5219842 0.5186020 0.5062007 0.5107103 0.4983089
## [85] 0.5343856 0.5005637 0.5287486 0.5107103 0.5062007 0.5118377 0.5005637
## [92] 0.5107103 0.5140924 0.4983089 0.5174746 0.5264938 0.5343856 0.5084555
## [99] 0.5050733 0.5028185 0.5107103 0.5388952 0.5242390 0.4881623 0.5016911
## [106] 0.5095829 0.5039459 0.5174746 0.5174746 0.5107103 0.4960541 0.5005637
## [113] 0.5050733 0.5163472 0.5028185 0.5016911 0.4791432 0.5073281 0.4949267
## [120] 0.4960541 0.4971815 0.5107103 0.5039459 0.4926719 0.4983089 0.5208568
## [127] 0.5287486 0.4904171 0.5129651 0.5434047 0.5186020 0.5095829 0.4983089
## [134] 0.5062007 0.4881623 0.5310034 0.5062007 0.5073281 0.5039459 0.5163472
## [141] 0.4847802 0.5129651 0.5129651 0.4915445 0.4949267 0.4780158 0.5039459
## [148] 0.4735062 0.5287486 0.5095829 0.4949267 0.5095829 0.5039459 0.5050733
## [155] 0.5434047 0.4802706 0.5186020 0.5152198 0.5039459 0.5219842 0.5366404
## [162] 0.4836528 0.4813980 0.5152198 0.4971815 0.4689966 0.4960541 0.4937993
## [169] 0.4791432 0.5062007 0.5242390 0.4994363 0.5005637 0.4791432 0.5084555
## [176] 0.4994363 0.5005637 0.5084555 0.5016911 0.5186020 0.4847802 0.4836528
## [183] 0.5050733 0.4937993 0.5253664 0.5118377 0.4937993 0.5073281 0.5084555
## [190] 0.4847802 0.4881623 0.4780158 0.4904171 0.4892897 0.5062007 0.4937993
## [197] 0.4802706 0.4949267 0.4915445 0.5005637
##
## $simulated_correct
## [1] 0.6527621 0.6843292 0.6865840 0.6561443 0.6922210 0.6944758 0.6753100
## [8] 0.6685457 0.6967306 0.6888388 0.6685457 0.6572717 0.6730552 0.6775648
## [15] 0.6448703 0.6820744 0.6764374 0.6617813 0.6888388 0.6640361 0.6933484
## [22] 0.6843292 0.6674183 0.6629087 0.6527621 0.6651635 0.6944758 0.6809470
## [29] 0.6967306 0.7012401 0.6910936 0.6595265 0.6640361 0.6809470 0.6595265
## [36] 0.6820744 0.7080045 0.6843292 0.6843292 0.7023675 0.6775648 0.6741826
## [43] 0.6775648 0.6843292 0.6809470 0.6989853 0.6809470 0.6753100 0.6877114
## [50] 0.6809470 0.6403608 0.6775648 0.6933484 0.6910936 0.6482525 0.6786922
## [57] 0.6865840 0.6583991 0.6572717 0.6516347 0.6719278 0.6764374 0.6899662
## [64] 0.6696731 0.6685457 0.6910936 0.6685457 0.6640361 0.6775648 0.6595265
## [71] 0.6967306 0.6741826 0.6674183 0.6775648 0.6662909 0.6685457 0.6910936
## [78] 0.6696731 0.6572717 0.6753100 0.7170237 0.6550169 0.6572717 0.6719278
## [85] 0.6719278 0.6764374 0.6888388 0.6730552 0.6662909 0.6786922 0.6944758
## [92] 0.6843292 0.6832018 0.6764374 0.6753100 0.6708005 0.6741826 0.6820744
## [99] 0.6877114 0.6764374 0.6933484 0.7057497 0.6459977 0.6572717 0.6708005
## [106] 0.6561443 0.7001127 0.6933484 0.6640361 0.6978579 0.6719278 0.6674183
## [113] 0.6629087 0.6741826 0.6741826 0.6482525 0.6414882 0.7034949 0.6572717
## [120] 0.6809470 0.6888388 0.6685457 0.6685457 0.6572717 0.7012401 0.7080045
## [127] 0.7113867 0.6730552 0.6933484 0.6651635 0.6651635 0.6741826 0.6629087
## [134] 0.6617813 0.6888388 0.6978579 0.6617813 0.6696731 0.6775648 0.6583991
## [141] 0.6516347 0.6753100 0.7001127 0.6809470 0.6910936 0.6741826 0.6798196
## [148] 0.6629087 0.6798196 0.6696731 0.6708005 0.6516347 0.6595265 0.6854566
## [155] 0.6448703 0.6899662 0.6561443 0.6978579 0.6595265 0.7001127 0.6786922
## [162] 0.6888388 0.6572717 0.6550169 0.6888388 0.6741826 0.6922210 0.6696731
## [169] 0.6437430 0.6775648 0.6865840 0.6550169 0.6854566 0.6775648 0.6595265
## [176] 0.6617813 0.6877114 0.6685457 0.6820744 0.6877114 0.6583991 0.6708005
## [183] 0.6629087 0.6561443 0.6786922 0.6944758 0.6674183 0.6741826 0.6414882
## [190] 0.6516347 0.6730552 0.6561443 0.6753100 0.6516347 0.6978579 0.6809470
## [197] 0.6629087 0.6730552 0.6944758 0.6741826
##
## $ppp
## [1] 0.24
summary(ppp2d$simulated_correct)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6404 0.6629 0.6747 0.6751 0.6869 0.7170
ppp2dDP <- compute_ppp_proportion(observed_binary, sim_2dDP)
ppp2dDP
## $observed
## [1] 0.5152198
##
## $simulated
## [1] 0.5129651 0.5219842 0.5129651 0.5219842 0.5016911 0.5062007 0.5129651
## [8] 0.5231116 0.5062007 0.5107103 0.5095829 0.5107103 0.5208568 0.5253664
## [15] 0.5016911 0.5174746 0.5219842 0.4847802 0.5118377 0.5016911 0.5118377
## [22] 0.5140924 0.5186020 0.5355130 0.5152198 0.5028185 0.4971815 0.5107103
## [29] 0.5152198 0.5174746 0.5163472 0.4904171 0.4983089 0.5140924 0.4881623
## [36] 0.5118377 0.5039459 0.4892897 0.4735062 0.5174746 0.5005637 0.5242390
## [43] 0.5332582 0.5163472 0.4994363 0.5129651 0.5152198 0.5095829 0.4836528
## [50] 0.4949267 0.4994363 0.5163472 0.5129651 0.5084555 0.4757610 0.5039459
## [57] 0.5062007 0.5422773 0.5118377 0.4926719 0.4881623 0.5028185 0.5163472
## [64] 0.5118377 0.5129651 0.4994363 0.5005637 0.5197294 0.5197294 0.4881623
## [71] 0.5084555 0.5129651 0.5140924 0.5050733 0.4870349 0.4971815 0.5174746
## [78] 0.4667418 0.4937993 0.5084555 0.4870349 0.5016911 0.5107103 0.5400225
## [85] 0.5062007 0.5039459 0.4859076 0.5310034 0.5005637 0.5095829 0.4960541
## [92] 0.5005637 0.4994363 0.5163472 0.5016911 0.4892897 0.4983089 0.4949267
## [99] 0.5152198 0.5118377 0.4723788 0.5039459 0.4994363 0.5287486 0.5129651
## [106] 0.4802706 0.5152198 0.4960541 0.4949267 0.5039459 0.4847802 0.5050733
## [113] 0.4701240 0.5073281 0.5129651 0.5050733 0.4915445 0.4994363 0.4870349
## [120] 0.5073281 0.4735062 0.4994363 0.4994363 0.5005637 0.5095829 0.5140924
## [127] 0.5084555 0.5253664 0.5107103 0.5242390 0.5129651 0.4926719 0.5118377
## [134] 0.5028185 0.5005637 0.5028185 0.5321308 0.4926719 0.4949267 0.5084555
## [141] 0.5095829 0.4937993 0.5016911 0.5174746 0.5107103 0.5186020 0.5084555
## [148] 0.5016911 0.4994363 0.4960541 0.5039459 0.5062007 0.5140924 0.5107103
## [155] 0.5084555 0.4915445 0.4892897 0.5050733 0.5016911 0.5016911 0.4960541
## [162] 0.5264938 0.5073281 0.5107103 0.5242390 0.5016911 0.4813980 0.5242390
## [169] 0.4881623 0.4825254 0.5039459 0.4870349 0.5118377 0.5174746 0.5186020
## [176] 0.5107103 0.4926719 0.5016911 0.5107103 0.4915445 0.5073281 0.4881623
## [183] 0.5321308 0.4994363 0.5129651 0.5242390 0.4689966 0.5016911 0.5028185
## [190] 0.5062007 0.5118377 0.5118377 0.4881623 0.4960541 0.4678692 0.4960541
## [197] 0.4949267 0.5062007 0.4937993 0.5062007
##
## $simulated_correct
## [1] 0.6595265 0.6798196 0.6843292 0.6820744 0.6640361 0.6662909 0.7001127
## [8] 0.7057497 0.6662909 0.6640361 0.6583991 0.6437430 0.6764374 0.6899662
## [15] 0.6843292 0.6685457 0.6820744 0.6967306 0.6786922 0.6595265 0.6471251
## [22] 0.6764374 0.6832018 0.6662909 0.6933484 0.6944758 0.6753100 0.6753100
## [29] 0.6933484 0.6843292 0.6651635 0.7091319 0.6809470 0.6877114 0.7068771
## [36] 0.6674183 0.6956032 0.6674183 0.6854566 0.7023675 0.6651635 0.6753100
## [43] 0.6843292 0.6516347 0.6798196 0.6888388 0.6820744 0.6606539 0.6888388
## [50] 0.6910936 0.6617813 0.6854566 0.6482525 0.6798196 0.6719278 0.6640361
## [57] 0.6685457 0.6640361 0.6651635 0.6753100 0.6933484 0.6493799 0.6967306
## [64] 0.6786922 0.6956032 0.6662909 0.6809470 0.6662909 0.6888388 0.6753100
## [71] 0.6843292 0.6708005 0.6583991 0.6583991 0.6832018 0.6753100 0.6640361
## [78] 0.6899662 0.6629087 0.7068771 0.6809470 0.6685457 0.6708005 0.6708005
## [85] 0.6708005 0.6459977 0.6595265 0.6820744 0.6877114 0.6809470 0.6922210
## [92] 0.6606539 0.6753100 0.6809470 0.6640361 0.6786922 0.6651635 0.6753100
## [99] 0.6550169 0.6877114 0.6708005 0.6617813 0.6550169 0.6753100 0.6640361
## [106] 0.6832018 0.6775648 0.6381060 0.6662909 0.6753100 0.6922210 0.6989853
## [113] 0.7023675 0.6741826 0.6595265 0.6583991 0.6786922 0.6753100 0.6944758
## [120] 0.7012401 0.6899662 0.6865840 0.6595265 0.6696731 0.6448703 0.6696731
## [127] 0.6685457 0.6516347 0.6798196 0.6617813 0.6933484 0.6888388 0.6426156
## [134] 0.6674183 0.6809470 0.6719278 0.6854566 0.7023675 0.6708005 0.6505073
## [141] 0.6719278 0.7034949 0.6843292 0.6572717 0.6662909 0.6719278 0.6820744
## [148] 0.6843292 0.7068771 0.6832018 0.6662909 0.6933484 0.6606539 0.6888388
## [155] 0.6775648 0.7057497 0.6786922 0.6899662 0.6640361 0.6933484 0.6741826
## [162] 0.7001127 0.6944758 0.6753100 0.6662909 0.6685457 0.6910936 0.6550169
## [169] 0.6617813 0.6606539 0.6708005 0.6989853 0.6719278 0.6730552 0.6809470
## [176] 0.6798196 0.6324690 0.6662909 0.6933484 0.6786922 0.6786922 0.6798196
## [183] 0.6786922 0.6685457 0.6775648 0.6888388 0.6944758 0.6978579 0.6899662
## [190] 0.6775648 0.6651635 0.6606539 0.6572717 0.6809470 0.6595265 0.6786922
## [197] 0.6888388 0.6617813 0.6741826 0.6843292
##
## $ppp
## [1] 0.21
summary(ppp2dDP$simulated_correct)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6325 0.6652 0.6759 0.6761 0.6869 0.7091
Let’s check the prediction of choices using the 1d model next. For each posterior sample of \(\theta\) and \(\alpha\), simulate responses.
A function to simulate responses using 1d model
simulate_pp_data_mcmclist1d <- function(mcmc_list, pair_data) {
# Combine chains into one matrix
all_samples <- do.call(rbind, mcmc_list)
n_obs <- nrow(pair_data)
n_rep = 200
sim_choices <- matrix(NA, nrow = n_rep, ncol = n_obs)
# Identify column names for theta1, theta2, gamma
theta_names <- grep("^theta\\.", colnames(all_samples), value = TRUE)
alpha_names <- grep("^alpha\\.", colnames(all_samples), value = TRUE)
for (r in 1:n_rep) {
s <- sample(1:nrow(all_samples), 1) # Select one draw from any chain in combined samples
theta <- all_samples[s, theta_names] # Draw one value for theta
alpha <- all_samples[s, alpha_names] # Draw one value for alpha
for (i in 1:n_obs) {
j1 <- as.character(pair_data$itemj1[i])
j2 <- as.character(pair_data$itemj2[i])
rater <- as.character(pair_data$rater[i])
# Safely extract theta/gamma values using column names
diff <- theta[paste0("theta.", j1)] - theta[paste0("theta.", j2)]
a <- alpha[paste0("alpha.", rater)]
eta <- a * diff # Add thetas
p <- pnorm(eta) # probit link is a normal distribution
sim_choices[r, i] <- rbinom(1, 1, p)
}
}
return(sim_choices)
}
Run the function for 1d model
sim_1d <- simulate_pp_data_mcmclist1d(mcmc_list1d, pairDataCDAT)
Compute Posterior Predictive P-value (PPP) for 1d model
ppp1d <- compute_ppp_proportion(observed_binary, sim_1d)
ppp1d
## $observed
## [1] 0.5152198
##
## $simulated
## [1] 0.5264938 0.4949267 0.5107103 0.5118377 0.5084555 0.5174746 0.5050733
## [8] 0.5005637 0.4971815 0.4937993 0.5118377 0.4971815 0.5140924 0.5028185
## [15] 0.5095829 0.4937993 0.5129651 0.5298760 0.5062007 0.5107103 0.4937993
## [22] 0.5140924 0.5174746 0.5084555 0.5321308 0.5050733 0.4836528 0.5219842
## [29] 0.5084555 0.5039459 0.5186020 0.5197294 0.5028185 0.4735062 0.5197294
## [36] 0.5050733 0.5253664 0.4960541 0.5332582 0.5095829 0.5140924 0.5095829
## [43] 0.4960541 0.5095829 0.5118377 0.4937993 0.4983089 0.5118377 0.5140924
## [50] 0.4870349 0.5163472 0.5005637 0.5186020 0.5016911 0.4926719 0.4667418
## [57] 0.4960541 0.5016911 0.5050733 0.4926719 0.4870349 0.5050733 0.4926719
## [64] 0.4949267 0.5253664 0.4960541 0.5343856 0.5084555 0.5332582 0.4892897
## [71] 0.5005637 0.5084555 0.4904171 0.4937993 0.4881623 0.5186020 0.4881623
## [78] 0.4881623 0.5084555 0.4994363 0.5208568 0.4859076 0.4836528 0.5129651
## [85] 0.5219842 0.5073281 0.5253664 0.4983089 0.5073281 0.4915445 0.4983089
## [92] 0.5118377 0.4847802 0.5107103 0.4915445 0.5107103 0.5016911 0.4971815
## [99] 0.5016911 0.4825254 0.4892897 0.5332582 0.5129651 0.5129651 0.5050733
## [106] 0.4994363 0.5118377 0.5118377 0.4904171 0.4994363 0.5140924 0.5050733
## [113] 0.4712514 0.4689966 0.5095829 0.4847802 0.4937993 0.5095829 0.5050733
## [120] 0.5231116 0.5028185 0.5163472 0.5016911 0.4994363 0.4926719 0.5084555
## [127] 0.4949267 0.5095829 0.4949267 0.5208568 0.5129651 0.4847802 0.5084555
## [134] 0.4791432 0.5298760 0.4870349 0.5016911 0.4937993 0.5152198 0.4802706
## [141] 0.4780158 0.4971815 0.4937993 0.5174746 0.5095829 0.5062007 0.5118377
## [148] 0.5174746 0.4554679 0.4949267 0.5163472 0.4746336 0.5084555 0.5107103
## [155] 0.5005637 0.5084555 0.4791432 0.4859076 0.5445321 0.4813980 0.4892897
## [162] 0.4971815 0.5219842 0.5163472 0.4791432 0.4813980 0.5152198 0.5028185
## [169] 0.5197294 0.5005637 0.4892897 0.4926719 0.5197294 0.4983089 0.5050733
## [176] 0.4904171 0.5174746 0.4994363 0.5231116 0.4949267 0.4825254 0.5197294
## [183] 0.4971815 0.5186020 0.5005637 0.5253664 0.4847802 0.5140924 0.5039459
## [190] 0.5084555 0.5186020 0.4836528 0.5231116 0.5005637 0.4994363 0.4915445
## [197] 0.4904171 0.5062007 0.4825254 0.5084555
##
## $simulated_correct
## [1] 0.6211950 0.5963923 0.6211950 0.6178129 0.5896280 0.6234498 0.6448703
## [8] 0.6223224 0.6144307 0.6042841 0.5794814 0.6279594 0.6178129 0.5997745
## [15] 0.6155581 0.6133033 0.5986471 0.6178129 0.6189402 0.6054115 0.6268320
## [22] 0.6268320 0.6324690 0.6414882 0.6133033 0.6290868 0.6144307 0.6347238
## [29] 0.6166855 0.6324690 0.6290868 0.6347238 0.6087937 0.6042841 0.6369786
## [36] 0.6133033 0.6245772 0.5794814 0.6347238 0.6313416 0.6268320 0.6335964
## [43] 0.6448703 0.6200676 0.6042841 0.6313416 0.5975197 0.6133033 0.6223224
## [50] 0.6020293 0.6223224 0.5862458 0.6290868 0.6121759 0.6324690 0.6133033
## [57] 0.6223224 0.6099211 0.6200676 0.6324690 0.6065389 0.6335964 0.6211950
## [64] 0.6234498 0.6133033 0.6178129 0.6268320 0.6369786 0.5963923 0.6381060
## [71] 0.6245772 0.5986471 0.6302142 0.5885006 0.6099211 0.6223224 0.6099211
## [78] 0.5828636 0.5941375 0.6054115 0.5839910 0.5918828 0.6347238 0.6437430
## [85] 0.6302142 0.6245772 0.6155581 0.6245772 0.5907554 0.5930101 0.6133033
## [92] 0.6358512 0.6155581 0.6054115 0.6087937 0.5918828 0.5738444 0.6189402
## [99] 0.6121759 0.6155581 0.6065389 0.6166855 0.6257046 0.6257046 0.6178129
## [106] 0.6392334 0.6110485 0.6223224 0.6189402 0.5963923 0.6223224 0.5907554
## [113] 0.6245772 0.5975197 0.6133033 0.6335964 0.6178129 0.6155581 0.6290868
## [120] 0.6110485 0.5907554 0.6313416 0.6054115 0.6121759 0.6189402 0.6302142
## [127] 0.6166855 0.6223224 0.6211950 0.6065389 0.6031567 0.6110485 0.6369786
## [134] 0.6099211 0.5704622 0.6313416 0.6054115 0.6200676 0.6211950 0.5997745
## [141] 0.5952649 0.6166855 0.6426156 0.6166855 0.6155581 0.6347238 0.6268320
## [148] 0.5896280 0.6200676 0.6031567 0.6065389 0.6189402 0.5986471 0.5986471
## [155] 0.6065389 0.6234498 0.6414882 0.6054115 0.6369786 0.6279594 0.6065389
## [162] 0.6054115 0.6031567 0.6358512 0.6099211 0.6234498 0.6324690 0.5862458
## [169] 0.6009019 0.6178129 0.5975197 0.6121759 0.5986471 0.6358512 0.6178129
## [176] 0.6009019 0.6459977 0.6189402 0.5997745 0.6099211 0.6471251 0.6257046
## [183] 0.6257046 0.6087937 0.6200676 0.6155581 0.5975197 0.6133033 0.6054115
## [190] 0.6302142 0.6200676 0.5806088 0.6178129 0.5997745 0.6527621 0.6268320
## [197] 0.6121759 0.6482525 0.6245772 0.6099211
##
## $ppp
## [1] 0.21
summary(ppp1d$simulated_correct)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5705 0.6054 0.6172 0.6158 0.6268 0.6528
Plot the observed proportion of selection of itemj1 in each comparison to the distribution of simulated selection of itemj1 in each comparison.
# Density of simulated proportions
dens <- density(ppp1d$simulated)
# Plot
plot(dens, main = "Posterior Predictive Density (1D Model)",
xlab = "Simulated Proportion of itemj1 Chosen",
ylab = "Density", lwd = 2, col = "steelblue")
# Add observed value as a vertical red line
abline(v = ppp1d$observed, col = "red", lwd = 2, lty = 2)
# Add legend
legend("topright", legend = "Observed Proportion", col = "red", lty = 2, lwd = 2)
Plot the density of the 2d model simulated proportions and the observed proportion (see Levy & Mislevy, 2016, pp. 238-239)
# Density of simulated proportions
dens <- density(ppp2d$simulated)
# Plot
plot(dens, main = "Posterior Predictive Density (2D Normal Prior Model)",
xlab = "Simulated Proportion of itemj1 Chosen",
ylab = "Density", lwd = 2, col = "steelblue")
# Add observed value as a vertical red line
abline(v = ppp2d$observed, col = "red", lwd = 2, lty = 2)
# Add legend
legend("topright", legend = "Observed Proportion", col = "red", lty = 2, lwd = 2)
Plot the density of the 2dDP model simulated proportions and the observed proportion
# Density of simulated proportions
dens <- density(ppp2dDP$simulated)
# Plot
plot(dens, main = "Posterior Predictive Density (2D DP Prior Model)",
xlab = "Simulated Proportion of itemj1 Chosen",
ylab = "Density", lwd = 2, col = "steelblue")
# Add observed value as a vertical red line
abline(v = ppp2dDP$observed, col = "red", lwd = 2, lty = 2)
# Add legend
legend("topright", legend = "Observed Proportion", col = "red", lty = 2, lwd = 2)
All models have similar PPP values (~.23) indicating that the distribution of simulated choices’ proportion of j1 preference is sufficiently amassed around the observed proportion of j1 preferences, based on the rater’s model-implied gamma and the items’ model-implied theta1 and theta2 for that same combination.
This begins to suggest that both of the 2d models produce simulated data that sufficiently resemble actual choices of items by raters.
Additionally, the bulk of simulated choices of j are within the narrow range of .05 (0.48 to 0.53).
They both consistently predict the proportion of preference for itemj1, but this does not mean that the row-by-row choices are the same. For that, the percent of correct choices was added to the ppp function.
The simulated choices using the model-implied latent variables produced twice as many correct as incorrect choices for each combination of rater and paired items, and slightly better for the 2-dimensional models (0.68) than the one-dimensional mode (0.62). This suggests that the two-dimensional models are just plain better at predicting actual choices, whatever other statistics we come up with.
From Chapter 10.3 of Levy & Mislevy (2016)
The Deviance Information Criterion is defined as “the difference between the posterior mean of the deviance, \(\overline{D(\theta)}\), and the deviance evaluated at the posterior mean, \(D(\bar{\theta})\)” (p. 248). Which is appropriate when the mean of the posterior is a “reasonable point summary of the parameters” (p. 249) which is the case for these parameters. The formula includes a penalty for complexity \(p_{D}\) based on the number of parameters estimated. Look to Gelman et al. (2013) Gill (2007) and Plummer (2008) for more advanced discussion of DIC.
DIC =\(\overline{D(\theta)} + p_{D} = 2\overline{D(\theta)} - D(\bar{\theta}) +2p_{D}\)
Loop over all MCMC iterations:
Compute \(p_{D}\) which is \(\overline{D(\theta)}\) - \(D(\bar{\theta})\).
Compute the DIC which 2 times \(\overline{D(\theta)}\) - \(\overline{D(\theta)}\).
First, the posterior mean of the deviance for 2d model.
computeDbar2d <- function(mcmc_matrix, pair_data, theta1_ids, theta2_ids, gamma_ids) {
# Extract item and judge IDs
itemj1 <- as.character(pair_data$itemj1)
itemj2 <- as.character(pair_data$itemj2)
choice <- pair_data$choice
rater <- as.character(pair_data$rater)
S <- nrow(mcmc_matrix)
N <- nrow(pair_data)
deviance_vec <- numeric(S)
for (s in 1:S) {
# Extract current parameter values
theta1 <- setNames(mcmc_matrix[s, theta1_ids], sub("theta1\\.", "", theta1_ids))
theta2 <- setNames(mcmc_matrix[s, theta2_ids], sub("theta2\\.", "", theta2_ids))
gamma <- setNames(mcmc_matrix[s, gamma_ids], sub("gamma\\.", "", gamma_ids))
eta <- numeric(N)
for (n in 1:N) {
j1 <- itemj1[n]
j2 <- itemj2[n]
r <- rater[n]
g <- gamma[r]
if (is.na(g)) g <- mean(gamma, na.rm = TRUE)
diff1 <- theta1[j2] - theta1[j1]
diff2 <- theta2[j2] - theta2[j1]
eta[n] <- (1 - g) * diff1 + g * diff2
}
p <- pnorm(eta)
p <- pmin(pmax(p, 1e-10), 1 - 1e-10) # avoid log(0)
is_j2_chosen <- choice == pair_data$itemj2
loglik <- dbinom(is_j2_chosen, size = 1, prob = p, log = TRUE) # Compute log-probability of the observed binary outcome
deviance_vec[s] <- -2 * sum(loglik)
}
mean(deviance_vec)
}
Note that the deviance is defined as minus two times the log-likelihood: \(D(\theta) = -2\log p(y | \theta)\)
So:
Then: \(\overline{D{\theta}}\) is computed by averaging this deviance over MCMC samples \(D(\bar{\theta})\) is computed by evaluating the deviance at the posterior mean of parameters
Use that function:
mcmc_matrix <- do.call(rbind, mcmc_list2d) # combine all chains
theta1_ids <- grep("^theta1\\.", colnames(mcmc_matrix), value = TRUE)
theta2_ids <- grep("^theta2\\.", colnames(mcmc_matrix), value = TRUE)
gamma_ids <- grep("^gamma\\.", colnames(mcmc_matrix), value = TRUE)
D_bar2d <- computeDbar2d(mcmc_matrix, pairDataCDAT, theta1_ids, theta2_ids, gamma_ids)
print(D_bar2d)
## [1] 1142.476
Now a function to compute the deviance at the posterior mean parameters
computeDevianceAtPosteriorMean2d <- function(mcmc_matrix, pair_data, theta1_ids, theta2_ids, gamma_ids) {
# Compute posterior means
theta1_mean <- setNames(colMeans(mcmc_matrix[, theta1_ids]), sub("theta1\\.", "", theta1_ids))
theta2_mean <- setNames(colMeans(mcmc_matrix[, theta2_ids]), sub("theta2\\.", "", theta2_ids))
gamma_mean <- setNames(colMeans(mcmc_matrix[, gamma_ids]), sub("gamma\\.", "", gamma_ids))
itemj1 <- as.character(pair_data$itemj1)
itemj2 <- as.character(pair_data$itemj2)
choice <- pair_data$choice
rater <- as.character(pair_data$rater)
N <- nrow(pair_data)
eta <- numeric(N)
for (n in 1:N) {
j1 <- itemj1[n]
j2 <- itemj2[n]
r <- rater[n]
g <- gamma_mean[r]
if (is.na(g)) g <- mean(gamma_mean, na.rm = TRUE)
diff1 <- theta1_mean[j2] - theta1_mean[j1]
diff2 <- theta2_mean[j2] - theta2_mean[j1]
eta[n] <- (1 - g) * diff1 + g * diff2
}
p <- pnorm(eta)
p <- pmin(pmax(p, 1e-10), 1 - 1e-10)
is_j2_chosen <- choice == pair_data$itemj2
loglik <- dbinom(is_j2_chosen, size = 1, prob = p, log = TRUE)
D_at_mean <- -2 * sum(loglik)
return(D_at_mean)
}
Use the function, with the same variables passed to it as created in previous step.
D_mean2d <- computeDevianceAtPosteriorMean2d(mcmc_matrix, pairDataCDAT, theta1_ids, theta2_ids, gamma_ids)
print(D_mean2d)
## [1] 1066.182
As a final step find the difference between twice the D_bar and the D_mean:
p_D2d <- D_bar2d - D_mean2d
DIC2d <- D_bar2d + p_D2d
print(DIC2d)
## [1] 1218.77
Repeat for 2dDP these steps using same functions as for the 2d normal prior model.
mcmc_matrix <- do.call(rbind, mcmc_list2dDP) # combine all chains
theta1_ids <- grep("^theta1\\.", colnames(mcmc_matrix), value = TRUE)
theta2_ids <- grep("^theta2\\.", colnames(mcmc_matrix), value = TRUE)
gamma_ids <- grep("^gamma\\.", colnames(mcmc_matrix), value = TRUE)
D_bar2dDP <- computeDbar2d(mcmc_matrix, pairDataCDAT, theta1_ids, theta2_ids, gamma_ids)
print(D_bar2dDP)
## [1] 1154.658
D_mean2dDP <- computeDevianceAtPosteriorMean2d(mcmc_matrix, pairDataCDAT, theta1_ids, theta2_ids, gamma_ids)
print(D_mean2dDP)
## [1] 1074.58
p_D2dDP <- D_bar2dDP - D_mean2dDP
DIC2dDP <- D_bar2dDP + p_D2dDP
print(DIC2dDP)
## [1] 1234.736
computeDbar1d <- function(mcmc_matrix, pair_data, theta_ids, alpha_ids) {
# Extract item and judge IDs
itemj1 <- as.character(pair_data$itemj1)
itemj2 <- as.character(pair_data$itemj2)
choice <- pair_data$choice
rater <- as.character(pair_data$rater)
S <- nrow(mcmc_matrix)
N <- nrow(pair_data)
deviance_vec <- numeric(S)
for (s in 1:S) {
# Extract current parameter values
theta <- setNames(mcmc_matrix[s, theta_ids], sub("theta\\.", "", theta_ids))
alpha <- setNames(mcmc_matrix[s, alpha_ids], sub("alpha\\.", "", alpha_ids))
eta <- numeric(N)
for (n in 1:N) {
j1 <- itemj1[n]
j2 <- itemj2[n]
r <- rater[n]
a <- alpha[r]
if (is.na(a)) a <- mean(alpha, na.rm = TRUE)
diff <- theta[j2] - theta[j1]
eta[n] <- a * diff
}
p <- pnorm(eta)
p <- pmin(pmax(p, 1e-10), 1 - 1e-10) # avoid log(0)
is_j2_chosen <- choice == pair_data$itemj2
loglik <- dbinom(is_j2_chosen, size = 1, prob = p, log = TRUE)
deviance_vec[s] <- -2 * sum(loglik)
}
mean(deviance_vec)
}
Use that function:
mcmc_matrix <- do.call(rbind, mcmc_list1d) # combine all chains
theta_ids <- grep("^theta\\.", colnames(mcmc_matrix), value = TRUE)
alpha_ids <- grep("^alpha\\.", colnames(mcmc_matrix), value = TRUE)
D_bar1d <- computeDbar1d(mcmc_matrix, pairDataCDAT, theta_ids, alpha_ids)
print(D_bar1d)
## [1] 999.6571
Now a function to compute the deviance at the posterior mean parameters
computeDevianceAtPosteriorMean1d <- function(mcmc_matrix, pair_data, theta_ids, alpha_ids) {
# Compute posterior means
theta_mean <- setNames(colMeans(mcmc_matrix[, theta_ids]), sub("theta\\.", "", theta_ids))
alpha_mean <- setNames(colMeans(mcmc_matrix[, alpha_ids]), sub("alpha\\.", "", alpha_ids))
itemj1 <- as.character(pair_data$itemj1)
itemj2 <- as.character(pair_data$itemj2)
choice <- pair_data$choice
rater <- as.character(pair_data$rater)
N <- nrow(pair_data)
eta <- numeric(N)
for (n in 1:N) {
j1 <- itemj1[n]
j2 <- itemj2[n]
r <- rater[n]
a <- alpha_mean[r]
if (is.na(a)) a <- mean(alpha_mean, na.rm = TRUE)
diff <- theta_mean[j2] - theta_mean[j1]
eta[n] <- a * diff
}
p <- pnorm(eta)
p <- pmin(pmax(p, 1e-10), 1 - 1e-10)
is_j2_chosen <- choice == pair_data$itemj2
loglik <- dbinom(is_j2_chosen, size = 1, prob = p, log = TRUE)
D_at_mean <- -2 * sum(loglik)
return(D_at_mean)
}
Use the function, with the same variables passed to it as created in previous step.
D_mean1d <- computeDevianceAtPosteriorMean1d(mcmc_matrix, pairDataCDAT, theta_ids, alpha_ids)
print(D_mean1d)
## [1] 973.8331
As a final step find the difference between twice the D_bar and the D_mean, then
p_D1d <- D_bar1d - D_mean1d
DIC1d <- D_bar1d + p_D1d
print(DIC1d)
## [1] 1025.481
dic_table <- data.frame(
Model = c("1d", "2d", "2dDP"),
DIC = c(DIC1d, DIC2d, DIC2dDP),
D_bar = c(D_bar1d, D_bar2d, D_bar2dDP),
D_mean = c(D_mean1d, D_mean2d, D_mean2dDP),
p_D = c(p_D1d, p_D2d, p_D2dDP)
)
print(dic_table)
## Model DIC D_bar D_mean p_D
## 1 1d 1025.481 999.6571 973.8331 25.82408
## 2 2d 1218.770 1142.4760 1066.1824 76.29365
## 3 2dDP 1234.736 1154.6579 1074.5799 80.07797
The 1d model has the lowest DIC by far, but is less expressive. The DIC statistic considers the trade off between fit and complexity, so the high complexity of the 2d models increases the DIC. Between the two 2d models, the additional parameters of the clustering does increase the deviance of the 2dDP model. The 2d and 2dDP models’ parameters have been shown above to be good at predicting the choices of raters when given the two items in a pairwise comparison. The clustering information from the 2dDP model does seem to be valid information in terms of co-clustering, but the increased complexity doesn’t result in better predictiveness overall. The max clusters do add substantially to the deviance statistic which considers complexity.
Even with the relatively sparse data set of comparisons, the model 2d works very well to explain the preferences of raters and the placement of items in 2d latent space.
The high DIC of the 2dDP model results partly from the maximum number of clusters explored, which when set at 6 allows the model to find the three clusters (by co-clustering matrix and dendrogram primarily) that are also visually evident in the gamma plot of medians.
Consider repeating the models with only the mid-angle gamma raters, cluster 2 raters. Does clustering raters by gamma angle improve the overall model’s predictiveness? Of course, no clustering was needed when modeling with just one cluster, but also no improvement in improvement in predictiveness was gained (See ‘Cluster Validity.R’). I think this shows that the rater perceptual orientation help to explain the observed data and actually reduce the need for grouping by rater agreement as I did previously with the BTL model on the web app. This could lead to larger groups of raters contributing to the theta estimates and shrinking credible intervals.
Notes: Set yourself up for further analysis, but not in this paper, more next steps.
Say which defaults you did not use in the functions, such as 3 chains, etc. Maybe say something about “we recognize that winbugs is standard software, but we used this because. . .”
No meeting until after returning July 14th, o
Gelman, A., Gilks, W. R., & Roberts, G. O. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. The Annals of Applied Probability, 7(1). https://doi.org/10.1214/aoap/1034625254
Levy, R., & Mislevy, R. J. (2016). Bayesian psychometric modeling. Chapman & Hall/CRC.
Yu, Q., & Quinn, K. M. (2022). A multidimensional pairwise comparison model for heterogeneous perceptions with an application to modelling the perceived truthfulness of public statements on COVID‐19. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 10.1111/rssa.12810. https://doi.org/10.1111/rssa.12810