Paired Comparison Models with MCMCpack and CDAT Project Data

# Clear the environment and load packages
rm(list = ls())

library(MCMCpack)
library(bayesplot)
library(coda)
library(ggplot2)
library(tidyverse)
library(posterior)

Load CDAT paired comparison data

# setwd("~/Dropbox/IU25SP-EDUC-Y617 Psychometric Theory/Psychometric Theory Final Project/Paired Comparison MCMC CDAT")
data.folder <- "Data"
load(file.path(data.folder, "pairDataCDAT.Rda"))

# Prepare for use in MCMCpack
colnames(pairDataCDAT) <- c("rater","itemj1", "itemj2","choice")

# Remove rater 1 from raters' comparisons
pairDataCDAT <- pairDataCDAT[pairDataCDAT$rater > 1, ]

# Prepare the data for easy matching to student responses as PDFs labeled by 4-digit ID

# Create a named vector for the mapping: names are the original IDs and values are the new numbers.
lookup <- c("60" = 1121, "61" = 1154, "62" = 1222, "63" = 1637,
            "64" = 1995, "65" = 3369, "66" = 4004, "67" = 4209,
            "68" = 4268, "69" = 4841, "70" = 5074, "71" = 5495,
            "72" = 5710, "73" = 8294, "74" = 8980, "75" = 9899)

# Map the itemj1 and itemj2 columns using the lookup vector.
pairDataCDAT$itemj1 <- lookup[as.character(pairDataCDAT$itemj1)]
pairDataCDAT$itemj2 <- lookup[as.character(pairDataCDAT$itemj2)]
pairDataCDAT$choice <- lookup[as.character(pairDataCDAT$choice)]

# Sort for easy matching in theta.start matrix below
pairDataCDAT <- pairDataCDAT[order(pairDataCDAT$itemj1), ] 


save(pairDataCDAT, file = file.path(data.folder, "pairDataCDAT.1.Rda"))

Set initial values for modeling parameters

n_chains <- 3 # Set the number of chains
set.seed(1234) # Set seed for randomized starting values
I <- length(unique(pairDataCDAT$rater))

Initialize respondent-specific parameters \(\gamma\)

The vector γ represents each respondent’s perceptual orientation in the 2D latent space in radians. Initially, all respondents are assigned a value of π/4π/4 (45° in radians). The second line randomizes the starting values within a small range [π/4−0.4,π/4+0.4][π/4−0.4,π/4+0.4] to introduce slight variation.

gamma_start <- runif(I, pi/4-.4, pi/4+.4) 

Initialize item-specific parameters \(\theta\) for 2d models

Thetas start at 0 for the one-dimensional model

theta_start1d <- matrix(0, 16, 1) # All start at 0 in the model's probit units

Create the empty starting \(\theta\) matrix for 2d models

theta_start2d <- matrix(0, 16, 2) # 16 scripts, two dimensions
# Creates a 16×2 matrix (theta_start2d) to store the latent attributes of statements along two dimensions. Initially, all values are set to 0.

Find the matching indices for the anchor scripts in the matrix by viewing them in order.

unique(pairDataCDAT$itemj1)
##  [1] 1121 1154 1222 1637 1995 3369 4004 4209 4268 4841 5074 5495 5710 8294 8980
## [16] 9899

Set initial values for two anchor items whose place in the matrix is determined above.

# These values anchor certain scripts in the 2d latent space and help define the rotation.
theta_start2d[11,1] <- 1   ## script 5074 is interpreted as having high creative elaboration
theta_start2d[11,2] <- -1  ## script 5074 is interpreted as having low conventional representation
theta_start2d[12,1] <- -1  ## script 5495 is interpreted as having low creative elaboration
theta_start2d[12,2] <- 1   ## script 5495 is interpreted as having high conventional representation

MCMC 1d with normal prior

seeds <- c(0202, 8018, 1304) # Set seeds for each chain
chains1d <- vector("list", n_chains)

for (i in 1:n_chains) {
  chains1d[[i]] <- MCMCpaircompare(
    pairDataCDAT,
    verbose=10000,      
    burnin=500,       # recommended to keep this a small percent of chain 
    mcmc=50000,        # Long chain needed for good mixing and sufficient sample size
    thin=20,           # Large thinning intervals are not standard practice
    seed=seeds[i],     # Ensures reproducibility of results.
    alpha.fixed=FALSE,
    alpha.start=1,
    a=0,               # The default prio mean of alpha is 0
    A=.25,             # The default prior precision is 0.25 (variance of 4)
    store.theta=TRUE,  # Saves item-specfic draws.
    store.alpha=TRUE,  # Saves both rater-specific draws.
    tune=.3,           # Sets the tuning parameter for the Metropolis-Hastings step.
    procrustes=FALSE   # Disables Procrustes alignment
    )
}
## 
## 
## MCMCpaircompare iteration 1 of 50500 
## 
## 
## MCMCpaircompare iteration 10001 of 50500 
## 
## 
## MCMCpaircompare iteration 20001 of 50500 
## 
## 
## MCMCpaircompare iteration 30001 of 50500 
## 
## 
## MCMCpaircompare iteration 40001 of 50500 
## 
## 
## MCMCpaircompare iteration 50001 of 50500 
## 
## 
## MCMCpaircompare iteration 1 of 50500 
## 
## 
## MCMCpaircompare iteration 10001 of 50500 
## 
## 
## MCMCpaircompare iteration 20001 of 50500 
## 
## 
## MCMCpaircompare iteration 30001 of 50500 
## 
## 
## MCMCpaircompare iteration 40001 of 50500 
## 
## 
## MCMCpaircompare iteration 50001 of 50500 
## 
## 
## MCMCpaircompare iteration 1 of 50500 
## 
## 
## MCMCpaircompare iteration 10001 of 50500 
## 
## 
## MCMCpaircompare iteration 20001 of 50500 
## 
## 
## MCMCpaircompare iteration 30001 of 50500 
## 
## 
## MCMCpaircompare iteration 40001 of 50500 
## 
## 
## MCMCpaircompare iteration 50001 of 50500
# Apply the mcmc() funciton to each chain and combine into one mcmc_list2dDP
mcmc_list1d <- mcmc.list(lapply(chains1d, mcmc)) 

# Save object
save(mcmc_list1d, file = "Out/mcmc_list1d.Rdata")

# Create a list of parameters for each type of parameter
thetas <- grep("^theta", colnames(mcmc_list1d[[1]]), value = TRUE)
alphas <- grep("^alpha", colnames(mcmc_list1d[[1]]), value = TRUE)

Check model convergence

gelman_diag_1d <- gelman.diag(mcmc_list1d)
max_rhat_1d <- max(gelman_diag_1d$psrf[, "Point est."])
gelman_diag_1d
## Potential scale reduction factors:
## 
##            Point est. Upper C.I.
## theta.1121       1.00       1.01
## theta.1154       1.00       1.02
## theta.1222       1.00       1.00
## theta.1637       1.00       1.01
## theta.1995       1.00       1.01
## theta.3369       1.00       1.00
## theta.4004       1.00       1.01
## theta.4209       1.00       1.01
## theta.4268       1.00       1.01
## theta.4841       1.00       1.01
## theta.5074       1.00       1.01
## theta.5495       1.00       1.01
## theta.5710       1.00       1.01
## theta.8294       1.00       1.01
## theta.8980       1.00       1.00
## theta.9899       1.00       1.00
## alpha.25         1.00       1.01
## alpha.26         1.01       1.02
## alpha.27         1.00       1.00
## alpha.28         1.00       1.01
## alpha.29         1.00       1.01
## alpha.30         1.00       1.00
## alpha.31         1.00       1.01
## alpha.34         1.00       1.00
## alpha.35         1.00       1.00
## alpha.36         1.00       1.01
## alpha.37         1.00       1.00
## alpha.38         1.00       1.00
## 
## Multivariate psrf
## 
## 1.01

All are within acceptable range of point estimate and upper CI, meaning convergence across chains is adequate.

The multivariate PSRF is a generalization of the univariate PSRF (Gelman-Rubin statistic, or \(\hat{R}\), which considers the joint convergence of all parameters simultaneously using the variance-covariance matrix across chains.

It quantifies how much potential gain in efficiency (via additional sampling) there is if the chains were continued. Values much above 1 indicate that the between-chain variance is still large relative to the within-chain variance.

The multivariate PSRF is based on the largest eigenvalue of the matrix:

\(Var^+/W\)

  • Where \(Var^+\) is a weighted average of the within- and between-chain covaraince matrices.
  • \(W\) is the within-chain covariance

It essentially checks for convergence across linear combinations of parameters, not just marginally.

Summary of MCMC 1d with normal prior

View a summary

summary1d <- round(
  cbind(
    as.data.frame(summary(mcmc_list1d)$statistics),
    as.data.frame(summary(mcmc_list1d)$quantiles)
  ), 3) 
summary1d$Model <- "1d"
save(summary1d, file = "Out/summary1d.Rdata")
summary1d
##              Mean    SD Naive SE Time-series SE   2.5%    25%    50%    75%
## theta.1121 -0.177 0.277    0.003          0.008 -0.730 -0.359 -0.173  0.010
## theta.1154 -0.947 0.343    0.004          0.009 -1.650 -1.172 -0.931 -0.711
## theta.1222  0.208 0.278    0.003          0.008 -0.342  0.022  0.204  0.398
## theta.1637 -0.029 0.270    0.003          0.008 -0.571 -0.209 -0.028  0.153
## theta.1995 -0.401 0.285    0.003          0.008 -0.971 -0.591 -0.400 -0.206
## theta.3369  1.719 0.480    0.006          0.011  0.845  1.379  1.696  2.028
## theta.4004  0.348 0.293    0.003          0.008 -0.224  0.145  0.345  0.543
## theta.4209 -0.211 0.279    0.003          0.008 -0.767 -0.390 -0.206 -0.022
## theta.4268  0.204 0.280    0.003          0.008 -0.347  0.015  0.205  0.391
## theta.4841 -0.485 0.288    0.003          0.008 -1.066 -0.668 -0.479 -0.287
## theta.5074 -0.064 0.273    0.003          0.008 -0.607 -0.244 -0.064  0.120
## theta.5495 -0.721 0.321    0.004          0.008 -1.387 -0.931 -0.713 -0.499
## theta.5710 -0.958 0.342    0.004          0.009 -1.664 -1.178 -0.945 -0.723
## theta.8294  0.290 0.279    0.003          0.008 -0.266  0.101  0.290  0.475
## theta.8980  0.803 0.333    0.004          0.009  0.167  0.572  0.793  1.021
## theta.9899  0.405 0.296    0.003          0.008 -0.177  0.203  0.402  0.609
## alpha.25    1.106 0.393    0.005          0.006  0.492  0.830  1.053  1.327
## alpha.26    2.027 0.626    0.007          0.011  1.033  1.576  1.959  2.384
## alpha.27    1.946 0.582    0.007          0.010  1.010  1.522  1.880  2.292
## alpha.28    1.114 0.416    0.005          0.006  0.452  0.816  1.058  1.349
## alpha.29    0.860 0.333    0.004          0.005  0.323  0.625  0.827  1.055
## alpha.30    0.691 0.284    0.003          0.004  0.229  0.494  0.655  0.854
## alpha.31    0.356 0.206    0.002          0.003 -0.008  0.215  0.337  0.482
## alpha.34    0.993 0.344    0.004          0.005  0.440  0.747  0.949  1.193
## alpha.35    0.515 0.235    0.003          0.003  0.126  0.351  0.490  0.650
## alpha.36    1.097 0.405    0.005          0.006  0.461  0.809  1.038  1.326
## alpha.37    1.018 0.389    0.004          0.006  0.420  0.745  0.962  1.231
## alpha.38    0.656 0.265    0.003          0.004  0.230  0.471  0.626  0.800
##             97.5% Model
## theta.1121  0.356    1d
## theta.1154 -0.310    1d
## theta.1222  0.763    1d
## theta.1637  0.496    1d
## theta.1995  0.144    1d
## theta.3369  2.742    1d
## theta.4004  0.935    1d
## theta.4209  0.328    1d
## theta.4268  0.750    1d
## theta.4841  0.066    1d
## theta.5074  0.465    1d
## theta.5495 -0.103    1d
## theta.5710 -0.317    1d
## theta.8294  0.843    1d
## theta.8980  1.484    1d
## theta.9899  0.985    1d
## alpha.25    2.028    1d
## alpha.26    3.446    1d
## alpha.27    3.289    1d
## alpha.28    2.065    1d
## alpha.29    1.609    1d
## alpha.30    1.348    1d
## alpha.31    0.794    1d
## alpha.34    1.781    1d
## alpha.35    1.047    1d
## alpha.36    2.025    1d
## alpha.37    1.947    1d
## alpha.38    1.288    1d

Check for high autocorrelation

effectiveSize(mcmc_list1d)
## theta.1121 theta.1154 theta.1222 theta.1637 theta.1995 theta.3369 theta.4004 
##   1183.106   1607.665   1259.876   1107.284   1236.771   2087.138   1404.252 
## theta.4209 theta.4268 theta.4841 theta.5074 theta.5495 theta.5710 theta.8294 
##   1232.249   1233.426   1281.698   1108.178   1531.724   1546.711   1235.135 
## theta.8980 theta.9899   alpha.25   alpha.26   alpha.27   alpha.28   alpha.29 
##   1471.531   1326.976   4040.197   3487.532   3308.899   4485.919   4632.910 
##   alpha.30   alpha.31   alpha.34   alpha.35   alpha.36   alpha.37   alpha.38 
##   4542.408   5756.181   4347.620   5207.429   4511.091   4177.004   4795.647
print(summarize_draws(mcmc_list1d)[, c("variable", "ess_bulk")], n=28)
## # A tibble: 28 × 2
##    variable   ess_bulk
##    <chr>         <dbl>
##  1 theta.1121    1066.
##  2 theta.1154    1515.
##  3 theta.1222    1142.
##  4 theta.1637    1064.
##  5 theta.1995    1169.
##  6 theta.3369    1700.
##  7 theta.4004    1280.
##  8 theta.4209    1162.
##  9 theta.4268    1130.
## 10 theta.4841    1200.
## 11 theta.5074    1085.
## 12 theta.5495    1357.
## 13 theta.5710    1386.
## 14 theta.8294    1104.
## 15 theta.8980    1345.
## 16 theta.9899    1173.
## 17 alpha.25      4178.
## 18 alpha.26      3596.
## 19 alpha.27      3267.
## 20 alpha.28      4201.
## 21 alpha.29      4369.
## 22 alpha.30      4780.
## 23 alpha.31      5940.
## 24 alpha.34      4344.
## 25 alpha.35      5248.
## 26 alpha.36      4615.
## 27 alpha.37      4346.
## 28 alpha.38      4478.

The ESS seems to be sufficient for inference from the posterior only when each MCMC chain is run for 50,000 iterations. This is >1000 when burn-in is about 1% of iterations (500) and thinning is set conservatively at 20.

See companion page for plots of 1d model “PWC 1d MCMCpack CDAT”

MCMC 2d without clusters

seeds <- c(1304, 2406, 7916) # Set seeds for each chain
chains2d <- vector("list", n_chains)

for (i in 1:n_chains) {
  chains2d[[i]] <- MCMCpaircompare2d(
    pairDataCDAT,
    theta.constraints=list(
      '5074' = list(1, "+"),  
      '5074' = list(2, "-"),
      '5495' = list(1, "-"), 
      '5495' = list(2, "+")   
      ),
    verbose=10000,         
    burnin=200,      
    mcmc=20000,     
    thin=20,      
    seed=seeds[i],        # Ensures reproducibility of results.
    gamma.start=gamma_start, # Uses the initialized respondent perception values.
    theta.start=theta_start2d,
    store.theta=TRUE,  # Saves item-specfic draws.
    store.gamma=TRUE,  # Saves both rater-specific draws.
    tune=.25,          # Sets the tuning parameter for the Metropolis-Hastings proposals 
    # resulting acceptance rate of .44 target for univariate proposals
    procrustes=FALSE   # Disables Procrustes alignment
    )
}
## 
## 
## MCMCpaircompare2d iteration 1 of 20200 
## 
## Summary of acceptance rates for gamma:
## 
## 
## MCMCpaircompare2d iteration 10001 of 20200 
## 
## Summary of acceptance rates for gamma:
## 
## 
## MCMCpaircompare2d iteration 20001 of 20200 
## 
## Summary of acceptance rates for gamma:
## 
## 
## MCMCpaircompare2d iteration 1 of 20200 
## 
## Summary of acceptance rates for gamma:
## 
## 
## MCMCpaircompare2d iteration 10001 of 20200 
## 
## Summary of acceptance rates for gamma:
## 
## 
## MCMCpaircompare2d iteration 20001 of 20200 
## 
## Summary of acceptance rates for gamma:
## 
## 
## MCMCpaircompare2d iteration 1 of 20200 
## 
## Summary of acceptance rates for gamma:
## 
## 
## MCMCpaircompare2d iteration 10001 of 20200 
## 
## Summary of acceptance rates for gamma:
## 
## 
## MCMCpaircompare2d iteration 20001 of 20200 
## 
## Summary of acceptance rates for gamma:
# Apply the mcmc() funciton to each chain and combine into one mcmc_list2dDP
mcmc_list2d <- mcmc.list(lapply(chains2d, mcmc)) 

#Save object
save(mcmc_list2d, file = "Out/mcmc_list2d.Rdata")

# Create a list of parameters for each type of parameter
thetas <- grep("^theta", colnames(mcmc_list2d[[1]]), value = TRUE)
alpha <- grep("^alpha", colnames(mcmc_list2d[[1]]), value = TRUE)

Check gamma acceptance rate:

mean(attr(chains2d[[3]], "gamma_accept_rate"), na.rm=TRUE)
## [1] 0.4334241

Gamma acceptance rates are excellent with tuning proposals set to .25 for a rate of .43 where .44 is considered optimal for univariate updates (Roberts, et al., 1997)

effectiveSize(mcmc_list2d)
## theta1.1121 theta1.1154 theta1.1222 theta1.1637 theta1.1995 theta1.3369 
##    1383.213    1443.011    1533.158    1307.613    1495.780    2236.972 
## theta1.4004 theta1.4209 theta1.4268 theta1.4841 theta1.5074 theta1.5495 
##    1575.418    1696.629    1672.043    1609.734    1547.080    1543.666 
## theta1.5710 theta1.8294 theta1.8980 theta1.9899 theta2.1121 theta2.1154 
##    1759.374    2024.255    1647.231    1673.853    1339.960    1451.198 
## theta2.1222 theta2.1637 theta2.1995 theta2.3369 theta2.4004 theta2.4209 
##    1433.658    1504.343    1222.556    1647.750    1591.155    1469.726 
## theta2.4268 theta2.4841 theta2.5074 theta2.5495 theta2.5710 theta2.8294 
##    1702.467    1621.296    1292.201    1571.017    1581.382    1338.498 
## theta2.8980 theta2.9899    gamma.25    gamma.26    gamma.27    gamma.28 
##    1668.773    1571.592    1130.004    1174.260     990.656    1365.406 
##    gamma.29    gamma.30    gamma.31    gamma.34    gamma.35    gamma.36 
##    1186.631    1451.307    2361.445    1059.541    1859.609    1699.951 
##    gamma.37    gamma.38 
##    1382.259    1909.395

Check model convergence

# For 2D model
gelman_diag_2d <- gelman.diag(mcmc_list2d)
max_rhat_2d <- max(gelman_diag_2d$psrf[, "Point est."])

gelman_diag_2d
## Potential scale reduction factors:
## 
##             Point est. Upper C.I.
## theta1.1121      1.001       1.01
## theta1.1154      0.999       1.00
## theta1.1222      1.001       1.01
## theta1.1637      1.001       1.00
## theta1.1995      1.002       1.01
## theta1.3369      1.000       1.00
## theta1.4004      1.003       1.01
## theta1.4209      1.000       1.00
## theta1.4268      1.002       1.01
## theta1.4841      1.000       1.00
## theta1.5074      0.999       1.00
## theta1.5495      1.000       1.00
## theta1.5710      1.002       1.00
## theta1.8294      1.002       1.01
## theta1.8980      1.000       1.00
## theta1.9899      1.003       1.01
## theta2.1121      1.001       1.01
## theta2.1154      1.001       1.00
## theta2.1222      0.999       1.00
## theta2.1637      1.002       1.00
## theta2.1995      1.003       1.01
## theta2.3369      1.000       1.00
## theta2.4004      1.004       1.02
## theta2.4209      1.008       1.03
## theta2.4268      1.005       1.02
## theta2.4841      0.999       1.00
## theta2.5074      1.003       1.01
## theta2.5495      1.005       1.02
## theta2.5710      1.001       1.00
## theta2.8294      1.002       1.01
## theta2.8980      1.001       1.01
## theta2.9899      1.005       1.02
## gamma.25         1.012       1.04
## gamma.26         1.004       1.02
## gamma.27         1.008       1.03
## gamma.28         1.003       1.01
## gamma.29         1.003       1.01
## gamma.30         1.008       1.03
## gamma.31         1.005       1.01
## gamma.34         1.005       1.02
## gamma.35         1.003       1.01
## gamma.36         1.004       1.02
## gamma.37         1.005       1.02
## gamma.38         1.005       1.02
## 
## Multivariate psrf
## 
## 1.03

Excellent \(\hat{R}\) and multivariate psrf for this model, all close to 1.0

Summary of MCMC 2d with normal prior

View a summary of the 2d model parameter posterior distributions

summary2d <- round(
  cbind(
    as.data.frame(summary(mcmc_list2d)$statistics),
    as.data.frame(summary(mcmc_list2d)$quantiles)
  ), 3)
summary2d$Model <- "2d"

save(summary2d, file="Out/summary2d.Rdata")
summary2d
##               Mean    SD Naive SE Time-series SE   2.5%    25%    50%    75%
## theta1.1121  1.243 0.401    0.007          0.011  0.475  0.969  1.241  1.519
## theta1.1154  0.007 0.387    0.007          0.010 -0.743 -0.258  0.005  0.265
## theta1.1222  0.613 0.362    0.007          0.009 -0.091  0.366  0.613  0.859
## theta1.1637  0.561 0.362    0.007          0.010 -0.118  0.309  0.555  0.805
## theta1.1995  0.872 0.407    0.007          0.011  0.101  0.594  0.860  1.150
## theta1.3369  1.631 0.463    0.008          0.010  0.743  1.317  1.620  1.936
## theta1.4004 -0.854 0.399    0.007          0.010 -1.671 -1.118 -0.861 -0.584
## theta1.4209 -1.234 0.391    0.007          0.010 -2.025 -1.491 -1.231 -0.965
## theta1.4268 -0.894 0.398    0.007          0.010 -1.697 -1.158 -0.887 -0.619
## theta1.4841 -0.912 0.383    0.007          0.010 -1.671 -1.167 -0.910 -0.645
## theta1.5074  1.154 0.386    0.007          0.010  0.429  0.880  1.150  1.420
## theta1.5495 -2.243 0.446    0.008          0.012 -3.162 -2.538 -2.234 -1.930
## theta1.5710 -0.873 0.395    0.007          0.009 -1.644 -1.140 -0.862 -0.606
## theta1.8294 -0.116 0.369    0.007          0.008 -0.844 -0.366 -0.115  0.137
## theta1.8980  0.679 0.384    0.007          0.009 -0.062  0.421  0.674  0.942
## theta1.9899  0.357 0.376    0.007          0.009 -0.410  0.098  0.363  0.624
## theta2.1121 -1.224 0.365    0.007          0.010 -1.940 -1.465 -1.217 -0.971
## theta2.1154 -1.546 0.359    0.007          0.009 -2.264 -1.781 -1.549 -1.305
## theta2.1222 -0.160 0.328    0.006          0.009 -0.788 -0.379 -0.154  0.068
## theta2.1637 -0.565 0.330    0.006          0.009 -1.220 -0.780 -0.565 -0.335
## theta2.1995 -1.239 0.363    0.007          0.010 -1.969 -1.476 -1.239 -0.998
## theta2.3369  1.183 0.394    0.007          0.010  0.389  0.919  1.182  1.454
## theta2.4004  1.316 0.354    0.006          0.009  0.660  1.069  1.299  1.554
## theta2.4209  0.764 0.334    0.006          0.009  0.132  0.532  0.758  0.980
## theta2.4268  1.122 0.345    0.006          0.008  0.470  0.888  1.114  1.339
## theta2.4841 -0.262 0.329    0.006          0.008 -0.892 -0.475 -0.270 -0.057
## theta2.5074 -1.056 0.358    0.007          0.010 -1.801 -1.289 -1.051 -0.806
## theta2.5495  0.579 0.332    0.006          0.009  0.046  0.327  0.549  0.793
## theta2.5710 -0.845 0.345    0.006          0.009 -1.505 -1.071 -0.848 -0.622
## theta2.8294  0.753 0.324    0.006          0.009  0.137  0.537  0.755  0.971
## theta2.8980  0.874 0.346    0.006          0.008  0.194  0.648  0.863  1.103
## theta2.9899  0.581 0.342    0.006          0.009 -0.106  0.354  0.589  0.806
## gamma.25     0.729 0.137    0.002          0.004  0.450  0.636  0.731  0.820
## gamma.26     0.832 0.127    0.002          0.004  0.582  0.747  0.833  0.918
## gamma.27     0.842 0.126    0.002          0.004  0.594  0.755  0.845  0.928
## gamma.28     0.463 0.140    0.003          0.004  0.190  0.365  0.463  0.562
## gamma.29     0.917 0.142    0.003          0.004  0.639  0.821  0.916  1.011
## gamma.30     0.327 0.148    0.003          0.004  0.054  0.221  0.321  0.426
## gamma.31     0.086 0.072    0.001          0.002  0.002  0.030  0.068  0.124
## gamma.34     0.934 0.147    0.003          0.005  0.629  0.839  0.938  1.032
## gamma.35     1.327 0.141    0.003          0.003  1.023  1.236  1.339  1.432
## gamma.36     1.309 0.125    0.002          0.003  1.048  1.227  1.312  1.400
## gamma.37     1.323 0.119    0.002          0.003  1.075  1.247  1.327  1.409
## gamma.38     1.442 0.091    0.002          0.002  1.236  1.386  1.457  1.514
##              97.5% Model
## theta1.1121  2.023    2d
## theta1.1154  0.785    2d
## theta1.1222  1.312    2d
## theta1.1637  1.255    2d
## theta1.1995  1.675    2d
## theta1.3369  2.558    2d
## theta1.4004 -0.082    2d
## theta1.4209 -0.472    2d
## theta1.4268 -0.152    2d
## theta1.4841 -0.185    2d
## theta1.5074  1.917    2d
## theta1.5495 -1.398    2d
## theta1.5710 -0.116    2d
## theta1.8294  0.596    2d
## theta1.8980  1.443    2d
## theta1.9899  1.067    2d
## theta2.1121 -0.533    2d
## theta2.1154 -0.845    2d
## theta2.1222  0.458    2d
## theta2.1637  0.060    2d
## theta2.1995 -0.527    2d
## theta2.3369  1.932    2d
## theta2.4004  2.034    2d
## theta2.4209  1.427    2d
## theta2.4268  1.844    2d
## theta2.4841  0.399    2d
## theta2.5074 -0.365    2d
## theta2.5495  1.333    2d
## theta2.5710 -0.164    2d
## theta2.8294  1.396    2d
## theta2.8980  1.570    2d
## theta2.9899  1.256    2d
## gamma.25     0.992    2d
## gamma.26     1.081    2d
## gamma.27     1.083    2d
## gamma.28     0.728    2d
## gamma.29     1.193    2d
## gamma.30     0.631    2d
## gamma.31     0.268    2d
## gamma.34     1.215    2d
## gamma.35     1.551    2d
## gamma.36     1.534    2d
## gamma.37     1.531    2d
## gamma.38     1.564    2d

Check for high autocorrelation

effectiveSize(mcmc_list2d)
## theta1.1121 theta1.1154 theta1.1222 theta1.1637 theta1.1995 theta1.3369 
##    1383.213    1443.011    1533.158    1307.613    1495.780    2236.972 
## theta1.4004 theta1.4209 theta1.4268 theta1.4841 theta1.5074 theta1.5495 
##    1575.418    1696.629    1672.043    1609.734    1547.080    1543.666 
## theta1.5710 theta1.8294 theta1.8980 theta1.9899 theta2.1121 theta2.1154 
##    1759.374    2024.255    1647.231    1673.853    1339.960    1451.198 
## theta2.1222 theta2.1637 theta2.1995 theta2.3369 theta2.4004 theta2.4209 
##    1433.658    1504.343    1222.556    1647.750    1591.155    1469.726 
## theta2.4268 theta2.4841 theta2.5074 theta2.5495 theta2.5710 theta2.8294 
##    1702.467    1621.296    1292.201    1571.017    1581.382    1338.498 
## theta2.8980 theta2.9899    gamma.25    gamma.26    gamma.27    gamma.28 
##    1668.773    1571.592    1130.004    1174.260     990.656    1365.406 
##    gamma.29    gamma.30    gamma.31    gamma.34    gamma.35    gamma.36 
##    1186.631    1451.307    2361.445    1059.541    1859.609    1699.951 
##    gamma.37    gamma.38 
##    1382.259    1909.395
print(summarize_draws(mcmc_list2d)[, c("variable", "ess_bulk")], n=28)
## # A tibble: 44 × 2
##    variable    ess_bulk
##    <chr>          <dbl>
##  1 theta1.1121    1307.
##  2 theta1.1154    1510.
##  3 theta1.1222    1459.
##  4 theta1.1637    1419.
##  5 theta1.1995    1440.
##  6 theta1.3369    1810.
##  7 theta1.4004    1147.
##  8 theta1.4209    1540.
##  9 theta1.4268    1149.
## 10 theta1.4841    1603.
## 11 theta1.5074    1421.
## 12 theta1.5495    1460.
## 13 theta1.5710    1818.
## 14 theta1.8294    1565.
## 15 theta1.8980    1678.
## 16 theta1.9899    1444.
## 17 theta2.1121    1156.
## 18 theta2.1154    1429.
## 19 theta2.1222    1283.
## 20 theta2.1637    1532.
## 21 theta2.1995    1282.
## 22 theta2.3369    1518.
## 23 theta2.4004    1538.
## 24 theta2.4209    1382.
## 25 theta2.4268    1666.
## 26 theta2.4841    1374.
## 27 theta2.5074    1170.
## 28 theta2.5495    1243.
## # ℹ 16 more rows

See companion page for plots of 2d model “PWC 2d MCMCpack CDAT”

MCMC 2d DP Prior Paired Comparison model

seeds <- c(4610, 5712, 6814) # Set seeds for each chain
chains2dDP <- vector("list", n_chains)

for (i in 1:n_chains) {
  chains2dDP[[i]] <- MCMCpaircompare2dDP(
    pairDataCDAT,
    
    # Main sampling chain defined:
    ttheta.constraints=list(
      '5074' = list(1, "+"),  
      '5074' = list(2, "-"),
      '5495' = list(1, "-"), 
      '5495' = list(2, "+")   
      ),
    verbose=10000,      
    burnin=400,       
    mcmc=40000,        
    thin=20,           
    seed=seeds[i],     # Ensures reproducibility of results.
    gamma.start=gamma_start, # Uses the initialized respondent perception values.
    theta.start=theta_start2d,
    store.theta=TRUE,  # Saves item-specfic draws.
    store.gamma=TRUE,  # Saves both rater-specific draws.
    tune=.18,           # Sets the tuning parameter for the proposals
    procrustes=FALSE,  # Disables Procrustes alignment
    
    # Cluster MCMC at each stored iteration of the main chain:
    alpha.start=1,     # Sets the Dirichlet Process concentration parameter to start only if not fixed.
    cluster.max=6,     # Allows up to n clusters of respondents.
    cluster.mcmc=100,  # Runs __ iterations of MCMC for clustering at each stored iteration of main chain
    alpha.fixed=FALSE, # Allows the concentration parameter to be learned from the data.
    a0=.5, b0=2        # Hyperparameters for the Dirichlet Process prior for cluster sampling.
    
    # To encourage more clustering of raters’ gamma values set the DP concentration parameter α to be low. 
    # Since α∼Gamma(a0,b0) choose hyperparameters a0 and b0 that favor small values of α.
    # Strong clustering: a0 = .5, b0=2
    # Balance clustering: a0 = 1, b0=1
    # Minimal clustering: a0 = 5, b0=1
    )
}
## 
## 
## MCMCpaircompare2dDP iteration 1 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 10001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 20001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 30001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 40001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 1 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 10001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 20001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 30001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 40001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 1 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 10001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 20001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 30001 of 40400 
## 
## 
## MCMCpaircompare2dDP iteration 40001 of 40400
# Apply the mcmc() funciton to each chain and combine into one mcmc_list2dDP
mcmc_list2dDP <- mcmc.list(lapply(chains2dDP, mcmc)) 

# Save object
save(mcmc_list2dDP, file = "Out/mcmc_list2dDP.Rdata")

# View the medians of posterior cluster assignments
summary(mcmc_list2dDP)$quantiles[33:56, 3]
##   gamma.25   gamma.26   gamma.27   gamma.28   gamma.29   gamma.30   gamma.31 
##  0.8726110  0.8937963  0.8951321  0.4653028  0.9037670  0.3582067  0.1065133 
##   gamma.34   gamma.35   gamma.36   gamma.37   gamma.38 cluster.25 cluster.26 
##  0.9042405  1.4476224  1.4457542  1.4465269  1.4574008  2.0000000  2.0000000 
## cluster.27 cluster.28 cluster.29 cluster.30 cluster.31 cluster.34 cluster.35 
##  2.0000000  3.0000000  2.0000000  3.0000000  4.0000000  2.0000000  2.0000000 
## cluster.36 cluster.37 cluster.38 
##  2.0000000  2.0000000  2.0000000

Cluster assignments should be examined with the Maximum A Posteriori (MAP) Estimate or mode for each rater, rather than median or mean.

# Combine chains into one matrix
mcmc_matrix <- do.call(rbind, mcmc_list2dDP)

# Extract cluster columns and sort by rater ID
cluster_cols <- grep("^cluster\\.", colnames(mcmc_matrix), value = TRUE)
cluster_cols <- cluster_cols[order(as.numeric(sub("cluster\\.", "", cluster_cols)))]

# Define function to compute mode
get_mode <- function(x) {
  as.numeric(names(which.max(table(x))))
}

# Compute modal cluster (MAP estimate) for each rater
modal_cluster_by_rater <- sapply(cluster_cols, function(col) {
  get_mode(mcmc_matrix[, col])
})

print(modal_cluster_by_rater)
## cluster.25 cluster.26 cluster.27 cluster.28 cluster.29 cluster.30 cluster.31 
##          1          1          1          3          1          3          4 
## cluster.34 cluster.35 cluster.36 cluster.37 cluster.38 
##          1          2          2          2          2
# Identify cluster columns
rater_ids <- cluster_cols
n_raters <- length(rater_ids)
n_samples <- nrow(mcmc_matrix)

# Initialize co-clustering matrix
co_clustering <- matrix(0, n_raters, n_raters)
colnames(co_clustering) <- rownames(co_clustering) <- sub("cluster\\.", "", rater_ids)

# Loop through each posterior draw (row of mcmc_matrix)
for (i in 1:n_samples) {
  row <- mcmc_matrix[i, rater_ids]
  for (j in 1:n_raters) {
    for (k in 1:n_raters) {
      if (row[[j]] == row[[k]]) {
        co_clustering[j, k] <- co_clustering[j, k] + 1
      }
    }
  }
}

# Normalize to get co-clustering proportions
co_clustering <- co_clustering / n_samples
library(pheatmap)

# Create cell labels with exactly 2 decimal places
co_clustering_labels <- formatC(co_clustering, digits = 2, format = "f")

# Define greyscale palette
grey_palette <- colorRampPalette(c("white", "black"))(100)

# Plot with dendrograms and formatted labels
pheatmap(co_clustering,
         display_numbers = co_clustering_labels,
         number_format = "%.2f",
         fontsize_number = 10,
         clustering_distance_rows = as.dist(1 - co_clustering),
         clustering_distance_cols = as.dist(1 - co_clustering),
         clustering_method = "average",
         color = grey_palette,
         main = "Co-Clustering Matrix (Greyscale with 2-Decimal Labels)")

Check gamma acceptance rate:

mean(attr(chains2dDP[[1]], "gamma_accept_rate"), na.rm=TRUE)
## [1] 0.3646094

Gamma acceptance rates are low but acceptable with tuning proposals for this model

Check model convergence

# For 2D-DP model
gelman_diag_2dDP <- gelman.diag(mcmc_list2dDP)
max_rhat_2dDP <- max(gelman_diag_2dDP$psrf[, "Point est."])

gelman_diag_2dDP
## Potential scale reduction factors:
## 
##             Point est. Upper C.I.
## theta1.1121       1.00       1.00
## theta1.1154       1.00       1.01
## theta1.1222       1.00       1.00
## theta1.1637       1.00       1.00
## theta1.1995       1.00       1.00
## theta1.3369       1.00       1.00
## theta1.4004       1.00       1.00
## theta1.4209       1.00       1.00
## theta1.4268       1.00       1.01
## theta1.4841       1.00       1.00
## theta1.5074       1.00       1.00
## theta1.5495       1.00       1.01
## theta1.5710       1.00       1.01
## theta1.8294       1.00       1.00
## theta1.8980       1.00       1.00
## theta1.9899       1.00       1.00
## theta2.1121       1.00       1.01
## theta2.1154       1.00       1.00
## theta2.1222       1.00       1.01
## theta2.1637       1.00       1.00
## theta2.1995       1.00       1.01
## theta2.3369       1.00       1.00
## theta2.4004       1.00       1.00
## theta2.4209       1.00       1.01
## theta2.4268       1.00       1.01
## theta2.4841       1.00       1.00
## theta2.5074       1.00       1.01
## theta2.5495       1.00       1.01
## theta2.5710       1.00       1.01
## theta2.8294       1.00       1.01
## theta2.8980       1.00       1.00
## theta2.9899       1.00       1.01
## gamma.25          1.00       1.02
## gamma.26          1.01       1.02
## gamma.27          1.01       1.03
## gamma.28          1.00       1.00
## gamma.29          1.01       1.02
## gamma.30          1.00       1.02
## gamma.31          1.00       1.00
## gamma.34          1.01       1.02
## gamma.35          1.01       1.03
## gamma.36          1.01       1.03
## gamma.37          1.01       1.03
## gamma.38          1.01       1.03
## cluster.25        1.00       1.01
## cluster.26        1.00       1.01
## cluster.27        1.00       1.02
## cluster.28        1.00       1.01
## cluster.29        1.00       1.02
## cluster.30        1.00       1.00
## cluster.31        1.01       1.01
## cluster.34        1.00       1.02
## cluster.35        1.01       1.04
## cluster.36        1.01       1.04
## cluster.37        1.01       1.04
## cluster.38        1.01       1.04
## n.clusters        1.00       1.00
## alpha             1.00       1.00
## 
## Multivariate psrf
## 
## 1.03

All Gelman-Rubin statistics are close to 1.0 and the multivariate psrf is also within the range of acceptable.

Summary of MCMC chain output

summary2dDP <- round(
  cbind(
    as.data.frame(summary(mcmc_list2dDP)$statistics),
    as.data.frame(summary(mcmc_list2dDP)$quantiles)
  ), 3)
summary2dDP$Model <- "2dDP"
save(summary2dDP, file = "Out/summary2dDP.Rdata")
summary2dDP
##               Mean    SD Naive SE Time-series SE   2.5%    25%    50%    75%
## theta1.1121  1.227 0.422    0.005          0.008  0.442  0.938  1.212  1.503
## theta1.1154  0.053 0.406    0.005          0.007 -0.724 -0.233  0.049  0.321
## theta1.1222  0.676 0.361    0.005          0.006  0.000  0.431  0.665  0.907
## theta1.1637  0.590 0.364    0.005          0.006 -0.111  0.342  0.581  0.832
## theta1.1995  0.862 0.433    0.006          0.008  0.060  0.562  0.843  1.142
## theta1.3369  1.539 0.451    0.006          0.007  0.654  1.234  1.538  1.839
## theta1.4004 -0.857 0.412    0.005          0.007 -1.698 -1.132 -0.847 -0.571
## theta1.4209 -1.300 0.415    0.005          0.007 -2.142 -1.575 -1.285 -1.015
## theta1.4268 -0.902 0.431    0.006          0.007 -1.769 -1.188 -0.885 -0.598
## theta1.4841 -0.894 0.387    0.005          0.006 -1.656 -1.152 -0.889 -0.630
## theta1.5074  1.222 0.404    0.005          0.007  0.468  0.943  1.208  1.483
## theta1.5495 -2.279 0.465    0.006          0.008 -3.229 -2.590 -2.263 -1.952
## theta1.5710 -0.807 0.400    0.005          0.006 -1.569 -1.081 -0.805 -0.538
## theta1.8294 -0.154 0.368    0.005          0.006 -0.884 -0.397 -0.152  0.095
## theta1.8980  0.655 0.389    0.005          0.006 -0.124  0.403  0.652  0.914
## theta1.9899  0.385 0.381    0.005          0.006 -0.368  0.134  0.380  0.643
## theta2.1121 -1.119 0.360    0.005          0.007 -1.848 -1.357 -1.107 -0.874
## theta2.1154 -1.546 0.360    0.005          0.007 -2.261 -1.783 -1.538 -1.305
## theta2.1222 -0.188 0.331    0.004          0.007 -0.852 -0.406 -0.185  0.039
## theta2.1637 -0.558 0.330    0.004          0.006 -1.221 -0.771 -0.561 -0.334
## theta2.1995 -1.173 0.356    0.005          0.007 -1.892 -1.407 -1.164 -0.930
## theta2.3369  1.270 0.401    0.005          0.007  0.468  1.005  1.272  1.538
## theta2.4004  1.215 0.356    0.005          0.007  0.538  0.970  1.208  1.451
## theta2.4209  0.713 0.357    0.005          0.007  0.021  0.471  0.709  0.942
## theta2.4268  1.028 0.354    0.005          0.006  0.345  0.786  1.023  1.268
## theta2.4841 -0.324 0.341    0.004          0.007 -0.994 -0.553 -0.325 -0.099
## theta2.5074 -1.035 0.365    0.005          0.007 -1.788 -1.276 -1.024 -0.781
## theta2.5495  0.455 0.409    0.005          0.008 -0.274  0.170  0.439  0.719
## theta2.5710 -0.936 0.357    0.005          0.007 -1.629 -1.174 -0.939 -0.702
## theta2.8294  0.747 0.333    0.004          0.007  0.075  0.523  0.747  0.966
## theta2.8980  0.885 0.341    0.004          0.007  0.208  0.659  0.887  1.110
## theta2.9899  0.549 0.336    0.004          0.006 -0.115  0.322  0.558  0.774
## gamma.25     0.859 0.153    0.002          0.003  0.527  0.760  0.873  0.970
## gamma.26     0.887 0.137    0.002          0.003  0.601  0.795  0.894  0.986
## gamma.27     0.888 0.137    0.002          0.003  0.602  0.795  0.895  0.987
## gamma.28     0.455 0.200    0.003          0.004  0.050  0.329  0.465  0.590
## gamma.29     0.897 0.139    0.002          0.003  0.612  0.803  0.904  0.996
## gamma.30     0.349 0.207    0.003          0.004  0.018  0.166  0.358  0.508
## gamma.31     0.129 0.100    0.001          0.001  0.005  0.050  0.107  0.183
## gamma.34     0.900 0.144    0.002          0.003  0.614  0.803  0.904  0.997
## gamma.35     1.424 0.115    0.001          0.002  1.137  1.365  1.448  1.514
## gamma.36     1.423 0.112    0.001          0.002  1.146  1.361  1.446  1.511
## gamma.37     1.423 0.113    0.001          0.002  1.138  1.361  1.447  1.512
## gamma.38     1.437 0.102    0.001          0.002  1.186  1.380  1.457  1.518
## cluster.25   2.514 1.516    0.020          0.037  1.000  1.000  2.000  3.000
## cluster.26   2.432 1.470    0.019          0.039  1.000  1.000  2.000  3.000
## cluster.27   2.450 1.480    0.019          0.038  1.000  1.000  2.000  3.000
## cluster.28   3.299 1.593    0.021          0.027  1.000  2.000  3.000  5.000
## cluster.29   2.475 1.491    0.019          0.037  1.000  1.000  2.000  3.000
## cluster.30   3.258 1.570    0.020          0.029  1.000  2.000  3.000  4.000
## cluster.31   3.579 1.604    0.021          0.030  1.000  2.000  4.000  5.000
## cluster.34   2.472 1.493    0.019          0.038  1.000  1.000  2.000  3.000
## cluster.35   2.675 1.501    0.019          0.039  1.000  1.000  2.000  4.000
## cluster.36   2.670 1.499    0.019          0.039  1.000  1.000  2.000  4.000
## cluster.37   2.673 1.502    0.019          0.037  1.000  1.000  2.000  4.000
## cluster.38   2.680 1.506    0.019          0.038  1.000  1.000  2.000  4.000
## n.clusters   4.476 0.782    0.010          0.012  3.000  4.000  4.000  5.000
## alpha        1.060 0.605    0.008          0.009  0.247  0.618  0.936  1.372
##              97.5% Model
## theta1.1121  2.093  2dDP
## theta1.1154  0.882  2dDP
## theta1.1222  1.402  2dDP
## theta1.1637  1.316  2dDP
## theta1.1995  1.746  2dDP
## theta1.3369  2.422  2dDP
## theta1.4004 -0.090  2dDP
## theta1.4209 -0.512  2dDP
## theta1.4268 -0.100  2dDP
## theta1.4841 -0.134  2dDP
## theta1.5074  2.048  2dDP
## theta1.5495 -1.404  2dDP
## theta1.5710 -0.014  2dDP
## theta1.8294  0.557  2dDP
## theta1.8980  1.417  2dDP
## theta1.9899  1.134  2dDP
## theta2.1121 -0.443  2dDP
## theta2.1154 -0.850  2dDP
## theta2.1222  0.459  2dDP
## theta2.1637  0.075  2dDP
## theta2.1995 -0.499  2dDP
## theta2.3369  2.038  2dDP
## theta2.4004  1.927  2dDP
## theta2.4209  1.443  2dDP
## theta2.4268  1.726  2dDP
## theta2.4841  0.354  2dDP
## theta2.5074 -0.352  2dDP
## theta2.5495  1.320  2dDP
## theta2.5710 -0.216  2dDP
## theta2.8294  1.408  2dDP
## theta2.8980  1.553  2dDP
## theta2.9899  1.213  2dDP
## gamma.25     1.115  2dDP
## gamma.26     1.130  2dDP
## gamma.27     1.131  2dDP
## gamma.28     0.842  2dDP
## gamma.29     1.143  2dDP
## gamma.30     0.732  2dDP
## gamma.31     0.372  2dDP
## gamma.34     1.158  2dDP
## gamma.35     1.564  2dDP
## gamma.36     1.565  2dDP
## gamma.37     1.564  2dDP
## gamma.38     1.565  2dDP
## cluster.25   6.000  2dDP
## cluster.26   6.000  2dDP
## cluster.27   6.000  2dDP
## cluster.28   6.000  2dDP
## cluster.29   6.000  2dDP
## cluster.30   6.000  2dDP
## cluster.31   6.000  2dDP
## cluster.34   6.000  2dDP
## cluster.35   6.000  2dDP
## cluster.36   6.000  2dDP
## cluster.37   6.000  2dDP
## cluster.38   6.000  2dDP
## n.clusters   6.000  2dDP
## alpha        2.560  2dDP

With a max of 6 clusters and 100 iterations for cluster at each stored iteration if the main MCMC chain, with a0=.5 and b0=2 strong clustering, the co-clustering matrix with dendrogram matches the visual plot of gamma medians precisely.

Check for high autocorrelation

effectiveSize(mcmc_list2dDP)
## theta1.1121 theta1.1154 theta1.1222 theta1.1637 theta1.1995 theta1.3369 
##    3021.321    3363.001    3902.996    3610.704    3039.552    4777.772 
## theta1.4004 theta1.4209 theta1.4268 theta1.4841 theta1.5074 theta1.5495 
##    3454.964    3531.749    3393.431    4092.856    3282.214    3691.647 
## theta1.5710 theta1.8294 theta1.8980 theta1.9899 theta2.1121 theta2.1154 
##    4232.548    3942.342    4048.296    3747.791    2784.995    3048.856 
## theta2.1222 theta2.1637 theta2.1995 theta2.3369 theta2.4004 theta2.4209 
##    2525.693    2738.471    3002.380    3017.804    2559.721    2724.787 
## theta2.4268 theta2.4841 theta2.5074 theta2.5495 theta2.5710 theta2.8294 
##    3066.203    2745.997    2650.245    2499.173    2882.105    2478.259 
## theta2.8980 theta2.9899    gamma.25    gamma.26    gamma.27    gamma.28 
##    2741.557    2871.350    2265.186    2112.832    2077.127    2717.815 
##    gamma.29    gamma.30    gamma.31    gamma.34    gamma.35    gamma.36 
##    2007.122    2465.693    5653.283    2172.651    2754.706    2752.154 
##    gamma.37    gamma.38  cluster.25  cluster.26  cluster.27  cluster.28 
##    2638.008    2794.730    1768.676    1487.658    1561.179    3541.323 
##  cluster.29  cluster.30  cluster.31  cluster.34  cluster.35  cluster.36 
##    1629.082    3017.058    2928.989    1561.633    1506.192    1510.300 
##  cluster.37  cluster.38  n.clusters       alpha 
##    1682.182    1537.260    4507.174    4940.833
print(summarize_draws(mcmc_list2dDP)[, c("variable", "ess_bulk")], n=58)
## # A tibble: 58 × 2
##    variable    ess_bulk
##    <chr>          <dbl>
##  1 theta1.1121    2919.
##  2 theta1.1154    3203.
##  3 theta1.1222    3612.
##  4 theta1.1637    3437.
##  5 theta1.1995    3015.
##  6 theta1.3369    4600.
##  7 theta1.4004    3369.
##  8 theta1.4209    3587.
##  9 theta1.4268    3456.
## 10 theta1.4841    3929.
## 11 theta1.5074    3333.
## 12 theta1.5495    3528.
## 13 theta1.5710    4184.
## 14 theta1.8294    4005.
## 15 theta1.8980    3904.
## 16 theta1.9899    3678.
## 17 theta2.1121    2806.
## 18 theta2.1154    3066.
## 19 theta2.1222    2549.
## 20 theta2.1637    2580.
## 21 theta2.1995    2738.
## 22 theta2.3369    2950.
## 23 theta2.4004    2710.
## 24 theta2.4209    2678.
## 25 theta2.4268    2722.
## 26 theta2.4841    2672.
## 27 theta2.5074    2647.
## 28 theta2.5495    2626.
## 29 theta2.5710    2808.
## 30 theta2.8294    2614.
## 31 theta2.8980    2589.
## 32 theta2.9899    2712.
## 33 gamma.25       2289.
## 34 gamma.26       2009.
## 35 gamma.27       1974.
## 36 gamma.28       2678.
## 37 gamma.29       2047.
## 38 gamma.30       2552.
## 39 gamma.31       5534.
## 40 gamma.34       2058.
## 41 gamma.35       3091.
## 42 gamma.36       3041.
## 43 gamma.37       2944.
## 44 gamma.38       3161.
## 45 cluster.25     1332.
## 46 cluster.26     1206.
## 47 cluster.27     1173.
## 48 cluster.28     3275.
## 49 cluster.29     1196.
## 50 cluster.30     2963.
## 51 cluster.31     2534.
## 52 cluster.34     1229.
## 53 cluster.35     1147.
## 54 cluster.36     1198.
## 55 cluster.37     1187.
## 56 cluster.38     1185.
## 57 n.clusters     4306.
## 58 alpha          4806.

Chain length increased to sufficient ESS above 1000 for all variables.

See companion page for plots of 2dDP model “PWC 2dDP MCMCpack CDAT”

https://rpubs.com/steveneheil/mcmc2dDP

Model Checking

From Levy & Mislevy (2016) Ch. 10.2.4 & 10.2.5 Posterior Predictive Model Checking

  1. What is the feature of the data that is of interest?

The common feature across the 2-dimensional models are (a) posterior thetas of each item’s location in 2-dimensional latent space and precision of that distribution; and (b) the posterior gammas of each rater.

The common feature of all models is the choice of one item over another based on th model-implied latent traits.

  1. What do the models imply about how thetas and gammas ought to be?

The prediction of choices based on model-implied latent variables should be close obserbed choices.

  1. How do choices of items found in the data compare to what the model implies about the raters’ choices?

The selection of the left item (j1) over the second (j2) is the test statistic to be examined with Posterior Predictive Model Checking.

PPMC with proportion of choice of item in a comparison as test statistic

(See Levy & Mislevy, 2016, pp. 237-239).

2d models with both thetas and a gamma governing rater preference For each posterior sample of \(\theta_{1}\), \(\theta_{2}\), and \(\gamma\), simulate responses.

A function to simulate responses using 2d model

First, a function that will - combine all chains into one list of samples - iterate 200 sets of simulated choices based on comparisons that match the rater and theta IDs of the observed comparisons - for each iteration draw one value from all chains of \(\theta_{1}\), \(\theta_{2}\), and \(\gamma\). - for each comparison of the 960 observations, extract the two items and the rater for a simulated choice. - find the difference between the two items’ thetas according to the model - add the two thetas in proportion to the gamma for the variable \(\eta\) - use eta as input to the probit link function to find the probability of choosing j1 over j2 - make a random draw of 1 or 0 governed by the binomial distribution and probability computed from eta - add the 1 or 0 to a matrix of choices of j1 over j2 - return the simulated choices for the 960 comparisons 200 times

simulate_pp_data_mcmclist2d <- function(mcmc_list, pair_data) {
  # Combine chains into one matrix
  all_samples <- do.call(rbind, mcmc_list)
  n_obs <- nrow(pair_data)
  n_rep = 200
  sim_choices <- matrix(NA, nrow = n_rep, ncol = n_obs)
  
  # Identify column names for theta1, theta2, gamma
  theta1_names <- grep("^theta1\\.", colnames(all_samples), value = TRUE)
  theta2_names <- grep("^theta2\\.", colnames(all_samples), value = TRUE)
  gamma_names  <- grep("^gamma\\.", colnames(all_samples), value = TRUE)

  for (r in 1:n_rep) {                     # Row by row iteration
    s <- sample(1:nrow(all_samples), 1)    # Draw from any chain
    theta1 <- all_samples[s, theta1_names] # Draw one value for theta1
    theta2 <- all_samples[s, theta2_names] # Draw one value for theta2
    gamma  <- all_samples[s, gamma_names]  # Draw one value for gamma

    for (i in 1:n_obs) {                        # Column by column iteration
      j1 <- as.character(pair_data$itemj1[i])   # Locate the first item number for that row from the observed
      j2 <- as.character(pair_data$itemj2[i])   # Locate the second item number for that row from the observed
      rater <- as.character(pair_data$rater[i]) # Locate the rater for that row from the observed
      # Find the model-implied diff between sampled theta1 of 2 items
      diff1 <- theta1[paste0("theta1.", j1)] - theta1[paste0("theta1.", j2)] 
      # Find the model-implied diff between sampled theta2 of 2 items
      diff2 <- theta2[paste0("theta2.", j1)] - theta2[paste0("theta2.", j2)] 
      g <- gamma[paste0("gamma.", rater)]       # Find the model-implied sampled gamma of the rater
      # eta will be the input to the probit link function
      # cos() & sin() assume angles are in radians as they are
      zi1 <- cos(g) 
      zi2 <- sin(g)
      # Add thetas in proportion to sine or cosine of gamma, representing how much more favorable j1 is than j2
      eta <- zi1 * diff1 + zi2 * diff2 
      # If eta is negative, then j2 is preferrable. If close to 0, then it's a toss-up.
      p <- pnorm(eta) # transform eta using the probit link to the probability j1 is chosen over j2.
      # If p is <.5 preference for j2, p = .5 no preference, p > .5 preference for j1.
      # Generate 1 draw of 0 or 1 based on p, and add it to the sim_choices matrix
      sim_choices[r, i] <- rbinom(n = 1, size = 1, p) 
    }
  }
  # return a set of binary choices where 1 is a choice of j1 and 0 is a choice of j2
  return(sim_choices) 
} 

Simulate data based on model

Run the function to simulate 200 sets of observed choices for each 2d model Each data set is a row in a matrix

First simulated data sets for the 2d without clusters

sim_2d <- simulate_pp_data_mcmclist2d(mcmc_list2d, pairDataCDAT)

Next simulate data sets for the 2d with clusters

sim_2dDP <- simulate_pp_data_mcmclist2d(mcmc_list2dDP, pairDataCDAT)

Convert observed item ID choices to 1/0 for comparison to simulated data. If itemj1 is chosen, then 1, if j2 is chosen, then 0.

#If choice is first item, 1, if choice is second item, 0
observed_binary <- ifelse(pairDataCDAT$choice == pairDataCDAT$itemj1, 1,
                          ifelse(pairDataCDAT$choice == pairDataCDAT$itemj2, 0, NA))

A function to compute the p-value of observed and simulated choices - given the binary version of the observed data, find the mean (proportion of j1 choices) for each simulated data set - compute the proportion of simulated data sets that have greater than or equal to the proportion of j1 choices in observed

compute_ppp_proportion <- function(observed_choices, sim_choices_matrix) {
  obs_prop <- mean(observed_choices)
  
  # The mean of the row is the proportion of same choice for that simulated data set
  sim_prop <- rowMeans(sim_choices_matrix) 
  sim_correct <- rowMeans(sim_choices_matrix == matrix(observed_choices,
                                                       nrow = nrow(sim_choices_matrix), 
                                                       ncol = length(observed_choices), 
                                                       byrow = TRUE))
  
  # Posterior predictive p-value
  ppp <- mean(sim_prop >= obs_prop)
  
  list(
    observed = obs_prop,
    simulated = sim_prop,
    simulated_correct = sim_correct,
    ppp = ppp
  )
}

Compute Posterior Predictive P-value (PPP) for each 2d model

2d with normal prior

ppp2d <- compute_ppp_proportion(observed_binary, sim_2d)
ppp2d
## $observed
## [1] 0.5152198
## 
## $simulated
##   [1] 0.5197294 0.5107103 0.5129651 0.5028185 0.5298760 0.5140924 0.4904171
##   [8] 0.5242390 0.5163472 0.5107103 0.5016911 0.4836528 0.5129651 0.5332582
##  [15] 0.5163472 0.5062007 0.5231116 0.4813980 0.5016911 0.5490417 0.4994363
##  [22] 0.5264938 0.4960541 0.5095829 0.5039459 0.5028185 0.4847802 0.5231116
##  [29] 0.4915445 0.5073281 0.5084555 0.5062007 0.5062007 0.5028185 0.5084555
##  [36] 0.5084555 0.4825254 0.4904171 0.5422773 0.5219842 0.5062007 0.4937993
##  [43] 0.5219842 0.4723788 0.4915445 0.5118377 0.4712514 0.5062007 0.5118377
##  [50] 0.5163472 0.5095829 0.4904171 0.4813980 0.4926719 0.5062007 0.4847802
##  [57] 0.5152198 0.5231116 0.4971815 0.5050733 0.5253664 0.5208568 0.5095829
##  [64] 0.5073281 0.4971815 0.5174746 0.5016911 0.4859076 0.4926719 0.5129651
##  [71] 0.4983089 0.4937993 0.4983089 0.4678692 0.5084555 0.5062007 0.4994363
##  [78] 0.5231116 0.5107103 0.5219842 0.5186020 0.5062007 0.5107103 0.4983089
##  [85] 0.5343856 0.5005637 0.5287486 0.5107103 0.5062007 0.5118377 0.5005637
##  [92] 0.5107103 0.5140924 0.4983089 0.5174746 0.5264938 0.5343856 0.5084555
##  [99] 0.5050733 0.5028185 0.5107103 0.5388952 0.5242390 0.4881623 0.5016911
## [106] 0.5095829 0.5039459 0.5174746 0.5174746 0.5107103 0.4960541 0.5005637
## [113] 0.5050733 0.5163472 0.5028185 0.5016911 0.4791432 0.5073281 0.4949267
## [120] 0.4960541 0.4971815 0.5107103 0.5039459 0.4926719 0.4983089 0.5208568
## [127] 0.5287486 0.4904171 0.5129651 0.5434047 0.5186020 0.5095829 0.4983089
## [134] 0.5062007 0.4881623 0.5310034 0.5062007 0.5073281 0.5039459 0.5163472
## [141] 0.4847802 0.5129651 0.5129651 0.4915445 0.4949267 0.4780158 0.5039459
## [148] 0.4735062 0.5287486 0.5095829 0.4949267 0.5095829 0.5039459 0.5050733
## [155] 0.5434047 0.4802706 0.5186020 0.5152198 0.5039459 0.5219842 0.5366404
## [162] 0.4836528 0.4813980 0.5152198 0.4971815 0.4689966 0.4960541 0.4937993
## [169] 0.4791432 0.5062007 0.5242390 0.4994363 0.5005637 0.4791432 0.5084555
## [176] 0.4994363 0.5005637 0.5084555 0.5016911 0.5186020 0.4847802 0.4836528
## [183] 0.5050733 0.4937993 0.5253664 0.5118377 0.4937993 0.5073281 0.5084555
## [190] 0.4847802 0.4881623 0.4780158 0.4904171 0.4892897 0.5062007 0.4937993
## [197] 0.4802706 0.4949267 0.4915445 0.5005637
## 
## $simulated_correct
##   [1] 0.6527621 0.6843292 0.6865840 0.6561443 0.6922210 0.6944758 0.6753100
##   [8] 0.6685457 0.6967306 0.6888388 0.6685457 0.6572717 0.6730552 0.6775648
##  [15] 0.6448703 0.6820744 0.6764374 0.6617813 0.6888388 0.6640361 0.6933484
##  [22] 0.6843292 0.6674183 0.6629087 0.6527621 0.6651635 0.6944758 0.6809470
##  [29] 0.6967306 0.7012401 0.6910936 0.6595265 0.6640361 0.6809470 0.6595265
##  [36] 0.6820744 0.7080045 0.6843292 0.6843292 0.7023675 0.6775648 0.6741826
##  [43] 0.6775648 0.6843292 0.6809470 0.6989853 0.6809470 0.6753100 0.6877114
##  [50] 0.6809470 0.6403608 0.6775648 0.6933484 0.6910936 0.6482525 0.6786922
##  [57] 0.6865840 0.6583991 0.6572717 0.6516347 0.6719278 0.6764374 0.6899662
##  [64] 0.6696731 0.6685457 0.6910936 0.6685457 0.6640361 0.6775648 0.6595265
##  [71] 0.6967306 0.6741826 0.6674183 0.6775648 0.6662909 0.6685457 0.6910936
##  [78] 0.6696731 0.6572717 0.6753100 0.7170237 0.6550169 0.6572717 0.6719278
##  [85] 0.6719278 0.6764374 0.6888388 0.6730552 0.6662909 0.6786922 0.6944758
##  [92] 0.6843292 0.6832018 0.6764374 0.6753100 0.6708005 0.6741826 0.6820744
##  [99] 0.6877114 0.6764374 0.6933484 0.7057497 0.6459977 0.6572717 0.6708005
## [106] 0.6561443 0.7001127 0.6933484 0.6640361 0.6978579 0.6719278 0.6674183
## [113] 0.6629087 0.6741826 0.6741826 0.6482525 0.6414882 0.7034949 0.6572717
## [120] 0.6809470 0.6888388 0.6685457 0.6685457 0.6572717 0.7012401 0.7080045
## [127] 0.7113867 0.6730552 0.6933484 0.6651635 0.6651635 0.6741826 0.6629087
## [134] 0.6617813 0.6888388 0.6978579 0.6617813 0.6696731 0.6775648 0.6583991
## [141] 0.6516347 0.6753100 0.7001127 0.6809470 0.6910936 0.6741826 0.6798196
## [148] 0.6629087 0.6798196 0.6696731 0.6708005 0.6516347 0.6595265 0.6854566
## [155] 0.6448703 0.6899662 0.6561443 0.6978579 0.6595265 0.7001127 0.6786922
## [162] 0.6888388 0.6572717 0.6550169 0.6888388 0.6741826 0.6922210 0.6696731
## [169] 0.6437430 0.6775648 0.6865840 0.6550169 0.6854566 0.6775648 0.6595265
## [176] 0.6617813 0.6877114 0.6685457 0.6820744 0.6877114 0.6583991 0.6708005
## [183] 0.6629087 0.6561443 0.6786922 0.6944758 0.6674183 0.6741826 0.6414882
## [190] 0.6516347 0.6730552 0.6561443 0.6753100 0.6516347 0.6978579 0.6809470
## [197] 0.6629087 0.6730552 0.6944758 0.6741826
## 
## $ppp
## [1] 0.24
summary(ppp2d$simulated_correct)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6404  0.6629  0.6747  0.6751  0.6869  0.7170

2d model with DP prior and clusters:

ppp2dDP <- compute_ppp_proportion(observed_binary, sim_2dDP)
ppp2dDP
## $observed
## [1] 0.5152198
## 
## $simulated
##   [1] 0.5129651 0.5219842 0.5129651 0.5219842 0.5016911 0.5062007 0.5129651
##   [8] 0.5231116 0.5062007 0.5107103 0.5095829 0.5107103 0.5208568 0.5253664
##  [15] 0.5016911 0.5174746 0.5219842 0.4847802 0.5118377 0.5016911 0.5118377
##  [22] 0.5140924 0.5186020 0.5355130 0.5152198 0.5028185 0.4971815 0.5107103
##  [29] 0.5152198 0.5174746 0.5163472 0.4904171 0.4983089 0.5140924 0.4881623
##  [36] 0.5118377 0.5039459 0.4892897 0.4735062 0.5174746 0.5005637 0.5242390
##  [43] 0.5332582 0.5163472 0.4994363 0.5129651 0.5152198 0.5095829 0.4836528
##  [50] 0.4949267 0.4994363 0.5163472 0.5129651 0.5084555 0.4757610 0.5039459
##  [57] 0.5062007 0.5422773 0.5118377 0.4926719 0.4881623 0.5028185 0.5163472
##  [64] 0.5118377 0.5129651 0.4994363 0.5005637 0.5197294 0.5197294 0.4881623
##  [71] 0.5084555 0.5129651 0.5140924 0.5050733 0.4870349 0.4971815 0.5174746
##  [78] 0.4667418 0.4937993 0.5084555 0.4870349 0.5016911 0.5107103 0.5400225
##  [85] 0.5062007 0.5039459 0.4859076 0.5310034 0.5005637 0.5095829 0.4960541
##  [92] 0.5005637 0.4994363 0.5163472 0.5016911 0.4892897 0.4983089 0.4949267
##  [99] 0.5152198 0.5118377 0.4723788 0.5039459 0.4994363 0.5287486 0.5129651
## [106] 0.4802706 0.5152198 0.4960541 0.4949267 0.5039459 0.4847802 0.5050733
## [113] 0.4701240 0.5073281 0.5129651 0.5050733 0.4915445 0.4994363 0.4870349
## [120] 0.5073281 0.4735062 0.4994363 0.4994363 0.5005637 0.5095829 0.5140924
## [127] 0.5084555 0.5253664 0.5107103 0.5242390 0.5129651 0.4926719 0.5118377
## [134] 0.5028185 0.5005637 0.5028185 0.5321308 0.4926719 0.4949267 0.5084555
## [141] 0.5095829 0.4937993 0.5016911 0.5174746 0.5107103 0.5186020 0.5084555
## [148] 0.5016911 0.4994363 0.4960541 0.5039459 0.5062007 0.5140924 0.5107103
## [155] 0.5084555 0.4915445 0.4892897 0.5050733 0.5016911 0.5016911 0.4960541
## [162] 0.5264938 0.5073281 0.5107103 0.5242390 0.5016911 0.4813980 0.5242390
## [169] 0.4881623 0.4825254 0.5039459 0.4870349 0.5118377 0.5174746 0.5186020
## [176] 0.5107103 0.4926719 0.5016911 0.5107103 0.4915445 0.5073281 0.4881623
## [183] 0.5321308 0.4994363 0.5129651 0.5242390 0.4689966 0.5016911 0.5028185
## [190] 0.5062007 0.5118377 0.5118377 0.4881623 0.4960541 0.4678692 0.4960541
## [197] 0.4949267 0.5062007 0.4937993 0.5062007
## 
## $simulated_correct
##   [1] 0.6595265 0.6798196 0.6843292 0.6820744 0.6640361 0.6662909 0.7001127
##   [8] 0.7057497 0.6662909 0.6640361 0.6583991 0.6437430 0.6764374 0.6899662
##  [15] 0.6843292 0.6685457 0.6820744 0.6967306 0.6786922 0.6595265 0.6471251
##  [22] 0.6764374 0.6832018 0.6662909 0.6933484 0.6944758 0.6753100 0.6753100
##  [29] 0.6933484 0.6843292 0.6651635 0.7091319 0.6809470 0.6877114 0.7068771
##  [36] 0.6674183 0.6956032 0.6674183 0.6854566 0.7023675 0.6651635 0.6753100
##  [43] 0.6843292 0.6516347 0.6798196 0.6888388 0.6820744 0.6606539 0.6888388
##  [50] 0.6910936 0.6617813 0.6854566 0.6482525 0.6798196 0.6719278 0.6640361
##  [57] 0.6685457 0.6640361 0.6651635 0.6753100 0.6933484 0.6493799 0.6967306
##  [64] 0.6786922 0.6956032 0.6662909 0.6809470 0.6662909 0.6888388 0.6753100
##  [71] 0.6843292 0.6708005 0.6583991 0.6583991 0.6832018 0.6753100 0.6640361
##  [78] 0.6899662 0.6629087 0.7068771 0.6809470 0.6685457 0.6708005 0.6708005
##  [85] 0.6708005 0.6459977 0.6595265 0.6820744 0.6877114 0.6809470 0.6922210
##  [92] 0.6606539 0.6753100 0.6809470 0.6640361 0.6786922 0.6651635 0.6753100
##  [99] 0.6550169 0.6877114 0.6708005 0.6617813 0.6550169 0.6753100 0.6640361
## [106] 0.6832018 0.6775648 0.6381060 0.6662909 0.6753100 0.6922210 0.6989853
## [113] 0.7023675 0.6741826 0.6595265 0.6583991 0.6786922 0.6753100 0.6944758
## [120] 0.7012401 0.6899662 0.6865840 0.6595265 0.6696731 0.6448703 0.6696731
## [127] 0.6685457 0.6516347 0.6798196 0.6617813 0.6933484 0.6888388 0.6426156
## [134] 0.6674183 0.6809470 0.6719278 0.6854566 0.7023675 0.6708005 0.6505073
## [141] 0.6719278 0.7034949 0.6843292 0.6572717 0.6662909 0.6719278 0.6820744
## [148] 0.6843292 0.7068771 0.6832018 0.6662909 0.6933484 0.6606539 0.6888388
## [155] 0.6775648 0.7057497 0.6786922 0.6899662 0.6640361 0.6933484 0.6741826
## [162] 0.7001127 0.6944758 0.6753100 0.6662909 0.6685457 0.6910936 0.6550169
## [169] 0.6617813 0.6606539 0.6708005 0.6989853 0.6719278 0.6730552 0.6809470
## [176] 0.6798196 0.6324690 0.6662909 0.6933484 0.6786922 0.6786922 0.6798196
## [183] 0.6786922 0.6685457 0.6775648 0.6888388 0.6944758 0.6978579 0.6899662
## [190] 0.6775648 0.6651635 0.6606539 0.6572717 0.6809470 0.6595265 0.6786922
## [197] 0.6888388 0.6617813 0.6741826 0.6843292
## 
## $ppp
## [1] 0.21
summary(ppp2dDP$simulated_correct)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6325  0.6652  0.6759  0.6761  0.6869  0.7091

1d model with a single theta and alpha governing choices

Let’s check the prediction of choices using the 1d model next. For each posterior sample of \(\theta\) and \(\alpha\), simulate responses.

A function to simulate responses using 1d model

simulate_pp_data_mcmclist1d <- function(mcmc_list, pair_data) {
  # Combine chains into one matrix
  all_samples <- do.call(rbind, mcmc_list)
  n_obs <- nrow(pair_data)
  n_rep = 200
  sim_choices <- matrix(NA, nrow = n_rep, ncol = n_obs)
  
  # Identify column names for theta1, theta2, gamma
  theta_names <- grep("^theta\\.", colnames(all_samples), value = TRUE)
  alpha_names  <- grep("^alpha\\.", colnames(all_samples), value = TRUE)
  
  for (r in 1:n_rep) {
    s <- sample(1:nrow(all_samples), 1)  # Select one draw from any chain in combined samples
    theta <- all_samples[s, theta_names] # Draw one value for theta
    alpha  <- all_samples[s, alpha_names] # Draw one value for alpha

    for (i in 1:n_obs) {
      j1 <- as.character(pair_data$itemj1[i])
      j2 <- as.character(pair_data$itemj2[i])
      rater <- as.character(pair_data$rater[i])

      # Safely extract theta/gamma values using column names

      diff <- theta[paste0("theta.", j1)] - theta[paste0("theta.", j2)]
      a <- alpha[paste0("alpha.", rater)]

      eta <- a * diff # Add thetas
      p <- pnorm(eta) # probit link is a normal distribution
      sim_choices[r, i] <- rbinom(1, 1, p)
    }
  }
  return(sim_choices)
}

Run the function for 1d model

sim_1d <- simulate_pp_data_mcmclist1d(mcmc_list1d, pairDataCDAT)

Compute Posterior Predictive P-value (PPP) for 1d model

ppp1d <- compute_ppp_proportion(observed_binary, sim_1d)
ppp1d
## $observed
## [1] 0.5152198
## 
## $simulated
##   [1] 0.5264938 0.4949267 0.5107103 0.5118377 0.5084555 0.5174746 0.5050733
##   [8] 0.5005637 0.4971815 0.4937993 0.5118377 0.4971815 0.5140924 0.5028185
##  [15] 0.5095829 0.4937993 0.5129651 0.5298760 0.5062007 0.5107103 0.4937993
##  [22] 0.5140924 0.5174746 0.5084555 0.5321308 0.5050733 0.4836528 0.5219842
##  [29] 0.5084555 0.5039459 0.5186020 0.5197294 0.5028185 0.4735062 0.5197294
##  [36] 0.5050733 0.5253664 0.4960541 0.5332582 0.5095829 0.5140924 0.5095829
##  [43] 0.4960541 0.5095829 0.5118377 0.4937993 0.4983089 0.5118377 0.5140924
##  [50] 0.4870349 0.5163472 0.5005637 0.5186020 0.5016911 0.4926719 0.4667418
##  [57] 0.4960541 0.5016911 0.5050733 0.4926719 0.4870349 0.5050733 0.4926719
##  [64] 0.4949267 0.5253664 0.4960541 0.5343856 0.5084555 0.5332582 0.4892897
##  [71] 0.5005637 0.5084555 0.4904171 0.4937993 0.4881623 0.5186020 0.4881623
##  [78] 0.4881623 0.5084555 0.4994363 0.5208568 0.4859076 0.4836528 0.5129651
##  [85] 0.5219842 0.5073281 0.5253664 0.4983089 0.5073281 0.4915445 0.4983089
##  [92] 0.5118377 0.4847802 0.5107103 0.4915445 0.5107103 0.5016911 0.4971815
##  [99] 0.5016911 0.4825254 0.4892897 0.5332582 0.5129651 0.5129651 0.5050733
## [106] 0.4994363 0.5118377 0.5118377 0.4904171 0.4994363 0.5140924 0.5050733
## [113] 0.4712514 0.4689966 0.5095829 0.4847802 0.4937993 0.5095829 0.5050733
## [120] 0.5231116 0.5028185 0.5163472 0.5016911 0.4994363 0.4926719 0.5084555
## [127] 0.4949267 0.5095829 0.4949267 0.5208568 0.5129651 0.4847802 0.5084555
## [134] 0.4791432 0.5298760 0.4870349 0.5016911 0.4937993 0.5152198 0.4802706
## [141] 0.4780158 0.4971815 0.4937993 0.5174746 0.5095829 0.5062007 0.5118377
## [148] 0.5174746 0.4554679 0.4949267 0.5163472 0.4746336 0.5084555 0.5107103
## [155] 0.5005637 0.5084555 0.4791432 0.4859076 0.5445321 0.4813980 0.4892897
## [162] 0.4971815 0.5219842 0.5163472 0.4791432 0.4813980 0.5152198 0.5028185
## [169] 0.5197294 0.5005637 0.4892897 0.4926719 0.5197294 0.4983089 0.5050733
## [176] 0.4904171 0.5174746 0.4994363 0.5231116 0.4949267 0.4825254 0.5197294
## [183] 0.4971815 0.5186020 0.5005637 0.5253664 0.4847802 0.5140924 0.5039459
## [190] 0.5084555 0.5186020 0.4836528 0.5231116 0.5005637 0.4994363 0.4915445
## [197] 0.4904171 0.5062007 0.4825254 0.5084555
## 
## $simulated_correct
##   [1] 0.6211950 0.5963923 0.6211950 0.6178129 0.5896280 0.6234498 0.6448703
##   [8] 0.6223224 0.6144307 0.6042841 0.5794814 0.6279594 0.6178129 0.5997745
##  [15] 0.6155581 0.6133033 0.5986471 0.6178129 0.6189402 0.6054115 0.6268320
##  [22] 0.6268320 0.6324690 0.6414882 0.6133033 0.6290868 0.6144307 0.6347238
##  [29] 0.6166855 0.6324690 0.6290868 0.6347238 0.6087937 0.6042841 0.6369786
##  [36] 0.6133033 0.6245772 0.5794814 0.6347238 0.6313416 0.6268320 0.6335964
##  [43] 0.6448703 0.6200676 0.6042841 0.6313416 0.5975197 0.6133033 0.6223224
##  [50] 0.6020293 0.6223224 0.5862458 0.6290868 0.6121759 0.6324690 0.6133033
##  [57] 0.6223224 0.6099211 0.6200676 0.6324690 0.6065389 0.6335964 0.6211950
##  [64] 0.6234498 0.6133033 0.6178129 0.6268320 0.6369786 0.5963923 0.6381060
##  [71] 0.6245772 0.5986471 0.6302142 0.5885006 0.6099211 0.6223224 0.6099211
##  [78] 0.5828636 0.5941375 0.6054115 0.5839910 0.5918828 0.6347238 0.6437430
##  [85] 0.6302142 0.6245772 0.6155581 0.6245772 0.5907554 0.5930101 0.6133033
##  [92] 0.6358512 0.6155581 0.6054115 0.6087937 0.5918828 0.5738444 0.6189402
##  [99] 0.6121759 0.6155581 0.6065389 0.6166855 0.6257046 0.6257046 0.6178129
## [106] 0.6392334 0.6110485 0.6223224 0.6189402 0.5963923 0.6223224 0.5907554
## [113] 0.6245772 0.5975197 0.6133033 0.6335964 0.6178129 0.6155581 0.6290868
## [120] 0.6110485 0.5907554 0.6313416 0.6054115 0.6121759 0.6189402 0.6302142
## [127] 0.6166855 0.6223224 0.6211950 0.6065389 0.6031567 0.6110485 0.6369786
## [134] 0.6099211 0.5704622 0.6313416 0.6054115 0.6200676 0.6211950 0.5997745
## [141] 0.5952649 0.6166855 0.6426156 0.6166855 0.6155581 0.6347238 0.6268320
## [148] 0.5896280 0.6200676 0.6031567 0.6065389 0.6189402 0.5986471 0.5986471
## [155] 0.6065389 0.6234498 0.6414882 0.6054115 0.6369786 0.6279594 0.6065389
## [162] 0.6054115 0.6031567 0.6358512 0.6099211 0.6234498 0.6324690 0.5862458
## [169] 0.6009019 0.6178129 0.5975197 0.6121759 0.5986471 0.6358512 0.6178129
## [176] 0.6009019 0.6459977 0.6189402 0.5997745 0.6099211 0.6471251 0.6257046
## [183] 0.6257046 0.6087937 0.6200676 0.6155581 0.5975197 0.6133033 0.6054115
## [190] 0.6302142 0.6200676 0.5806088 0.6178129 0.5997745 0.6527621 0.6268320
## [197] 0.6121759 0.6482525 0.6245772 0.6099211
## 
## $ppp
## [1] 0.21
summary(ppp1d$simulated_correct)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5705  0.6054  0.6172  0.6158  0.6268  0.6528

Plot the observed proportion of selection of itemj1 in each comparison to the distribution of simulated selection of itemj1 in each comparison.

# Density of simulated proportions
dens <- density(ppp1d$simulated)

# Plot
plot(dens, main = "Posterior Predictive Density (1D Model)",
     xlab = "Simulated Proportion of itemj1 Chosen",
     ylab = "Density", lwd = 2, col = "steelblue")

# Add observed value as a vertical red line
abline(v = ppp1d$observed, col = "red", lwd = 2, lty = 2)

# Add legend
legend("topright", legend = "Observed Proportion", col = "red", lty = 2, lwd = 2)

Plot simulated and observed proportions

Plot the density of the 2d model simulated proportions and the observed proportion (see Levy & Mislevy, 2016, pp. 238-239)

# Density of simulated proportions
dens <- density(ppp2d$simulated)

# Plot
plot(dens, main = "Posterior Predictive Density (2D Normal Prior Model)",
     xlab = "Simulated Proportion of itemj1 Chosen",
     ylab = "Density", lwd = 2, col = "steelblue")

# Add observed value as a vertical red line
abline(v = ppp2d$observed, col = "red", lwd = 2, lty = 2)

# Add legend
legend("topright", legend = "Observed Proportion", col = "red", lty = 2, lwd = 2)

Plot the density of the 2dDP model simulated proportions and the observed proportion

# Density of simulated proportions
dens <- density(ppp2dDP$simulated)

# Plot
plot(dens, main = "Posterior Predictive Density (2D DP Prior Model)",
     xlab = "Simulated Proportion of itemj1 Chosen",
     ylab = "Density", lwd = 2, col = "steelblue")

# Add observed value as a vertical red line
abline(v = ppp2dDP$observed, col = "red", lwd = 2, lty = 2)

# Add legend
legend("topright", legend = "Observed Proportion", col = "red", lty = 2, lwd = 2)

All models have similar PPP values (~.23) indicating that the distribution of simulated choices’ proportion of j1 preference is sufficiently amassed around the observed proportion of j1 preferences, based on the rater’s model-implied gamma and the items’ model-implied theta1 and theta2 for that same combination.

This begins to suggest that both of the 2d models produce simulated data that sufficiently resemble actual choices of items by raters.

Additionally, the bulk of simulated choices of j are within the narrow range of .05 (0.48 to 0.53).

They both consistently predict the proportion of preference for itemj1, but this does not mean that the row-by-row choices are the same. For that, the percent of correct choices was added to the ppp function.

The simulated choices using the model-implied latent variables produced twice as many correct as incorrect choices for each combination of rater and paired items, and slightly better for the 2-dimensional models (0.68) than the one-dimensional mode (0.62). This suggests that the two-dimensional models are just plain better at predicting actual choices, whatever other statistics we come up with.

Model Comparison with DIC

From Chapter 10.3 of Levy & Mislevy (2016)

The Deviance Information Criterion is defined as “the difference between the posterior mean of the deviance, \(\overline{D(\theta)}\), and the deviance evaluated at the posterior mean, \(D(\bar{\theta})\)” (p. 248). Which is appropriate when the mean of the posterior is a “reasonable point summary of the parameters” (p. 249) which is the case for these parameters. The formula includes a penalty for complexity \(p_{D}\) based on the number of parameters estimated. Look to Gelman et al. (2013) Gill (2007) and Plummer (2008) for more advanced discussion of DIC.

DIC =\(\overline{D(\theta)} + p_{D} = 2\overline{D(\theta)} - D(\bar{\theta}) +2p_{D}\)

  1. Compute \(\overline{D(\theta)}\) as the posterior mean of the deviance

Loop over all MCMC iterations:

  1. Compute \(D(\bar{\theta})\):
  1. Compute \(p_{D}\) which is \(\overline{D(\theta)}\) - \(D(\bar{\theta})\).

  2. Compute the DIC which 2 times \(\overline{D(\theta)}\) - \(\overline{D(\theta)}\).

First, the posterior mean of the deviance for 2d model.

2d DIC

computeDbar2d <- function(mcmc_matrix, pair_data, theta1_ids, theta2_ids, gamma_ids) {
  # Extract item and judge IDs
  itemj1 <- as.character(pair_data$itemj1)
  itemj2 <- as.character(pair_data$itemj2)
  choice <- pair_data$choice
  rater  <- as.character(pair_data$rater)
  
  S <- nrow(mcmc_matrix)
  N <- nrow(pair_data)
  deviance_vec <- numeric(S)
  
  for (s in 1:S) {
    # Extract current parameter values
    theta1 <- setNames(mcmc_matrix[s, theta1_ids], sub("theta1\\.", "", theta1_ids))
    theta2 <- setNames(mcmc_matrix[s, theta2_ids], sub("theta2\\.", "", theta2_ids))
    gamma  <- setNames(mcmc_matrix[s, gamma_ids],  sub("gamma\\.", "", gamma_ids))
    
    eta <- numeric(N)
    for (n in 1:N) {
      j1 <- itemj1[n]
      j2 <- itemj2[n]
      r  <- rater[n]
      
      g <- gamma[r]
      if (is.na(g)) g <- mean(gamma, na.rm = TRUE)
      
      diff1 <- theta1[j2] - theta1[j1]
      diff2 <- theta2[j2] - theta2[j1]
      eta[n] <- (1 - g) * diff1 + g * diff2
    }
    
    p <- pnorm(eta)
    p <- pmin(pmax(p, 1e-10), 1 - 1e-10)  # avoid log(0)
    
    is_j2_chosen <- choice == pair_data$itemj2
    loglik <- dbinom(is_j2_chosen, size = 1, prob = p, log = TRUE) # Compute log-probability of the observed binary outcome
    deviance_vec[s] <- -2 * sum(loglik)
  }
  
  mean(deviance_vec)
}

Note that the deviance is defined as minus two times the log-likelihood: \(D(\theta) = -2\log p(y | \theta)\)

So:

  • When we compute log-likelihoods in our code via dbinom(…, log = TRUE), we are evaluating log⁡p(y∣θ)logp(y∣θ)
  • Summing over all observations gives the log-likelihood for the whole dataset
  • Multiplying by -2 gives the deviance for that parameter draw

Then: \(\overline{D{\theta}}\) is computed by averaging this deviance over MCMC samples \(D(\bar{\theta})\) is computed by evaluating the deviance at the posterior mean of parameters

  • Log-likelihood is the fit of data under parameters log
  • Deviance is the badness-of-fit; lower is better -2* LL
  • Posterior Mean Deviance is the average deviance across posterior samples
  • DIC is a trade off between fit and complexity

Use that function:

mcmc_matrix <- do.call(rbind, mcmc_list2d)  # combine all chains
theta1_ids <- grep("^theta1\\.", colnames(mcmc_matrix), value = TRUE)
theta2_ids <- grep("^theta2\\.", colnames(mcmc_matrix), value = TRUE)
gamma_ids  <- grep("^gamma\\.",  colnames(mcmc_matrix), value = TRUE)

D_bar2d <- computeDbar2d(mcmc_matrix, pairDataCDAT, theta1_ids, theta2_ids, gamma_ids)

print(D_bar2d)
## [1] 1142.476

Now a function to compute the deviance at the posterior mean parameters

computeDevianceAtPosteriorMean2d <- function(mcmc_matrix, pair_data, theta1_ids, theta2_ids, gamma_ids) {
  # Compute posterior means
  theta1_mean <- setNames(colMeans(mcmc_matrix[, theta1_ids]), sub("theta1\\.", "", theta1_ids))
  theta2_mean <- setNames(colMeans(mcmc_matrix[, theta2_ids]), sub("theta2\\.", "", theta2_ids))
  gamma_mean  <- setNames(colMeans(mcmc_matrix[, gamma_ids]),  sub("gamma\\.",  "", gamma_ids))
  
  itemj1 <- as.character(pair_data$itemj1)
  itemj2 <- as.character(pair_data$itemj2)
  choice <- pair_data$choice
  rater  <- as.character(pair_data$rater)
  
  N <- nrow(pair_data)
  eta <- numeric(N)
  
  for (n in 1:N) {
    j1 <- itemj1[n]
    j2 <- itemj2[n]
    r  <- rater[n]
    
    g <- gamma_mean[r]
    if (is.na(g)) g <- mean(gamma_mean, na.rm = TRUE)
    
    diff1 <- theta1_mean[j2] - theta1_mean[j1]
    diff2 <- theta2_mean[j2] - theta2_mean[j1]
    eta[n] <- (1 - g) * diff1 + g * diff2
  }
  
  p <- pnorm(eta)
  p <- pmin(pmax(p, 1e-10), 1 - 1e-10)
  
  is_j2_chosen <- choice == pair_data$itemj2
  loglik <- dbinom(is_j2_chosen, size = 1, prob = p, log = TRUE)
  
  D_at_mean <- -2 * sum(loglik)
  return(D_at_mean)
}

Use the function, with the same variables passed to it as created in previous step.

D_mean2d <- computeDevianceAtPosteriorMean2d(mcmc_matrix, pairDataCDAT, theta1_ids, theta2_ids, gamma_ids)
print(D_mean2d)
## [1] 1066.182

As a final step find the difference between twice the D_bar and the D_mean:

p_D2d <- D_bar2d - D_mean2d
DIC2d <- D_bar2d + p_D2d
print(DIC2d)
## [1] 1218.77

Repeat for 2dDP these steps using same functions as for the 2d normal prior model.

2dDP DIC

mcmc_matrix <- do.call(rbind, mcmc_list2dDP)  # combine all chains
theta1_ids <- grep("^theta1\\.", colnames(mcmc_matrix), value = TRUE)
theta2_ids <- grep("^theta2\\.", colnames(mcmc_matrix), value = TRUE)
gamma_ids  <- grep("^gamma\\.",  colnames(mcmc_matrix), value = TRUE)

D_bar2dDP <- computeDbar2d(mcmc_matrix, pairDataCDAT, theta1_ids, theta2_ids, gamma_ids)

print(D_bar2dDP)
## [1] 1154.658
D_mean2dDP <- computeDevianceAtPosteriorMean2d(mcmc_matrix, pairDataCDAT, theta1_ids, theta2_ids, gamma_ids)
print(D_mean2dDP)
## [1] 1074.58
p_D2dDP <- D_bar2dDP - D_mean2dDP
DIC2dDP <- D_bar2dDP + p_D2dDP
print(DIC2dDP)
## [1] 1234.736

1d DIC

computeDbar1d <- function(mcmc_matrix, pair_data, theta_ids, alpha_ids) {
  # Extract item and judge IDs
  itemj1 <- as.character(pair_data$itemj1)
  itemj2 <- as.character(pair_data$itemj2)
  choice <- pair_data$choice
  rater  <- as.character(pair_data$rater)
  
  S <- nrow(mcmc_matrix)
  N <- nrow(pair_data)
  deviance_vec <- numeric(S)
  
  for (s in 1:S) {
    # Extract current parameter values
    theta <- setNames(mcmc_matrix[s, theta_ids], sub("theta\\.", "", theta_ids))
    alpha <- setNames(mcmc_matrix[s, alpha_ids], sub("alpha\\.", "", alpha_ids))
    
    eta <- numeric(N)
    for (n in 1:N) {
      j1 <- itemj1[n]
      j2 <- itemj2[n]
      r  <- rater[n]
      
      a <- alpha[r]
      if (is.na(a)) a <- mean(alpha, na.rm = TRUE)
      
      diff <- theta[j2] - theta[j1]
      eta[n] <- a * diff
    }
    
    p <- pnorm(eta)
    p <- pmin(pmax(p, 1e-10), 1 - 1e-10)  # avoid log(0)
    
    is_j2_chosen <- choice == pair_data$itemj2
    loglik <- dbinom(is_j2_chosen, size = 1, prob = p, log = TRUE)
    deviance_vec[s] <- -2 * sum(loglik)
  }
  
  mean(deviance_vec)
}

Use that function:

mcmc_matrix <- do.call(rbind, mcmc_list1d)  # combine all chains
theta_ids <- grep("^theta\\.", colnames(mcmc_matrix), value = TRUE)
alpha_ids  <- grep("^alpha\\.",  colnames(mcmc_matrix), value = TRUE)

D_bar1d <- computeDbar1d(mcmc_matrix, pairDataCDAT, theta_ids, alpha_ids)

print(D_bar1d)
## [1] 999.6571

Now a function to compute the deviance at the posterior mean parameters

computeDevianceAtPosteriorMean1d <- function(mcmc_matrix, pair_data, theta_ids, alpha_ids) {
  # Compute posterior means
  theta_mean <- setNames(colMeans(mcmc_matrix[, theta_ids]), sub("theta\\.", "", theta_ids))
  alpha_mean  <- setNames(colMeans(mcmc_matrix[, alpha_ids]),  sub("alpha\\.",  "", alpha_ids))
  
  itemj1 <- as.character(pair_data$itemj1)
  itemj2 <- as.character(pair_data$itemj2)
  choice <- pair_data$choice
  rater  <- as.character(pair_data$rater)
  
  N <- nrow(pair_data)
  eta <- numeric(N)
  
  for (n in 1:N) {
    j1 <- itemj1[n]
    j2 <- itemj2[n]
    r  <- rater[n]
    
    a <- alpha_mean[r]
    if (is.na(a)) a <- mean(alpha_mean, na.rm = TRUE)
    
    diff <- theta_mean[j2] - theta_mean[j1]
    eta[n] <- a * diff
  }
  
  p <- pnorm(eta)
  p <- pmin(pmax(p, 1e-10), 1 - 1e-10)
  
  is_j2_chosen <- choice == pair_data$itemj2
  loglik <- dbinom(is_j2_chosen, size = 1, prob = p, log = TRUE)
  
  D_at_mean <- -2 * sum(loglik)
  return(D_at_mean)
}

Use the function, with the same variables passed to it as created in previous step.

D_mean1d <- computeDevianceAtPosteriorMean1d(mcmc_matrix, pairDataCDAT, theta_ids, alpha_ids)
print(D_mean1d)
## [1] 973.8331

As a final step find the difference between twice the D_bar and the D_mean, then

p_D1d <- D_bar1d - D_mean1d
DIC1d <- D_bar1d + p_D1d
print(DIC1d)
## [1] 1025.481

DIC Table

dic_table <- data.frame(
  Model = c("1d", "2d", "2dDP"),
  DIC = c(DIC1d, DIC2d, DIC2dDP),
  D_bar = c(D_bar1d, D_bar2d, D_bar2dDP),
  D_mean = c(D_mean1d, D_mean2d, D_mean2dDP),
  p_D = c(p_D1d, p_D2d, p_D2dDP)
)

print(dic_table)
##   Model      DIC     D_bar    D_mean      p_D
## 1    1d 1025.481  999.6571  973.8331 25.82408
## 2    2d 1218.770 1142.4760 1066.1824 76.29365
## 3  2dDP 1234.736 1154.6579 1074.5799 80.07797

The 1d model has the lowest DIC by far, but is less expressive. The DIC statistic considers the trade off between fit and complexity, so the high complexity of the 2d models increases the DIC. Between the two 2d models, the additional parameters of the clustering does increase the deviance of the 2dDP model. The 2d and 2dDP models’ parameters have been shown above to be good at predicting the choices of raters when given the two items in a pairwise comparison. The clustering information from the 2dDP model does seem to be valid information in terms of co-clustering, but the increased complexity doesn’t result in better predictiveness overall. The max clusters do add substantially to the deviance statistic which considers complexity.

Even with the relatively sparse data set of comparisons, the model 2d works very well to explain the preferences of raters and the placement of items in 2d latent space.

The high DIC of the 2dDP model results partly from the maximum number of clusters explored, which when set at 6 allows the model to find the three clusters (by co-clustering matrix and dendrogram primarily) that are also visually evident in the gamma plot of medians.

Consider repeating the models with only the mid-angle gamma raters, cluster 2 raters. Does clustering raters by gamma angle improve the overall model’s predictiveness? Of course, no clustering was needed when modeling with just one cluster, but also no improvement in improvement in predictiveness was gained (See ‘Cluster Validity.R’). I think this shows that the rater perceptual orientation help to explain the observed data and actually reduce the need for grouping by rater agreement as I did previously with the BTL model on the web app. This could lead to larger groups of raters contributing to the theta estimates and shrinking credible intervals.

Notes: Set yourself up for further analysis, but not in this paper, more next steps.

Say which defaults you did not use in the functions, such as 3 chains, etc. Maybe say something about “we recognize that winbugs is standard software, but we used this because. . .”

No meeting until after returning July 14th, o

References

Gelman, A., Gilks, W. R., & Roberts, G. O. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. The Annals of Applied Probability, 7(1). https://doi.org/10.1214/aoap/1034625254

Levy, R., & Mislevy, R. J. (2016). Bayesian psychometric modeling. Chapman & Hall/CRC.

Yu, Q., & Quinn, K. M. (2022). A multidimensional pairwise comparison model for heterogeneous perceptions with an application to modelling the perceived truthfulness of public statements on COVID‐19. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 10.1111/rssa.12810. https://doi.org/10.1111/rssa.12810