Bayes Factors for One-Way Within-Subject Designs

1. Getting Started

In Loftus and Masson (1994), some to-be-recalled 20-word lists are presented at a rate of 1, 2, or 5 sec per word. Suppose that the hypothetical experiment is run as a one-way within-subject (repeated-measures) design with three conditions.

dat <- matrix(c(10,6,11,22,16,15,1,12,9,8,
                13,8,14,23,18,17,1,15,12,9,
                13,8,14,25,20,17,4,17,12,12), ncol=3,
              dimnames=list(paste0("s",1:10), paste0("Level",1:3))) #wide data format
# dat <- dat * 100

df <- data.frame("ID"=factor(rep(paste0("s",1:10), 3)),
                 "Level"=factor(rep(paste0("Level",1:3), each=10)),
                 "Response"=c(dat)) #long data format

Or, testing a simulated data set.

mu.b <- 100; sigma.b <- 20; n <- 24
delta.mu <- 6.7; sigma.e <- 6.67 #rho=.9, power=.8
set.seed(277)
TrueScore <- rnorm(n, mu.b, sigma.b)
eij1 <- rnorm(n, 0, sigma.e)
eij2 <- rnorm(n, 0, sigma.e)

dat <- cbind("Level1"=TrueScore+eij1+delta.mu,
             "Level2"=TrueScore+eij2)
# dat <- dat * 100

df <- data.frame("ID"=factor(rep(paste0("s",1:n), 2)),
                 "Level"=factor(rep(c("Level1", "Level2"), each=n)),
                 "Response"=c(dat))

## The Bayes factors will be computed for the hypothetical data (in the first chunk).

Method: Constructing the Jeffreys-Zellner-Siow (JZS) Bayes factor.

\[\begin{align*} \tag{1} \mathcal{M}_1:\ Y_{ij}&=\mu+\sigma_\epsilon(d_i+b_j)+\epsilon_{ij}\\ \text{versus}\quad\mathcal{M}_0:\ Y_{ij}&=\mu+\sigma_\epsilon b_j+\epsilon_{ij},\qquad\epsilon_{ij}\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_\epsilon^2) \text{ for } i=1,\dotsb,a;\ j=1,\dotsb,n. \end{align*}\]

\[\begin{equation} \tag{2} \pi(\mu,\sigma_\epsilon^2)\propto 1/\ \sigma_\epsilon^{2} \end{equation}\]

\[\begin{equation} \tag{3} d_i^\star\mid g\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,g) \end{equation}\]

\[\begin{equation} \tag{4} g\sim\text{Scale-inv-}\chi^2(1,h^2) \end{equation}\]

\[\begin{equation} \tag{5} (d_1^\star,\dotsb,d_{a-1}^\star)=(d_1,\dotsb,d_a)\cdot\mathbf{Q} \end{equation}\]

\[\begin{equation} \tag{6} \mathbf{I}_a-\frac{1}{a}\mathbf{J}_a=\mathbf{Q}\cdot\mathbf{Q}^\top \end{equation}\]

\[\begin{equation} \tag{7} b_j\mid g_b\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,g_b) \end{equation}\]

\[\begin{equation} \tag{8} g_b\sim\text{Scale-inv-}\chi^2(1,h_b^2) \end{equation}\]

\(\mathbf{Q}\) is an \(a\times(a-1)\) matrix of the \(a-1\) eigenvectors of unit length corresponding to the nonzero eigenvalues of the left side term in (6). For example, the projected main effect \(d^\star=(d_1-d_2)/\sqrt{2}\) when \(a=2\). In the other direction (given \(d_1+d_2=0\)), \(d_1=d^\star/\sqrt{2}\) and \(d_2=-d^\star/\sqrt{2}\).

set.seed(277)
(JZS_BF <- anovaBF(Response~Level+ID, data=df, whichRandom="ID", progress=F))

## Bayes factor analysis
## --------------
## [1] Level + ID : 35959.93 ±0.56%
## 
## Against denominator:
##   Response ~ ID 
## ---
## Bayes factor type: BFlinearModel, JZS

Note: Fixed effects are assumed. See the discussion here.

Markov chain Monte Carlo (MCMC) diagnostics. Plots of iterations versus sampled values for each variable in the chain.

set.seed(277)
draws <- posterior(JZS_BF, iterations=1e5, progress=F)
summary(draws)

## 
## Iterations = 1:1e+05
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                  Mean      SD  Naive SE Time-series SE
## mu            12.7292  1.9846 0.0062759      0.0062759
## Level-Level1  -1.6542  0.2364 0.0007475      0.0008613
## Level-Level2   0.2538  0.2241 0.0007088      0.0007100
## Level-Level3   1.4004  0.2330 0.0007367      0.0008187
## ID-s1         -0.7238  2.0379 0.0064445      0.0064445
## ID-s10        -3.0369  2.0370 0.0064415      0.0064415
## ID-s2         -5.3524  2.0380 0.0064446      0.0064446
## ID-s3          0.2681  2.0370 0.0064416      0.0064416
## ID-s4         10.5135  2.0384 0.0064461      0.0064461
## ID-s5          5.2273  2.0393 0.0064489      0.0064489
## ID-s6          3.5731  2.0352 0.0064359      0.0064359
## ID-s7        -10.6347  2.0369 0.0064412      0.0064412
## ID-s8          1.9211  2.0361 0.0064386      0.0064386
## ID-s9         -1.7162  2.0395 0.0064495      0.0064495
## sig2           0.7938  0.3294 0.0010416      0.0023032
## g_Level        7.7913 94.4643 0.2987222      0.2987222
## g_ID          56.2794 39.9360 0.1262886      0.2052537
## 
## 2. Quantiles for each variable:
## 
##                  2.5%      25%      50%     75%    97.5%
## mu             8.7772  11.4887  12.7270 13.9666  16.6988
## Level-Level1  -2.1042  -1.8100  -1.6599 -1.5046  -1.1664
## Level-Level2  -0.1907   0.1090   0.2547  0.3998   0.6950
## Level-Level3   0.9238   1.2533   1.4041  1.5527   1.8510
## ID-s1         -4.7946  -2.0006  -0.7140  0.5661   3.3221
## ID-s10        -7.1093  -4.3147  -3.0322 -1.7511   0.9941
## ID-s2         -9.4374  -6.6315  -5.3413 -4.0637  -1.3013
## ID-s3         -3.8028  -1.0187   0.2716  1.5472   4.3035
## ID-s4          6.4649   9.2282  10.5097 11.7944  14.6162
## ID-s5          1.1702   3.9469   5.2212  6.5136   9.2945
## ID-s6         -0.4687   2.2866   3.5639  4.8580   7.6215
## ID-s7        -14.6990 -11.9076 -10.6180 -9.3487  -6.6167
## ID-s8         -2.1454   0.6405   1.9223  3.2007   5.9699
## ID-s9         -5.8027  -3.0012  -1.7110 -0.4266   2.3528
## sig2           0.3849   0.5714   0.7223  0.9325   1.6225
## g_Level        0.5096   1.5649   2.9401  6.0788  36.1203
## g_ID          14.7594  31.1880  46.1816 68.9989 157.4876

plot(draws, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)

2A. Full Stan Model (unstandardized effect sizes)

\[\begin{align*} \tag{a} \mathcal{M}_1:\ Y_{ij}&=\mu+t_i+\epsilon_{ij}\\ \text{versus}\quad\mathcal{M}_0:\ Y_{ij}&=\mu+\epsilon_{ij}\\ (\epsilon_{1j},\dotsb,\epsilon_{aj})^\top&\overset{\text{i.i.d.}}{\sim}\mathbf{\mathcal{N}}_a(\mathbf{0},\mathbf{\Sigma})\qquad\text{for } i=1,\dotsb,a;\ j=1,\dotsb,n. \end{align*}\]

\[\begin{equation} \tag{b} \pi(\mu)\propto 1 \end{equation}\]

\[\begin{equation} \tag{c} \mathbf{\Sigma}\sim\mathcal{W}^{-1}(\mathbf{\Psi},\ \nu) \end{equation}\]

\[\begin{equation} \tag{3} t_i^\star\mid g\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,g) \end{equation}\]

\[\begin{equation} \tag{4} g\sim\text{Scale-inv-}\chi^2(1,h^2) \end{equation}\]

\[\begin{equation} \tag{5} (t_1^\star,\dotsb,t_{a-1}^\star)=(t_1,\dotsb,t_a)\cdot\mathbf{Q} \end{equation}\]

\[\begin{equation} \tag{6} \mathbf{I}_a-\frac{1}{a}\mathbf{J}_a=\mathbf{Q}\cdot\mathbf{Q}^\top \end{equation}\]

\(\mathbf{Q}\) is an \(a\times(a-1)\) matrix of the \(a-1\) eigenvectors of unit length corresponding to the nonzero eigenvalues of the left side term in (6). For example, the projected main effect \(t^\star=(t_1-t_2)/\sqrt{2}\) when \(a=2\). In the other direction (given \(t_1+t_2=0\)), \(t_1=t^\star/\sqrt{2}\) and \(t_2=-t^\star/\sqrt{2}\).

computeQ <- function(C) {
  #' reduced parametrization for the fixed treatment effects
  S <- diag(C) - matrix(1, nrow = C, ncol = C) / C
  e <- qr(S)
  Q <- qr.Q(e) %*% diag(sign(diag(qr.R(e))))
  Q[, which(abs(e$qraux) > 1e-3), drop = FALSE]
}

Specify the full model \(\mathcal{M}_1\) in (a). stancodeM1 initially declares the reduced fixed treatment effects tf. Note that the effects are not standardized (\(d_i=t_i/\sigma_\epsilon\)) but the data are scaled. Compile and fit the model in Stan.

\[\begin{equation} \tag{c.1} \mathbf{\Sigma}\sim\mathcal{W}^{-1}(\mathbf{I}_{a},\ a) \end{equation}\]

Note that to compute the (log) marginal likelihood for a Stan model, we need to specify the model in a certain way. Instead of using “~” signs for specifying distributions, we need to directly use the (log) density functions. The reason for this is that when using the “~” sign, constant terms are dropped which are not needed for sampling from the posterior. However, for computing the marginal likelihood, these constants need to be retained. For instance, instead of writing y ~ normal(mu, sigma); we would need to write target += normal_lpdf(y | mu, sigma); (Gronau, 2021). See also the discussion regarding the implicit uniform prior for computing the log marginal likelihood.

stancodeM1 <- "
data {
  int<lower=1> N;        // number of subjects
  int<lower=2> C;        // number of conditions
  // vector[C] Y[N];        // responses [removed features]
  array[N] vector[C] Y;  // responses [Stan 2.26+ syntax for array declarations]
  matrix[C,C-1] P;       // projecting C-1 fixed effects into C
  real<lower=0> ht;      // scale parameter of the variance of the reduced treatment effect
  matrix[C,C] V;         // scale matrix of the inverse-Wishart prior
  real<lower=C-1> nu;    // degrees of freedom of the inverse-Wishart prior
  real lwr;              // lower bound for the grand mean
  real upr;              // upper bound for the grand mean
}

parameters {
  real<lower=lwr, upper=upr> mu; // grand mean
  cov_matrix[C] Sigma;           // error covariance matrix
  real<lower=0> gt;              // variance of the reduced treatment effect
  vector[C-1] tf;                // reduced treatment effect
}

transformed parameters {
  vector[C] t;           // treatment effect
  t = P * tf;
}

model {
  target += uniform_lpdf(mu | lwr, upr);
  target += inv_wishart_lpdf(Sigma | nu, V);
  target += scaled_inv_chi_square_lpdf(gt | 1, ht);
  target += normal_lpdf(tf | 0, sqrt(gt));
  for (k in 1:N) {
    target += multi_normal_lpdf(Y[k] | mu + t, Sigma);
  }
}"

stanmodelM1 <- stan_model(model_code=stancodeM1)
datalistM1 <- list("N"=nrow(dat), "C"=ncol(dat),
                   "Y"=dat/sd(dat), #scale the data
                   "P"=computeQ(ncol(dat)), "ht"=.5,
                   "V"=diag(ncol(dat)), "nu"=ncol(dat))
datalistM1$lwr <- mean(datalistM1[["Y"]])-6*sd(datalistM1[["Y"]]) # -6 * 1
datalistM1$upr <- mean(datalistM1[["Y"]])+6*sd(datalistM1[["Y"]])
stanfitM1 <- sampling(stanmodelM1, data=datalistM1,
                      iter=50000, warmup=10000, chains=4, seed=277, refresh=0)

rstan::summary(stanfitM1)[[1]] %>% kable(digits=4) %>% kable_classic(full_width=F)

	mean	se_mean	sd	2.5%	25%	50%	75%	97.5%	n_eff	Rhat
mu	2.1556	0.0010	0.3392	1.4795	1.9424	2.1554	2.3696	2.8294	124133.64	1.0001
Sigma[1,1]	1.2071	0.0033	0.6838	0.4755	0.7726	1.0367	1.4352	2.9667	43957.39	1.0001
Sigma[1,2]	1.1183	0.0033	0.6691	0.3970	0.6940	0.9538	1.3423	2.8355	41644.26	1.0001
Sigma[1,3]	1.0995	0.0033	0.6563	0.3891	0.6817	0.9362	1.3205	2.7896	40607.91	1.0000
Sigma[2,1]	1.1183	0.0033	0.6691	0.3970	0.6940	0.9538	1.3423	2.8355	41644.26	1.0001
Sigma[2,2]	1.3125	0.0035	0.7414	0.5163	0.8420	1.1297	1.5592	3.2194	44336.54	1.0001
Sigma[2,3]	1.1400	0.0034	0.6812	0.4014	0.7061	0.9718	1.3688	2.8906	40970.54	1.0001
Sigma[3,1]	1.0995	0.0033	0.6563	0.3891	0.6817	0.9362	1.3205	2.7896	40607.91	1.0000
Sigma[3,2]	1.1400	0.0034	0.6812	0.4014	0.7061	0.9718	1.3688	2.8906	40970.54	1.0001
Sigma[3,3]	1.2695	0.0035	0.7149	0.4986	0.8130	1.0906	1.5090	3.1085	42421.92	1.0001
gt	0.3779	0.0046	1.2566	0.0397	0.0947	0.1672	0.3329	1.8549	75394.60	1.0000
tf[1]	-0.3293	0.0003	0.1126	-0.5474	-0.4018	-0.3313	-0.2590	-0.0994	122646.75	1.0000
tf[2]	-0.1304	0.0003	0.1161	-0.3589	-0.2048	-0.1311	-0.0573	0.1020	133501.98	1.0000
t[1]	-0.2689	0.0003	0.0920	-0.4470	-0.3281	-0.2705	-0.2114	-0.0812	122646.75	1.0000
t[2]	0.0422	0.0003	0.0936	-0.1447	-0.0171	0.0422	0.1019	0.2280	133492.43	1.0000
t[3]	0.2266	0.0003	0.0946	0.0345	0.1673	0.2279	0.2873	0.4105	128015.65	1.0001
lp__	-30.1349	0.0126	2.6863	-36.4063	-31.6730	-29.7439	-28.1765	-26.0760	45239.40	1.0001

mean(dat/sd(dat))

## [1] 2.162076

cov(dat/sd(dat))

##           Level1    Level2    Level3
## Level1 0.9674355 0.9994698 0.9834526
## Level2 0.9994698 1.0635384 1.0186903
## Level3 0.9834526 1.0186903 1.0238158

2B. Null Stan Model