=================================================================================================== PREVIOUS: https://rpubs.com/sherloconan/1285505
THIS: https://rpubs.com/sherloconan/1289178
\[\begin{align} \tag{1} y_{ijkr}=&\ \overbrace{\mu+\alpha_i+\beta_j+(\alpha\beta)_{ij}}^{\text{fixed effects}}+\overbrace{s_k+(\alpha s)_{ik}+(\beta s)_{jk}+(\alpha\beta s)_{ijk}}^{\text{random effects}}+\epsilon_{ijkr}, \\ \\ &\text{for}\ \ i\in\{1,2\};\ \ j\in\{1,2\};\ \ k=1,\dotsb,n;\ \ r=1,\dotsb,R, \end{align}\]
where \(\alpha_1+\alpha_2=0\), \(\quad\beta_1+\beta_2=0\), \(\quad(\alpha\beta)_{1j}+(\alpha\beta)_{2j}=(\alpha\beta)_{i1}+(\alpha\beta)_{i2}=0\), \(\quad s_{k}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{s}^2)\), \(\quad\epsilon_{ijkr}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{\epsilon}^2)\),
\(\hspace{1.4em} -(\alpha s)_{1k}=(\alpha s)_{2k}\sim\mathcal{N}(0,\ \sigma_{\alpha s}^2)\), \(\quad-(\beta s)_{1k}=(\beta s)_{2k}\sim\mathcal{N}(0,\ \sigma_{\beta s}^2)\),
and \((\alpha\beta s)_{11k}=-(\alpha\beta s)_{21k}=-(\alpha\beta s)_{12k}=(\alpha\beta s)_{22k}\sim\mathcal{N}(0,\ \sigma_{\alpha\beta s}^2)\).
We aim to simulate 2×2 repeated-measures data using a generating model that satisfies the following considerations:
(1) There are no fixed interaction effects between factors \(A\) and \(B\), i.e., \((\alpha\beta)_{ij}=0\).
c(-10.5, 4.5, -4.5, 10.5) # a1b1, a2b1, a1b2, a2b2 (A=15, B=6)(2) We need not worry about effect size standardization, as \(\sigma_{\epsilon}^2=1\), and all variance components are homogeneous.
(3) Each participant can receive a single (\(R=1\)) or balanced multiple trials (\(R>1\)) for each of the four conditions.
(4) Random intercepts, \(s_k\), are always included.
(5a) Random slopes are included, or (5b) Random slopes are excluded (\(\sigma_{\alpha s}^2=0\), \(\sigma_{\beta s}^2=0\), and \(\sigma_{\alpha\beta s}^2=0\)) in simulations.
rnorm(N, 1, 0) # constant ones (fixed slope)
Both set.seed(123); rnorm(3, 100, 10) and
set.seed(123); 100 + rnorm(3, 0, 10) generate identical
samples.
We have already implemented (1) to (5a) in Eqs. 2-7 (contrast-based parameterization) to investigate the distributional assumptions (by changing the distributions for Eqs. 4-7) within the 2×2 repeated-measures design. The same code can be easily extended to (5b) with minimal modifications.
\[\begin{align} y_{ijkr}&=\mu_k+b_{1k}\cdot C_{i}+b_{2k}\cdot C_{j}+b_{12k}\cdot C_{i}\cdot C_{j}+\epsilon_{ijkr} \tag{2} \\ &\quad -C_1=C_2=C\equiv.5 \tag{3} \\ \mu_k&\sim\mathcal{N}(\mu,\ \sigma_0^2) \tag{4} \\ b_{1k}&\sim\mathcal{N}(B_1,\ \sigma_1^2) \tag{5} \\ b_{2k}&\sim\mathcal{N}(B_2,\ \sigma_2^2) \tag{6} \\ b_{12k}&\sim\mathcal{N}(0,\ \sigma_{12}^2) \tag{$*$} \\ \epsilon_{ijkr}&\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ 1) \tag{7} \end{align}\]
simData <- function(N, n=1, B1=0, sd0=0.5, isRIO=F, sd1=0.5, sd2=0.5, sd12=0) {
#' Input -
#' N: number of participants
#' n: number of (balanced) trials
#' B1: fixed effect size for factor A of interest
#' sd0: standard deviation for random intercepts
#' isRIO: if TRUE, random slopes have sd suppress to 0 (i.e. random intercepts only)
#' sd1: standard deviation for random slopes for (contrasted) factor A
#' sd2: standard deviation for random slopes for (contrasted) factor B
#' sd12: standard deviation for random slopes for the (contrasted) interaction AB
#'
#' Output - a long format data frame for a 2x2 repeated-measures design
C1 <- C2 <- c(-.5, .5) # contrast vectors for factors A and B, respectively (sum coding)
B2 <- 0.5 # fixed effect size for factor B
s <- rnorm(N, 0, sd0) # random intercepts for each participant
# fixed or random slopes for factor A (this is the SECOND factor in our paper)
b1 <- rnorm(N, B1, ifelse(isRIO, 0, sd1))
# fixed or random slopes for factor B (this is the FIRST factor in our paper)
b2 <- rnorm(N, B2, ifelse(isRIO, 0, sd2))
# fixed or random slopes for the interaction AB
b12 <- rnorm(N, 0, ifelse(isRIO, 0, sd12))
# a matrix of condition contrasts for the 2x2 design
conds <- expand.grid("A"=C1, "B"=C2)
conds$AB <- conds$A * conds$B
# each row of `M` represents one participant's set of four condition means
M <- s + cbind(b1, b2, b12) %*% t(conds) # a1b1, a2b1, a1b2, a2b2 (N×4 matrix)
DV <- rnorm(N*4*n, c(M), 1) # adding noise (residual variance = 1)
data.frame("DV" = DV,
"ID" = rep(paste0("s",1:N), 4*n),
"A" = rep(c("a1","a2"), each=N, times=2*n),
"B" = rep(c("b1","b2"), each=N*2, times=n),
"Trial" = rep(paste0("r",1:n), each=N*4),
stringsAsFactors = T) # long format data
}
set.seed(277)
df1 <- simData(N=1000, n=1, B1=0.25, isRIO=F) # simulating with random slopes for A and B
mat1 <- matrix(df1$DV, ncol=4,
dimnames=list(unique(df1$ID), c("a1b1", "a2b1", "a1b2", "a2b2"))) # wide format data
round(cor(mat1),2) # sample correlation matrix (compound symmetry is violated)
## a1b1 a2b1 a1b2 a2b2
## a1b1 1.00 0.16 0.18 0.08
## a2b1 0.16 1.00 0.08 0.14
## a1b2 0.18 0.08 1.00 0.21
## a2b2 0.08 0.14 0.21 1.00
\[\begin{align} \operatorname{Cov}(y_{ijkr},\ y_{lmkr})&=\operatorname{Cov}\big(\mu_k+b_{1k}\cdot C_{i}+b_{2k}\cdot C_{j}+b_{12k}\cdot C_{i}\cdot C_{j}+\epsilon_{ijkr}, \\ &\qquad\qquad\mu_k+b_{1k}\cdot C_{l}+b_{2k}\cdot C_{m}+b_{12k}\cdot C_{l}\cdot C_{m}+\epsilon_{lmkr}\big) \\ &=\operatorname{Var}(\mu_k)+C_{i}C_{l}\cdot\operatorname{Var}(b_{1k})+C_{j}C_{m}\cdot\operatorname{Var}(b_{2k})+C_{i}C_{j}C_{l}C_{m}\cdot\operatorname{Var}(b_{12k})+\operatorname{Cov}(\epsilon_{ijkr},\ \epsilon_{lmkr}) \tag{8} \end{align}\]
The marginal variance-covariance matrix of \(y_{11}\), \(y_{21}\), \(y_{12}\), and \(y_{22}\) is given by Eq. 9.
\[\begin{equation} \tag{9} \tiny \begin{pmatrix} \sigma_0^2+C^2\sigma_1^2+C^2\sigma_2^2+C^4\sigma_{12}^2+\sigma_\epsilon^2 &\sigma_0^2-C^2\sigma_1^2+C^2\sigma_2^2-C^4\sigma_{12}^2 &\sigma_0^2+C^2\sigma_1^2-C^2\sigma_2^2-C^4\sigma_{12}^2 &\sigma_0^2-C^2\sigma_1^2-C^2\sigma_2^2+C^4\sigma_{12}^2 \\ \bullet &\sigma_0^2+C^2\sigma_1^2+C^2\sigma_2^2+C^4\sigma_{12}^2+\sigma_\epsilon^2 &\sigma_0^2-C^2\sigma_1^2-C^2\sigma_2^2+C^4\sigma_{12}^2 &\sigma_0^2+C^2\sigma_1^2-C^2\sigma_2^2-C^4\sigma_{12}^2 \\ \bullet & \bullet &\sigma_0^2+C^2\sigma_1^2+C^2\sigma_2^2+C^4\sigma_{12}^2+\sigma_\epsilon^2 &\sigma_0^2-C^2\sigma_1^2+C^2\sigma_2^2-C^4\sigma_{12}^2 \\ \bullet & \bullet & \bullet &\sigma_0^2+C^2\sigma_1^2+C^2\sigma_2^2+C^4\sigma_{12}^2+\sigma_\epsilon^2 \end{pmatrix} \end{equation}\]
When \(C=.5\), \(\sigma_0=\sigma_1=\sigma_2=\sigma=0.5\),
\(\sigma_{12}=0\), and \(\sigma_\epsilon=1\),
the correlations
have two values, \(\frac{\sigma^2-2C^2\sigma^2}{\sigma^2+2C^2\sigma^2+\sigma_\epsilon^2}=\frac{1}{11}\approx.09\)
and \(\frac{\sigma^2}{\sigma^2+2C^2\sigma^2+\sigma_\epsilon^2}=\frac{2}{11}\approx.18\).
set.seed(277); R <- 3
df3 <- simData(N=1000, n=R, B1=0.25,
sd12=1, isRIO=F) # simulating with random slopes for A, B, and AB
df3 <- df3[order(df3$A, df3$B),]
mat3 <- matrix(df3$DV, ncol=4,
dimnames=list(rep(unique(df3$ID), R),
c("a1b1", "a1b2", "a2b1", "a2b2"))) # wide format data
round(cor(mat3),2) # sample correlation matrix
## a1b1 a1b2 a2b1 a2b2
## a1b1 1.00 0.10 0.14 0.12
## a1b2 0.10 1.00 0.11 0.12
## a2b1 0.14 0.11 1.00 0.12
## a2b2 0.12 0.12 0.12 1.00
Compound symmetry holds because the variance components of the random slopes for \(A\), \(B\), and \(AB\) are equal (Be sure to consider the contrast weight).
When \(C=.5\), \(\sigma_0=\sigma_1=\sigma_2=\sigma=0.5\),
\(\sigma_{12}=2\sigma=1\), and \(\sigma_\epsilon=1\),
the correlations
is \(\frac{\sigma^2-C^2\sigma^2}{\sigma^2+3C^2\sigma^2+\sigma_\epsilon^2}=\frac{3}{23}\approx.13\).
set.seed(277)
df2 <- simData(N=1000, n=1, B1=0.25, isRIO=T) # simulating without random slopes
mat2 <- matrix(df2$DV, ncol=4,
dimnames=list(unique(df2$ID), c("a1b1", "a2b1", "a1b2", "a2b2"))) # wide format data
round(cor(mat2),2) # sample correlation matrix (compound symmetry holds)
## a1b1 a2b1 a1b2 a2b2
## a1b1 1.00 0.17 0.21 0.18
## a2b1 0.17 1.00 0.20 0.18
## a1b2 0.21 0.20 1.00 0.16
## a2b2 0.18 0.18 0.16 1.00
The correlation is \(\frac{\sigma^2}{\sigma^2+\sigma_\epsilon^2}=\frac{0.5^2}{0.5^2+1}=.2\).
Based on \((\alpha
s)_{ik}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{\alpha
s}^2)\), \(\quad(\beta
s)_{jk}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{\beta
s}^2)\), and \((\alpha\beta
s)_{ijk}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\
\sigma_{\alpha\beta s}^2)\quad\) (note “i.i.d.”),
the
marginal variance-covariance matrix for \(y_{11}\), \(y_{21}\), \(y_{12}\), and \(y_{22}\) is incorrectly given by Eq. ⛌.
\[\begin{equation} \tag{⛌} \scriptsize \begin{pmatrix} \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 & \sigma_s^2+\sigma_{\beta s}^2 & \sigma_s^2+\sigma_{\alpha s}^2 & \sigma_s^2 \\ \bullet & \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 & \sigma_s^2 & \sigma_s^2+\sigma_{\alpha s}^2 \\ \bullet & \bullet & \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 & \sigma_s^2+\sigma_{\beta s}^2 \\ \bullet & \bullet & \bullet & \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 \end{pmatrix} \end{equation}\]
Note that covariances cannot be negative in Eq. ⛌. This derivation is incorrect because it does not account for the sum-to-zero constraints.
\(y_{ijkr}=\mu+\alpha_i+\beta_j+(\alpha\beta)_{ij}+s_k+(\alpha s)_{ik}+(\beta s)_{jk}+(\alpha\beta s)_{ijk}+\epsilon_{ijkr}\)
\(y_{ijkr}=\mu_k+b_{1k}\cdot C_{i}+b_{2k}\cdot C_{j}+b_{12k}\cdot C_{i}\cdot C_{j}+\epsilon_{ijkr}\)
\[\begin{align} \alpha_i&=B_1\cdot C_i \tag{11} \\ \beta_j&=B_2\cdot C_j \tag{12} \\ (\alpha s)_{ik}&=(b_{1k}-B_1)\cdot C_i \tag{13} \\ (\beta s)_{jk}&=(b_{2k}-B_2)\cdot C_j \tag{14} \end{align}\]
Verify the effect size for factor \(A\): \(\quad\alpha_2-\alpha_1=B_1\cdot C_2-B_1\cdot C_1=.5B_1-(-.5)B_1=B_1\)
\[\begin{align} \sigma_{s}^2&=\sigma_0^2 \tag{15} \\ \sigma_{\alpha s}^2&=C^2\sigma_1^2 \tag{16} \\ \sigma_{\beta s}^2&=C^2\sigma_2^2 \tag{17} \\ \sigma_{\alpha\beta s}^2&=C^4\sigma_{12}^2 \tag{18} \end{align}\]
\[\begin{equation} \tag{$9^\prime$} \scriptsize \begin{pmatrix} \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 &\sigma_s^2-\sigma_{\alpha s}^2+\sigma_{\beta s}^2-\sigma_{\alpha\beta s}^2 &\sigma_s^2+\sigma_{\alpha s}^2-\sigma_{\beta s}^2-\sigma_{\alpha\beta s}^2 &\sigma_s^2-\sigma_{\alpha s}^2-\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2 \\ \bullet & \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 &\sigma_s^2-\sigma_{\alpha s}^2-\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2 &\sigma_s^2+\sigma_{\alpha s}^2-\sigma_{\beta s}^2-\sigma_{\alpha\beta s}^2 \\ \bullet & \bullet &\sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 &\sigma_s^2-\sigma_{\alpha s}^2+\sigma_{\beta s}^2-\sigma_{\alpha\beta s}^2 \\ \bullet & \bullet & \bullet &\sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 \end{pmatrix} \end{equation}\]
Question 1: Can any covariance term be negative? (Yes)
Question 2: How do we simulate unequal variance terms? (Heteroscedasticity \(\sigma_{\epsilon\ ij}^2\))
Question 3: How do we extend to an \(a\times b\) repeated-measures design?
\[\begin{align} \boldsymbol{\Sigma}_{ab} &= \sigma_\epsilon^2\mathbf{I}_{ab} +\sigma_s^2\mathbf{J}_{ab} +\sigma_{1}^2\mathbf{J}_b\otimes\big(\mathbf{Q}_{\alpha s}\mathbf{Q}_{\alpha s}^\top\big) +\sigma_{2}^2\big(\mathbf{Q}_{\beta s}\mathbf{Q}_{\beta s}^\top\big)\otimes\mathbf{J}_a +\sigma_{12}^2\big(\mathbf{Q}_{\beta s}\mathbf{Q}_{\beta s}^\top\big)\otimes\big(\mathbf{Q}_{\alpha s}\mathbf{Q}_{\alpha s}^\top\big) \\ &= \sigma_\epsilon^2\mathbf{I}_{ab} +\sigma_s^2\mathbf{J}_{ab} +\sigma_{1}^2\mathbf{J}_b\otimes\big(\mathbf{I}_a-a^{-1}\mathbf{J}_a\big) +\sigma_{2}^2\big(\mathbf{I}_b-b^{-1}\mathbf{J}_b\big)\otimes\mathbf{J}_a +\sigma_{12}^2\big(\mathbf{I}_b-b^{-1}\mathbf{J}_b\big)\otimes\big(\mathbf{I}_a-a^{-1}\mathbf{J}_a\big) \end{align}\]