Simulating 2×2 Repeated-Measures Data

=================================================================================================== PREVIOUS: https://rpubs.com/sherloconan/1285505

THIS: https://rpubs.com/sherloconan/1289178

Statistical Model

\[\begin{align} \tag{1} y_{ijkr}=&\ \overbrace{\mu+\alpha_i+\beta_j+(\alpha\beta)_{ij}}^{\text{fixed effects}}+\overbrace{s_k+(\alpha s)_{ik}+(\beta s)_{jk}+(\alpha\beta s)_{ijk}}^{\text{random effects}}+\epsilon_{ijkr}, \\ \\ &\text{for}\ \ i\in\{1,2\};\ \ j\in\{1,2\};\ \ k=1,\dotsb,n;\ \ r=1,\dotsb,R, \end{align}\]

where $\alpha_1+\alpha_2=0$, $\quad\beta_1+\beta_2=0$, $\quad(\alpha\beta)_{1j}+(\alpha\beta)_{2j}=(\alpha\beta)_{i1}+(\alpha\beta)_{i2}=0$, $\quad s_{k}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{s}^2)$, $\quad\epsilon_{ijkr}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{\epsilon}^2)$,

$\hspace{1.4em} -(\alpha s)_{1k}=(\alpha s)_{2k}\sim\mathcal{N}(0,\ \sigma_{\alpha s}^2)$, $\quad-(\beta s)_{1k}=(\beta s)_{2k}\sim\mathcal{N}(0,\ \sigma_{\beta s}^2)$,

and $(\alpha\beta s)_{11k}=-(\alpha\beta s)_{21k}=-(\alpha\beta s)_{12k}=(\alpha\beta s)_{22k}\sim\mathcal{N}(0,\ \sigma_{\alpha\beta s}^2)$.

Simulation Schemes

We aim to simulate 2×2 repeated-measures data using a generating model that satisfies the following considerations:

(1) There are no fixed interaction effects between factors $A$ and $B$, i.e., $(\alpha\beta)_{ij}=0$.

c(-10.5, 4.5, -4.5, 10.5) # a1b1, a2b1, a1b2, a2b2 (A=15, B=6)

(2) We need not worry about effect size standardization, as $\sigma_{\epsilon}^2=1$, and all variance components are homogeneous.

(3) Each participant can receive a single ($R=1$) or balanced multiple trials ($R>1$) for each of the four conditions.

(4) Random intercepts, $s_k$, are always included.

(5a) Random slopes are included, or (5b) Random slopes are excluded ($\sigma_{\alpha s}^2=0$, $\sigma_{\beta s}^2=0$, and $\sigma_{\alpha\beta s}^2=0$) in simulations.

rnorm(N, 1, 0) # constant ones (fixed slope)
Both set.seed(123); rnorm(3, 100, 10) and set.seed(123); 100 + rnorm(3, 0, 10) generate identical samples.

We have already implemented (1) to (5a) in Eqs. 2-7 (contrast-based parameterization) to investigate the distributional assumptions (by changing the distributions for Eqs. 4-7) within the 2×2 repeated-measures design. The same code can be easily extended to (5b) with minimal modifications.

\[\begin{align} y_{ijkr}&=\mu_k+b_{1k}\cdot C_{i}+b_{2k}\cdot C_{j}+b_{12k}\cdot C_{i}\cdot C_{j}+\epsilon_{ijkr} \tag{2} \\ &\quad -C_1=C_2=C\equiv.5 \tag{3} \\ \mu_k&\sim\mathcal{N}(\mu,\ \sigma_0^2) \tag{4} \\ b_{1k}&\sim\mathcal{N}(B_1,\ \sigma_1^2) \tag{5} \\ b_{2k}&\sim\mathcal{N}(B_2,\ \sigma_2^2) \tag{6} \\ b_{12k}&\sim\mathcal{N}(0,\ \sigma_{12}^2) \tag{$*$} \\ \epsilon_{ijkr}&\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ 1) \tag{7} \end{align}\]

simData <- function(N, n=1, B1=0, sd0=0.5, isRIO=F, sd1=0.5, sd2=0.5, sd12=0) {
  #' Input -
  #' N:         number of participants
  #' n:         number of (balanced) trials
  #' B1:        fixed effect size for factor A of interest
  #' sd0:       standard deviation for random intercepts
  #' isRIO:     if TRUE, random slopes have sd suppress to 0 (i.e. random intercepts only)
  #' sd1:       standard deviation for random slopes for (contrasted) factor A
  #' sd2:       standard deviation for random slopes for (contrasted) factor B
  #' sd12:      standard deviation for random slopes for the (contrasted) interaction AB
  #'
  #' Output - a long format data frame for a 2x2 repeated-measures design
  
  C1 <- C2 <- c(-.5, .5) # contrast vectors for factors A and B, respectively (sum coding)
  B2 <- 0.5 # fixed effect size for factor B
  
  s <- rnorm(N, 0, sd0) # random intercepts for each participant
  
  # fixed or random slopes for factor A (this is the SECOND factor in our paper)
  b1 <- rnorm(N, B1, ifelse(isRIO, 0, sd1))
  
  # fixed or random slopes for factor B (this is the FIRST factor in our paper)
  b2 <- rnorm(N, B2, ifelse(isRIO, 0, sd2))
  
  # fixed or random slopes for the interaction AB
  b12 <- rnorm(N, 0, ifelse(isRIO, 0, sd12))
  
  # a matrix of condition contrasts for the 2x2 design
  conds <- expand.grid("A"=C1, "B"=C2)
  conds$AB <- conds$A * conds$B

  # each row of `M` represents one participant's set of four condition means
  M <- s + cbind(b1, b2, b12) %*% t(conds) # a1b1, a2b1, a1b2, a2b2 (N×4 matrix)

  DV <- rnorm(N*4*n, c(M), 1) # adding noise (residual variance = 1)

  data.frame("DV" = DV,
             "ID" = rep(paste0("s",1:N), 4*n),
             "A" = rep(c("a1","a2"), each=N, times=2*n),
             "B" = rep(c("b1","b2"), each=N*2, times=n),
             "Trial" = rep(paste0("r",1:n), each=N*4),
             stringsAsFactors = T) # long format data
}

With Random Slopes for $A$ and $B$

set.seed(277)
df1 <- simData(N=1000, n=1, B1=0.25, isRIO=F) # simulating with random slopes for A and B

mat1 <- matrix(df1$DV, ncol=4, 
               dimnames=list(unique(df1$ID), c("a1b1", "a2b1", "a1b2", "a2b2"))) # wide format data

round(cor(mat1),2) # sample correlation matrix (compound symmetry is violated)

##      a1b1 a2b1 a1b2 a2b2
## a1b1 1.00 0.16 0.18 0.08
## a2b1 0.16 1.00 0.08 0.14
## a1b2 0.18 0.08 1.00 0.21
## a2b2 0.08 0.14 0.21 1.00

\[\begin{align} \operatorname{Cov}(y_{ijkr},\ y_{lmkr})&=\operatorname{Cov}\big(\mu_k+b_{1k}\cdot C_{i}+b_{2k}\cdot C_{j}+b_{12k}\cdot C_{i}\cdot C_{j}+\epsilon_{ijkr}, \\ &\qquad\qquad\mu_k+b_{1k}\cdot C_{l}+b_{2k}\cdot C_{m}+b_{12k}\cdot C_{l}\cdot C_{m}+\epsilon_{lmkr}\big) \\ &=\operatorname{Var}(\mu_k)+C_{i}C_{l}\cdot\operatorname{Var}(b_{1k})+C_{j}C_{m}\cdot\operatorname{Var}(b_{2k})+C_{i}C_{j}C_{l}C_{m}\cdot\operatorname{Var}(b_{12k})+\operatorname{Cov}(\epsilon_{ijkr},\ \epsilon_{lmkr}) \tag{8} \end{align}\]

The marginal variance-covariance matrix of $y_{11}$, $y_{21}$, $y_{12}$, and $y_{22}$ is given by Eq. 9.

\[\begin{equation} \tag{9} \tiny \begin{pmatrix} \sigma_0^2+C^2\sigma_1^2+C^2\sigma_2^2+C^4\sigma_{12}^2+\sigma_\epsilon^2 &\sigma_0^2-C^2\sigma_1^2+C^2\sigma_2^2-C^4\sigma_{12}^2 &\sigma_0^2+C^2\sigma_1^2-C^2\sigma_2^2-C^4\sigma_{12}^2 &\sigma_0^2-C^2\sigma_1^2-C^2\sigma_2^2+C^4\sigma_{12}^2 \\ \bullet &\sigma_0^2+C^2\sigma_1^2+C^2\sigma_2^2+C^4\sigma_{12}^2+\sigma_\epsilon^2 &\sigma_0^2-C^2\sigma_1^2-C^2\sigma_2^2+C^4\sigma_{12}^2 &\sigma_0^2+C^2\sigma_1^2-C^2\sigma_2^2-C^4\sigma_{12}^2 \\ \bullet & \bullet &\sigma_0^2+C^2\sigma_1^2+C^2\sigma_2^2+C^4\sigma_{12}^2+\sigma_\epsilon^2 &\sigma_0^2-C^2\sigma_1^2+C^2\sigma_2^2-C^4\sigma_{12}^2 \\ \bullet & \bullet & \bullet &\sigma_0^2+C^2\sigma_1^2+C^2\sigma_2^2+C^4\sigma_{12}^2+\sigma_\epsilon^2 \end{pmatrix} \end{equation}\]

When $C=.5$, $\sigma_0=\sigma_1=\sigma_2=\sigma=0.5$, $\sigma_{12}=0$, and $\sigma_\epsilon=1$,
the correlations have two values, $\frac{\sigma^2-2C^2\sigma^2}{\sigma^2+2C^2\sigma^2+\sigma_\epsilon^2}=\frac{1}{11}\approx.09$ and $\frac{\sigma^2}{\sigma^2+2C^2\sigma^2+\sigma_\epsilon^2}=\frac{2}{11}\approx.18$.

With Random Slopes for All

set.seed(277); R <- 3
df3 <- simData(N=1000, n=R, B1=0.25, 
               sd12=1, isRIO=F) # simulating with random slopes for A, B, and AB
df3 <- df3[order(df3$A, df3$B),]

mat3 <- matrix(df3$DV, ncol=4, 
               dimnames=list(rep(unique(df3$ID), R), 
                             c("a1b1", "a1b2", "a2b1", "a2b2"))) # wide format data

round(cor(mat3),2) # sample correlation matrix

##      a1b1 a1b2 a2b1 a2b2
## a1b1 1.00 0.10 0.14 0.12
## a1b2 0.10 1.00 0.11 0.12
## a2b1 0.14 0.11 1.00 0.12
## a2b2 0.12 0.12 0.12 1.00

Compound symmetry holds because the variance components of the random slopes for $A$, $B$, and $AB$ are equal (Be sure to consider the contrast weight).

When $C=.5$, $\sigma_0=\sigma_1=\sigma_2=\sigma=0.5$, $\sigma_{12}=2\sigma=1$, and $\sigma_\epsilon=1$,
the correlations is $\frac{\sigma^2-C^2\sigma^2}{\sigma^2+3C^2\sigma^2+\sigma_\epsilon^2}=\frac{3}{23}\approx.13$.

Without Random Slopes

set.seed(277)
df2 <- simData(N=1000, n=1, B1=0.25, isRIO=T) # simulating without random slopes

mat2 <- matrix(df2$DV, ncol=4, 
               dimnames=list(unique(df2$ID), c("a1b1", "a2b1", "a1b2", "a2b2"))) # wide format data

round(cor(mat2),2) # sample correlation matrix (compound symmetry holds)

##      a1b1 a2b1 a1b2 a2b2
## a1b1 1.00 0.17 0.21 0.18
## a2b1 0.17 1.00 0.20 0.18
## a1b2 0.21 0.20 1.00 0.16
## a2b2 0.18 0.18 0.16 1.00

The correlation is $\frac{\sigma^2}{\sigma^2+\sigma_\epsilon^2}=\frac{0.5^2}{0.5^2+1}=.2$.

Covariance Matrix

Based on $(\alpha s)_{ik}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{\alpha s}^2)$, $\quad(\beta s)_{jk}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{\beta s}^2)$, and $(\alpha\beta s)_{ijk}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{\alpha\beta s}^2)\quad$ (note “i.i.d.”),
the marginal variance-covariance matrix for $y_{11}$, $y_{21}$, $y_{12}$, and $y_{22}$ is incorrectly given by Eq. ⛌.

\[\begin{equation} \tag{⛌} \scriptsize \begin{pmatrix} \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 & \sigma_s^2+\sigma_{\beta s}^2 & \sigma_s^2+\sigma_{\alpha s}^2 & \sigma_s^2 \\ \bullet & \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 & \sigma_s^2 & \sigma_s^2+\sigma_{\alpha s}^2 \\ \bullet & \bullet & \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 & \sigma_s^2+\sigma_{\beta s}^2 \\ \bullet & \bullet & \bullet & \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 \end{pmatrix} \end{equation}\]

Note that covariances cannot be negative in Eq. ⛌. This derivation is incorrect because it does not account for the sum-to-zero constraints.

From Equation 1 to 2

$y_{ijkr}=\mu+\alpha_i+\beta_j+(\alpha\beta)_{ij}+s_k+(\alpha s)_{ik}+(\beta s)_{jk}+(\alpha\beta s)_{ijk}+\epsilon_{ijkr}$

$y_{ijkr}=\mu_k+b_{1k}\cdot C_{i}+b_{2k}\cdot C_{j}+b_{12k}\cdot C_{i}\cdot C_{j}+\epsilon_{ijkr}$

\[\begin{align} \alpha_i&=B_1\cdot C_i \tag{11} \\ \beta_j&=B_2\cdot C_j \tag{12} \\ (\alpha s)_{ik}&=(b_{1k}-B_1)\cdot C_i \tag{13} \\ (\beta s)_{jk}&=(b_{2k}-B_2)\cdot C_j \tag{14} \end{align}\]

Verify the effect size for factor $A$: $\quad\alpha_2-\alpha_1=B_1\cdot C_2-B_1\cdot C_1=.5B_1-(-.5)B_1=B_1$

\[\begin{align} \sigma_{s}^2&=\sigma_0^2 \tag{15} \\ \sigma_{\alpha s}^2&=C^2\sigma_1^2 \tag{16} \\ \sigma_{\beta s}^2&=C^2\sigma_2^2 \tag{17} \\ \sigma_{\alpha\beta s}^2&=C^4\sigma_{12}^2 \tag{18} \end{align}\]

\[\begin{equation} \tag{$9^\prime$} \scriptsize \begin{pmatrix} \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 &\sigma_s^2-\sigma_{\alpha s}^2+\sigma_{\beta s}^2-\sigma_{\alpha\beta s}^2 &\sigma_s^2+\sigma_{\alpha s}^2-\sigma_{\beta s}^2-\sigma_{\alpha\beta s}^2 &\sigma_s^2-\sigma_{\alpha s}^2-\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2 \\ \bullet & \sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 &\sigma_s^2-\sigma_{\alpha s}^2-\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2 &\sigma_s^2+\sigma_{\alpha s}^2-\sigma_{\beta s}^2-\sigma_{\alpha\beta s}^2 \\ \bullet & \bullet &\sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 &\sigma_s^2-\sigma_{\alpha s}^2+\sigma_{\beta s}^2-\sigma_{\alpha\beta s}^2 \\ \bullet & \bullet & \bullet &\sigma_s^2+\sigma_{\alpha s}^2+\sigma_{\beta s}^2+\sigma_{\alpha\beta s}^2+\sigma_\epsilon^2 \end{pmatrix} \end{equation}\]

Discussion

Question 1: Can any covariance term be negative? (Yes)

Question 2: How do we simulate unequal variance terms? (Heteroscedasticity $\sigma_{\epsilon\ ij}^2$)

Question 3: How do we extend to an $a\times b$ repeated-measures design?

\[\begin{align} \boldsymbol{\Sigma}_{ab} &= \sigma_\epsilon^2\mathbf{I}_{ab} +\sigma_s^2\mathbf{J}_{ab} +\sigma_{1}^2\mathbf{J}_b\otimes\big(\mathbf{Q}_{\alpha s}\mathbf{Q}_{\alpha s}^\top\big) +\sigma_{2}^2\big(\mathbf{Q}_{\beta s}\mathbf{Q}_{\beta s}^\top\big)\otimes\mathbf{J}_a +\sigma_{12}^2\big(\mathbf{Q}_{\beta s}\mathbf{Q}_{\beta s}^\top\big)\otimes\big(\mathbf{Q}_{\alpha s}\mathbf{Q}_{\alpha s}^\top\big) \\ &= \sigma_\epsilon^2\mathbf{I}_{ab} +\sigma_s^2\mathbf{J}_{ab} +\sigma_{1}^2\mathbf{J}_b\otimes\big(\mathbf{I}_a-a^{-1}\mathbf{J}_a\big) +\sigma_{2}^2\big(\mathbf{I}_b-b^{-1}\mathbf{J}_b\big)\otimes\mathbf{J}_a +\sigma_{12}^2\big(\mathbf{I}_b-b^{-1}\mathbf{J}_b\big)\otimes\big(\mathbf{I}_a-a^{-1}\mathbf{J}_a\big) \end{align}\]

scroll to top