The Bayes Factors and P-Values under Model Misspecification

============================================================================================================ The Bayes factor: https://rpubs.com/sherloconan/1160305

1. Skewed $t$

1.1 Skew Student-$t$ Distribution

Same population mean and variance.

x <- seq(-3, 3, by=0.01)
p0 <- dnorm(x, mean=0, sd=1)
p1 <- fGarch::dsstd(x, mean=0, sd=1, nu=4, xi=20)

plot(x, p1, type="l", lty=2, lwd=2, col="red", xlab="x", ylab="Density")
lines(x, p0, type="l", lwd=2)
legend("topright",c("Violated","Normal"), col=c("red","black"), lty=c(2,1), lwd=2, bty="n")

1.2 Simulation (True Effect = 0)

NSim <- 500 #number of replicates

viz3(10, NSim, 0) #function is defined in Section 6

viz3(20, NSim, 0)

viz3(100, NSim, 0)

viz3(1000, NSim, 0)

1.3 Simulation (True Effect = 0.5)

viz3(10, NSim, 0.5)

viz3(20, NSim, 0.5)

viz3(100, NSim, 0.5)

viz3(1000, NSim, 0.5)

2. Uniform

2.1 Continuous Uniform Distribution

Same population mean and variance.

x <- seq(-3, 3, by=0.01)
p0 <- dnorm(x, mean=0, sd=1)
p1 <- dunif(x, min=-sqrt(3), max=sqrt(3))
# mean = (a + b) / 2
# variance = (b - a)^2 / 12 ==>
# a = mean - sqrt(3 * variance)
# b = mean + sqrt(3 * variance)

plot(x, p1, type="l", lty=2, lwd=2, col="red", xlab="x", ylab="Density", ylim=c(0, 0.42))
lines(x, p0, type="l", lwd=2)
legend("topright",c("Violated","Normal"), col=c("red","black"), lty=c(2,1), lwd=2, bty="n")

2.2 Simulation (True Effect = 0)

viz3(10, NSim, 0, dist="unif")

viz3(20, NSim, 0, dist="unif")

viz3(100, NSim, 0, dist="unif")

viz3(1000, NSim, 0, dist="unif")

2.3 Simulation (True Effect = 0.5)

viz3(10, NSim, 0.5, dist="unif")

viz3(20, NSim, 0.5, dist="unif")

viz3(100, NSim, 0.5, dist="unif")

viz3(1000, NSim, 0.5, dist="unif")

3. Triangular

3.1 Symmetric Triangular Distribution

Same population mean and variance.

x <- seq(-3, 3, by=0.01)
p0 <- dnorm(x, mean=0, sd=1)

sym_tri_pdf <- function(x, a, b) {
  #' define the symmetric triangular distribution PDF
  c <- (a + b) / 2
  ifelse(x < a | x > b, 0, ((b - c) - abs(c - x)) / ((b-c)^2))
}

plot(x, sym_tri_pdf(x, -sqrt(6), sqrt(6)),
     type="l", lty=2, lwd=2, col="red", xlab="x", ylab="Density")
lines(x, p0, type="l", lwd=2)
legend("topright",c("Violated","Normal"), col=c("red","black"), lty=c(2,1), lwd=2, bty="n")

3.2 Simulation (True Effect = 0)

viz3(10, NSim, 0, dist="tri")

viz3(20, NSim, 0, dist="tri")

viz3(100, NSim, 0, dist="tri")

viz3(1000, NSim, 0, dist="tri")

3.3 Simulation (True Effect = 0.5)

viz3(10, NSim, 0.5, dist="tri")

viz3(20, NSim, 0.5, dist="tri")

viz3(100, NSim, 0.5, dist="tri")

viz3(1000, NSim, 0.5, dist="tri")

4. Increase $\sigma_\epsilon^2$

4.1 Simulation (True Effect = 0)

viz3(10, NSim, 0, dist="100sd")

viz3(20, NSim, 0, dist="100sd")

viz3(100, NSim, 0, dist="100sd")

viz3(1000, NSim, 0, dist="100sd")

4.2 Simulation (True Effect = 0.5)

viz3(10, NSim, 0.5, dist="100sd")

viz3(20, NSim, 0.5, dist="100sd")

viz3(100, NSim, 0.5, dist="100sd")

viz3(1000, NSim, 0.5, dist="100sd")

5. The $3p\sqrt{n}$ Rule

\[\begin{equation} \tag{1} y_i\overset{\text{i.i.d.}}{\sim}\mathcal{N}(\mu,\ \sigma_\epsilon^2)\quad\text{for }i=1,\dotsb,n \end{equation}\]

The test statistic $W$ in the Wald test is $W=\left(\frac{\hat{\mu}-\mu_0}{\text{se}(\hat{\mu})}\right)^2\sim\chi_1^2$ under $\mathcal{H}_0$, where $\hat{\mu}$ is the maximum likelihood estimate, and $\text{se}(\hat{\mu})$ is the standard error. Therefore, $p$-value.

pchisq(W, 1, lower.tail=F) #W and p-value are one to one
qchisq(p, 1, lower.tail=F) #inverse CDF of chisq(1): 1-p → W → JAB

In a one-sample $t$-test, $\mathcal{H}_0:\ \mu=0$ versus $\mathcal{H}_1:\ \mu\neq 0$,

$\mu_0=0$, $\quad\hat{\mu}=\bar{y}$, $\quad\text{se}(\hat{\mu})=s_y/\sqrt{n}$, $\quad$ and $W=t^2$,

where $t=\bar{y}\sqrt{n}/s_y$, $\quad\bar{y}$ is the sample mean, and $s_y$ is the sample standard deviation.

Wagenmakers (2022; Eq. 6) reviewed the Jeffreys-style approximate Bayes factor (JAB) and proposed a piecewise approximation from $p$-value and sample size in Eq. 4.

\[\begin{equation} \tag{2} \textit{BF}_{01}=\frac{p(\boldsymbol{y}\mid\mathcal{H}_0)}{p(\boldsymbol{y}\mid\mathcal{H}_1)}=\frac{p(\boldsymbol{y}\mid\mu_0)}{\int p(\boldsymbol{y}\mid\mu)\cdot g(\mu)\text{d}\mu} \end{equation}\]

Using the normal unit-information prior for $\mu$, \[\begin{equation} \tag{3} \textit{JAB}_{01}=\sqrt{n}\exp\left\{-\frac{1}{2}W\right\} \end{equation}\]

\[ \textit{JAB}_{01}\approx \begin{cases} \tag{4} \text{$\ p^{\frac{1}{4}}\sqrt{n}$,} & \text{$p>0.5$} \\ \text{$\ \sqrt{pn}$,} & \text{$0.1<p\leqslant0.5$} \\ \text{$\ 3p\sqrt{n}$,} & \text{$p\leqslant0.1$} \end{cases} \]

6. Function

viz3 <- function(N, NSim, ES, SD=1, nu=4, sk=20, dist="skew") {
  #' Input -
  #' N:     number of subjects
  #' NSim:  number of replicates
  #' ES:    effect size
  #' SD:    error standard deviation
  #' nu:    shape parameter (degrees of freedom) of the skewed t
  #' sk:    skewness parameter of the skewed t
  #' dist:  distribution that violates the normality, but has the same population mean and variance
  #'        "skew" (default), skewed t.
  #'        "unif", continuous uniform.
  #'        "tri", symmetric triangular
  #'        "100sd", increase error standard deviation
  #' Dependencies - BayesFactor, fGarch, ggplot2
  #' Output - scatter plot of the p-values and the Bayes factors
  
  computeBF <- function(data) {
    #' Input -
    #' data:    one sample
    #' Output - the Bayes factor
    set.seed(277)
    bf <- BayesFactor::ttestBF(data)
    if (BayesFactor::extractBF(bf)$error < 0.01) { #negligible Monte Carlo errors
      BayesFactor::extractBF(bf)$bf
    } else {return(NA)}
  }
  
  BF <- BF_rep <- BF_vio <- pVal <- pVal_rep <- pVal_vio <- numeric(NSim)
  for (i in 1:NSim) {
    set.seed(i)
    y0 <- rnorm(N, ES, SD)
    pVal[i] <- t.test(y0)$p.value
    BF[i] <- computeBF(y0)

    set.seed(i-NSim)
    y0_rep <- rnorm(N, ES, SD)
    pVal_rep[i] <- t.test(y0_rep)$p.value
    BF_rep[i] <- computeBF(y0_rep)
    
    set.seed(i+NSim)
    if (dist=="skew") {
      y1 <- fGarch::rsstd(N, mean=ES, sd=SD, nu=nu, xi=sk)
      label <- "Skewed"
      
    } else if (dist=="unif") {
      # mean = (a + b) / 2
      # variance = (b - a)^2 / 12 
      y1 <- runif(N, min=ES-sqrt(3)*SD, max=ES+sqrt(3)*SD)
      label <- "Uniform"
      
    } else if (dist=="tri") {
      # mean = (a + b) / 2
      # variance = (b - a)^2 / 24
      a <- ES - sqrt(6) * SD
      b <- ES + sqrt(6) * SD
      u <- runif(N) #generate from U(0,1)
      y1 <- ifelse(u < 0.5,
                   a+sqrt((b-a)^2*u/2),
                   b-sqrt((b-a)^2*(1-u)/2)) #inverse transformation method
      label <- "Triangular"
      
    } else if (dist=="100sd") {
      y1 <- rnorm(N, ES, SD*100)
      label <- "Normal (100×SD)"
      
    } else {stop("Bad Input!")}
    pVal_vio[i] <- t.test(y1)$p.value
    BF_vio[i] <- computeBF(y1)
  }
  
  df <- data.frame("Model"=factor(rep(c("Normal","Normal (rep)",label),each=NSim),
                                  levels=c("Normal","Normal (rep)",label)),
                   "pVal"=c(pVal, pVal_rep, pVal_vio))
  
  if (ES==0) { #null
    df$BF <- 1/c(BF, BF_rep, BF_vio) #BF01
    ylab <- expression(italic("BF")["01"])
    yline <- c(1/3,3); ytext <- c("1/3","3")
  } else {
    df$BF <- log(c(BF, BF_rep, BF_vio)) #log(BF10)
    ylab <- expression("ln("~italic("BF")[10]~")")
    yline <- c(-log(3),log(3)); ytext <- c("-ln3","ln3")
  }

  print(ggplot(df, aes(pVal, BF))+
    geom_point(size=0.5,color="#8263d9",alpha=.2)+
    facet_grid(. ~ Model)+
    geom_hline(yintercept=yline,linetype="dashed")+
    geom_vline(xintercept=.05,color="red")+
    scale_x_continuous(breaks=c(.05,1))+
    annotate("text", x=Inf, y=yline[1], label=ytext[1], hjust=1.1, vjust=-0.5, color="gray")+
    annotate("text", x=Inf, y=yline[2], label=ytext[2], hjust=1.1, vjust=-0.5, color="gray")+
    labs(x=expression(italic("P")~"-Value"),y=ylab,
         title=expression("One-Sample"~italic(t)~"-Test"),
         subtitle=paste0("Number of Subjects = ",N,", Number of Replicates = ",NSim,", Effect Size = ",ES))+
    theme_classic())
  
  ggplot(df, aes(pVal, BF, color=Model))+
    geom_point(size=0.5,alpha=.1)+
    scale_color_manual(values=c("#bac2ff", "#4e91fd", "#ff4848"))+
    geom_hline(yintercept=yline,linetype="dashed")+
    geom_vline(xintercept=.05,color="red")+
    scale_x_continuous(breaks=c(.05,1))+
    annotate("text", x=Inf, y=yline[1], label=ytext[1], hjust=1.1, vjust=-0.5, color="gray")+
    annotate("text", x=Inf, y=yline[2], label=ytext[2], hjust=1.1, vjust=-0.5, color="gray")+
    labs(x=expression(italic("P")~"-Value"),y=ylab)+
    theme_classic()+
    theme(legend.position="bottom")+
    guides(colour=guide_legend(override.aes=list(size=3, alpha=1),nrow=1))
}

The Bayes Factors and \(P\)-Values
under Model Misspecification

Skewed Residuals in the One-Sample t-Test

Zhengxiao Wei, Farouk S. Nathoo, Michael E. J. Masson

2024-03-20

1. Skewed \(t\)

1.1 Skew Student-\(t\) Distribution

1.2 Simulation (True Effect = 0)

1.3 Simulation (True Effect = 0.5)

2. Uniform

2.1 Continuous Uniform Distribution

2.2 Simulation (True Effect = 0)

2.3 Simulation (True Effect = 0.5)

3. Triangular

3.1 Symmetric Triangular Distribution

3.2 Simulation (True Effect = 0)

3.3 Simulation (True Effect = 0.5)

4. Increase \(\sigma_\epsilon^2\)

4.1 Simulation (True Effect = 0)

4.2 Simulation (True Effect = 0.5)

5. The \(3p\sqrt{n}\) Rule

6. Function

The Bayes Factors and \(P\)-Values under Model Misspecification

Skewed Residuals in the One-Sample t-Test

Zhengxiao Wei, Farouk S. Nathoo, Michael E. J. Masson

2024-03-20

1. Skewed \(t\)

1.1 Skew Student-\(t\) Distribution

1.2 Simulation (True Effect = 0)

1.3 Simulation (True Effect = 0.5)

2. Uniform

2.1 Continuous Uniform Distribution

2.2 Simulation (True Effect = 0)

2.3 Simulation (True Effect = 0.5)

3. Triangular

3.1 Symmetric Triangular Distribution

3.2 Simulation (True Effect = 0)

3.3 Simulation (True Effect = 0.5)

4. Increase \(\sigma_\epsilon^2\)

4.1 Simulation (True Effect = 0)

4.2 Simulation (True Effect = 0.5)

5. The \(3p\sqrt{n}\) Rule

6. Function

The Bayes Factors and \(P\)-Values
under Model Misspecification