============================================================================================================ PREVIOUS: https://rpubs.com/sherloconan/1151407
THIS: https://rpubs.com/sherloconan/1160305
MORE: https://github.com/zhengxiaoUVic/Bayesian
The Kullback-Leibler (KL) divergence, also known as relative entropy, measures the asymmetric statistical distance between two probability measures, which can be either probability mass functions (PMFs in Eq. 0 Row 1) or probability density functions (PDFs in Eq. 0 Row 2), across the same sample space.
\[ D_{\text{KL}}(P\parallel Q):=\mathbb{E}_{X\sim p(x)}\left[\log\frac{p(X)}{q(X)}\right]= \begin{cases} \tag{0} \text{$\ \sum_{x\in\mathcal{X}}P(x)\cdot\log\frac{P(x)}{Q(x)}$} \\ \\ \text{$\ \int_{\mathcal{X}} p(x)\cdot\log\frac{p(x)}{q(x)}\text{d}x$} \end{cases} \]
Read more lecture notes from Wasserman (2020).
Typically, \(P\) may represent the posterior, the “true” distribution of data or observations, or a precisely calculated theoretical distribution. Distribution \(Q\) may represent a prior, a theory, a model, a description or an approximation of \(P\).
\(D_{\text{KL}}(P\parallel Q)\) measures how \(P\) is different from the reference \(Q\). Confusingly, in different contexts, \(D_{\text{KL}}(P\parallel Q)\) may be interpreted as how \(Q\) is different from the reference \(P\), using the same formula.
In certain contexts, it is a measure of the information lost when \(Q\) is used to approximate or dominate \(P\). Note that the units used in this calculation are called nats, which is short for “natural units of information”. When using a base-2 logarithm, the KL divergence is expressed in terms of bits rather than nats.
In the objective Bayesian framework, Bernardo’s reference prior is designed to maximize the expected KL divergence between the posterior and the prior distributions. This approach allows the data to exert maximum influence on the posterior estimates.
Example 1:
The KL Divergence from a \(t\)-Distribution to a Standard Normal
Distribution
curve(dnorm, -3, 3, lwd=2, xlab="x", ylab="Density", main="Probability Density Functions")
curve(dt(x, df=5), lwd=2, lty=2, col="red", add=T)
curve(dt(x, df=20), lwd=2, lty=3, col="blue", add=T)
legend("topright",c("Normal","Student t(5)","Student t(20)"),col=c("black","red","blue"),lty=1:3,lwd=2,bty="n")
We can compute the KL divergence between two PDFs, given two samples from their respective populations. However, the KL divergence is not directly computed from the samples themselves, but rather from the estimated PDFs, e.g., using smoothing methods.
Among these methods, the \(k\)-nearest neighbors framework allows to estimate the consistent and asymptotically unbiased entropy of a PDF directly from the data set, i.e., without explicitly estimating the PDF (Boltz, Debreuve, & Barlaud, 2007, Eq. 7).
The KL.divergence function from the “FNN” package (usage rank 327) returns a vector of \(k\) values, using 1 to \(k\) nearest neighbors, respectively (you may encounter negative values; be careful with ChatGPT 4’s made-up answers).
nS <- 500; nK <- 10
set.seed(277)
sample_norm <- rnorm(nS)
sample_t5 <- rt(nS, df=5)
round(FNN::KL.divergence(sample_norm, sample_t5, k=5), 3)
## [1] -0.090 -0.015 0.018 0.045 0.058
round(FNN::KL.divergence(sample_norm, sample_t5, k=nK), 3)
## [1] -0.090 -0.015 0.018 0.045 0.058 0.057 0.040 0.031 0.038 0.063
sample_t20 <- rt(nS, df=20)
round(FNN::KL.divergence(sample_norm, sample_t20, k=nK), 3)
## [1] -0.041 0.011 0.036 0.035 0.043 0.025 -0.003 0.017 0.014 0.026
sample_rep <- rnorm(nS)
round(FNN::KL.divergence(sample_norm, sample_rep, k=nK), 3)
## [1] -0.097 -0.036 -0.016 -0.020 0.001 0.008 -0.006 0.013 0.011 0.009
ggplot(data.frame("x"=c(sample_norm, sample_rep, sample_t5, sample_t20),
"Model"=factor(rep(c("Normal",
"Normal (rep)",
"Student t(5)",
"Student t(20)"),each=nS))),
aes(x))+
geom_histogram(binwidth=0.3,fill="white",color="black")+
geom_rug(alpha=.1)+
facet_grid(. ~ Model)+
geom_vline(xintercept=0, color="purple")+
labs(x="Sample", y="Count", title="Histograms")+
theme_classic()
Example 2:
The KL Divergence Between the
Exponential Distributions
\(f_X(x;\lambda)=\lambda e^{-\lambda x},\quad x\geqslant 0\)
\(D_{\text{KL}}(f_X(x;\lambda_1)\parallel f_X(x;\lambda_2))=\lambda_1\int_0^\infty\left(\ln\frac{\lambda_1}{\lambda_2}\cdot e^{-\lambda_1 x}-(\lambda_1-\lambda_2)x\cdot e^{-\lambda_1 x}\right)\text{d}x=\ln\frac{\lambda_1}{\lambda_2}+\frac{\lambda_2}{\lambda_1}-1\)
Warning: Work in Progress
KL_exp <- function(nS=10000, nK=5, seed=1000, ...) {
#' Example code of ?KL.divergence
nS <- nS; nK <- nK; set.seed(seed)
X <- rexp(nS, rate=0.2)
Y <- rexp(nS, rate=0.4) #sample sizes need not be equal
KL_div <- FNN::KL.divergence(X, Y, k=nK)
KL_div <- KL_div[KL_div < Inf & KL_div > -Inf]
plot(KL_div, xlab=paste0("Number of Nearest Neighbors 1-", nK), ylab="KL Divergence Estimate",
main="KL Divergence of Exp(0.2) from Exp(0.4)", sub=paste0("Sample Size = ", nS), ...)
abline(h=1-log(2), col="red")
# Theoretical divergence = log(0.2/0.4)+(0.4/0.2)-1 = 1-log(2) = 0.3068528
# cat("The overall KL divergence of Exp(0.2) from Exp(0.4) is", mean(KL_div),
# "nats, \ngiven", nK, "nearest neighbors and", nS, "samples.")
}
par(mfrow=c(1,2))
KL_exp(nK=100, ylim=c(0.05, 0.35))
KL_exp(nS=1000, nK=100, ylim=c(0.05, 0.35))
Boltz, Debreuve, and Barlaud (2007, p. 3; one of the two key references) recommend choosing \(k=\sqrt{N}\), a value that, however, does not yield satisfactory results in the examples presented here. Indeed, the plots may not display any clear guidelines.
The KL Divergence Between the Univariate Normal Distributions.
\(D_{\text{KL}}(\mathcal{N}(\mu_1,\sigma_1^2)\parallel\mathcal{N}(\mu_2,\sigma_2^2))=\ln\frac{\sigma_2}{\sigma_1}+\frac{\sigma_1^2+(\mu_1-\mu_2)^2}{2\sigma_2^2}-\frac{1}{2}\)
KL_norm <- function(nS=10000, nK=5, seed=1000, ...) {
nS <- nS; nK <- nK; set.seed(seed)
X <- rnorm(nS)
Y <- rnorm(nS) #sample sizes need not be equal
KL_div <- FNN::KL.divergence(X, Y, k=nK)
KL_div <- KL_div[KL_div < Inf & KL_div > -Inf]
plot(KL_div, xlab=paste0("Number of Nearest Neighbors 1-", nK), ylab="KL Divergence Estimate",
main="KL Divergence of Two N(0,1)", sub=paste0("Sample Size = ", nS), ...)
abline(h=0, col="red")
}
par(mfrow=c(1,2))
KL_norm(nK=100, ylim=c(-0.04, 0.02))
KL_norm(nS=1000, nK=100, ylim=c(-0.04, 0.02))
Discussion
“[FNN] miserably fails sometimes, but it works in simple cases and
with large samples
(for samples smaller than 100 it
can behave erratically).” Ref.
“Even for large datasets, [the KL divergence estimate] jumps around when the number of nearest neighbors is small.” Ref.
“This study is far from exhaustive, and timings are sensitive to implementation details. Please take with a pinch of salt.” Ref.
The square of the Hellinger distance between two PDFs \(P\) and \(Q\) is loosely defined as
\[\begin{equation*} H^2(P,Q)=\frac{1}{2}\int_{\mathcal{X}}\left(\sqrt{p(x)}-\sqrt{q(x)}\right)^2\text{d}x. \end{equation*}\]
Example 2 (Continued):
The Hellinger Distance
Between \(\text{Exp}(\lambda_1)\) and
\(\text{Exp}(\lambda_2)\)
\(H(f_X(x;\lambda_1),\ f_X(x;\lambda_2))=\sqrt{1-\frac{2\sqrt{\lambda_1\lambda_2}}{\lambda_1+\lambda_2}}\)
Use the “statip” R package (v 0.2.3).
rate1 <- 0.2; rate2 <- 0.4
set.seed(277)
x <- rexp(nS, rate=rate1) #nS = 500
y <- rexp(nS, rate=rate2) #sample sizes need not be equal
statip::hellinger(x, y, 0, Inf)
## [1] 0.3132922
sqrt(1-2*sqrt(rate1*rate2)/(rate1+rate2)) #the theoretical Hellinger distance value
## [1] 0.2391463
The Hellinger Distance Between \(\mathcal{N}(\mu_1,\sigma_1^2)\) and \(\mathcal{N}(\mu_2,\sigma_2^2)\).
\(H(\mathcal{N}(\mu_1,\sigma_1^2),\ \mathcal{N}(\mu_2,\sigma_2^2))=\sqrt{1-\sqrt{\frac{2\sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2}}\exp{\left\{-\frac{1}{4}\frac{(\mu_1-\mu_2)^2}{\sigma_1^2+\sigma_2^2}\right\}}}\)
set.seed(277)
round(replicate(10, statip::hellinger(rnorm(nS), rnorm(nS), -Inf, Inf)), 3) # H(two N(0,1)) = 0
## [1] 0.066 0.066 0.059 0.062 0.067 0.040 0.089 0.059 0.053 0.072
Discussion
“[To produce some Hellinger-distance like measure of discrepancy between the two empirical distributions,] the answer is sensitive to the choice of bandwidth [with kernel density estimates].” Ref.
The Kolmogorov–Smirnov (KS) test is a nonparametric test that addresses two question:
How likely is it that we would see a collection of samples like this if they were drawn from that probability distribution?
How likely is it that we would see two sets of samples like this if they were drawn from the same (but unknown) probability distribution?
Comparing a Sample to a Theoretical Distribution
One-Sample KS Statistic
\[\begin{equation*} D_N:=\sup_x\lvert F_N(x)-F(x)\rvert \end{equation*}\]
The largest vertical distance between the two cumulative distribution functions (CDFs) across \(x\).
\(F_N(x)\) is the empirical CDF,
which is a step function (right-continuous in “ggplot2” by
default; geom_step(direction="hv",...)).
set.seed(177); sample_t <- rt(500, df=5)
ks.test(sample_t, "pnorm", mean=0, sd=1) #compare a t sample to N(0,1)
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: sample_t
## D = 0.049633, p-value = 0.1702
## alternative hypothesis: two-sided
# low statistical power?
set.seed(177); round(replicate(10, ks.test(rt(500, df=5), "pnorm", mean=0, sd=1)$p.value), 3)
## [1] 0.170 0.514 0.013 0.267 0.173 0.126 0.399 0.372 0.120 0.023
set.seed(277); sample_unif <- runif(500)
ks.test(sample_unif, "punif", min=0, max=1) #compare a uniform sample to U[0,1]
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: sample_unif
## D = 0.075118, p-value = 0.007086
## alternative hypothesis: two-sided
# high false alarm rate?
set.seed(277); round(replicate(10, ks.test(runif(500), "punif", min=0, max=1)$p.value), 3)
## [1] 0.007 0.777 0.316 0.324 0.577 0.825 0.629 0.713 0.936 0.677
Comparing Two Samples
Two-Sample KS Statistic
\[\begin{equation*} D_{N,M}:=\sup_x\lvert F_{1,N}(x)-F_{2,M}(x)\rvert \end{equation*}\]
set.seed(377); sample_norm <- rnorm(500)
ks.test(sample_t, sample_norm) #compare a t sample to a normal sample
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: sample_t and sample_norm
## D = 0.076, p-value = 0.1114
## alternative hypothesis: two-sided
set.seed(377); round(replicate(10, ks.test(rt(500, df=5), rnorm(500))$p.value), 3)
## [1] 0.560 0.257 0.129 0.150 0.257 0.172 0.024 0.370 0.770 0.095
set.seed(477); sample_rep <- rnorm(500)
ks.test(sample_rep, sample_norm) #compare two normal samples
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: sample_rep and sample_norm
## D = 0.05, p-value = 0.5596
## alternative hypothesis: two-sided
set.seed(477); round(replicate(10, ks.test(rnorm(500), rnorm(500))$p.value), 3)
## [1] 0.612 0.960 0.257 0.460 0.035 0.612 0.460 0.508 0.292 0.996
set.seed(577); sample_norm2 <- rnorm(500, mean=0.5)
ks.test(sample_norm2, sample_norm) #compare two samples from N(0.5,1) and N(0,1)
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: sample_norm2 and sample_norm
## D = 0.214, p-value = 2.273e-10
## alternative hypothesis: two-sided
set.seed(577); round(replicate(10, ks.test(rnorm(500, mean=0.5), rnorm(500))$p.value), 3)
## [1] 0 0 0 0 0 0 0 0 0 0
The Wasserstein 1-distance is an integral probability metric.
\[\begin{equation*} W_1(P,Q)=\sup_{f\in\mathcal{F}_L}\left\lvert\mathbb{E}_{X\sim P}[f(X)]-\mathbb{E}_{Y\sim Q}[f(Y)]\right\rvert, \end{equation*}\]
where \(\mathcal{F}_L\) is the class of 1-Lipschitz functions such that \(\lvert f(x_1)-f(x_2)\rvert\leqslant\lvert x_1-x_2\rvert\) for \(x_1,x_2\in\mathbb{R}\).
\(P\) and \(Q\) do not need to have the same support.
\(\widehat{W}_1(P,Q)=W_1(P_n,Q_n)\), where \(P_n\) and \(Q_n\) are empirical probability measures based on samples.
The Wasserstein 1-distance is also known as the earth mover’s distance. It represents the minimum “cost” required to move the mass in one distribution to match the other distribution, where the cost is the product of the amount of mass moved and the distance it is moved.
\[\begin{equation*} W_1(P,Q)=\inf_{\gamma\in\Gamma(P,Q)}\mathbb{E}_{(X,Y)^\top\sim\gamma}[d(X,Y)], \end{equation*}\]
where \(\Gamma(P,Q)\) denotes the set of all joint distributions \(\gamma\) whose marginals are \(P\) and \(Q\). In other words, \(\gamma\) is a coupling of \(P\) and \(Q\). \(d(X,Y)=\lvert X-Y\rvert\) in one dimension.
Example 2 (Continued):
The Wasserstein
1-Distance Between \(\text{Exp}(\lambda_1)\) and \(\text{Exp}(\lambda_2)\)
\(W_1(f_X(x;\lambda_1),\ f_X(x;\lambda_2))=\left\lvert\frac{1}{\lambda_1}-\frac{1}{\lambda_2}\right\rvert\)
Use the “transport” R package (v 0.15-4).
rate1 <- 0.2; rate2 <- 0.4
set.seed(277)
x <- rexp(nS, rate=rate1) #nS = 500
y <- rexp(nS, rate=rate2) #sample sizes need not be equal
transport::wasserstein1d(x, y)
## [1] 2.985239
abs(1/rate1 - 1/rate2) #the theoretical Wasserstein 1-distance value
## [1] 2.5
The Wasserstein 1-Distance Between \(\mathcal{N}(\mu_1,\sigma_1^2)\) and \(\mathcal{N}(\mu_2,\sigma_2^2)\).
\(W_1(\mathcal{N}(\mu_1,\sigma_1^2),\ \mathcal{N}(\mu_2,\sigma_2^2))=\lvert\mu_1-\mu_2\rvert+\lvert\sigma_1-\sigma_2\rvert\)
set.seed(277)
round(replicate(10, transport::wasserstein1d(rnorm(nS), rnorm(nS))), 3) # W1(two N(0,1)) = 0
## [1] 0.053 0.080 0.067 0.066 0.076 0.067 0.087 0.069 0.057 0.074
\(X_1,\dotsb,X_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\)
\(Y_1,\dotsb,Y_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\)
\(Z_1,\dotsb,Z_N\overset{\text{i.i.d.}}{\sim}\mathcal{Q}(0,1)\), a non-normal distribution
Read Section 7 first.
\(f_X(x\mid\mu,\ \sigma,\ \nu,\ \xi)=\frac{1}{\sqrt{\nu}\ \text{B}(1/2,\ \nu/2)}\left(1+\frac{z^2}{\nu}\right)^{-\frac{\nu+1}{2}}\cdot a\cdot\sqrt{\frac{\nu}{\nu-2}}\cdot\frac{2}{(\xi+1/\xi)\ \sigma}\)
\(z=y\ \xi^{-\text{sign}(y)}\sqrt{\frac{\nu}{\nu-2}}\)
\(y=\frac{x-\mu}{\sigma}a+b\)
\(a^2=(1-M^2)(\xi^2+1/\xi^2)+2M^2-1\), \(\quad a>0\)
\(b=(\xi-1/\xi)M\)
\(M=\frac{2\sqrt{\nu-2}}{(\nu-1)\ \text{B}(1/2,\ \nu/2)}\)
\(\mu\): location parameter
\(\sigma>0\): scale parameter
\(\nu>2\): shape parameter (degrees of freedom)
\(\xi>0\): skewness parameter. \(\xi>1\) right-skewed; \(0<\xi<1\) left-skewed; \(\xi=1\) symmetric
Same population mean 0 and variance 1.
NSim <- 500 #number of replicates
par(mfrow=c(1,2))
curve(fGarch::dsstd(x, mean=0, sd=1, nu=4, xi=20), -3, 3,
lty=2, lwd=2, col="red", xlab="x", ylab="Density", main="Probability Density Functions")
curve(fGarch::dsstd(x, mean=0, sd=1, nu=4, xi=1/20),
lty=3, lwd=2, col="blue", add=T)
curve(dnorm, lwd=2, add=T)
curve(fGarch::psstd(x, mean=0, sd=1, nu=4, xi=20), -3, 3,
lty=2, lwd=2, col="red", xlab="x", ylab="Probability", main="Cumulative Distribution Functions")
curve(fGarch::psstd(x, mean=0, sd=1, nu=4, xi=1/20),
lty=3, lwd=2, col="blue", add=T)
curve(pnorm, lwd=2, add=T)
legend("bottomright",c("Right-Skewed","Left-Skewed","Normal"),
col=c("red","blue","black"), lty=c(2,3,1), lwd=2, bty="n")
rSkew <- fGarch::rsstd(NSim, mean=0.5, sd=1, nu=4, xi=20)
lSkew <- 1-rSkew # axial symmetry (left-skewed)
par(mfrow=c(1,2))
hist(rSkew, breaks=6, prob=T, col="white",
yaxt="n", ylab="", xlab="Random Deviates", main="Right-Skewed")
hist(lSkew, breaks=6, prob=T, col="white",
yaxt="n", ylab="", xlab="Random Deviates", main="Mirroring")
BayesFactor::ttestBF(rSkew, mu=0.5)
## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 0.1568436 ±0.14%
##
## Against denominator:
## Null, mu = 0.5
## ---
## Bayes factor type: BFoneSample, JZS
BayesFactor::ttestBF(lSkew, mu=0.5)
## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 0.1568436 ±0.14%
##
## Against denominator:
## Null, mu = 0.5
## ---
## Bayes factor type: BFoneSample, JZS
| KL Divergence | Bayes Factor | P-Value | ||||
| Misspecified | Correct | Misspecified | Correct | |||
| ES=0 | N=10 | 0.116 | 0.052 | 0.101 | 0.058 | |
| N=20 | 0.031 | 0.045 | 0.037 | 0.049 | ||
| N=100 | 0.017 | 0.016 | 0.024 | 0.013 | ||
| N=1000 | 0.042 | 0.054 | 0.042 | 0.039 | ||
| ES=0.5 | N=10 | 0.063 | 0.034 | 0.114 | 0.023 | |
| N=20 | 0.132 | 0.168 | 0.012 | |||
| N=100 | 0.134 | 0.122 | 0.011 | |||
| N=1000 | 0.017 | 0.045 | 0.075 | 0.029 | ||
viz(10, NSim, 0) #function is defined in Section 7
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.116 nats,
## while the overall KL divergence under correct specification is 0.052 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.064 and p-value = 0.257 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.031 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.064 and p-value = 0.257 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.017 nats,
## while the overall KL divergence under correct specification is 0.016 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.042 and p-value = 0.77 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.042 nats,
## while the overall KL divergence under correct specification is 0.054 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.072 and p-value = 0.15 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.063 nats,
## while the overall KL divergence under correct specification is 0.034 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.1 and p-value = 0.013 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.132 nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.134 and p-value = 0 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.134 nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.146 and p-value = 0 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.017 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.126 and p-value = 0.001 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.101 nats,
## while the overall KL divergence under correct specification is 0.058 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.045 under model misspecification,
## while the Hellinger distance is 0.023 under correct specification.
##
## The two-sample KS tests show D = 0.064 and p-value = 0.257 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.037 nats,
## while the overall KL divergence under correct specification is 0.049 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.037 under model misspecification,
## while the Hellinger distance is 0.043 under correct specification.
##
## The two-sample KS tests show D = 0.064 and p-value = 0.257 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.024 nats,
## while the overall KL divergence under correct specification is 0.013 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.031 under model misspecification,
## while the Hellinger distance is 0.045 under correct specification.
##
## The two-sample KS tests show D = 0.042 and p-value = 0.77 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.042 nats,
## while the overall KL divergence under correct specification is 0.039 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.05 under model misspecification,
## while the Hellinger distance is 0.029 under correct specification.
##
## The two-sample KS tests show D = 0.072 and p-value = 0.15 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.114 nats,
## while the overall KL divergence under correct specification is 0.023 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.1 and p-value = 0.013 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.168 nats,
## while the overall KL divergence under correct specification is 0.012 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.134 and p-value = 0 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, plot=2, interact=F, xlim=c(0,.005))
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.122 nats,
## while the overall KL divergence under correct specification is 0.011 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.146 and p-value = 0 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, plot=2, interact=F, xlim=c(0,1E-43))
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.075 nats,
## while the overall KL divergence under correct specification is 0.029 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.126 and p-value = 0.001 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, plot=3)
viz(20, NSim, 0, plot=3)
viz(100, NSim, 0, plot=3)
viz(1000, NSim, 0, plot=3)
viz(10, NSim, 0.5, plot=3)
viz(20, NSim, 0.5, plot=3)
viz(100, NSim, 0.5, plot=3)
viz(1000, NSim, 0.5, plot=3)
\(X_1,\dotsb,X_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\)
\(Y_1,\dotsb,Y_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\)
\(Z_1,\dotsb,Z_N\overset{\text{i.i.d.}}{\sim}\mathcal{Q}(0,1)\), a non-normal distribution
Read Section 7 first.
Same population mean 0 and variance 1.
# mean = (a + b) / 2
# variance = (b - a)^2 / 12 ==>
# a = mean - sqrt(3 * variance)
# b = mean + sqrt(3 * variance)
curve(dunif(x, min=-sqrt(3), max=sqrt(3)), -3, 3,
lty=2, lwd=2, col="red", xlab="x", ylab="Density",
ylim=c(0, 0.42), main="Probability Density Functions")
curve(dnorm, lwd=2, add=T)
legend("topright",c("Violated","Normal"), col=c("red","black"), lty=c(2,1), lwd=2, bty="n")
| KL Divergence | Bayes Factor | P-Value | ||||
| Misspecified | Correct | Misspecified | Correct | |||
| ES=0 | N=10 | 0.064 | 0.052 | 0.055 | 0.058 | |
| N=20 | 0.034 | 0.045 | 0.033 | 0.049 | ||
| N=100 | 0.039 | 0.016 | 0.013 | 0.013 | ||
| N=1000 | 0.03 | 0.054 | 0.032 | 0.039 | ||
| ES=0.5 | N=10 | 0.04 | 0.034 | 0.047 | 0.023 | |
| N=20 | 0.003 | 0.012 | ||||
| N=100 | 0.021 | 0.025 | 0.011 | |||
| N=1000 | 0.011 | 0.045 | 0.016 | 0.029 | ||
viz(10, NSim, 0, dist="unif")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.064 nats,
## while the overall KL divergence under correct specification is 0.052 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.09 and p-value = 0.035 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, dist="unif")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.034 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.052 and p-value = 0.508 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, dist="unif")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.039 nats,
## while the overall KL divergence under correct specification is 0.016 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.04 and p-value = 0.819 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, dist="unif")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.03 nats,
## while the overall KL divergence under correct specification is 0.054 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.066 and p-value = 0.226 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, dist="unif")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.04 nats,
## while the overall KL divergence under correct specification is 0.034 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.1 and p-value = 0.013 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, dist="unif")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is NaN nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.106 and p-value = 0.007 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, dist="unif")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.021 nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.074 and p-value = 0.129 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, dist="unif")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.011 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.056 and p-value = 0.413 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, dist="unif", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.055 nats,
## while the overall KL divergence under correct specification is 0.058 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.062 under model misspecification,
## while the Hellinger distance is 0.023 under correct specification.
##
## The two-sample KS tests show D = 0.09 and p-value = 0.035 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, dist="unif", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.033 nats,
## while the overall KL divergence under correct specification is 0.049 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.046 under model misspecification,
## while the Hellinger distance is 0.043 under correct specification.
##
## The two-sample KS tests show D = 0.052 and p-value = 0.508 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, dist="unif", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.013 nats,
## while the overall KL divergence under correct specification is 0.013 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.021 under model misspecification,
## while the Hellinger distance is 0.045 under correct specification.
##
## The two-sample KS tests show D = 0.04 and p-value = 0.819 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, dist="unif", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.032 nats,
## while the overall KL divergence under correct specification is 0.039 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.053 under model misspecification,
## while the Hellinger distance is 0.029 under correct specification.
##
## The two-sample KS tests show D = 0.066 and p-value = 0.226 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, dist="unif", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.047 nats,
## while the overall KL divergence under correct specification is 0.023 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.1 and p-value = 0.013 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, dist="unif", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.003 nats,
## while the overall KL divergence under correct specification is 0.012 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.106 and p-value = 0.007 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, dist="unif", plot=2, interact=F, xlim=c(0,.005))
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.025 nats,
## while the overall KL divergence under correct specification is 0.011 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.074 and p-value = 0.129 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, dist="unif", plot=2, interact=F, xlim=c(0,1E-43))
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.016 nats,
## while the overall KL divergence under correct specification is 0.029 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.056 and p-value = 0.413 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, dist="unif", plot=3)
viz(20, NSim, 0, dist="unif", plot=3)
viz(100, NSim, 0, dist="unif", plot=3)
viz(1000, NSim, 0, dist="unif", plot=3)
viz(10, NSim, 0.5, dist="unif", plot=3)
viz(20, NSim, 0.5, dist="unif", plot=3)
viz(100, NSim, 0.5, dist="unif", plot=3)
viz(1000, NSim, 0.5, dist="unif", plot=3)
\(X_1,\dotsb,X_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\)
\(Y_1,\dotsb,Y_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\)
\(Z_1,\dotsb,Z_N\overset{\text{i.i.d.}}{\sim}\mathcal{Q}(0,1)\), a non-normal distribution
Read Section 7 first.
Same population mean 0 and variance 1.
sym_tri_pdf <- function(x, a, b) {
#' define the symmetric triangular distribution PDF
c <- (a + b) / 2
ifelse(x < a | x > b, 0, ((b - c) - abs(c - x)) / ((b-c)^2))
}
curve(sym_tri_pdf(x, -sqrt(6), sqrt(6)), -3, 3, lty=2, lwd=2, col="red",
xlab="x", ylab="Density", main="Probability Density Functions")
curve(dnorm, lwd=2, add=T)
legend("topright",c("Violated","Normal"), col=c("red","black"), lty=c(2,1), lwd=2, bty="n")
| KL Divergence | Bayes Factor | P-Value | ||||
| Misspecified | Correct | Misspecified | Correct | |||
| ES=0 | N=10 | 0.058 | 0.052 | 0.047 | 0.058 | |
| N=20 | 0.036 | 0.045 | 0.037 | 0.049 | ||
| N=100 | 0.028 | 0.016 | 0.026 | 0.013 | ||
| N=1000 | 0.032 | 0.054 | 0.028 | 0.039 | ||
| ES=0.5 | N=10 | 0.051 | 0.034 | 0.05 | 0.023 | |
| N=20 | 0.012 | |||||
| N=100 | 0.007 | 0.01 | 0.011 | |||
| N=1000 | 0.013 | 0.045 | 0.027 | 0.029 | ||
viz(10, NSim, 0, dist="tri")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.058 nats,
## while the overall KL divergence under correct specification is 0.052 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.08 and p-value = 0.082 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, dist="tri")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.036 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.048 and p-value = 0.612 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, dist="tri")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.028 nats,
## while the overall KL divergence under correct specification is 0.016 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.044 and p-value = 0.718 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, dist="tri")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.032 nats,
## while the overall KL divergence under correct specification is 0.054 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.072 and p-value = 0.15 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, dist="tri")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.051 nats,
## while the overall KL divergence under correct specification is 0.034 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.082 and p-value = 0.069 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, dist="tri")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is NaN nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.082 and p-value = 0.069 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, dist="tri")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.007 nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.074 and p-value = 0.129 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, dist="tri")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.013 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.058 and p-value = 0.37 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, dist="tri", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.047 nats,
## while the overall KL divergence under correct specification is 0.058 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.057 under model misspecification,
## while the Hellinger distance is 0.023 under correct specification.
##
## The two-sample KS tests show D = 0.08 and p-value = 0.082 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, dist="tri", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.037 nats,
## while the overall KL divergence under correct specification is 0.049 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.036 under model misspecification,
## while the Hellinger distance is 0.043 under correct specification.
##
## The two-sample KS tests show D = 0.048 and p-value = 0.612 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, dist="tri", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.026 nats,
## while the overall KL divergence under correct specification is 0.013 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.021 under model misspecification,
## while the Hellinger distance is 0.045 under correct specification.
##
## The two-sample KS tests show D = 0.044 and p-value = 0.718 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, dist="tri", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.028 nats,
## while the overall KL divergence under correct specification is 0.039 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.049 under model misspecification,
## while the Hellinger distance is 0.029 under correct specification.
##
## The two-sample KS tests show D = 0.072 and p-value = 0.15 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, dist="tri", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.05 nats,
## while the overall KL divergence under correct specification is 0.023 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.082 and p-value = 0.069 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, dist="tri", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is NaN nats,
## while the overall KL divergence under correct specification is 0.012 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.082 and p-value = 0.069 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, dist="tri", plot=2, interact=F, xlim=c(0,.005))
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.01 nats,
## while the overall KL divergence under correct specification is 0.011 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.074 and p-value = 0.129 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, dist="tri", plot=2, interact=F, xlim=c(0,1E-43))
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.027 nats,
## while the overall KL divergence under correct specification is 0.029 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.058 and p-value = 0.37 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, dist="tri", plot=3)
viz(20, NSim, 0, dist="tri", plot=3)
viz(100, NSim, 0, dist="tri", plot=3)
viz(1000, NSim, 0, dist="tri", plot=3)
viz(10, NSim, 0.5, dist="tri", plot=3)
viz(20, NSim, 0.5, dist="tri", plot=3)
viz(100, NSim, 0.5, dist="tri", plot=3)
viz(1000, NSim, 0.5, dist="tri", plot=3)
\(X_1,\dotsb,X_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\)
\(Y_1,\dotsb,Y_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\)
\(Z_1,\dotsb,Z_N\overset{\text{i.i.d.}}{\sim}\mathcal{Q}(0,1)\), a non-normal distribution
Read Section 7 first.
\(f(x;\mu_1,\mu_2,\sigma_1^2,\sigma_2^2,w)=(1-w)\cdot\mathcal{N}(x;\mu_1,\sigma_1^2)+w\cdot\mathcal{N}(x;\mu_2,\sigma_2^2)\)
Same population mean 0 and variance 1. \(\mu_0=0\) and \(\sigma_0^2=1\).
The mean is \(\quad(1-w)\mu_1+w\mu_2:=\mu_0\).
The variance is \(\quad(1-w)\sigma_1^2+w\sigma_2^2+w(1-w)(\mu_1-\mu_2)^2:=\sigma_0^2\).
Let \(\quad\mu_1=\mu_0-\frac{\mu}{1-w}\), \(\quad\mu_2=\mu_0+\frac{\mu}{w}\), \(\quad\)and \(\sigma_1=\sigma_2=\sigma\).
Then, obtain \(\quad\sigma^2+\frac{\mu^2}{w(1-w)}=\sigma_0^2\).
t <- seq(0, 2*pi, length.out=100)
horiz <- cos(t); vert <- 0.5*sin(t)
plot(horiz, vert, type="l", asp=1, col="blue", xaxt="n", yaxt="n",
xlab=expression(italic(σ)), ylab=expression(italic(μ)), main="A Standard Ellipse")
abline(h=0, v=0, col="gray", lty=2)
text(1, 0, expression(italic(σ)[0]), cex=1.5, col="red")
text(-1, 0, expression(-italic(σ)[0]), cex=1.5, col="red")
text(0, 0.5, expression(italic(σ)[0]*sqrt(italic(w)*(italic(1-w)))), cex=1.5, col="red")
text(0, -0.5, expression(-italic(σ)[0]*sqrt(italic(w)*(italic(1-w)))), cex=1.5, col="red")
text(0, 0, expression(italic(σ)^2 + frac(italic(μ)^2, italic(w)*(italic(1-w))) == italic(σ)[0]^2), col="blue")
m1 <- 0.9; sd1 <- sqrt(1-m1^2) #sd1^2 + m1^2 = 1, given w = .5 (symmetry)
# LaplacesDemon::dnormm(x, p=rep(.5,2), mu=c(-m1,m1), sigma=rep(sd1,2)) #same
curve(EnvStats::dnormMix(x, mean1=-m1, mean2=m1, sd1=sd1, sd2=sd1), -3, 3,
lwd=2, col="red", xlab="x", ylab="Density", main="Probability Density Functions")
curve(dnorm, lwd=2, add=T)
legend("topright",c("Violated","Normal"), col=c("red","black"), lty=c(2,1), lwd=2, bty="n")
| KS Statistic | Bayes Factor | P-Value | ||||
| Misspecified | Correct | Misspecified | Correct | |||
| ES=0 | N=10 | 0.058 | 0.036 | 0.058 | 0.036 | |
| N=20 | 0.042 | 0.046 | 0.042 | 0.046 | ||
| N=100 | 0.038 | 0.056 | 0.038 | 0.056 | ||
| N=1000 | 0.056 | 0.046 | 0.056 | 0.046 | ||
| ES=0.5 | N=10 | 0.092 | 0.064 | 0.092 | 0.064 | |
| N=20 | 0.082 | 0.068 | 0.082 | 0.068 | ||
| N=100 | 0.032 | 0.056 | 0.032 | 0.056 | ||
| N=1000 | 0.04 | 0.036 | 0.04 | 0.036 | ||
viz(10, NSim, 0, dist="bim")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.069 nats,
## while the overall KL divergence under correct specification is 0.052 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.058 and p-value = 0.37 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, dist="bim")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.065 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.042 and p-value = 0.77 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, dist="bim")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.029 nats,
## while the overall KL divergence under correct specification is 0.016 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.038 and p-value = 0.863 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, dist="bim")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.043 nats,
## while the overall KL divergence under correct specification is 0.054 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.056 and p-value = 0.413 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, dist="bim")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.038 nats,
## while the overall KL divergence under correct specification is 0.034 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.092 and p-value = 0.029 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, dist="bim")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.001 nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.082 and p-value = 0.069 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, dist="bim")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is NaN nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.032 and p-value = 0.96 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, dist="bim")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.039 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.04 and p-value = 0.819 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, dist="bim", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.062 nats,
## while the overall KL divergence under correct specification is 0.058 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.046 under model misspecification,
## while the Hellinger distance is 0.023 under correct specification.
##
## The two-sample KS tests show D = 0.058 and p-value = 0.37 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, dist="bim", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.064 nats,
## while the overall KL divergence under correct specification is 0.049 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.03 under model misspecification,
## while the Hellinger distance is 0.043 under correct specification.
##
## The two-sample KS tests show D = 0.042 and p-value = 0.77 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, dist="bim", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.021 nats,
## while the overall KL divergence under correct specification is 0.013 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.023 under model misspecification,
## while the Hellinger distance is 0.045 under correct specification.
##
## The two-sample KS tests show D = 0.038 and p-value = 0.863 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, dist="bim", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.044 nats,
## while the overall KL divergence under correct specification is 0.039 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.041 under model misspecification,
## while the Hellinger distance is 0.029 under correct specification.
##
## The two-sample KS tests show D = 0.056 and p-value = 0.413 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, dist="bim", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.039 nats,
## while the overall KL divergence under correct specification is 0.023 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.092 and p-value = 0.029 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, dist="bim", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.002 nats,
## while the overall KL divergence under correct specification is 0.012 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.082 and p-value = 0.069 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, dist="bim", plot=2, interact=F, xlim=c(0,.005))
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.004 nats,
## while the overall KL divergence under correct specification is 0.011 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.032 and p-value = 0.96 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, dist="bim", plot=2, interact=F, xlim=c(0,1E-43))
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.04 nats,
## while the overall KL divergence under correct specification is 0.029 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.04 and p-value = 0.819 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, dist="bim", plot=3)
viz(20, NSim, 0, dist="bim", plot=3)
viz(100, NSim, 0, dist="bim", plot=3)
viz(1000, NSim, 0, dist="bim", plot=3)
viz(10, NSim, 0.5, dist="bim", plot=3)
viz(20, NSim, 0.5, dist="bim", plot=3)
viz(100, NSim, 0.5, dist="bim", plot=3)
viz(1000, NSim, 0.5, dist="bim", plot=3)
p <- 1/3; m1 <- -0.45/(1-p); m2 <- 0.45/p; sd1 <- sqrt(1+m1*m2)
curve(EnvStats::dnormMix(x, mean1=m1, mean2=m2, sd1=sd1, sd2=sd1, p.mix=p), -3, 3,
lty=2, lwd=2, col="red", xlab="x", ylab="Density", main="Probability Density Functions")
curve(dnorm, lwd=2, add=T)
legend("topright",c("Violated","Normal"), col=c("red","black"), lty=c(2,1), lwd=2, bty="n")
| KS Statistic | Bayes Factor | P-Value | ||||
| Misspecified | Correct | Misspecified | Correct | |||
| ES=0 | N=10 | 0.040 | 0.036 | 0.040 | 0.036 | |
| N=20 | 0.066 | 0.046 | 0.066 | 0.046 | ||
| N=100 | 0.032 | 0.056 | 0.032 | 0.056 | ||
| N=1000 | 0.040 | 0.046 | 0.040 | 0.046 | ||
| ES=0.5 | N=10 | 0.116 | 0.064 | 0.116 | 0.064 | |
| N=20 | 0.110 | 0.068 | 0.110 | 0.068 | ||
| N=100 | 0.104 | 0.056 | 0.104 | 0.056 | ||
| N=1000 | 0.082 | 0.036 | 0.082 | 0.036 | ||
viz(10, NSim, 0, dist="bim", p.mix=1/3)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.087 nats,
## while the overall KL divergence under correct specification is 0.052 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.04 and p-value = 0.819 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, dist="bim", p.mix=1/3)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.025 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.066 and p-value = 0.226 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, dist="bim", p.mix=1/3)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.055 nats,
## while the overall KL divergence under correct specification is 0.016 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.032 and p-value = 0.96 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, dist="bim", p.mix=1/3)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.018 nats,
## while the overall KL divergence under correct specification is 0.054 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.04 and p-value = 0.819 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, dist="bim", p.mix=1/3)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.03 nats,
## while the overall KL divergence under correct specification is 0.034 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.116 and p-value = 0.002 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, dist="bim", p.mix=1/3)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.035 nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.11 and p-value = 0.005 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, dist="bim", p.mix=1/3)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.014 nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.104 and p-value = 0.009 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, dist="bim", p.mix=1/3)
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.091 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.082 and p-value = 0.069 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, dist="bim", p.mix=1/3, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.063 nats,
## while the overall KL divergence under correct specification is 0.058 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.037 under model misspecification,
## while the Hellinger distance is 0.023 under correct specification.
##
## The two-sample KS tests show D = 0.04 and p-value = 0.819 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, dist="bim", p.mix=1/3, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.032 nats,
## while the overall KL divergence under correct specification is 0.049 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.034 under model misspecification,
## while the Hellinger distance is 0.043 under correct specification.
##
## The two-sample KS tests show D = 0.066 and p-value = 0.226 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, dist="bim", p.mix=1/3, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.099 nats,
## while the overall KL divergence under correct specification is 0.013 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.022 under model misspecification,
## while the Hellinger distance is 0.045 under correct specification.
##
## The two-sample KS tests show D = 0.032 and p-value = 0.96 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, dist="bim", p.mix=1/3, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.01 nats,
## while the overall KL divergence under correct specification is 0.039 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.033 under model misspecification,
## while the Hellinger distance is 0.029 under correct specification.
##
## The two-sample KS tests show D = 0.04 and p-value = 0.819 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, dist="bim", p.mix=1/3, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.071 nats,
## while the overall KL divergence under correct specification is 0.023 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.116 and p-value = 0.002 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, dist="bim", p.mix=1/3, plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.035 nats,
## while the overall KL divergence under correct specification is 0.012 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.11 and p-value = 0.005 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, dist="bim", p.mix=1/3, plot=2, interact=F, xlim=c(0,.005))
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.031 nats,
## while the overall KL divergence under correct specification is 0.011 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.104 and p-value = 0.009 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, dist="bim", p.mix=1/3, plot=2, interact=F, xlim=c(0,1E-43))
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.144 nats,
## while the overall KL divergence under correct specification is 0.029 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.082 and p-value = 0.069 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, dist="bim", p.mix=1/3, plot=3)
viz(20, NSim, 0, dist="bim", p.mix=1/3, plot=3)
viz(100, NSim, 0, dist="bim", p.mix=1/3, plot=3)
viz(1000, NSim, 0, dist="bim", p.mix=1/3, plot=3)
viz(10, NSim, 0.5, dist="bim", p.mix=1/3, plot=3)
viz(20, NSim, 0.5, dist="bim", p.mix=1/3, plot=3)
viz(100, NSim, 0.5, dist="bim", p.mix=1/3, plot=3)
viz(1000, NSim, 0.5, dist="bim", p.mix=1/3, plot=3)
\(X_1,\dotsb,X_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\)
\(Y_1,\dotsb,Y_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\)
\(Z_1,\dotsb,Z_N\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,100^2)\)
Read Section 7 first.
| KL Divergence | Bayes Factor | P-Value | ||||
| SD=100 | SD=1 | SD=100 | SD=1 | |||
| ES=0 | N=10 | 0.06 | 0.052 | 0.058 | 0.058 | |
| N=20 | 0.037 | 0.045 | 0.036 | 0.049 | ||
| N=100 | 0.022 | 0.016 | 0.01 | 0.013 | ||
| N=1000 | 0.043 | 0.054 | 0.051 | 0.039 | ||
| ES=0.5 | N=10 | 0.417 | 0.034 | 0.575 | 0.023 | |
| N=20 | 1.226 | 1.73 | 0.012 | |||
| N=100 | 2.658 | 10.933 | 0.011 | |||
| N=1000 | 1.591 | 0.045 | 111.771 | 0.029 | ||
viz(10, NSim, 0, dist="100sd")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.06 nats,
## while the overall KL divergence under correct specification is 0.052 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.058 and p-value = 0.37 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, dist="100sd")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.037 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.08 and p-value = 0.082 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, dist="100sd")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.022 nats,
## while the overall KL divergence under correct specification is 0.016 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.048 and p-value = 0.612 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, dist="100sd")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.043 nats,
## while the overall KL divergence under correct specification is 0.054 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.034 and p-value = 0.935 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, dist="100sd")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 0.417 nats,
## while the overall KL divergence under correct specification is 0.034 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.432 and p-value = 0 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, dist="100sd")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 1.226 nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.662 and p-value = 0 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, dist="100sd")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 2.658 nats,
## while the overall KL divergence under correct specification is NaN nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.978 and p-value = 0 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, dist="100sd")
## The overall KL divergence between the distributions of the Bayes factors
## under model misspecification is 1.591 nats,
## while the overall KL divergence under correct specification is 0.045 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 1 and p-value = 0 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, dist="100sd", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.058 nats,
## while the overall KL divergence under correct specification is 0.058 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.04 under model misspecification,
## while the Hellinger distance is 0.023 under correct specification.
##
## The two-sample KS tests show D = 0.058 and p-value = 0.37 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(20, NSim, 0, dist="100sd", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.036 nats,
## while the overall KL divergence under correct specification is 0.049 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.058 under model misspecification,
## while the Hellinger distance is 0.043 under correct specification.
##
## The two-sample KS tests show D = 0.08 and p-value = 0.082 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(100, NSim, 0, dist="100sd", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.01 nats,
## while the overall KL divergence under correct specification is 0.013 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.036 under model misspecification,
## while the Hellinger distance is 0.045 under correct specification.
##
## The two-sample KS tests show D = 0.048 and p-value = 0.612 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0, dist="100sd", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.051 nats,
## while the overall KL divergence under correct specification is 0.039 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The Hellinger distance between the distributions of p-values
## is 0.027 under model misspecification,
## while the Hellinger distance is 0.029 under correct specification.
##
## The two-sample KS tests show D = 0.034 and p-value = 0.935 under model misspecification,
## and D = 0.046 and p-value = 0.665 under correct specification.
viz(10, NSim, 0.5, dist="100sd", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 0.575 nats,
## while the overall KL divergence under correct specification is 0.023 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.432 and p-value = 0 under model misspecification,
## and D = 0.064 and p-value = 0.257 under correct specification.
viz(20, NSim, 0.5, dist="100sd", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 1.73 nats,
## while the overall KL divergence under correct specification is 0.012 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.662 and p-value = 0 under model misspecification,
## and D = 0.068 and p-value = 0.198 under correct specification.
viz(100, NSim, 0.5, dist="100sd", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 10.933 nats,
## while the overall KL divergence under correct specification is 0.011 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 0.978 and p-value = 0 under model misspecification,
## and D = 0.056 and p-value = 0.413 under correct specification.
viz(1000, NSim, 0.5, dist="100sd", plot=2)
## The overall KL divergence between the distributions of p-values
## under model misspecification is 111.771 nats,
## while the overall KL divergence under correct specification is 0.029 nats,
## given 10 nearest neighbors and 500 replicates.
##
## The two-sample KS tests show D = 1 and p-value = 0 under model misspecification,
## and D = 0.036 and p-value = 0.902 under correct specification.
viz(10, NSim, 0, dist="100sd", plot=3)
viz(20, NSim, 0, dist="100sd", plot=3)
viz(100, NSim, 0, dist="100sd", plot=3)
viz(1000, NSim, 0, dist="100sd", plot=3)
viz(10, NSim, 0.5, dist="100sd", plot=3)
viz(20, NSim, 0.5, dist="100sd", plot=3)
viz(100, NSim, 0.5, dist="100sd", plot=3)
viz(1000, NSim, 0.5, dist="100sd", plot=3)
The “BayesFactor” R package and JASP
\[\begin{equation} \tag{1} y_i\overset{\text{i.i.d.}}{\sim}\mathcal{N}(\sigma_\epsilon \delta,\ \sigma_\epsilon^2)\quad\text{for }i=1,\dotsb,N,\text{ where }\delta=\mu\ /\ \sigma_\epsilon \end{equation}\]
\[\begin{equation} \tag{2} \pi(\sigma_\epsilon^2)\propto\sigma_\epsilon^{-2} \end{equation}\]
\[\begin{equation} \tag{3} \delta\sim\text{Cauchy}(0,\ h)=\frac{h}{(\delta^2+h^2)\pi} \end{equation}\]
The Jeffreys-Zellner-Siow (JZS) Bayes factor for the Bayesian \(t\)-test, \(\mathcal{H}_0:\ \delta=0\) versus \(\mathcal{H}_1:\ \delta\neq 0\), is computed by integrating across one dimension in Eq. 4 (Ly et al., 2016, p. 24; Rouder et al., 2012, p. 360; Rouder et al., 2009, p. 237; Wei, Nathoo, & Masson, 2023, p. 1762; see also the balanced one-way between-subjects ANOVA in Morey et al., 2011, p. 374), using Gaussian quadrature.
\[\begin{equation} \tag{4} \textit{JZS-BF}_{10}=(2\pi)^{-\frac{1}{2}}h\left(1+\frac{t^2}{\nu}\right)^{\frac{\nu+1}{2}}\int_0^\infty(1+Ng)^{-\frac{1}{2}}\left(1+\frac{t^2}{(1+Ng)\nu}\right)^{-\frac{\nu+1}{2}}g^{-\frac{3}{2}}e^{-\frac{h^2}{2g}}\text{d}g, \end{equation}\]
where the default scale \(h=\sqrt{2}/2\), the degrees of freedom \(\nu=N-1\), \(\quad t=\bar{y}\sqrt{N}/s_y\), \(\quad\bar{y}\) is the sample mean, and \(s_y\) is the sample standard deviation in a one-sample \(t\)-test.
Unlike the anovaBF function, the ttestBF function of the “BayesFactor” R package (v 0.9.12-4.7) yields negligible Monte Carlo errors and does not require specifying the number of Markov chain Monte Carlo iterations.
JZSBF10 <- function(h=sqrt(2)/2, t, N) {
#' Input -
#' h: the scale parameter of the Cauchy prior
#' t: the observed t test statistic
#' N: the sample size
#' Output - the JZS Bayes factor for a one-sample t-test
nu <- N - 1
coef <- (2*pi)^(-0.5) * h * (1+t^2/nu)^((nu+1)/2)
integrand <- function(g) {
(1+N*g)^(-0.5) * (1+t^2/((1+N*g)*nu))^(-(nu+1)/2) * g^(-1.5) * exp(-h^2/(2*g))
}
coef * integrate(integrand, lower=0, upper=Inf)$value
}
N <- 20 #sample size
t_obs <- seq(0,5,by=0.2) #observed t test statistics
NSim <- length(t_obs)
pVal <- BF10 <- numeric(NSim)
for (i in 1:NSim) {
pVal[i] <- pt(t_obs[i], df=N-1, lower.tail=F)*2
BF10[i] <- JZSBF10(t=t_obs[i], N=N)
}
par(mfrow=c(1,2))
plot(pVal, 1/BF10,
main="The Theoretical Relationship",
sub=bquote(italic(N) == .(N)),
xlab=expression(italic("P")~"-Value"),
ylab=expression(italic("BF")["01"])) #closely match the empirical results
abline(v=.05, col="red")
abline(h=c(1/3,3), lty=2)
plot(t_obs, BF10,
main="The Theoretical Relationship",
sub=bquote(italic(N) == .(N)),
xlab=expression("|"~italic("t")~"|-Statistic"),
ylab=expression(italic("BF")[10]))
\[ \begin{align*} &\underset{N\to\infty}{\mathrm{plim}}\ \!\textit{JZS-BF}_{10}\left(t(N),\ N\right)= \begin{cases} 0, & \text{if $\delta=0$} \\ \infty, & \text{if $\delta\neq0$} \end{cases} \\ \\ &\lim_{t\to\infty}\textit{JZS-BF}_{10}\left(t(N),\ N\right)=\infty,\quad\forall N \end{align*} \]
The resulting \(\textit{JZS-BF}_{10}\) avoids critical issues:
Bartlett’s paradox: the Bayes factor approaches zero as the prior variance increases.
Information paradox: the Bayes factor tends to be bounded, given overwhelming information.
Approximate Bayes Factor
The test statistic \(W\) in the Wald test is \(W=\left(\frac{\hat{\mu}-\mu_0}{\text{se}(\hat{\mu})}\right)^2\sim\chi_1^2\) under \(\mathcal{H}_0\), where \(\hat{\mu}\) is the maximum likelihood estimate, and \(\text{se}(\hat{\mu})\) is the standard error. Therefore, \(p\)-value.
pchisq(W, 1, lower.tail=F) #W and p-value are one to one
qchisq(p, 1, lower.tail=F) #inverse CDF of chisq(1): 1-p → W → JAB
In a one-sample \(t\)-test, \(\mu_0=0\), \(\quad\hat{\mu}=\bar{y}\), \(\quad\text{se}(\hat{\mu})=s_y/\sqrt{N}\), \(\quad\) and \(W=t^2\).
Wagenmakers (2022; Eq. 6) reviewed the Jeffreys-style approximate Bayes factor (JAB) and proposed a piecewise approximation from \(p\)-value and sample size in Eq. 7.
\[\begin{equation} \tag{5} \textit{BF}_{01}=\frac{p(\boldsymbol{y}\mid\mathcal{H}_0)}{p(\boldsymbol{y}\mid\mathcal{H}_1)}=\frac{p(\boldsymbol{y}\mid\mu_0)}{\int p(\boldsymbol{y}\mid\mu)\cdot g(\mu)\text{d}\mu} \end{equation}\]
Using the normal unit-information prior for \(\mu\), \[\begin{equation} \tag{6} \textit{JAB}_{01}=\sqrt{N}\exp\left\{-\frac{1}{2}W\right\} \end{equation}\]
\[ \textit{JAB}_{01}\approx \begin{cases} \tag{7} \text{$\ p^{\frac{1}{4}}\sqrt{N}$,} & \text{$p>0.5$} \\ \text{$\ \sqrt{pN}$,} & \text{$0.1<p\leqslant0.5$} \\ \text{$\ 3p\sqrt{N}$,} & \text{$p\leqslant0.1$} \end{cases} \]
Frequentist 101:
Why are p-values
uniformly distributed under the null hypothesis?
Left-sided \(p\)-value \(:=\mathbb{P}(T\leqslant t\mid\mathcal{H}_0)=F_T(t)\).
Given the PDF of the test statistic \(T\overset{\mathcal{H}_0}{\sim}f_T(t)\) and its CDF \(F_T(t)\),
the CDF of the \(p\)-value under \(\mathcal{H}_0\) is
\[\begin{align*} F_P(p):=&\mathbb{P}\left(P\leqslant p\right) \\ =&\mathbb{P}\left(F_T(T)\leqslant p\right) \\ =&\mathbb{P}\left(F_T^{-1}(F_T(T))\leqslant F_T^{-1}(p)\right) \\ =&\mathbb{P}\left(T\leqslant F_T^{-1}(p)\right) \\ =&F_T(F_T^{-1}(p)) \\ =&p \end{align*}\]
indicating that \(P\overset{\mathcal{H}_0}{\sim}U[0,1]\).
In each simulation run, we generate ① N samples from a
model following a normal distribution, ② other N samples
from the same normal model, and ③ N samples from a model
that does not follow a normal distribution — such as skewed-\(t\), uniform, triangular, or bimodal —
while maintaining the same population mean and variance. The parameters
that vary are the sample size N and the effect size
ES.
For each data set, we calculate the Bayes factor and \(p\)-value of the one-sample \(t\)-test. We replicate the process
NSim=500 times. Consequently, we get NSim=500
pairs of Bayes factors and \(p\)-values
for each model scenario, allowing us to create boxplots and empirical
cumulative distribution functions (ECDFs).
We aim to compare the divergence between scenarios ③ and ①, using ② and ① as references.
When \(N\) is sufficiently large,
| Distributions | Skewed t | Uniform | Triangular | Bimodal | Asymmetric | 100 SD | |
|---|---|---|---|---|---|---|---|
|
Bayes Factor |
ES=0 | = | = | = | = | = | = |
| ES=0.5 | Diverge | = | = | = | Diverge | Diverge | |
| P-Value | ES=0 | = | = | = | = | = | = |
| ES=0.5 | Diverge | = | = | = | Diverge | Diverge | |
Model misspecification occurs when the data are not normally distributed, particularly for small sample sizes.
The normality assumption of the one-sample \(t\)-test is less critical when the sample size is large due to the Central Limit Theorem, which states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the distribution of the population.
viz <- function(N, NSim, ES, SD=1, nu=4, sk=20, dist="skew", k=10, p.mix=.5, plot=1, interact=T, ...) {
#' Input -
#' N: number of subjects
#' NSim: number of replicates
#' ES: effect size
#' SD: error standard deviation
#' nu: shape parameter (degrees of freedom) of the skewed t
#' sk: skewness parameter of the skewed t
#' dist: distribution that violates the normality,
#' but has the same population mean and variance
#' "skew" (default), a skewed t
#' "unif", a continuous uniform
#' "tri", a symmetric triangular
#' "bim", a bimodal (symmetric when p.mix=0.5)
#' "100sd", still a normal but 100 times the error standard deviation
#' k: maximum number of nearest neighbors used in the KL divergence
#' p.mix: weight of the second modal
#' plot: visualizations
#' 1 (default), distributions of the Bayes factors
#' 2, distributions of p-values
#' 3, scatter plot of the p-values and the Bayes factors
#' interact: TRUE (default), interactive; FALSE, base ECDF
#' Dependencies - BayesFactor, EnvStats, fGarch, FNN, ggplot2, plotly, statip
#' Output - visualizations
computeBF <- function(data) {
#' Input -
#' data: one sample
#' Output - the Bayes factor
set.seed(277)
bf <- BayesFactor::ttestBF(data)
if (BayesFactor::extractBF(bf)$error < 0.01) { #negligible Monte Carlo errors
BayesFactor::extractBF(bf)$bf
} else {return(NA)}
}
BF <- BF_rep <- BF_vio <- pVal <- pVal_rep <- pVal_vio <- numeric(NSim)
for (i in 1:NSim) {
set.seed(i)
y0 <- rnorm(N, ES, SD)
pVal[i] <- t.test(y0)$p.value
BF[i] <- computeBF(y0)
set.seed(i-NSim)
y0_rep <- rnorm(N, ES, SD)
pVal_rep[i] <- t.test(y0_rep)$p.value
BF_rep[i] <- computeBF(y0_rep)
set.seed(i+NSim)
if (dist=="skew") {
y1 <- fGarch::rsstd(N, mean=ES, sd=SD, nu=nu, xi=sk)
label <- "Skewed"
} else if (dist=="unif") {
# mean = (a + b) / 2
# variance = (b - a)^2 / 12
y1 <- runif(N, min=ES-sqrt(3)*SD, max=ES+sqrt(3)*SD)
label <- "Uniform"
} else if (dist=="tri") {
# mean = (a + b) / 2
# variance = (b - a)^2 / 24
a <- ES - sqrt(6) * SD
b <- ES + sqrt(6) * SD
u <- runif(N) #generate from U(0,1)
y1 <- ifelse(u < 0.5,
a+sqrt((b-a)^2*u/2),
b-sqrt((b-a)^2*(1-u)/2)) #inverse transformation method
label <- "Triangular"
} else if (dist=="bim") {
sd_eq <- sqrt(SD^2-0.45^2/(p.mix*(1-p.mix)))
y1 <- EnvStats::rnormMix(N, mean1=ES-0.45/(1-p.mix), mean2=ES+0.45/p.mix,
sd1=sd_eq, sd2=sd_eq, p.mix=p.mix)
label <- ifelse(p.mix==0.5, "Bimodal", "Bimodal (asymm)")
} else if (dist=="100sd") {
y1 <- rnorm(N, ES, SD*100)
label <- "Normal (100×SD)"
} else {stop("Bad Input!")}
pVal_vio[i] <- t.test(y1)$p.value
BF_vio[i] <- computeBF(y1)
}
df <- data.frame("Model"=factor(rep(c("Normal","Normal (rep)",label),each=NSim),
levels=c("Normal","Normal (rep)",label)),
"pVal"=c(pVal, pVal_rep, pVal_vio))
if (plot %in% c(1,3)) {
if (ES==0) { #null
df$BF <- 1/c(BF, BF_rep, BF_vio) #BF01
ylab <- expression(italic("BF")["01"])
yline <- c(1/3,3); ytext <- c("1/3","3")
} else {
df$BF <- log(c(BF, BF_rep, BF_vio)) #log(BF10)
ylab <- expression("ln("~italic("BF")[10]~")")
yline <- c(-log(3),log(3)); ytext <- c("-ln3","ln3")
}
}
if (plot==1) { #distributions of the Bayes factors
KL_div_ref <- FNN::KL.divergence(BF, BF_rep, k=k)
KL_div <- FNN::KL.divergence(BF, BF_vio, k=k)
KL_text <- "the Bayes factors"
KS_ref <- stats::ks.test(BF, BF_rep)
KS <- stats::ks.test(BF, BF_vio)
# HD_ref <- statip::hellinger(BF, BF_rep, 0, Inf)
# HD <- statip::hellinger(BF, BF_vio, 0, Inf) #the integral is probably divergent!
if (ES==0) { #null
df$err1 <- df$BF < 1/3 #type I errors
ypos <- 0.1
} else {
df$pwr <- df$BF > log(3) #statistical power
ypos <- 2.1
}
freq <- table(df[-c(2,3)])[,"TRUE"] / NSim
sample_norm <- log(BF)
sample_rep <- log(BF_rep)
sample_vio <- log(BF_vio)
label_x <- ifelse(interact,
"ln(BF10)",
expression("ln("~italic("BF")[10]~")")) #ggplotly may not support the math expression
print(ggplot(df, aes(x=Model, y=BF))+
geom_boxplot()+
labs(x="True Model", y=ylab,
title=expression("Bayesian One-Sample"~italic(t)~"-Test"),
subtitle=paste0("Number of Subjects = ",N,", Number of Replicates = ",NSim,", Effect Size = ",ES))+
annotate("text", x=c(0.9,1.9,2.9), y=ypos, label=freq, color="red", angle=90)+
geom_hline(yintercept=yline, linetype="dashed", color="gray")+
annotate("text", x=Inf, y=yline[1], label=ytext[1], hjust=1.1, vjust=-0.5, color="gray")+
annotate("text", x=Inf, y=yline[2], label=ytext[2], hjust=1.1, vjust=-0.5, color="gray")+
theme_classic())
} else if (plot==2) { #distributions of p-values
KL_div_ref <- FNN::KL.divergence(pVal, pVal_rep, k=k)
KL_div <- FNN::KL.divergence(pVal, pVal_vio, k=k)
KL_text <- "p-values"
KS_ref <- stats::ks.test(pVal, pVal_rep)
KS <- stats::ks.test(pVal, pVal_vio)
if (ES==0) { #maximum number of subdivisions reached!
HD_ref <- statip::hellinger(pVal, pVal_rep, 0, 1)
HD <- statip::hellinger(pVal, pVal_vio, 0, 1)
}
typeI <- c(sum(pVal < .05), sum(pVal_rep < .05), sum(pVal_vio < .05)) / NSim
sample_norm <-pVal
sample_rep <- pVal_rep
sample_vio <- pVal_vio
label_x <- ifelse(interact,
"P-Value",
expression(italic("P")~"-Value")) #ggplotly may not support the math expression
print(ggplot(df, aes(x=Model, y=pVal))+
geom_boxplot()+
labs(x="True Model", y=expression(italic("P")~"-Value"),
title=expression("One-Sample"~italic(t)~"-Test"),
subtitle=paste0("Number of Subjects = ",N,", Number of Replicates = ",NSim,", Effect Size = ",ES))+
annotate("text", x=c(0.9,1.9,2.9), y=.015, label=typeI, color="red", angle=90)+
geom_hline(yintercept=.05, linetype="dotted", color="gray")+
geom_hline(yintercept=0, linetype="dashed", color="gray")+
geom_hline(yintercept=1, linetype="dashed", color="gray")+
scale_y_continuous(breaks=c(.01,.05,.1,1))+
theme_classic())
} else if (plot==3) { #scatter plot of the p-values and the Bayes factors
print(ggplot(df, aes(pVal, BF))+
geom_point(size=0.5,color="#8263d9",alpha=.2)+
facet_grid(. ~ Model)+
geom_hline(yintercept=yline,linetype="dashed")+
geom_vline(xintercept=.05,color="red")+
scale_x_continuous(breaks=c(.05,1))+
annotate("text", x=Inf, y=yline[1], label=ytext[1], hjust=1.1, vjust=-0.5, color="gray")+
annotate("text", x=Inf, y=yline[2], label=ytext[2], hjust=1.1, vjust=-0.5, color="gray")+
labs(x=expression(italic("P")~"-Value"),y=ylab,
title=expression("One-Sample"~italic(t)~"-Test"),
subtitle=paste0("Number of Subjects = ",N,", Number of Replicates = ",NSim,", Effect Size = ",ES))+
theme_classic())
print(ggplot(df, aes(pVal, BF, color=Model))+
geom_point(size=0.5,alpha=.1)+
scale_color_manual(values=c("#bac2ff", "#4e91fd", "#ff4848"))+
geom_hline(yintercept=yline,linetype="dashed")+
geom_vline(xintercept=.05,color="red")+
scale_x_continuous(breaks=c(.05,1))+
annotate("text", x=Inf, y=yline[1], label=ytext[1], hjust=1.1, vjust=-0.5, color="gray")+
annotate("text", x=Inf, y=yline[2], label=ytext[2], hjust=1.1, vjust=-0.5, color="gray")+
labs(x=expression(italic("P")~"-Value"),y=ylab)+
theme_classic()+
theme(legend.position="bottom")+
guides(colour=guide_legend(override.aes=list(size=3, alpha=1),nrow=1)))
} else {stop("Bad Input!")}
if (plot %in% c(1,2)) {
KL_div <- KL_div[KL_div < Inf & KL_div >= 0]
KL_div_ref <- KL_div_ref[KL_div_ref < Inf & KL_div_ref >= 0]
cat("The overall KL divergence between the distributions of", KL_text,
"\n under model misspecification is", round(mean(KL_div),3),
"nats, \nwhile the overall KL divergence under correct specification is", round(mean(KL_div_ref),3),
"nats, \n given", k, "nearest neighbors and", NSim, "replicates.")
cat("\n\n")
if (plot==2 & ES==0) {
cat("The Hellinger distance between the distributions of", KL_text,
"\n is", round(HD,3), "under model misspecification,",
"\nwhile the Hellinger distance is", round(HD_ref,3), "under correct specification.")
cat("\n\n")
}
cat("The two-sample KS tests show D =", round(KS$statistic,3), "and p-value =", round(KS$p.value,3),
"under model misspecification, \n and D =",
round(KS_ref$statistic,3), "and p-value =", round(KS_ref$p.value,3), "under correct specification.")
ecdf_norm <- stats::ecdf(sample_norm) #compute an empirical cumulative distribution function
ecdf_rep <- stats::ecdf(sample_rep)
ecdf_vio <- stats::ecdf(sample_vio)
if(interact) { #interactive
ecdf_obj <- data.frame("x"=c(sample_norm, sample_rep, sample_vio),
"y"=c(ecdf_norm(sample_norm), ecdf_rep(sample_rep), ecdf_vio(sample_vio)),
"Model"=factor(rep(c("Normal", "Normal (rep)", label), each=NSim)))
ggplotly(ggplot(ecdf_obj, aes(x, y, color=Model))+
geom_step()+xlab(label_x)+ylab("ECDF")+
ggtitle(paste0("Empirical CDF Comparison, N.S. = ",N,
", N.R. = ",NSim,", ES = ",ES))+
scale_color_manual(values=c("black","blue","red"))+
geom_hline(yintercept=c(0,1),linetype="dashed",color="gray")+
theme_classic())
} else { #base
plot(ecdf_norm, main="Empirical CDF Comparison", xlab=label_x, ylab="ECDF",
sub=paste0("Number of Subjects = ",N,", Number of Replicates = ",NSim,", Effect Size = ",ES), ...)
lines(ecdf_rep, col="blue")
lines(ecdf_vio, col="red")
legend("bottomright", legend=c("Normal","Normal (rep)",label), col=c("black","blue","red"), lwd=2)
}
}
}