The Wasserstein prior for the Skew Normal distribution

The skew-normal pdf is defined as Azzalini (1985): \[ s(x \mid \mu,\sigma, \alpha) = \dfrac{2}{\sigma} \phi\left(\dfrac{x-\mu}{\sigma}\right) \Phi\left(\alpha \dfrac{x-\mu}{\sigma}\right), \] where \(\alpha \in {\mathbb R}\) is a skewness parameter, \(\mu\) is a location parameter and \(\sigma\) is a scale parameter.

Li and Rubio (2022) proposed the independence Wasserstein prior \[ \pi(\mu,\sigma,\alpha) \propto \sqrt{\int_{-\infty}^{\infty} \frac{\sqrt{2} e^{-\frac{1}{2} \left(2 \alpha ^2+1\right) x^2}}{\pi ^{3/2} \left(\alpha ^2+1\right)^2 \left(\mbox{erf}\left(\frac{\alpha x}{\sqrt{2}}\right)+1\right)}dx}. \] They showed that the posterior distribution of \((\mu,\sigma,\alpha)\) is proper if \(n > 2\). We will consider the Student-t aproximation to the Wasserstein prior of \(\alpha\) proposed in Li and Rubio (2022) in the following application.

The following R code shows a simple application of the Wasserstein prior.

See also: Natural (non-) informative priors for the skew-normal distribution and The Jeffreys prior for skew–symmetric models.

Body Mass Index data

rm(list=ls())

# Required packages
library(sn)
## Loading required package: stats4
## 
## Attaching package: 'sn'
## The following object is masked from 'package:stats':
## 
##     sd
library(Rtwalk)
## Package: Rtwalk, Version: 1.8.0
## The R Implementation of 't-walk' MCMC Sampler
## http://www.cimat.mx/~jac/twalk/
## For citations, please use:
## Christen JA and Fox C (2010). A general purpose sampling algorithm for
## continuous distributions (the t-walk). Bayesian Analysis, 5(2),
## pp. 263-282. <URL:
## http://ba.stat.cmu.edu/journal/2010/vol05/issue02/christen.pdf>.
library(TeachingDemos)

# Data
data(ais)

bmi = ais$BMI[1:100]

hist(bmi)

# Log-likelihood and MLE 
ll <- function(par){
if(par[2]>0) return(-sum(dsn(bmi,par[1],par[2],par[3],log=T)))
else return(Inf)
}

OPT <- optim(c(20,1,2),ll)

MLE <- OPT$par

##########################################################################################################################
# Log-posterior with Wasserstein prior
##########################################################################################################################
# -Log posterior

sc = 0.757

lpw <- function(par){
  mu <- par[1]; sigma = exp(par[2]); alpha = par[3]
return(-sum(dsn(bmi,mu,sigma,alpha,log=T)) - dt(alpha/sc, df = 3/2, log = TRUE) )
}

# Number of iterations 
NMH = 110000

# Support function 
Support <- function(x) {TRUE}

# Function to generate the initial points in the sampler
X0 <- function(x) { MLE + runif(3, -0.1,0.1) }

# Posterior samples
set.seed(1234)
outw <- Runtwalk( dim=3,  Tr=NMH,  Obj=lpw, Supp=Support, x0=X0(), xp0=X0(),PlotLogPost = FALSE) 
## This is the twalk for R.
## Evaluating the objective density at initial values.
## Opening 8 X 110001 matrix to save output.  Sampling (no graphics mode).
# thin-in and burn-in
burn = 10000
thin = 100
ind = seq(burn,NMH,thin)

mupw = outw$output[ , 1][ind]
sigmapw = exp(outw$output[ , 2][ind])
alphapw = outw$output[ , 3][ind]

# Some histograms and summaries
hist(mupw)
abline(v=MLE[1],col="red",lwd=2)
box()

hist(sigmapw)
abline(v=MLE[2],col="red",lwd=2)
box()

hist(alphapw)
abline(v=MLE[3],col="red",lwd=2)
box()

# Posterior means
apply(cbind(mupw,sigmapw,alphapw),2,mean)
##      mupw   sigmapw   alphapw 
## 20.083601  3.462600  1.466051
# Posterior medians
apply(cbind(mupw,sigmapw,alphapw),2,median)
##      mupw   sigmapw   alphapw 
## 19.743538  3.472449  1.549175
# Highest posterior credible intervals
apply(cbind(mupw,sigmapw,alphapw),2,emp.hpd)
##          mupw  sigmapw    alphapw
## [1,] 18.45805 2.422982 -0.6590818
## [2,] 22.89599 4.480975  3.3615995
##########################################################################################################################
# Predictive density
##########################################################################################################################

# Predictive densities for the Wasserstein prior
predw<- Vectorize(function(x) {
  tempf <- function(par) dsn(x,par[1],par[2],par[3])
  var <- mean(apply(cbind(mupw,sigmapw,alphapw),1,tempf))
  return(var)
})


hist(bmi,breaks=20,probability = TRUE, xlim=c(15,35), ylab = "Predictive Density")
curve(predw,15,35,add=T,lwd=2,lty=1)
box()

Azzalini, A. 1985. “A Class of Distributions Which Includes the Normal Ones.” Scandinavian Journal of Statistics, 171–78.
Li, W., and F. J. Rubio. 2022. “On a Prior Based on the Wasserstein Information Matrix.” Submitted.