The Wasserstein-1 metric in 1-D

The \(p^{\text{th}}\) Wasserstein distance between two probability measures \(\mu\) and \(\nu\), with finite \(p^{\text{th}}\) moment, can be defined as [1]:

\[W_p(\mu,\nu)^p = \inf {\mathbb E}\left[ d(X,Y)^p \right],\] where \(d\) is a metric, and \({\mathbb E}[Z]\) denotes the expected value of a random variable \(Z\) and the infimum is taken over all joint distributions of the random variables \(X\) and \(Y\) with marginals \(\mu\) and \(\nu\), respectively.

For \(p=1\), [2] showed that the Wasserstein-1 metric in 1-D (dimension one), between two cumulative distribution functions \(F_1\) and \(F_2\) on \({\mathbb R}\) can be written as the \(L_1\) distance:

\[W_1(F_1,F_2) = \int_{\mathbb R} \left \vert F_1(x) -F_2(x) \right\vert dx.\] Thus, for distributions with numerically tractable CDFs, the Wasserstein-1 metric can be approximated using numerical integration. It is worth noticing that this distance is not invariant under monotone transformations (for instance, under scale tranformations).

Next, we present several examples of the numerical calculation of the Wasserstein-1 metric between a distribution \(F(\cdot;\theta)\) and a nested distribution of interest \(F(\cdot;\theta_0)\), for some fixed value of \(\theta_0\). We omit location and scale parameters, but the R code can be easily tweaked to include these parameters. We also present plots of the function: \[M(\theta) = W_1(F(\cdot;\theta),F(\cdot;\theta_0)) = \int_{\mathbb R} \left \vert F(x;\theta) -F(x;\theta_0) \right\vert dx,\] which can be interpreted as a function that measures the effect of the parameter \(\theta\) (see [3]).

References

Exampe 1: Skew-normal vs. Normal

The skew-normal probability density function is: \[f(x;\lambda) = 2\phi(x)\Phi(\lambda x),\] where \(\phi\) and \(\Phi\) are the standard normal pdf and cdf, respectively, and \(\lambda\in{\mathbb R}\). Here, we calculate the Wasserstein-1 metric between \(f(x;\lambda)\) and \(\phi(x)\).

# Required packages
library(sn)

## Loading required package: stats4

## 
## Attaching package: 'sn'

## The following object is masked from 'package:stats':
## 
##     sd

library(knitr)

## Warning: package 'knitr' was built under R version 3.4.3

MW1 <- Vectorize(function(par){
  tempf <- Vectorize(function(x)  abs(psn(x, alpha=par)  - pnorm(x)) )
  val <- integrate(tempf,-Inf,Inf)$value
  return(val)
})

# Example: lambda = 0,1,2,3,4,5
lambda <- -5:5
W1 <- MW1(lambda)
print(kable(cbind(lambda,W1),digits=4))

## 
## 
##  lambda       W1
## -------  -------
##      -5   0.7824
##      -4   0.7741
##      -3   0.7569
##      -2   0.7136
##      -1   0.5642
##       0   0.0000
##       1   0.5642
##       2   0.7136
##       3   0.7569
##       4   0.7741
##       5   0.7824

# Plot
curve(MW1,-10,10, xlab = ~lambda, ylab="M", cex.axis=1.5, cex.lab=1.5, lwd=2, n = 250)

Example 2: two-piece normal vs normal

The two-piece normal probability density function is defined as: \[f(x;\gamma) = \phi\left(\dfrac{x}{1+\gamma}\right)I(x<0) + \phi\left(\dfrac{x}{1-\gamma}\right)I(x\geq 0).\] where \(\phi\) is the standard normal pdf, and \(\gamma\in(-1,1)\). Here, we calculate the Wasserstein-1 metric between \(f(x;\gamma)\) and \(\phi(x)\).

# Required packages
# http://www.rpubs.com/FJRubio/twopiece
library(twopiece)

## Loading required package: flexclust

## Loading required package: grid

## Loading required package: lattice

## Loading required package: modeltools

## Loading required package: mvtnorm

## Loading required package: label.switching

library(knitr)

MW1 <- Vectorize(function(par){
  tempf <- Vectorize(function(x)  abs(ptp3(x, 0, 1, par, FUN = pnorm, param = "eps")  - pnorm(x)) )
  val <- integrate(tempf,-Inf,Inf)$value
  return(val)
})

# Example: lambda = -0.9,-0.8,...,0,...,0.8,0.9
gamma <- seq(-0.9,0.9,by=0.1)
W1 <- MW1(gamma)
print(kable(cbind(gamma,W1),digits=4))

## 
## 
##  gamma       W1
## ------  -------
##   -0.9   1.4362
##   -0.8   1.2766
##   -0.7   1.1170
##   -0.6   0.9575
##   -0.5   0.7979
##   -0.4   0.6383
##   -0.3   0.4787
##   -0.2   0.3192
##   -0.1   0.1596
##    0.0   0.0000
##    0.1   0.1596
##    0.2   0.3192
##    0.3   0.4787
##    0.4   0.6383
##    0.5   0.7979
##    0.6   0.9575
##    0.7   1.1170
##    0.8   1.2766
##    0.9   1.4362

# Plot
curve(MW1,-0.99,0.99, xlab = ~gamma, ylab="M", cex.axis=1.5, cex.lab=1.5, lwd=2, n = 250)

Example 3: distance between two survival curves

The Exponentiated Weibull distribution (EWD) is a three-parameter distribution. It contains a scale parameter, a shape parameter, and a power (shape) parameter \(\alpha\). The EWD contains the Weibull distribution as a particular case (\(\alpha=1\)). The EWD has been used to model survival times given that its hazard function can capture the basic shapes: constant, increasing, decreasing, bathtub, and unimodal.

If we are interested in comparing two survival curves, \(S_1\) and \(S_2\), one possible way is to calculate the area between the survival curves, which is the \(L_1\) distance between them. Moreover, since \(S_i(\cdot) =1-F_i(\cdot)\), \(i=1,2\), it follows that \[\int_{{\mathbb R}_+} \left \vert S_1(x) - S_2(x) \right\vert dx = \int_{{\mathbb R}_+} \left \vert F_1(x) -F_2(x) \right\vert dx = W_1(F_1,F_2). \] Here, we will measure the effect of the power parameter \(\alpha\) for the case where the scale and shape parameters are \(1\), compared to the Weibull distribution with unit scale and shape parameters.

# Required packages
# http://www.rpubs.com/FJRubio/EWD
library(knitr)

# Exponentiated Weibull Cumulative distribution function
pexpweibull<- function(t,lambda,kappa,alpha,log.p=FALSE){
  log.cdf <- alpha*pweibull(t,scale=lambda,shape=kappa,log.p=TRUE)
  ifelse(log.p, return(log.cdf), return(exp(log.cdf)))
} 

MW1 <- Vectorize(function(par){
  tempf <- Vectorize(function(x)  abs(pexpweibull(x, 1, 1, par)  - pweibull(x,1,1)) )
  val <- integrate(tempf,0,Inf)$value
  return(val)
})

# Example
alpha <- seq( 0.1,5,by=0.1)
W1 <- MW1(alpha)
print(kable(cbind(alpha,W1),digits=4))

## 
## 
##  alpha       W1
## ------  -------
##    0.1   0.8465
##    0.2   0.7118
##    0.3   0.5920
##    0.4   0.4842
##    0.5   0.3863
##    0.6   0.2967
##    0.7   0.2142
##    0.8   0.1378
##    0.9   0.0666
##    1.0   0.0000
##    1.1   0.0626
##    1.2   0.1215
##    1.3   0.1773
##    1.4   0.2301
##    1.5   0.2804
##    1.6   0.3283
##    1.7   0.3740
##    1.8   0.4178
##    1.9   0.4597
##    2.0   0.5000
##    2.1   0.5387
##    2.2   0.5761
##    2.3   0.6120
##    2.4   0.6468
##    2.5   0.6804
##    2.6   0.7129
##    2.7   0.7444
##    2.8   0.7749
##    2.9   0.8045
##    3.0   0.8333
##    3.1   0.8613
##    3.2   0.8886
##    3.3   0.9151
##    3.4   0.9409
##    3.5   0.9661
##    3.6   0.9907
##    3.7   1.0146
##    3.8   1.0381
##    3.9   1.0610
##    4.0   1.0833
##    4.1   1.1052
##    4.2   1.1266
##    4.3   1.1476
##    4.4   1.1682
##    4.5   1.1883
##    4.6   1.2080
##    4.7   1.2274
##    4.8   1.2464
##    4.9   1.2650
##    5.0   1.2833

# Plot
curve(MW1,0.001,5, xlab = ~alpha, ylab="M", cex.axis=1.5, cex.lab=1.5, lwd=2, n = 1000)

Numerical calculation of the Wasserstein-1 metric in 1-D: examples

F. Javier Rubio

October 13, 2018

The Wasserstein-1 metric in 1-D

References

Exampe 1: Skew-normal vs. Normal

Example 2: two-piece normal vs normal

Example 3: distance between two survival curves