Several types of Nonparametric estimators of P(X<Y) for paired data

Let \((X,Y)\) be a pair of absolutely continuous random variables with joint density \(f_{X,Y}:{\mathbb R}^2\rightarrow\mathbb R_+\). The stress-strength coefficient is defined as \(\theta = {\mathbb P}(X<Y)\). Thus, by definition, we have that (see [1]) \[\theta = {\mathbb P}(X<Y) = \int_{-\infty}^{\infty}\int_{-\infty}^y f_{X,Y}(x,y)dxdy, \,\,\,\,\,\, (1)\] Alternatively, if \(Z=Y-X\), we have: \[\theta = {\mathbb P}(X<Y) =\int_{0}^{\infty} f_{Z}(z)dz = 1- F_Z(0), \,\,\,\,\,\, (2)\] where \(f_Z\) and \(F_Z\) are the probability density function and cumulative distribution function, respectively, of the random variable \(Z\). [1] proposed estimating (1), for a sample \(({\bf x},{\bf y}) = \{(x_1,y_1),\dots,(x_n,y_n)\}\) of \((X,Y)\), as follows

Algorithm 1

(i). Using \(({\bf x},{\bf y})\) construct a nonparametric estimator of \(f_{X,Y}\), \(\widehat{f}_{X,Y}\).

(ii). Define the estimator \(\tilde{\theta} = \int_{-\infty}^{\infty}\int_{-\infty}^y \widehat{f}_{X,Y}(x,y)dxdy\)

[1] proposed the use of several nonparametric density estimators \(\widehat{f}_{X,Y}\) such as kernel density estimators and shape-restricted estimators.

Similarly, [1] proposed the following estimator of (2), for a sample \({\bf z} = (z_1,\dots,z_n)\) with \(z_j=y_j-x_j\), as follows

Algorithm 2

  1. Using the sample \({\bf z}\), construct a nonparametric estimator of \(F_Z\), \(\widehat{F}_Z\).

  2. Define the estimator \(\tilde{\theta} = 1-\widehat{F}_Z(0)\).

[1] proposed the use of several nonparametric density estimators \(\widehat{F}_Z\) such as the empirical cumulative distribution function (ECDF), kernel density (distribution) estimators, shape-restricted density estimators, among others.

The estimators of \(\theta\) obtained with Algorithm 1 and Algorithm 2 are consistent. The following R code illustrates the use of these estimators in the context of finance using real data.

References

  1. Nonparametric inference for P(X<Y) with paired variables see also
rm(list=ls())
library(ks)
## Warning: package 'ks' was built under R version 3.3.2
## Loading required package: mvtnorm
library(kerdiest)
## Loading required package: date
## Warning: package 'date' was built under R version 3.3.2
## Loading required package: chron
## Warning: package 'chron' was built under R version 3.3.2
## Loading required package: evir
## 
## Attaching package: 'kerdiest'
## The following object is masked from 'package:ks':
## 
##     kde
library(KernSmooth)
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
library(mvtnorm)
library(LogConcDEAD)
## Loading required package: MASS
library(ghyp)
## Loading required package: numDeriv
## Loading required package: gplots
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
# The SMI stocks data:  "SMI"  vs    "Swiss.Re"
data(smi.stocks)
data = as.matrix(smi.stocks[ , c(1, 6)])
# Differences
data.diff = data[,2]-data[,1]

################################################################################################
# Estimators based on Algorithm 1
################################################################################################

# Estimator (1) based on kernel density estimators (Kernel 2D)
K2D = function(data){
  H.scv=Hscv(data)
  H.scv
  H.eig = eigen(H.scv)
  H.sqrt = H.eig$vectors %*% diag(sqrt(H.eig$values)) %*% solve(H.eig$vectors)
  H = solve(H.sqrt)
  dH = det(H.scv)
  KG = function(par){
    par1=cbind(rep(par[1],length(data[,1])),rep(par[2],length(data[,1])))
    return( mean(dmvnorm(t(H%*%t(par1-data)),rep(0,2),diag(2),log=FALSE)/sqrt(dH)) )
  }
  KG2 = function(x,y){
    par=c(x,y)
    return(KG(par))
  }
  vKG = Vectorize(KG2)
  return(integrate(function(y) {
    sapply(y, function(y) {
      integrate(function(x) vKG(x,y), -Inf, y)$value
    })
  }, -Inf, Inf)$value)
}


K2D(data)
## [1] 0.4884762
# Estimator (1) based on log-concave density estimators (MLE 2D)
# Integration is based on Monte Carlo methods, select the number of MC iterations
MC <- 100000  # Monte Carlo iterations

MLE2D = function(data){
  f1 = mlelcd(data)
  samp1 = rlcd(n=MC, f1, method=c("Independent","MH"))
  return(length(which(samp1[,1]<samp1[,2]))/MC)
}

MLE2D(data)
## [1] 0.48711
# Estimator (1) based on smooth log-concave density estimators (SMLE 2D)
# Integration is based on Monte Carlo methods, select the number of MC iterations
MC <- 100000  # Monte Carlo iterations


SMLE2D = function(data,ind){
  data1 =  cbind(data[ind,1],data[ind,2])
  f1 = mlelcd(data1)
  samp1 = rslcd(n=MC, f1, A=hatA(f1), method=c("Independent","MH"))
  return(length(which(samp1[,1]<samp1[,2]))/MC)
}

SMLE2D(data)
## [1] 0.48968
################################################################################################
# Estimators based on Algorithm 2
################################################################################################

# Estimator (2) based on kernel distribution estimators (Kernel 1D)
KDE1 <- function(z){
bwcv = ALbw(type_kernel = "n", vec_data=z)
1-kde(type_kernel = "n", vec_data=z, y = 0,  bw =bwcv)$Estimated
}

KDE1(data.diff)
## [1] 0.4882594
# Estimator (2) based on log-concave density estimators (MLE 1D)
MLE1D = function(z){
  f = mlelcd(z)
  integrate(function(x) dlcd(x, f, uselog=FALSE),0,max(z))$value
}

MLE1D(data.diff)
## [1] 0.4889634
# Estimator (2) based on smooth log-concave density estimators (SMLE 1D)
SMLE1D = function(z){
  f = mlelcd(z)
  integrate(function(x) dslcd(x, f, A=hatA(f)),0,max(z))$value
}

SMLE1D(data.diff)
## [1] 0.4905866
# Estimator (2) based on the Empirical distribution (ECDF)
EDCFE = function(z) 1-ecdf(z)(0)
EDCFE(data.diff)
## [1] 0.4872809