The Kumaraswamy Distribution

The Kumaraswamy distribution is a distribution with support on \([0,1]\). It represents an attractive alternative to the Beta distribution as its PDF and CDF can be written in closed form. See The Kumaraswamy Distribution for more details and an implementation of this distribution.

The following R code explores an approximation of the Beta distribution with parameters \((\alpha,\beta)\) using a Kumaraswamy distribution with parameters \((a,b)\) obtained by minimising the Total Variation distance:

\[(a,b) = \mbox{argmin}_{a_0,b_0} \int_0^1 \vert f_B(x,\alpha,\beta) - f_K(x,a_0,b_0) \vert dx.\] where \(f_B\) and \(f_K\) are the PDFs of the Beta and Kumaraswamy distributions, respectively.

A related post by John D. Cook: A beta-like distribution

Routines

#------------------------------------------------------------------
# Kumaraswamy PDF
# https://rpubs.com/FJRubio/KUMA
#------------------------------------------------------------------

dkuma <- Vectorize(function(x,a,b,log = FALSE){
  logden <-  log(a) + log(b) + (a-1)*log(x) + (b-1)*log(1-x^a)
  val <- ifelse(log, logden, exp(logden)) 
  return(val)
})

#----------------------------------------------------------------------------------
# Function to miminise the TV distance between a dbeta(x,a,b) and dkuma(x,a1,b1)
# a : shape parameter
# b : shape parameter
# init : initial value for the parameters of dkuma in log scale
#----------------------------------------------------------------------------------

MDTV <- function(a,b,init){
  # Total variation distance between dkuma and dbeta
  fun.min <- function(par){
    ak <- exp(par[1]); bk <- exp(par[2])
    obj.fun <- Vectorize(function(x) abs(dkuma(x,ak,bk)  - dbeta(x,a,b)) )
    dtv <- 0.5*integrate(obj.fun,0,1)$value
    return(dtv)
  }
  # optimisation step
opt <-   optim(init, fun.min, control = list(maxit = 10000))
  out <- list( pars = exp(opt$par), val.min = opt$value)
  return(out)
}

Examples with good approximation

library(knitr)
AB <- rbind(c(1.5,3.7), c(3.7,1.5), c(2,2), c(0.5,0.5), c(0.5,2.5), c(2,0.5))

MAT <- matrix(0, ncol = 5, nrow = 6)

for(j in 1:6){
  temp <- MDTV(AB[j,1],AB[j,2],c(0,0))
  MAT[j,] <- c(AB[j,1],AB[j,2],temp$pars,temp$val.min)
} 

colnames(MAT) <- c("alpha","beta","a","b","TV")

# dkuma parameters that minimise the TV distance and TV distance
kable(MAT, digits = 3)

alpha	beta	a	b	TV
1.5	3.7	1.362	4.018	0.008
3.7	1.5	3.361	1.538	0.005
2.0	2.0	1.817	2.087	0.006
0.5	0.5	0.437	0.517	0.012
0.5	2.5	0.559	2.185	0.014
2.0	0.5	2.216	0.496	0.004

for(j in 1:6){
  # Comparison of dbeta and dkuma
  db <- Vectorize(function(x) dbeta(x,AB[j,1],AB[j,2]))
  dk <- Vectorize(function(x) dkuma(x,MAT[j,3],MAT[j,4]))
  
  curve(db,0,1,n=1001,lwd= 2, xlab = "x",  ylab = "Density", cex.axis = 1.5, cex.lab = 1.5, main = paste("alpha = ", AB[j,1], ", ", "beta = ", AB[j,2]))
  curve(dk,0,1,n=1001,lwd = 2, lty = 2,add=T, col = "red")
}

Examples with poorer approximation

Consider now approximating the distributions dbeta(x,j,J-j+1) for \(j \in \{ 1,\dots,J\}\). The following plots show the Kumaraswamy approximations for \(J = 15\) and \(j=3,\dots,11\).

J = 15
MAT <- matrix(0, ncol = 5, nrow = J)

for(j in 1:J){
  temp <- MDTV(j,J-j+1,c(0,0))
  MAT[j,] <- c(j,J-j+1,temp$pars,temp$val.min)
} 

colnames(MAT) <- c("alpha","beta","a","b","TV")

# dkuma parameters that minimise the TV distance and TV distance
kable(MAT[3:11,], digits = 3)

alpha	beta	a	b	TV
3	13	2.021	24.129	0.030
4	12	2.454	23.668	0.034
5	11	2.884	21.906	0.035
6	10	3.326	19.397	0.036
7	9	3.801	16.720	0.035
8	8	4.317	13.985	0.034
9	7	4.901	11.415	0.032
10	6	5.578	9.038	0.029
11	5	6.414	6.954	0.026

for(j in 3:(J-4)){
  # Comparison of dbeta and dkuma
  db <- Vectorize(function(x) dbeta(x,j,J-j+1))
  dk <- Vectorize(function(x) dkuma(x,MAT[j,3],MAT[j,4]))
  
  curve(db,0,1,n=1001,lwd= 2, xlab = "x",  ylab = "Density", cex.axis = 1.5, cex.lab = 1.5, main = paste("alpha = ", j, ", ", "beta = ", J-j+1))
  curve(dk,0,1,n=1001,lwd = 2, lty = 2,add=T, col = "red")
}

Approximating the Beta PDF with a Kumaraswamy PDF based on the TV distance

F. Javier Rubio

February 20, 2022

The Kumaraswamy Distribution

Routines

Examples with good approximation

Examples with poorer approximation

References