The LogSumExp function

The LogSumExp function is the logarithm of the sum of the exponentials of values \({\bf x} = (x_1,\dots,x_n)\): \[ \operatorname{LSE}({\bf x}) = \log\left[\sum_{j=1}^n \exp(x_j) \right]. \] This function appears in many applications in Statistics (for instance, in Monte Carlo integration and logistic regression), Machine Learning, and some areas of Mathematics.

Challenges in the numerical evaluation of the LogSumExp function

Although the mathematical definition of the LSE function is simple, its numerical evaluation can be challenging when the values \(x_j\) are large and positive (in which case R may return Inf) or \(\vert x_j \vert\) are large and \(x_j\) have negative sign (in which case R may return 0). For example:

exp(-800)

## [1] 0

exp(800)

## [1] Inf

Motivated by these challenges, several approximations and numerical “tricks” have been proposed.

Approximations and numerical tricks

This is a little compilation of approximations and numerical “tricks” for calculating the LSE function.

Tangent line approximation

Using the tangent line approximation \({\displaystyle \log(X+a)\approx \log X+a/X,}\)

\[ {\textstyle \mathrm {LSE} (\mathbf {x} )= \log\left[\sum_{j=1}^n \exp(x_j) \right] \approx \log(\exp x_{k})+{\bigl (}\sum _{j\neq k}\exp x_{j}{\bigr )}/\exp x_{k}\approx x_{k}=\max _{j}x_{j}.} \] This approximation is very simple, but it is accurate only if the term \({\displaystyle x_{k}}\) is much larger than the rest.

Centering

This trick basically consists of centering the values \({\bf x}\), as follows: \[ \operatorname{LSE}({\bf x}) = \log\left[\sum_{j=1}^n \exp(x_j) \exp(x^* - x^*) \right]= \log\left[\sum_{j=1}^n \exp(x_j- x^*) \right] + x^*. \] The value \(x^*\) is typically the mean, the median, or the maximum of \({\bf x}\). Thus, it only works if the range of \(x_j-x^*\) can be evaluated numerically. This method is also related to the softmax function.

Recursive formula

Note first that, \[ \begin{aligned} \exp(x_1) + \exp(x_2) &= \exp(\max(x_1,x_2)) + \exp(\min(x_1,x_2)) \\ &= \exp(\max(x_1,x_2)) (1 + \exp(\min(x_1,x_2)-\max(x_1,x_2)) \\ &= \exp(\max(x_1,x_2)) (1 + \exp(-\vert x_1 - x_2\vert)). \\ \end{aligned} \] since \(\max (a,b) = \frac{1}{2}( a + b + |a-b| )\) and \(\min (a,b) = \frac{1}{2}( a + b - |a-b| )\). Thus, if we define \[L_n \equiv \log\left[\sum_{j=1}^{n-1} \exp(x_j) \right],\] and apply the previous recursive formula, we obtain \[ \begin{aligned} \exp[\operatorname{LSE}({\bf x})] &= \exp(L_{n+1}) \\ &= \exp(L_{n}) + \exp(x_n)\\ &= \exp( \max(L_n,x_n) )\left[ 1 + \exp( -\vert L_n - x_n \vert ) \right]. \end{aligned} \] Then, \[ \begin{aligned} \operatorname{LSE}({\bf x}) &= \max(L_n,x_n) + \log \left[ 1 + \exp( -\vert L_n - x_n \vert ) \right]. \end{aligned} \] This provides a recursive relationship that can be used to calculate the LSE. The advantage of this recursive formula is that the argument in the exponential in the second term is centered, and can be calculated using the R command log1p.

Examples

The following R code presents some examples of the numerical tricks described above to calculate or approximate the LogSumExp function. We present the comparison between the approximation using the maximum, the calculation using the centering at the mean method, the calculation using the recursive formula, the calculation using the logSumExp() command from the matrixStats R package, and the direct calculation (denoted as “Naive” method). These examples illustrate cases where the Naive method fails, when the Maximum method fails (or is inaccurate), and when the Centering method fails. The recursive formula and the logSumExp() command seem to work well in all cases (although there may be some pathological cases out there).

# Required packages
library(matrixStats) # contains the function logSumExp
library(knitr)
#--------------------------------------------------------------------
# LSE function: Recursive formula
#--------------------------------------------------------------------
LSE_R <- function(vec){ 
  n.vec <- length(vec)
  vec <- sort(vec, decreasing = TRUE)
  Lk <- vec[1]
  for (k in 1:(n.vec-1)) {
    Lk <- max(vec[k+1], Lk) + log1p(exp(-abs(vec[k+1] - Lk))) 
  }
  return(Lk)
}

#--------------------------------------------------------------------
# LSE function: Maximum
#--------------------------------------------------------------------
LSE_M <- function(vec){ return(max(vec)) }

#--------------------------------------------------------------------
# LSE function: Centered at the mean
#--------------------------------------------------------------------
LSE_C <- function(vec){ 
  mvec <- mean(vec)
  val <- log(sum(  exp(vec - mvec) )) + mvec
  return(val) }

#--------------------------------------------------------------------
# LSE function: Naive
#--------------------------------------------------------------------
LSE_N <- function(vec){ return(log(sum(exp(vec)))) }


########################################################################################
# Example 1
########################################################################################
set.seed(123)
x <- rnorm(n = 100, mean = -1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)

	x
Maximum	-978.1267
Centering	-977.3762
Recursive	-977.3762
logSumExp	-977.3762
Naive	-Inf

########################################################################################
# Example 2
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)

	x
Maximum	-967.5896
Centering	-967.5756
Recursive	-967.5756
logSumExp	-967.5756
Naive	-Inf

########################################################################################
# Example 3
########################################################################################
set.seed(123)
x <- rnorm(n = 100, mean = 1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)

	x
Maximum	1021.873
Centering	1022.624
Recursive	1022.624
logSumExp	1022.624
Naive	Inf

########################################################################################
# Example 4
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)

	x
Maximum	1032.410
Centering	1032.424
Recursive	1032.424
logSumExp	1032.424
Naive	Inf

########################################################################################
# Example 5
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -5000, sd = 500)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)

	x
Maximum	-3379.48
Centering	Inf
Recursive	-3379.48
logSumExp	-3379.48
Naive	-Inf

########################################################################################
# Example 6
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 5000, sd = 500)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)

	x
Maximum	6620.52
Centering	Inf
Recursive	6620.52
logSumExp	6620.52
Naive	Inf

########################################################################################
# Example 7
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -5000, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)

	x
Maximum	-4990.277
Centering	-4988.933
Recursive	-4988.933
logSumExp	-4988.933
Naive	-Inf

########################################################################################
# Example 8
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 5000, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)

	x
Maximum	5009.723
Centering	5011.067
Recursive	5011.067
logSumExp	5011.067
Naive	Inf

########################################################################################
# Example 9
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -500, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)

	x
Maximum	-490.2769
Centering	-488.9334
Recursive	-488.9334
logSumExp	-488.9334
Naive	-488.9334

########################################################################################
# Example 10
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 500, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)

	x
Maximum	509.7231
Centering	511.0666
Recursive	511.0666
logSumExp	511.0666
Naive	511.0666

LogSumExp Function

F. Javier Rubio

November 04, 2019