The LogSumExp function

The LogSumExp function is the logarithm of the sum of the exponentials of values \({\bf x} = (x_1,\dots,x_n)\): \[ \operatorname{LSE}({\bf x}) = \log\left[\sum_{j=1}^n \exp(x_j) \right]. \] This function appears in many applications in Statistics (for instance, in Monte Carlo integration and logistic regression), Machine Learning, and some areas of Mathematics.

Challenges in the numerical evaluation of the LogSumExp function

Although the mathematical definition of the LSE function is simple, its numerical evaluation can be challenging when the values \(x_j\) are large and positive (in which case R may return Inf) or \(\vert x_j \vert\) are large and \(x_j\) have negative sign (in which case R may return 0). For example:

exp(-800)
## [1] 0
exp(800)
## [1] Inf

Motivated by these challenges, several approximations and numerical “tricks” have been proposed.

Approximations and numerical tricks

This is a little compilation of approximations and numerical “tricks” for calculating the LSE function.

Tangent line approximation

Using the tangent line approximation \({\displaystyle \log(X+a)\approx \log X+a/X,}\)

\[ {\textstyle \mathrm {LSE} (\mathbf {x} )= \log\left[\sum_{j=1}^n \exp(x_j) \right] \approx \log(\exp x_{k})+{\bigl (}\sum _{j\neq k}\exp x_{j}{\bigr )}/\exp x_{k}\approx x_{k}=\max _{j}x_{j}.} \] This approximation is very simple, but it is accurate only if the term \({\displaystyle x_{k}}\) is much larger than the rest.

Centering

This trick basically consists of centering the values \({\bf x}\), as follows: \[ \operatorname{LSE}({\bf x}) = \log\left[\sum_{j=1}^n \exp(x_j) \exp(x^* - x^*) \right]= \log\left[\sum_{j=1}^n \exp(x_j- x^*) \right] + x^*. \] The value \(x^*\) is typically the mean, the median, or the maximum of \({\bf x}\). Thus, it only works if the range of \(x_j-x^*\) can be evaluated numerically. This method is also related to the softmax function.

Recursive formula

Note first that, \[ \begin{aligned} \exp(x_1) + \exp(x_2) &= \exp(\max(x_1,x_2)) + \exp(\min(x_1,x_2)) \\ &= \exp(\max(x_1,x_2)) (1 + \exp(\min(x_1,x_2)-\max(x_1,x_2)) \\ &= \exp(\max(x_1,x_2)) (1 + \exp(-\vert x_1 - x_2\vert)). \\ \end{aligned} \] since \(\max (a,b) = \frac{1}{2}( a + b + |a-b| )\) and \(\min (a,b) = \frac{1}{2}( a + b - |a-b| )\). Thus, if we define \[L_n \equiv \log\left[\sum_{j=1}^{n-1} \exp(x_j) \right],\] and apply the previous recursive formula, we obtain \[ \begin{aligned} \exp[\operatorname{LSE}({\bf x})] &= \exp(L_{n+1}) \\ &= \exp(L_{n}) + \exp(x_n)\\ &= \exp( \max(L_n,x_n) )\left[ 1 + \exp( -\vert L_n - x_n \vert ) \right]. \end{aligned} \] Then, \[ \begin{aligned} \operatorname{LSE}({\bf x}) &= \max(L_n,x_n) + \log \left[ 1 + \exp( -\vert L_n - x_n \vert ) \right]. \end{aligned} \] This provides a recursive relationship that can be used to calculate the LSE. The advantage of this recursive formula is that the argument in the exponential in the second term is centered, and can be calculated using the R command log1p.

See also: logsumexp function in R.

Examples

The following R code presents some examples of the numerical tricks described above to calculate or approximate the LogSumExp function. We present the comparison between the approximation using the maximum, the calculation using the centering at the mean method, the calculation using the recursive formula, the calculation using the logSumExp() command from the matrixStats R package, and the direct calculation (denoted as “Naive” method). These examples illustrate cases where the Naive method fails, when the Maximum method fails (or is inaccurate), and when the Centering method fails. The recursive formula and the logSumExp() command seem to work well in all cases (although there may be some pathological cases out there).

# Required packages
library(matrixStats) # contains the function logSumExp
library(knitr)
#--------------------------------------------------------------------
# LSE function: Recursive formula
#--------------------------------------------------------------------
LSE_R <- function(vec){ 
  n.vec <- length(vec)
  vec <- sort(vec, decreasing = TRUE)
  Lk <- vec[1]
  for (k in 1:(n.vec-1)) {
    Lk <- max(vec[k+1], Lk) + log1p(exp(-abs(vec[k+1] - Lk))) 
  }
  return(Lk)
}

#--------------------------------------------------------------------
# LSE function: Maximum
#--------------------------------------------------------------------
LSE_M <- function(vec){ return(max(vec)) }

#--------------------------------------------------------------------
# LSE function: Centered at the mean
#--------------------------------------------------------------------
LSE_C <- function(vec){ 
  mvec <- mean(vec)
  val <- log(sum(  exp(vec - mvec) )) + mvec
  return(val) }

#--------------------------------------------------------------------
# LSE function: Naive
#--------------------------------------------------------------------
LSE_N <- function(vec){ return(log(sum(exp(vec)))) }


########################################################################################
# Example 1
########################################################################################
set.seed(123)
x <- rnorm(n = 100, mean = -1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x
Maximum -978.1267
Centering -977.3762
Recursive -977.3762
logSumExp -977.3762
Naive -Inf
########################################################################################
# Example 2
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x
Maximum -967.5896
Centering -967.5756
Recursive -967.5756
logSumExp -967.5756
Naive -Inf
########################################################################################
# Example 3
########################################################################################
set.seed(123)
x <- rnorm(n = 100, mean = 1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x
Maximum 1021.873
Centering 1022.624
Recursive 1022.624
logSumExp 1022.624
Naive Inf
########################################################################################
# Example 4
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x
Maximum 1032.410
Centering 1032.424
Recursive 1032.424
logSumExp 1032.424
Naive Inf
########################################################################################
# Example 5
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -5000, sd = 500)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x
Maximum -3379.48
Centering Inf
Recursive -3379.48
logSumExp -3379.48
Naive -Inf
########################################################################################
# Example 6
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 5000, sd = 500)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x
Maximum 6620.52
Centering Inf
Recursive 6620.52
logSumExp 6620.52
Naive Inf
########################################################################################
# Example 7
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -5000, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x
Maximum -4990.277
Centering -4988.933
Recursive -4988.933
logSumExp -4988.933
Naive -Inf
########################################################################################
# Example 8
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 5000, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x
Maximum 5009.723
Centering 5011.067
Recursive 5011.067
logSumExp 5011.067
Naive Inf
########################################################################################
# Example 9
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -500, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x
Maximum -490.2769
Centering -488.9334
Recursive -488.9334
logSumExp -488.9334
Naive -488.9334
########################################################################################
# Example 10
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 500, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x
Maximum 509.7231
Centering 511.0666
Recursive 511.0666
logSumExp 511.0666
Naive 511.0666