The LogSumExp function is the logarithm of the sum of the exponentials of values \({\bf x} = (x_1,\dots,x_n)\): \[ \operatorname{LSE}({\bf x}) = \log\left[\sum_{j=1}^n \exp(x_j) \right]. \] This function appears in many applications in Statistics (for instance, in Monte Carlo integration and logistic regression), Machine Learning, and some areas of Mathematics.
Although the mathematical definition of the LSE function is simple, its numerical evaluation can be challenging when the values \(x_j\) are large and positive (in which case R may return Inf
) or \(\vert x_j \vert\) are large and \(x_j\) have negative sign (in which case R may return 0
). For example:
exp(-800)
## [1] 0
exp(800)
## [1] Inf
Motivated by these challenges, several approximations and numerical “tricks” have been proposed.
This is a little compilation of approximations and numerical “tricks” for calculating the LSE function.
Using the tangent line approximation \({\displaystyle \log(X+a)\approx \log X+a/X,}\)
\[ {\textstyle \mathrm {LSE} (\mathbf {x} )= \log\left[\sum_{j=1}^n \exp(x_j) \right] \approx \log(\exp x_{k})+{\bigl (}\sum _{j\neq k}\exp x_{j}{\bigr )}/\exp x_{k}\approx x_{k}=\max _{j}x_{j}.} \] This approximation is very simple, but it is accurate only if the term \({\displaystyle x_{k}}\) is much larger than the rest.
This trick basically consists of centering the values \({\bf x}\), as follows: \[ \operatorname{LSE}({\bf x}) = \log\left[\sum_{j=1}^n \exp(x_j) \exp(x^* - x^*) \right]= \log\left[\sum_{j=1}^n \exp(x_j- x^*) \right] + x^*. \] The value \(x^*\) is typically the mean, the median, or the maximum of \({\bf x}\). Thus, it only works if the range of \(x_j-x^*\) can be evaluated numerically. This method is also related to the softmax function.
Note first that, \[
\begin{aligned}
\exp(x_1) + \exp(x_2)
&= \exp(\max(x_1,x_2)) + \exp(\min(x_1,x_2)) \\
&= \exp(\max(x_1,x_2)) (1 + \exp(\min(x_1,x_2)-\max(x_1,x_2)) \\
&= \exp(\max(x_1,x_2)) (1 + \exp(-\vert x_1 - x_2\vert)). \\
\end{aligned}
\] since \(\max (a,b) = \frac{1}{2}( a + b + |a-b| )\) and \(\min (a,b) = \frac{1}{2}( a + b - |a-b| )\). Thus, if we define \[L_n \equiv \log\left[\sum_{j=1}^{n-1} \exp(x_j) \right],\] and apply the previous recursive formula, we obtain \[
\begin{aligned}
\exp[\operatorname{LSE}({\bf x})] &= \exp(L_{n+1}) \\
&= \exp(L_{n}) + \exp(x_n)\\
&= \exp( \max(L_n,x_n) )\left[ 1 + \exp( -\vert L_n - x_n \vert ) \right].
\end{aligned}
\] Then, \[
\begin{aligned}
\operatorname{LSE}({\bf x}) &= \max(L_n,x_n) + \log \left[ 1 + \exp( -\vert L_n - x_n \vert ) \right].
\end{aligned}
\] This provides a recursive relationship that can be used to calculate the LSE. The advantage of this recursive formula is that the argument in the exponential in the second term is centered, and can be calculated using the R command log1p
.
See also: logsumexp function in R.
The following R code presents some examples of the numerical tricks described above to calculate or approximate the LogSumExp function. We present the comparison between the approximation using the maximum, the calculation using the centering at the mean method, the calculation using the recursive formula, the calculation using the logSumExp()
command from the matrixStats
R package, and the direct calculation (denoted as “Naive” method). These examples illustrate cases where the Naive method fails, when the Maximum method fails (or is inaccurate), and when the Centering method fails. The recursive formula and the logSumExp()
command seem to work well in all cases (although there may be some pathological cases out there).
# Required packages
library(matrixStats) # contains the function logSumExp
library(knitr)
#--------------------------------------------------------------------
# LSE function: Recursive formula
#--------------------------------------------------------------------
LSE_R <- function(vec){
n.vec <- length(vec)
vec <- sort(vec, decreasing = TRUE)
Lk <- vec[1]
for (k in 1:(n.vec-1)) {
Lk <- max(vec[k+1], Lk) + log1p(exp(-abs(vec[k+1] - Lk)))
}
return(Lk)
}
#--------------------------------------------------------------------
# LSE function: Maximum
#--------------------------------------------------------------------
LSE_M <- function(vec){ return(max(vec)) }
#--------------------------------------------------------------------
# LSE function: Centered at the mean
#--------------------------------------------------------------------
LSE_C <- function(vec){
mvec <- mean(vec)
val <- log(sum( exp(vec - mvec) )) + mvec
return(val) }
#--------------------------------------------------------------------
# LSE function: Naive
#--------------------------------------------------------------------
LSE_N <- function(vec){ return(log(sum(exp(vec)))) }
########################################################################################
# Example 1
########################################################################################
set.seed(123)
x <- rnorm(n = 100, mean = -1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x | |
---|---|
Maximum | -978.1267 |
Centering | -977.3762 |
Recursive | -977.3762 |
logSumExp | -977.3762 |
Naive | -Inf |
########################################################################################
# Example 2
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x | |
---|---|
Maximum | -967.5896 |
Centering | -967.5756 |
Recursive | -967.5756 |
logSumExp | -967.5756 |
Naive | -Inf |
########################################################################################
# Example 3
########################################################################################
set.seed(123)
x <- rnorm(n = 100, mean = 1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x | |
---|---|
Maximum | 1021.873 |
Centering | 1022.624 |
Recursive | 1022.624 |
logSumExp | 1022.624 |
Naive | Inf |
########################################################################################
# Example 4
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 1000, sd = 10)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x | |
---|---|
Maximum | 1032.410 |
Centering | 1032.424 |
Recursive | 1032.424 |
logSumExp | 1032.424 |
Naive | Inf |
########################################################################################
# Example 5
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -5000, sd = 500)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x | |
---|---|
Maximum | -3379.48 |
Centering | Inf |
Recursive | -3379.48 |
logSumExp | -3379.48 |
Naive | -Inf |
########################################################################################
# Example 6
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 5000, sd = 500)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x | |
---|---|
Maximum | 6620.52 |
Centering | Inf |
Recursive | 6620.52 |
logSumExp | 6620.52 |
Naive | Inf |
########################################################################################
# Example 7
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -5000, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x | |
---|---|
Maximum | -4990.277 |
Centering | -4988.933 |
Recursive | -4988.933 |
logSumExp | -4988.933 |
Naive | -Inf |
########################################################################################
# Example 8
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 5000, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x | |
---|---|
Maximum | 5009.723 |
Centering | 5011.067 |
Recursive | 5011.067 |
logSumExp | 5011.067 |
Naive | Inf |
########################################################################################
# Example 9
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = -500, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x | |
---|---|
Maximum | -490.2769 |
Centering | -488.9334 |
Recursive | -488.9334 |
logSumExp | -488.9334 |
Naive | -488.9334 |
########################################################################################
# Example 10
########################################################################################
set.seed(123)
x <- rnorm(n = 1000, mean = 500, sd = 3)
comp <- c(LSE_M(x), LSE_C(x), LSE_R(x), logSumExp(x), LSE_N(x))
names(comp) <- c("Maximum", "Centering", "Recursive", "logSumExp", "Naive")
kable(comp)
x | |
---|---|
Maximum | 509.7231 |
Centering | 511.0666 |
Recursive | 511.0666 |
logSumExp | 511.0666 |
Naive | 511.0666 |