### log-sum-exp trick

when I implement models with discrete variables (which actually happens more than one can think), I always end up estimating this value:

\[ V = \log \left( \sum_i e^{b_i} \right) \]

Why ? This usually happens at the denominator of a Bayes formula for example. I try to keep \(log\)-probabilities all the time so that not to have to deal with very small numbers and to do additions instead of multiplications. By the way, I was looking at the time and latency of floating-point instructions in the latest processors (like Intel Core i7 for example), and I realized that still in 2010, additions are faster than multiplications (even with SSEx and the like).

Therefore, use \(log\)

In this expression, \(b_i\) are the log-probabilities and therefore \(e^{b_i}\) are very small or very big yielding to overflow or underflow sometimes. A scaling trick can help using numbers in a better range without loss of accuracy and for a little extra cost as follows:

\[ \begin{array}{rcl} \log \left( \sum_i e^{b_i} \right)&=& \log \left( \sum_i e^{b_i}e^{-B}e^{B} \right)\\ ~ &=& \log \left( \left( \sum_i e^{b_i - B }\right)e^{B} \right)\\ ~ &=& \log \left( \sum_{i} e^{b_i - B} \right) + B \end{array} \]

And that's it. For the value of \(B\), take for instance \(B=\max_i b_i\).

So the extra cost is to find the max value and to make a subtraction.

\[ V = \log \left( \sum_i e^{b_i} \right) \]

Why ? This usually happens at the denominator of a Bayes formula for example. I try to keep \(log\)-probabilities all the time so that not to have to deal with very small numbers and to do additions instead of multiplications. By the way, I was looking at the time and latency of floating-point instructions in the latest processors (like Intel Core i7 for example), and I realized that still in 2010, additions are faster than multiplications (even with SSEx and the like).

Therefore, use \(log\)

In this expression, \(b_i\) are the log-probabilities and therefore \(e^{b_i}\) are very small or very big yielding to overflow or underflow sometimes. A scaling trick can help using numbers in a better range without loss of accuracy and for a little extra cost as follows:

\[ \begin{array}{rcl} \log \left( \sum_i e^{b_i} \right)&=& \log \left( \sum_i e^{b_i}e^{-B}e^{B} \right)\\ ~ &=& \log \left( \left( \sum_i e^{b_i - B }\right)e^{B} \right)\\ ~ &=& \log \left( \sum_{i} e^{b_i - B} \right) + B \end{array} \]

And that's it. For the value of \(B\), take for instance \(B=\max_i b_i\).

So the extra cost is to find the max value and to make a subtraction.

## Comments

let's assume that b_i takes huge values (e.g., 800, 500, 700)

in this case,

how can you avoid computational problems due to machine precission?

Juan

Istvan