Tuesday, June 29, 2010

log-sum-exp trick

when I implement models with discrete variables (which actually happens more than one can think), I always end up estimating this value:
\[ V = \log \left( \sum_i e^{b_i} \right) \]
Why ? This usually happens at the denominator of a Bayes formula for example. I try to keep \(log\)-probabilities all the time so that not to have to deal with very small numbers and to do additions instead of multiplications. By the way, I was looking at the time and latency of floating-point instructions in the latest processors (like Intel Core i7 for example), and I realized that still in 2010, additions are faster than multiplications (even with SSEx and the like).

Therefore, use \(log\)

In this expression, \(b_i\) are the log-probabilities and therefore \(e^{b_i}\) are very small or very big yielding to overflow or underflow sometimes. A scaling trick can help using numbers in a better range without loss of accuracy and for a little extra cost as follows:
\[ \begin{array}{rcl} \log \left( \sum_i e^{b_i} \right)&=& \log \left( \sum_i e^{b_i}e^{-B}e^{B} \right)\\ ~ &=& \log \left( \left( \sum_i e^{b_i - B }\right)e^{B} \right)\\ ~ &=& \log \left( \sum_{i} e^{b_i - B} \right) + B \end{array} \]

And that's it. For the value of \(B\), take for instance \(B=\max_i b_i\).
So the extra cost is to find the max value and to make a subtraction.

3 comments:

Anonymous said...

Very interesting, but ...

let's assume that b_i takes huge values (e.g., 800, 500, 700)

in this case,

how can you avoid computational problems due to machine precission?

Juan

Anonymous said...

Do you know who invented the technique? I would like to cite the author.

Istvan

David said...

I'm not sure exactly who invented that. I think the first time I saw it was in Kevin Murphy's note on Naive Bayes. But I guess it's just a common trick that is well known (or maybe Kevin discovered it ?)