badai.xyz

Naive bayes and log sum exp trick

2019-10-04

When it comes to multiplying many very small numbers we’ll face a problem: the answer is always zero. This is because of the phenomenon of floating-point underflow. To avoid this there is a thing called log sum exp trick. So why this is related to machine learning? Many machine learning algorithms multiply small values including Naive Bayes. I’ll be covering in this post my solution to this problem with a simple example. Examples will be in “Python” pseudo-code.

Ok here is the equation which you use to multiply the probabilities:

And in the code you could do it something like this:

probability_positive = 1
probability_negative = 1

for word in words:
    probability_positive *= word.positive_probability
probability_negative *= word.negative_probability

probability_positive *= probability_positive_total
probability_negative *= probability_negative_total

sum = probability_positive + probability_negative

probability_positive /= sum
probability_negative /= sum

print(probability_positive)
print(probability_negative)

And here comes the problem: let’s assume you have at least 100 words, and the probability is between 0 - 1. Depending on the variable type (float, double, long) the answer will underflow and become zero.

To solve this problem we’ll have to use logarithmic identities:

For the full identities list check Wikipedia article: https://en.wikipedia.org/wiki/Logarithm#Logarithmic_identities

Now here is example what code would look using logarithmic identities:

probability_positive = 0
probability_negative = 0

for word in words:
    probability_positive += math.log(word.positive_probability)
probability_negative += math.log(word.negative_probability)
    
probability_positive += math.log(probability_positive_total)
probability_negative += math.log(probability_negative_total)

probability_positive = math.exp(probability_positive)
probability_negative = math.exp(probability_negative)

sum = probability_positive + probability_negative

probability_positive /= sum
probability_negative /= sum

print(probability_positive)
print(probability_negative)

If you try to run your code you might conclude it still doesn’t work. That’s because the “math.exp()” still underflows the value. To fix that the “log sum exp trick” comes in.

There are few versions of it and here is two most common:

where

I do highly recommend to use the last one because the first example can over/underflow.

# Log sum exp function
def logsumexp(x, y):
    sum = 0
max_value = max(x, y)
sum += math.exp(x - max_value)
sum += math.exp(y - max_value)

return math.log(sum) + max_value
    
# Initialise probabilities
probability_positive = 0
probability_negative = 0

# Iterate word probabilities
for word in words:
    probability_positive += math.log(word.positive_probability)
probability_negative += math.log(word.negative_probability)
    
# Add total probabilities
probability_positive += math.log(probability_positive_total)
probability_negative += math.log(probability_negative_total)

# Calculate normalization
normalization = logsumexp(probability_positive, probability_negative)

# Probabilities from logarithm
probability_positive = math.exp(probability_positive - normalization)
probability_negative = math.exp(probability_negative - normalization)

# Sum probabilities
sum = probability_positive + probability_negative

# Calculate probabilities
probability_positive /= sum
probability_negative /= sum

print(probability_positive)
print(probability_negative)

And now finally we have got this thing in working order!