Handling zero multiplied with NaN

Question

I am trying to estimate the entropy of Random Variables (RVs), which involves a calculation of step: p_X * log(p_X). For example,

import numpy as np X = np.random.rand(100) binX = np.histogram(X, 10)[0] #create histogram with 10 bins p_X = binX / np.sum(binX) ent_X = -1 * np.sum(p_X * np.log(p_X))

Sometimes p_X shall be zero which mathematically make the whole term as zero. But python makes p_X * np.log(p_X) as NaN and makes the whole summation as NaN. Is there any way to manage (without any explicit checking for NaN) making p_X * np.log(p_X) to give zero whenever p_X is zero? Any insight and correction is appreciated and Thanks in advance:)

..give zero whenever p_X is zero... A simple if condition? — B001ᛦ
– B001ᛦ, Commented Jun 19, 2019 at 10:21

Paul Panzer · Accepted Answer · 2019-06-19 10:33:54Z

6

If you have scipy, use scipy.special.xlogy(p_X,p_X). Not only does it solve your problem, as an added benefit it is also a bit faster than p_X*np.log(p_X).

answered Jun 19, 2019 at 10:33

Paul Panzer

53.3k3 gold badges60 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

SJL Over a year ago

It's worth noting that xlogy(0, float("nan")) returns nan, not 0. So for the case where you are doing xlogy(x, y) and want the result to be 0 whenever x == 0 and y <= 0, then this solution works. But if y could be a nan value, then you'll still have the possibility of returning a nan value.

Paul Panzer Over a year ago

@SJL which is the correct behavior: suppress bogus NaNs created by naive evaluation of 0 log 0 but do not hide genuine NaNs. Besides, OP has x==y, so this cannot happen.

SJL Over a year ago

I wasn't implying that the behavior was incorrect, just that some people might incorrectly assume that xlogy(0, nan) returned 0.

Paul Panzer Over a year ago

@SJL Nor was I implying you were implying ;-) Btw. OP's headline is misleading as log 0 doesn't return NaN but -infty, it is the subsequent multiplication that makes it NaN.

Dan · Accepted Answer · 2019-06-19 10:23:03Z

4

In your case you can use nansum since adding 0 in sum is the same thing as ignoring a NaN:

ent_X = -1 * np.nansum(p_X * np.log(p_X))

answered Jun 19, 2019 at 10:23

Dan

45.8k20 gold badges98 silver badges169 bronze badges

2 Comments

Leporello Over a year ago

The problem with that solution is that it will silently eat NaNs produced by other operations than p_X being zero. Presumably, the OP would prefer NaNs to be kept if one of the p_X is <0, for instance (judging by without any explicit checking for NaN).

Dan Over a year ago

@Leporello that's true

yatu · Accepted Answer · 2019-06-19 10:28:15Z

You can use a np.ma.log, which will mask 0s and use the filled method to fill the masked array with 0:

np.ma.log(p_X).filled(0)

For instance:

np.ma.log(range(5)).filled(0) # array([0. , 0. , 0.69314718, 1.09861229, 1.38629436])

X = np.random.rand(100) binX = np.histogram(X, 10)[0] #create histogram with 10 bins p_X = binX / np.sum(binX) ent_X = -1 * np.sum(p_X * np.ma.log(p_X).filled(0))

Collectives™ on Stack Overflow

Handling zero multiplied with NaN

3 Answers 3

4 Comments

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

2 Comments

Comments

Related