Greater/less than comparisons between Pandas DataFrames/Series

Question

How can I perform comparisons between DataFrames and Series? I'd like to mask elements in a DataFrame/Series that are greater/less than elements in another DataFrame/Series.

For instance, the following doesn't replace elements greater than the mean with nans although I was expecting it to:

>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]}) >>> x[x > x.mean(axis=1)] = np.nan >>> x a b 0 1 3 1 2 4

If we look at the boolean array created by the comparison, it is really weird:

>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]}) >>> x > x.mean(axis=1) a b 0 1 0 False False False False 1 False False False False

I don't understand by what logic the resulting boolean array is like that. I'm able to work around this problem by using transpose:

>>> (x.T > x.mean(axis=1).T).T a b 0 False True 1 False True

But I believe there is some "correct" way of doing this that I'm not aware of. And at least I'd like to understand what is going on.

EdChum · Accepted Answer · 2015-11-05 10:53:23Z

The problem here is that it's interpreting the index as column values to perform the comparison, if you use .gt and pass axis=0 then you get the result you desire:

In [203]: x.gt(x.mean(axis=1), axis=0) Out[203]: a b 0 False True 1 False True

You can see what I mean when you perform the comparison with the np array:

In [205]: x > x.mean(axis=1).values Out[205]: a b 0 False False 1 False True

here you can see that the default axis for comparison is on the column, resulting in a different result

Collectives™ on Stack Overflow

Greater/less than comparisons between Pandas DataFrames/Series

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related