5

How can I perform comparisons between DataFrames and Series? I'd like to mask elements in a DataFrame/Series that are greater/less than elements in another DataFrame/Series.

For instance, the following doesn't replace elements greater than the mean with nans although I was expecting it to:

>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]}) >>> x[x > x.mean(axis=1)] = np.nan >>> x a b 0 1 3 1 2 4 

If we look at the boolean array created by the comparison, it is really weird:

>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]}) >>> x > x.mean(axis=1) a b 0 1 0 False False False False 1 False False False False 

I don't understand by what logic the resulting boolean array is like that. I'm able to work around this problem by using transpose:

>>> (x.T > x.mean(axis=1).T).T a b 0 False True 1 False True 

But I believe there is some "correct" way of doing this that I'm not aware of. And at least I'd like to understand what is going on.

1 Answer 1

2

The problem here is that it's interpreting the index as column values to perform the comparison, if you use .gt and pass axis=0 then you get the result you desire:

In [203]: x.gt(x.mean(axis=1), axis=0) Out[203]: a b 0 False True 1 False True 

You can see what I mean when you perform the comparison with the np array:

In [205]: x > x.mean(axis=1).values Out[205]: a b 0 False False 1 False True 

here you can see that the default axis for comparison is on the column, resulting in a different result

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.