How can I perform comparisons between DataFrames and Series? I'd like to mask elements in a DataFrame/Series that are greater/less than elements in another DataFrame/Series.
For instance, the following doesn't replace elements greater than the mean with nans although I was expecting it to:
>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]}) >>> x[x > x.mean(axis=1)] = np.nan >>> x a b 0 1 3 1 2 4 If we look at the boolean array created by the comparison, it is really weird:
>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]}) >>> x > x.mean(axis=1) a b 0 1 0 False False False False 1 False False False False I don't understand by what logic the resulting boolean array is like that. I'm able to work around this problem by using transpose:
>>> (x.T > x.mean(axis=1).T).T a b 0 False True 1 False True But I believe there is some "correct" way of doing this that I'm not aware of. And at least I'd like to understand what is going on.