For clarity I will extract an excerpt from my code and use general names. I have a class Foo() that stores a DataFrame to an attribute.
import pandas as pd import pandas.util.testing as pdt class Foo(): def __init__(self, bar): self.bar = bar # dict of dicts self.df = pd.DataFrame(bar) # pandas object def __eq__(self, other): if isinstance(other, self.__class__): return self.__dict__ == other.__dict__ return NotImplemented def __ne__(self, other): result = self.__eq__(other) if result is NotImplemented: return result return not result However, when I try to compare two instances of Foo, I get an excepetion related to the ambiguity of comparing two DataFrames (the comparison should work fine without the 'df' key in Foo.__dict__).
d1 = {'A' : pd.Series([1, 2], index=['a', 'b']), 'B' : pd.Series([1, 2], index=['a', 'b'])} d2 = d1.copy() foo1 = Foo(d1) foo2 = Foo(d2) foo1.bar # dict foo1.df # pandas DataFrame foo1 == foo2 # ValueError [Out] ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Fortunately, pandas has utility functions for asserting whether two DataFrames or Series are true. I'd like to use this function's comparison operation if possible.
pdt.assert_frame_equal(pd.DataFrame(d1), pd.DataFrame(d2)) # no raises There are a few options to resolve the comparison of two Foo instances:
- compare a copy of
__dict__, wherenew_dictlacks the df key - delete the df key from
__dict__(not ideal) - don't compare
__dict__, but only parts of it contained in a tuple - overload the
__eq__to facilitate pandas DataFrame comparisons
The last option seems the most robust in the long-run, but I am not sure of the best approach. In the end, I would like to refactor __eq__ to compare all items from Foo.__dict__, including DataFrames (and Series). Any ideas on how to accomplish this?
__eq__function? You can super the original for other cases.