Reclassing a Python DataFrame by setting class

Question

I'm trying to create a sub class of DataFrame, that extends it with few properties and methods. In addition to the default constructor there are few others like the one below that initialized the DataFrame from SQL table and then add few attributes (I simplified it and left a dummy just to demonstrate the problem). So once I get the initial df, I "convert" it to my class by df.__class__ = Cls statement. It seems somewhat weired to me, but reading few posts on this issue (e.g. Reclassing an instance in Python) it's a valid way to go, and seems to work most of the time. But the problem is when I use a method of the parent class (in that case DataFrame.append) that returns a new instance of the object: sdf2 = sdf1.append(item) - the resulting class of sdf2 is the DataFrame and not SubDataFrame, and consequently print('sdf2: ', sdf2.name) fails because 'DataFrame' has no attribute 'name'... the bottom line, trying naively to use a standard DataFrame method, my object was corrupted... I can solve it by writing the (virtual) 'append' method in my subclass, but in that case I would need to do it for many methods and if I cannot use the inherited methods no sense in subclassing at all (I can just define the DataFrame as a member variable of my class). I guess there should be the best practice for this sub-classing, just don't know it. Any help is very appreciated. Thanks!

Adi

import pandas as pd import pandas.io.sql as pdsql class SubDataFrame(pd.DataFrame): @classmethod def create(Cls): # df = pdsql.read_sql(db_query, db_connection) d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]} df = pd.DataFrame(d, index=['a', 'b', 'c', 'd']) df.__class__ = Cls df.name = 'Test Obj' return df if __name__ == "__main__": sdf1 = SubDataFrame.create() print('sdf1: ', sdf1.__class__) # prints sdf1: <class '__main__.SubDataFrame'>" print('sdf1: ', sdf1.name) # prints "sdf1: Test Obj" item = sdf1.iloc[0].copy() sdf2 = sdf1.append(item) print('sdf2: ', sdf2.__class__) # prints: "sdf2: <class 'pandas.core.frame.DataFrame'>" print('sdf2: ', sdf2.name) # exception: "AttributeError: 'DataFrame' object has no attribute 'name'" pass

Try to test using super() as suggested by @BrenB. I read the reference (regarding unbound superclass classmethod) but still can't make it work... these are my tests:

import pandas as pd import pandas.io.sql as pdsql class SubDataFrame(pd.DataFrame): @classmethod def create_reset_class(Cls): d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]} df = pd.DataFrame(d, index=['a', 'b', 'c', 'd']) df.__class__ = Cls df.name = 'Test Obj' return df @classmethod def create_using_super(Cls): d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]} df = super(SubDataFrame, Cls).__init__(d, index=['a', 'b', 'c', 'd']) df.name = 'Test Obj' return df def __init__(self): d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]} df = super(SubDataFrame, self).__init__(d, index=['a', 'b', 'c', 'd']) df.name = 'Test Obj' return df if __name__ == "__main__": sdf3 = SubDataFrame.create_using_super() sdf4 = SubDataFrame() sdf1 = SubDataFrame.create_reset_class() print('sdf1: ', sdf1.__class__) print('sdf1: ', sdf1.name) item = sdf1.iloc[0].copy() sdf2 = sdf1.append(item) print('sdf2: ', sdf2.__class__) print('sdf2: ', sdf2.name) pass

Note that for my SubDataFrame I have a default __init__ constructor, create() is my (non-default) constructor which is a classmethod, while inside it I call pandas.DataFrame() which is the standard bound constructor, expecting self and not Cls. So I tried 2 options:

a. df = super(SubDataFrame, Cls).__init__(d, index=['a', 'b', 'c', 'd']) generates an error AttributeError in File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 208: 'dict' object has no attribute '_init_dict'

b. Using a standard bound constructor __init__ doesn't generate any error but df returns as None (from df = super(SubDataFrame, self).__init__(d, index=['a', 'b', 'c', 'd'])

Do I use super() incorrectly? is it a pandas bug? any other idea? thanks!

Why are you setting __class__ that way at all? You should be able to just use super to do whatever initialization you need to do. — BrenBarn
– BrenBarn, Commented Jan 29, 2015 at 21:14
@BrenBarn -- IIRC, pandas dataframes are subclasses of numpy.ndarray and the latter is known to be hard to subclass properly. I'd imagine those difficulties would carry over into dataframes as well. I think that a (potentially) easier route here is to create a new type by composition rather than traditional inheritance. — mgilson
– mgilson, Commented Jan 29, 2015 at 21:26
@mgilson: I believe DataFrame is no longer a subclass of ndarray in recent pandas versions. There were some changes a few versions ago that made subclassing DataFrame smoother (see this issue). — BrenBarn
– BrenBarn, Commented Jan 30, 2015 at 3:17
@BrenBarn: if I understand correctly super() requires 'self' and is relevant for instance methods (e.g. init) but not for class methods like this one, isn't it? In any case I tried various options for using super: — Adi E
– Adi E, Commented Jan 31, 2015 at 9:55
@BrenBarn: Sorry, my previous comment was truncated. If I understand correctly, super() requires 'self' and is relevant for instance methods (e.g. init) but not for class methods like this one, isn't it? In any case I tried various options for using super: df = super(SubDataFrame).__init__(d, index=['a', 'b', 'c', 'd']) or df = super().__init__(d, index=['a', 'b', 'c', 'd']) and few others and all ended with exceptions (I'm using Python 3.4 and Pandas 0.15.2). Probably I miss something, can you suggest the correct way for doing it with super()? thanks! — Adi E
– Adi E, Commented Jan 31, 2015 at 10:04

Collectives™ on Stack Overflow

Reclassing a Python DataFrame by setting class

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked