-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
This will allow pandas objects to be collected on the first generation of the gc rather than wait for it to break cycles. Practically I am not sure this will have much of a user change.
dask/distributed#956 (comment)
The idea is to change this code here
from
class _NDFrameIndexer(object): _valid_types = None _exception = KeyError axis = None def __init__(self, obj, name): self.obj = obj self.ndim = obj.ndim self.name = name to
class _NDFrameIndexer(object): _valid_types = None _exception = KeyError axis = None def __init__(self, obj, name): self.obj = weakref.ref(obj) self.ndim = obj.ndim self.name = name and corresponding self.obj to self.obj()
it 'works' in that gc collection happens immedately upon object deletion (IOW del df). but a few fails on caching / chaining. In particular tests like: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/indexing/test_chaining_and_caching.py#L31 I think were relying upon the reference NOT being collected (so that they can check it).
So this would require some internal reworking to remove / fix this. I suspect we will still achieve the same user effects (meaning of detection of chaining etc).