- Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
There is an issue in events_filter.Chunks.__iter__ (copied below)
faery/python/faery/events_filter.py
Lines 169 to 191 in e701fd0
| def __iter__(self) -> collections.abc.Iterator[numpy.ndarray]: | |
| events_buffers: list[numpy.ndarray] = [] | |
| current_length = 0 | |
| for events in self.parent: | |
| events_length = len(events) | |
| while events_length > 0: | |
| if current_length + events_length < self.chunk_length: | |
| events_buffers.append(events) | |
| current_length += events_length | |
| break | |
| pivot = self.chunk_length - current_length | |
| if len(events_buffers) == 0: | |
| yield events[:pivot] | |
| else: | |
| events_buffers.append(events[:pivot]) | |
| yield numpy.concatenate( | |
| events_buffers, dtype=events_stream.EVENTS_DTYPE | |
| ) | |
| events_buffers = [] | |
| events = events[pivot:] | |
| if len(events_buffers) > 0: | |
| yield numpy.concatenate(events_buffers, dtype=events_stream.EVENTS_DTYPE) | |
| events_buffers = [] |
I was using it for its intended purpose of grabbing data in batches, but it seems to get stuck in an infinite loop after the first chunk and returns an empty batch (basically yield events[:pivot] runs infinitely).
I think the issue is that, while events = events[:pivot] is overridden, the events_length variable that the while loop is checking is not updated to match.
Changing:
while events_length > 0:to:
while (events_length := len(events) > 0:Appears to give the expected behavior. However, it seems like itertools might be faster? I'm getting a ~10s improvement iterating over 800k events using the below snippet instead of calling .chunks (25 vs 36 seconds)
evts = itertools.batched( itertools.chain.from_iterable(faery.events_stream_from_file(path)), 1000 )Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels