-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performance
Milestone
Description
I came across a strange slowness in GroupBy transform() function. I put together a simple function to avoid using apply() because it can be REALLY slow:
def apply_by_group(grouped, f): """ Applies a function to each DataFrame in a DataFrameGroupBy object, concatenates the results and returns the resulting DataFrame. Parameters ---------- grouped: DataFrameGroupBy The grouped DataFrame that contains column(s) to be ranked and, potentially, a column with weights. f: callable Function to apply to each DataFrame. Returns ------- DataFrame that results from applying the function to each DataFrame in the DataFrameGroupBy object and concatenating the results. """ assert isinstance(grouped, DataFrameGroupBy) assert hasattr(f, '__call__') data_frames = [] for key, data_frame in grouped: data_frames.append(f(data_frame)) return pd.concat(data_frames) Now I observe the following for the two equivalent ways of doing the same thing:
%timeit data.groupby(level=field_security_id).transform(lambda x: x.fillna()) 1 loops, best of 3: 24.3 s per loop
%timeit apply_by_group(data.groupby(level=field_security_id), lambda x: x.fillna()) 1 loops, best of 3: 2.72 s per loop
That was unexpected. Am I doing something wrong in using transform()?
Metadata
Metadata
Assignees
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performance