Skip to content

Conversation

@rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Nov 19, 2025

I'm seeing a minor slowdown due to not assuming a C-contiguous mask. Alternative could be to prevent the mask from ever not being C-contiguous. Will be posting ASVs in a day or so.

size = 10_000_000 df = pd.DataFrame( { 'a': np.random.randint(0, 100, size), 'b': np.random.randint(0, 100, size), 'c': np.random.randint(0, 100, size), 'd': np.random.randint(0, 100, size), }, dtype='Int64', ) gb = df.groupby('a') gb.cummax() # populate caches %timeit gb.cummax() # 155 ms ± 52.4 μs per loop (mean ± std. dev. of 7 runs, 10 loops each) <--- main # 160 ms ± 1.89 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) <--- PR

ASVs are clean.

asv compare -f 1.05 --only-changed 49f4a94a 19df340e | Change | Before [49f4a94a] <main> | After [19df340e] <bug_gb_c_contiguous> | Ratio | Benchmark (Parameter) | |----------|----------------------------|------------------------------------------|---------|-------------------------------------------------------------------| | - | 24.8±0.1ms | 22.4±0.2ms | 0.91 | groupby.GroupByCythonAggEaDtypes.time_frame_agg('Float64', 'min') | 
@rhshadrach rhshadrach marked this pull request as ready for review November 20, 2025 12:10
@jbrockmendel jbrockmendel merged commit 7bf6660 into pandas-dev:main Nov 20, 2025
39 of 41 checks passed
@jbrockmendel
Copy link
Member

thanks @rhshadrach

@rhshadrach rhshadrach deleted the bug_gb_c_contiguous branch November 20, 2025 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants