Skip to content

Conversation

@rhshadrach
Copy link
Member

The function nunique here is faster than using pd.unique, but only works on integers:

arr = np.random.randint(0, 300, size) %timeit len(unique(arr)) %timeit nunique_ints(arr) size: 10 8.94 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 2.94 µs ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) size: 100 9.82 µs ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 3.01 µs ± 5.16 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) size: 1000 11 µs ± 76.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 3.74 µs ± 9.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) size: 10000 27.9 µs ± 66.8 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 10.4 µs ± 9.43 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) size: 100000 187 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 78.8 µs ± 408 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 
@rhshadrach rhshadrach added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Categorical Categorical Data Type labels Nov 11, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge conflict otherwise LGTM

…pby_dropna_filtering � Conflicts: �	pandas/core/groupby/grouper.py
@mroeschke mroeschke added this to the 2.0 milestone Dec 2, 2022
@mroeschke mroeschke merged commit 60ed993 into pandas-dev:main Dec 2, 2022
@mroeschke
Copy link
Member

Thanks @rhshadrach

@rhshadrach rhshadrach deleted the groupby_dropna_filtering branch December 2, 2022 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Categorical Categorical Data Type Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

2 participants