BUG: groupby with dropna=False drops nulls from categorical groupers #49652

rhshadrach · 2022-11-11T18:14:37Z

closes BUG: Get wrong result when groupby category column with dropna=False #36327 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

The function nunique here is faster than using pd.unique, but only works on integers:

arr = np.random.randint(0, 300, size) %timeit len(unique(arr)) %timeit nunique_ints(arr) size: 10 8.94 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 2.94 µs ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) size: 100 9.82 µs ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 3.01 µs ± 5.16 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) size: 1000 11 µs ± 76.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 3.74 µs ± 9.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) size: 10000 27.9 µs ± 66.8 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 10.4 µs ± 9.43 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) size: 100000 187 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 78.8 µs ± 408 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

…ouper

…pby_dropna_filtering

…pby_dropna_filtering � Conflicts: � doc/source/whatsnew/v2.0.0.rst

…ach/pandas into groupby_dropna_filtering � Conflicts: � pandas/core/algorithms.py

pandas/core/algorithms.py

pandas/core/groupby/grouper.py

…pby_dropna_filtering

…ach/pandas into groupby_dropna_filtering

pandas/core/groupby/grouper.py

…pby_dropna_filtering

…ach/pandas into groupby_dropna_filtering

mroeschke

Merge conflict otherwise LGTM

…pby_dropna_filtering � Conflicts: � pandas/core/groupby/grouper.py

…ach/pandas into groupby_dropna_filtering

mroeschke · 2022-12-02T18:49:50Z

Thanks @rhshadrach

BUG: groupby(..., dropna=False) drops null values with categorical gr…

6ae5ef9

…ouper

rhshadrach added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Categorical Categorical Data Type labels Nov 11, 2022

rhshadrach and others added 11 commits November 12, 2022 07:56

Use intp

20e17ab

Merge branch 'main' of https://github.com/pandas-dev/pandas into grou…

29ee263

…pby_dropna_filtering

Fixups

8254619

Use intp

f679fd7

Merge branch 'main' into groupby_dropna_filtering

9a3fb30

int64

34760ca

dtype fix

93f306c

Merge branch 'main' of https://github.com/pandas-dev/pandas into grou…

d6100b4

…pby_dropna_filtering � Conflicts: � doc/source/whatsnew/v2.0.0.rst

Breakup op to debug on CI

f3a3ebb

Trying with intp

4bfeaa1

Merge branch 'groupby_dropna_filtering' of https://github.com/rhshadr…

af9d90c

…ach/pandas into groupby_dropna_filtering � Conflicts: � pandas/core/algorithms.py

rhshadrach commented Nov 20, 2022

View reviewed changes

pandas/core/algorithms.py Show resolved Hide resolved

Merge branch 'main' into groupby_dropna_filtering

45f3947

mroeschke reviewed Nov 29, 2022

View reviewed changes

pandas/core/groupby/grouper.py Outdated Show resolved Hide resolved

rhshadrach added 4 commits November 28, 2022 21:01

Restore cache decorator

4d72402

Merge branch 'main' of https://github.com/pandas-dev/pandas into grou…

6f2f51d

…pby_dropna_filtering

Merge branch 'groupby_dropna_filtering' of https://github.com/rhshadr…

9ddc2d0

…ach/pandas into groupby_dropna_filtering

Add bincount comment

1e3bff3

mroeschke reviewed Nov 29, 2022

View reviewed changes

pandas/core/groupby/grouper.py Outdated Show resolved Hide resolved

rhshadrach added 3 commits November 29, 2022 17:57

Merge branch 'main' of https://github.com/pandas-dev/pandas into grou…

5a42eeb

…pby_dropna_filtering

Rework recoding logic

c8ba7ad

Merge branch 'groupby_dropna_filtering' of https://github.com/rhshadr…

a548506

…ach/pandas into groupby_dropna_filtering

mroeschke reviewed Dec 1, 2022

View reviewed changes

rhshadrach added 2 commits December 1, 2022 23:16

Merge branch 'main' of https://github.com/pandas-dev/pandas into grou…

9bec396

…pby_dropna_filtering � Conflicts: � pandas/core/groupby/grouper.py

Merge branch 'groupby_dropna_filtering' of https://github.com/rhshadr…

867e97a

…ach/pandas into groupby_dropna_filtering

mroeschke added this to the 2.0 milestone Dec 2, 2022

mroeschke approved these changes Dec 2, 2022

View reviewed changes

mroeschke merged commit 60ed993 into pandas-dev:main Dec 2, 2022

rhshadrach deleted the groupby_dropna_filtering branch December 2, 2022 20:09

CBMollerup mentioned this pull request Dec 5, 2022

BUG: Groupby drops NA even though dropna=False with categorical dtype #48492

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: groupby with dropna=False drops nulls from categorical groupers #49652

BUG: groupby with dropna=False drops nulls from categorical groupers #49652

Uh oh!

rhshadrach commented Nov 11, 2022

Uh oh!

Uh oh!

Uh oh!

mroeschke left a comment

mroeschke commented Dec 2, 2022

Labels

2 participants

Uh oh!

BUG: groupby with dropna=False drops nulls from categorical groupers #49652

BUG: groupby with dropna=False drops nulls from categorical groupers #49652

Uh oh!

Conversation

rhshadrach commented Nov 11, 2022

Uh oh!

Uh oh!

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

mroeschke commented Dec 2, 2022

Labels

2 participants