how to use pandas groupby with None and NaN treated as separate values

Question

Is it possible for pandas groupby to treat Nones and NaNs as separate entities?

Here is an example:

df = pd.DataFrame([ [np.nan, 5], [None, 10], ['a', 7], [np.nan, 5], [None, 10] ]) Out: 0 1 0 NaN 5 1 None 10 2 a 7 3 NaN 5 4 None 10 df.groupby(0, dropna=False).mean() Out: 1 0 a 7.0 NaN 7.5

However, I want to achieve the following result:

 1 0 a 7.0 NaN 5.0 None 10.0

EDIT: an 'ideal' solution to this problem should:

be generalisable to grouping with multiple columns
does not (potentially) conflate other items. E.g. converting everything to strings would mean that None and 'None' become conflated (or '7' and 7, or...)

Alternatively, explaining why the task cannot be done 'tidily' would also appreciated, so one can instead think about 'hacky' solutions.

jezrael · Accepted Answer · 2022-01-26 10:38:00Z

3

Not very nice, but possible converting to strings:

print (df.groupby(df[0].astype(str)).mean()) 1 0 None 10 a 7 nan 5

edited Jan 26, 2022 at 10:38

answered Jan 26, 2022 at 10:32

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Lovkush Over a year ago

Thanks! As you already said, this is not the nicest solution, but the simple fact you are having to resort to such solutions is already useful information for me.

Collectives™ on Stack Overflow

how to use pandas groupby with None and NaN treated as separate values

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related