Skip to content

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna #35014

@TomAugspurger

Description

@TomAugspurger

Code Sample, a copy-pastable example

In [1]: import pandas as pd In [2]: df = pd.DataFrame({"A": [0, 0, 1, None], "B": [1, 2, 3, None]}) In [3]: gb = df.groupby("A", dropna=False) In [6]: gb['B'].transform(len) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-6-3bae7d67a46f> in <module> ----> 1 gb['B'].transform(len) ~/sandbox/pandas/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs) 471 if not isinstance(func, str): 472 return self._transform_general( --> 473 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs 474 ) 475 ~/sandbox/pandas/pandas/core/groupby/generic.py in _transform_general(self, func, engine, engine_kwargs, *args, **kwargs) 537 538 result.name = self._selected_obj.name --> 539 result.index = self._selected_obj.index 540 return result 541 ~/sandbox/pandas/pandas/core/generic.py in __setattr__(self, name, value) 5141 try: 5142 object.__getattribute__(self, name) -> 5143 return object.__setattr__(self, name, value) 5144 except AttributeError: 5145 pass ~/sandbox/pandas/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__() 64 65 def __set__(self, obj, value): ---> 66 obj._set_axis(self.axis, value) ~/sandbox/pandas/pandas/core/series.py in _set_axis(self, axis, labels, fastpath) 422 if not fastpath: 423 # The ensure_index call above ensures we have an Index object --> 424 self._mgr.set_axis(axis, labels) 425 426 # ndarray compatibility ~/sandbox/pandas/pandas/core/internals/managers.py in set_axis(self, axis, new_labels) 213 if new_len != old_len: 214 raise ValueError( --> 215 f"Length mismatch: Expected axis has {old_len} elements, new " 216 f"values have {new_len} elements" 217 ) ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements

Problem description

Compare that with the following

In [4]: gb.transform(len) Out[4]: B 0 2 1 2 2 1 3 1 In [5]: gb[['B']].transform(len) Out[5]: B 0 2 1 2 2 1 3 1

So it's just when slicing down to a SeriesGroupBy object.

Expected Output

A series:

Out[5]: 0 2 1 2 2 1 3 1

Metadata

Metadata

Assignees

Labels

BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions