Skip to content

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna=True #35612

@arw2019

Description

@arw2019
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.

xref #35014

Creating a separate issue as the dropna=True requires a different fix to dropna=False (resolved by #35078)

Problem description

The setup is:

In [1]: import pandas as pd In [2]: df = pd.DataFrame({"A": [0, 0, 1, None], "B": [1, 2, 3, None]}) In [3]: gb = df.groupby("A", dropna=True) 

All three of these commands:

In [4]: gb['B'].transform(len) In [5]: gb[['B']].transform(len) In [6]: gb.transform(len) 

generate a variant of this error

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-4-3bae7d67a46f> in <module> ----> 1 gb['B'].transform(len) /workspaces/pandas-arw2019/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs) 487 488 if not isinstance(func, str): --> 489 return self._transform_general( 490 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs 491 ) /workspaces/pandas-arw2019/pandas/core/groupby/generic.py in _transform_general(self, func, engine, engine_kwargs, *args, **kwargs) 556 557 result.name = self._selected_obj.name --> 558 result.index = self._selected_obj.index 559 return result 560 /workspaces/pandas-arw2019/pandas/core/generic.py in __setattr__(self, name, value) 5167 try: 5168 object.__getattribute__(self, name) -> 5169 return object.__setattr__(self, name, value) 5170 except AttributeError: 5171 pass /workspaces/pandas-arw2019/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__() 64 65 def __set__(self, obj, value): ---> 66 obj._set_axis(self.axis, value) /workspaces/pandas-arw2019/pandas/core/series.py in _set_axis(self, axis, labels, fastpath) 422 if not fastpath: 423 # The ensure_index call above ensures we have an Index object --> 424 self._mgr.set_axis(axis, labels) 425 426 # ndarray compatibility /workspaces/pandas-arw2019/pandas/core/internals/managers.py in set_axis(self, axis, new_labels) 214 215 if new_len != old_len: --> 216 raise ValueError( 217 f"Length mismatch: Expected axis has {old_len} elements, new " 218 f"values have {new_len} elements" ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements

Expected Output

All three should return:

Out[9]: B 0 2 1 2 2 1

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 9843926
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+54.g9843926e3
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions