-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Similar issues: #10923, #9697, #9941
Please, consider the following data:
import numpy import pandas df = pandas.DataFrame({'A':numpy.random.rand(20), 'B':numpy.random.rand(20)*10, 'C':numpy.random.randint(0,5,20)}) df.loc[:4,'C']=NoneNow, there are two code lines below that do the same think: to output the average of groups as the new rows values. The first one uses a string function name, the second one, a lambda function. The first one works, the second, doesn't.
In [41]: df.groupby('C')['B'].transform('mean') Out[41]: 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN 5 5.670891 6 5.335332 7 0.580197 8 5.670891 9 5.670891 10 1.628290 11 1.628290 12 5.670891 13 8.493416 14 5.670891 15 8.493416 16 5.335332 17 5.670891 18 5.670891 19 5.335332 Name: B, dtype: float64In [42]: df.groupby('C')['B'].transform(lambda x:x.mean()) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-42-87c87c7a22f4> in <module>() ----> 1 df.groupby('C')['B'].transform(lambda x:x.mean()) ~/.conda/envs/myroot/lib/python3.6/site-packages/pandas/core/groupby.py in transform(self, func, *args, **kwargs) 3061 3062 result.name = self._selected_obj.name -> 3063 result.index = self._selected_obj.index 3064 return result 3065 ~/.conda/envs/myroot/lib/python3.6/site-packages/pandas/core/generic.py in __setattr__(self, name, value) 3092 try: 3093 object.__getattribute__(self, name) -> 3094 return object.__setattr__(self, name, value) 3095 except AttributeError: 3096 pass pandas/_libs/src/properties.pyx in pandas._libs.lib.AxisProperty.__set__ (pandas/_libs/lib.c:45255)() ~/.conda/envs/myroot/lib/python3.6/site-packages/pandas/core/series.py in _set_axis(self, axis, labels, fastpath) 306 object.__setattr__(self, '_index', labels) 307 if not fastpath: --> 308 self._data.set_axis(axis, labels) 309 310 def _set_subtyp(self, is_all_dates): ~/.conda/envs/myroot/lib/python3.6/site-packages/pandas/core/internals.py in set_axis(self, axis, new_labels) 2834 raise ValueError('Length mismatch: Expected axis has %d elements, ' 2835 'new values have %d elements' % -> 2836 (old_len, new_len)) 2837 2838 self.axes[axis] = new_labels ValueError: Length mismatch: Expected axis has 15 elements, new values have 20 elementsThe first one, using 'mean', is what I was expecting. By all means, it looks strange to me that we have two different behaviours for the same operation.
Note: The second one, with lambda function, used to work on (pandas) version 0.19.1
I first posted this question to SO: https://stackoverflow.com/questions/45333681/handling-na-in-groupby-transform . After some discussion there I started to think that a bug is around.
Thanks
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None