-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
Description
In a groupby/transform when some of the groups are missing, should the transformed values be set to missing (my preference), left unchanged, or should this be an error? Currently the behavior is inconsistent between Series and Frames, and between cythonized and non-cythonized transformations.
For a Series with a non-cythonized transformation, the values are left unchanged:
>>> import pandas as pd >>> import numpy as np >>> s = pd.Series([100, 200, 300, 400]) >>> s.groupby([1, 1, np.nan, np.nan]).transform(pd.Series.mean) 0 200 1 200 2 300 3 400 For a Series with cythonized functions, its an error (this changed between 0.14.1 and 0.15.0):
>>> s.groupby([1, 1, np.nan, np.nan]).transform(np.mean) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pandas/pandas/core/groupby.py", line 2425, in transform return self._transform_fast(cyfunc) File "pandas/pandas/core/groupby.py", line 2466, in _transform_fast return self._set_result_index_ordered(Series(values)) File "pandas/pandas/core/groupby.py", line 494, in _set_result_index_ordered result.index = self.obj.index File "pandas/pandas/core/generic.py", line 1948, in __setattr__ object.__setattr__(self, name, value) File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:41020) File "pandas/pandas/core/series.py", line 262, in _set_axis self._data.set_axis(axis, labels) File "pandas/pandas/core/internals.py", line 2217, in set_axis 'new values have %d elements' % (old_len, new_len)) ValueError: Length mismatch: Expected axis has 2 elements, new values have 4 elements For DataFrames, the results are opposite:
>>> f = pd.DataFrame({'a': s, 'b': s * 2}) >>> f a b 0 100 200 1 200 400 2 300 600 3 400 800 >>> f.groupby([1, 1, np.nan, np.nan]).transform(np.sum) a b 0 300 600 1 300 600 2 300 600 3 400 800 >>> f.groupby([1, 1, np.nan, np.nan]).transform(pd.DataFrame.sum) Traceback (most recent call last): File "pandas/pandas/core/groupby.py", line 3002, in transform return self._transform_general(func, *args, **kwargs) File "pandas/pandas/core/groupby.py", line 2968, in _transform_general return self._set_result_index_ordered(concatenated) File "pandas/pandas/core/groupby.py", line 494, in _set_result_index_ordered result.index = self.obj.index File "pandas/pandas/core/generic.py", line 1948, in __setattr__ object.__setattr__(self, name, value) File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:41020) File "pandas/pandas/core/generic.py", line 406, in _set_axis self._data.set_axis(axis, labels) File "pandas/pandas/core/internals.py", line 2217, in set_axis 'new values have %d elements' % (old_len, new_len)) ValueError: Length mismatch: Expected axis has 2 elements, new values have 4 elements >>> print(pd.__version__) 0.15.1-125-ge463818