-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
-
I have checked that this issue has not already been reported
- The issue could potentially be similar to that reported in BUG: Fails and or weird aggregation results when using agg with custom functions #33517, but I'm not sure.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd import numpy as np import scipy.stats def circ_mean(data, dummy_kwarg=0): # print(data) return 180/np.pi*scipy.stats.circmean(data*np.pi/180) def numpy_mean(data, dummy_kwarg=0): return np.mean(data) @pd.api.extensions.register_dataframe_accessor("my") class CircstatsAccessor(object): def __init__(self, pandas_obj): self._obj = pandas_obj def circ_mean(self, axis=0, level=None, **kwargs): df = self._obj if axis != 0 or level is not None: df = df.groupby(axis=axis, level=level) return df.agg(circ_mean, **kwargs) def numpy_mean(self, axis=0, level=None, **kwargs): df = self._obj if axis != 0 or level is not None: df = df.groupby(axis=axis, level=level) return df.agg(numpy_mean, **kwargs) df = pd.DataFrame( data={ "col1": [10, 11, 12, 13], "col2": [20, 21, 22, 23], }, index=[1, 2, 3, 4] ) # Compute results with the standard `df.mean` call # I'd like my custom mean function to do a similar thing df.mean(level=0, axis=0) # If I don't pass in any kwargs, `df.my.circ_mean` behaves as expected # Results approximately match those from `df.mean` df.my.circ_mean(level=0, axis=0) # If I pass in a kwarg that is not ever used, `df.my.circ_mean` # returns unusual results - the returned values in `col1` are # identical to those in `col2`, whereas they were # different before df.my.circ_mean(level=0, axis=0, dummy_kwarg=0) # If I call `df.my.numpy_mean`, results are identical # without or without providing the kwarg df.my.numpy_mean(level=0, axis=0) df.my.numpy_mean(level=0, axis=0, dummy_kwarg=0)Problem description
As discussed in the code comments above, I see a difference in behavior in my circ_mean function depending on whether a dummy (un-used) keyword argument is specified. Uncommenting the print command in the circ_mean function indicates that df.agg is passing in different things depending on whether or not this keyword is provided.
I would expect there to be no difference in behavior since this keyword has no effect. Interestingly, I see the expected no difference in behavior if I replace the more complicated circular mean call with a simple np.mean call inside my custom function (compare circ_mean and numpy_mean functions).
Expected Output
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 3e89b4c
python : 3.7.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None
pandas : 1.2.0
numpy : 1.19.2
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.0.0.post20201207
Cython : 0.29.21
pytest : 6.2.1
hypothesis : None
sphinx : 3.4.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : 0.10.1
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 0.15.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.2
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2