-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
-
[X ] I have checked that this issue has not already been reported.
-
[ X] I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Two different dates, one within the range of what pd.Timestamp can handle, the other outside of that range:
import pandas as pd import datetime df = pd.DataFrame({'A': ['X', 'Y'], 'B': [datetime.datetime(2005, 1, 1, 10, 30, 23, 540000), datetime.datetime(3005, 1, 1, 10, 30, 23, 540000)]}) print(df.groupby('A').B.max()) Problem description
pd.Timestamp can't deal with a too big date like the year 3005, so to represent such a date I need to use the datetime.datetime type. Before 1.1.1 (1.1.0?) this hasn't been an issue, but now this code throws an assertion error:
Traceback (most recent call last): File "<ipython-input-38-8b8ec5e4e179>", line 5, in <module> print(df.groupby('A').B.max()) File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\groupby.py", line 1558, in max numeric_only=numeric_only, min_count=min_count, alias="max", npfunc=np.max File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\groupby.py", line 1015, in _agg_general result = self.aggregate(lambda x: npfunc(x, axis=self.axis)) File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\generic.py", line 261, in aggregate func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\groupby.py", line 1083, in _python_agg_general result, counts = self.grouper.agg_series(obj, f) File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\ops.py", line 644, in agg_series return self._aggregate_series_fast(obj, func) File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\ops.py", line 669, in _aggregate_series_fast result, counts = grouper.get_result() File "pandas\_libs\reduction.pyx", line 256, in pandas._libs.reduction.SeriesGrouper.get_result File "pandas\_libs\reduction.pyx", line 74, in pandas._libs.reduction._BaseGrouper._apply_to_group File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\groupby.py", line 1060, in <lambda> f = lambda x: func(x, *args, **kwargs) File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\groupby\groupby.py", line 1015, in <lambda> result = self.aggregate(lambda x: npfunc(x, axis=self.axis)) File "<__array_function__ internals>", line 6, in amax File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\numpy\core\fromnumeric.py", line 2706, in amax keepdims=keepdims, initial=initial, where=where) File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\numpy\core\fromnumeric.py", line 85, in _wrapreduction return reduction(axis=axis, out=out, **passkwargs) File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\generic.py", line 11460, in stat_func func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\series.py", line 4220, in _reduce delegate = self._values File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\series.py", line 572, in _values return self._mgr.internal_values() File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\internals\managers.py", line 1615, in internal_values return self._block.internal_values() File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\internals\blocks.py", line 2019, in internal_values return self.array_values() File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\internals\blocks.py", line 2022, in array_values return self._holder._simple_new(self.values) File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\core\arrays\datetimes.py", line 290, in _simple_new assert values.dtype == "i8" AssertionError From testing with mixing pd.Timestamp and datetime.datetime types I presume pandas is converting applicable dates (first line in the example) to pd.Timestamp while leaving the others as datetime.datetime leading to a mixed-type result column and the assertion error.
Expected Output
Since I'm explicitely operating with datatype datetime.datetime there should be no implicit conversion to pd.Timestamp if it's not assured that all values are within the range that pd.Timestamp allows.
Output of pd.show_versions()
commit : f2ca0a2
python : 3.7.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 1.1.1
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 50.0.0.post20200830
Cython : 0.29.21
pytest : None
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.3
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : 0.8.0
fastparquet : 0.4.1
gcsfs : None
matplotlib : 3.3.1
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : None
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.51.1