-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample
When I try to join 3 dataframes with multiindices containing dates I got ValueError:
import pandas as pd date = pd.Timestamp(2000, 1, 1).date() df1 = pd.DataFrame({'index_0': 0, 'date': date, 'col1': [0]}) df2 = pd.DataFrame({'index_0': 0, 'date': date, 'col2': [0]}) df1 = df1.groupby(by=['index_0', 'date']).first() df2 = df2.groupby(by=['index_0', 'date']).first() multi_index = pd.MultiIndex.from_tuples([(0, date)], names=['index_0', 'date']) df3 = pd.DataFrame(index=multi_index, columns=['col3'], data=[0]) result = df1.join([df2, df3])Traceback
ValueError Traceback (most recent call last) <ipython-input-555-5947f20f11fd> in <module> 9 df3 = pd.DataFrame(index=multi_index, columns=['col3'], data=[0]) 10 ---> 11 result = df1.join([df2, df3]) 12 /opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in join(self, other, on, how, lsuffix, rsuffix, sort) 7207 """ 7208 return self._join_compat( -> 7209 other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort 7210 ) 7211 /opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort) 7248 frames, axis=1, join="outer", verify_integrity=True, sort=sort 7249 ) -> 7250 return res.reindex(self.index, copy=False) 7251 else: 7252 return concat( /opt/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs) 225 @wraps(func) 226 def wrapper(*args, **kwargs) -> Callable[..., Any]: --> 227 return func(*args, **kwargs) 228 229 kind = inspect.Parameter.POSITIONAL_OR_KEYWORD /opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in reindex(self, *args, **kwargs) 3854 kwargs.pop("axis", None) 3855 kwargs.pop("labels", None) -> 3856 return super().reindex(**kwargs) 3857 3858 def drop( /opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in reindex(self, *args, **kwargs) 4542 # perform the reindex on the axes 4543 return self._reindex_axes( -> 4544 axes, level, limit, tolerance, method, fill_value, copy 4545 ).__finalize__(self) 4546 /opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy) 3742 if index is not None: 3743 frame = frame._reindex_index( -> 3744 index, method, copy, level, fill_value, limit, tolerance 3745 ) 3746 /opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _reindex_index(self, new_index, method, copy, level, fill_value, limit, tolerance) 3758 ): 3759 new_index, indexer = self.index.reindex( -> 3760 new_index, method=method, level=level, limit=limit, tolerance=tolerance 3761 ) 3762 return self._reindex_with_indexers( /opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/multi.py in reindex(self, target, method, level, limit, tolerance) 2414 ) 2415 else: -> 2416 raise ValueError("cannot handle a non-unique multi-index!") 2417 2418 if not isinstance(target, MultiIndex): ValueError: cannot handle a non-unique multi-index! Problem description
The issue seems to be that groupby produces dataframes df1 and df2 having multiindex with level 'date' consisting of 'dateteime.date' objects, while in df3 we get multiindex with level 'date' consisting of 'pandas._libs.tslibs.timestamps.Timestamp' objects. This does not lead to an error in subsequent joins:
result = df1.join(df2) result = result.join(df3)or when dealing with single level indices, but creates a problem when joining 3 dataframes at once.
Expected Output
col1 col2 col3 index_0 date 0 2000-01-01 0 0 0 Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.4.0-124-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 45.2.0.post20200210
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : 0.3.0
lxml.etree : 4.3.4
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.0.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.5
tables : 3.5.2
tabulate : 0.8.3
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
numba : 0.45.0