-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Setup:
index = pd.Index(['PCE']*4, name='Variable') data = [ pd.Period('2018Q2'), pd.Period('2021', freq='5A-Dec'), pd.Period('2026', freq='10A-Dec'), pd.Period('2017Q2') ] ser = pd.Series(data, index=index, name='Period') In the real-life version of this issue, 'Period' is a column in a DataFrame and I need to append it as a new level to the index. The snippets here show the problem(s) in both py2 and py3, but for reasons unknown df.set_index('Period', append=True) goes through fine in py2.
The large majority of Period values are quarterly-frequency.
py2
>>> pd.__version__ '0.20.2' >>> ser.sort_values() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1710, in sort_values argsorted = _try_kind_sort(arr[good]) File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1696, in _try_kind_sort return arr.argsort(kind=kind) File "pandas/_libs/period.pyx", line 725, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11842) pandas._libs.period.IncompatibleFrequency: Input has different freq=10A-DEC from Period(freq=Q-DEC) >>> ser.to_frame() Period Variable PCE 2018Q2 PCE 2021 PGDP 2026 PGDP 2017Q2 >>> ser.to_frame().set_index('Period', append=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2836, in set_index index = MultiIndex.from_arrays(arrays, names=names) File "/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays labels, levels = _factorize_from_iterables(arrays) File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables])) File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable cat = Categorical(values, ordered=True) File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 310, in __init__ raise NotImplementedError("> 1 ndim Categorical are not " NotImplementedError: > 1 ndim Categorical are not supported at this time No idea why it thinks Categorical is relevant here. That doesn't happen in py3.
For the purposes of sort_values, refusing to sort might make sense. But when all I care about is set_index, I'm pretty indifferent to the ordering.
py3
>>> pd.__version__ '0.20.2' >>> ser.sort_values() pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC) During handling of the above exception, another exception occurred: SystemError: <built-in function isinstance> returned a result with an error set [...] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 1710, in sort_values argsorted = _try_kind_sort(arr[good]) File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 1696, in _try_kind_sort return arr.argsort(kind=kind) File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713) File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__ return not self == other File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__ if isinstance(other, compat.string_types): SystemError: <built-in function isinstance> returned a result with an error set >>> ser.to_frame().set_index('Period', append=True) pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC) During handling of the above exception, another exception occurred: SystemError: <built-in function isinstance> returned a result with an error set [...] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 2836, in set_index index = MultiIndex.from_arrays(arrays, names=names) File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays labels, levels = _factorize_from_iterables(arrays) File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables])) File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in <listcomp> return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables])) File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable cat = Categorical(values, ordered=True) File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 298, in __init__ codes, categories = factorize(values, sort=True) File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 567, in factorize assume_unique=True) File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 486, in safe_sort sorter = values.argsort() File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713) File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__ return not self == other File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__ if isinstance(other, compat.string_types): SystemError: <built-in function isinstance> returned a result with an error set I have no idea what to make of this.
A problem that I have not been able to replicate with a copy/pasteable subset of the data:
>>> mi = pd.MultiIndex.from_arrays([period.index, period]) >>> mi [... prints roughly what we'd expect...] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 800, in shape return self._values.shape File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 860, in _values return self.values File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 667, in values self._tuples = lib.fast_zip(values) File "pandas/_libs/lib.pyx", line 549, in pandas._libs.lib.fast_zip (pandas/_libs/lib.c:10513) ValueError: all arrays must be same length >>> mi.names FrozenList(['Variable', None]) >>> mi[0] ('CPROF', 'Period') >>> mi[1] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1377, in __getitem__ if lab[key] == -1: IndexError: index 1 is out of bounds for axis 0 with size 1 AFAICT it took the name 'Period' and made that the only value in the new level of the MultiIndex. Really no idea what's going on here.