Skip to content

MultiIndex - Comparison with Mixed Frequencies (and other FUBAR) #17112

@jbrockmendel

Description

@jbrockmendel

Setup:

index = pd.Index(['PCE']*4, name='Variable') data = [	pd.Period('2018Q2'),	pd.Period('2021', freq='5A-Dec'),	pd.Period('2026', freq='10A-Dec'),	pd.Period('2017Q2')	] ser = pd.Series(data, index=index, name='Period') 

In the real-life version of this issue, 'Period' is a column in a DataFrame and I need to append it as a new level to the index. The snippets here show the problem(s) in both py2 and py3, but for reasons unknown df.set_index('Period', append=True) goes through fine in py2.

The large majority of Period values are quarterly-frequency.

py2

>>> pd.__version__ '0.20.2' >>> ser.sort_values() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1710, in sort_values argsorted = _try_kind_sort(arr[good]) File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1696, in _try_kind_sort return arr.argsort(kind=kind) File "pandas/_libs/period.pyx", line 725, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11842) pandas._libs.period.IncompatibleFrequency: Input has different freq=10A-DEC from Period(freq=Q-DEC) >>> ser.to_frame() Period Variable PCE 2018Q2 PCE 2021 PGDP 2026 PGDP 2017Q2 >>> ser.to_frame().set_index('Period', append=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2836, in set_index index = MultiIndex.from_arrays(arrays, names=names) File "/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays labels, levels = _factorize_from_iterables(arrays) File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables])) File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable cat = Categorical(values, ordered=True) File "/usr/local/lib/python2.7/site-packages/pandas/core/categorical.py", line 310, in __init__ raise NotImplementedError("> 1 ndim Categorical are not " NotImplementedError: > 1 ndim Categorical are not supported at this time 

No idea why it thinks Categorical is relevant here. That doesn't happen in py3.

For the purposes of sort_values, refusing to sort might make sense. But when all I care about is set_index, I'm pretty indifferent to the ordering.

py3

>>> pd.__version__ '0.20.2' >>> ser.sort_values() pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC) During handling of the above exception, another exception occurred: SystemError: <built-in function isinstance> returned a result with an error set [...] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 1710, in sort_values argsorted = _try_kind_sort(arr[good]) File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 1696, in _try_kind_sort return arr.argsort(kind=kind) File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713) File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__ return not self == other File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__ if isinstance(other, compat.string_types): SystemError: <built-in function isinstance> returned a result with an error set >>> ser.to_frame().set_index('Period', append=True) pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC) During handling of the above exception, another exception occurred: SystemError: <built-in function isinstance> returned a result with an error set [...] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 2836, in set_index index = MultiIndex.from_arrays(arrays, names=names) File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays labels, levels = _factorize_from_iterables(arrays) File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables])) File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in <listcomp> return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables])) File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable cat = Categorical(values, ordered=True) File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 298, in __init__ codes, categories = factorize(values, sort=True) File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 567, in factorize assume_unique=True) File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 486, in safe_sort sorter = values.argsort() File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713) File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__ return not self == other File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__ if isinstance(other, compat.string_types): SystemError: <built-in function isinstance> returned a result with an error set 

I have no idea what to make of this.

A problem that I have not been able to replicate with a copy/pasteable subset of the data:

>>> mi = pd.MultiIndex.from_arrays([period.index, period]) >>> mi [... prints roughly what we'd expect...] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 800, in shape return self._values.shape File "/usr/local/lib/python3.5/site-packages/pandas/core/base.py", line 860, in _values return self.values File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 667, in values self._tuples = lib.fast_zip(values) File "pandas/_libs/lib.pyx", line 549, in pandas._libs.lib.fast_zip (pandas/_libs/lib.c:10513) ValueError: all arrays must be same length >>> mi.names FrozenList(['Variable', None]) >>> mi[0] ('CPROF', 'Period') >>> mi[1] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1377, in __getitem__ if lab[key] == -1: IndexError: index 1 is out of bounds for axis 0 with size 1 

AFAICT it took the name 'Period' and made that the only value in the new level of the MultiIndex. Really no idea what's going on here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions