Skip to content

.loc sometimes raises KeyError without an error message when called on an unsorted MultiIndex DataFrame #12660

@adamdivak

Description

@adamdivak

Hello,

I know it is well documented that MultiIndex DataFrames need to be sorted to use slicing, and that is fine. Even if you forget this, in most cases (for example when using .loc with a slicer) Pandas gives a helpful error message when you try to call it on an unsorted DataFrame, which makes it easy to spot the mistake and add the necessary sorting. However when simply using .loc without a slicer, the same KeyError exception is raised without an error message, which looks like as if it was a legit key error.

Code Sample, a copy-pastable example if possible

Create a test DataFrame

iterables = [['a', 'b'], [2, 1]] columns = pd.MultiIndex.from_product(iterables, names=['col1', 'col2']) rows = pd.MultiIndex.from_product(iterables, names=['row1', 'row2']) df = pd.DataFrame(np.random.randn(4, 4), index=rows, columns=columns) print(df)
col1 a b col2 2 1 2 1 row1 row2 a 2 -1.285010 0.183851 -1.180964 0.885343 1 0.213501 0.479927 0.142614 0.064209 b 2 0.250557 -0.612791 -0.275680 -0.134086 1 -0.853687 -2.397638 0.940984 1.133747 

Try to call .loc without a slicer

df.loc['a', 'b']
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-28-b77cac191687> in <module>() ----> 1 df.loc['a', 'b'] /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in __getitem__(self, key) 1223 def __getitem__(self, key): 1224 if type(key) is tuple: -> 1225 return self._getitem_tuple(key) 1226 else: 1227 return self._getitem_axis(key, axis=0) /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup) 736 def _getitem_tuple(self, tup): 737 try: --> 738 return self._getitem_lowerdim(tup) 739 except IndexingError: 740 pass /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup) 849 ax0 = self.obj._get_axis(0) 850 if isinstance(ax0, MultiIndex): --> 851 result = self._handle_lowerdim_multi_index_axis0(tup) 852 if result is not None: 853 return result /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _handle_lowerdim_multi_index_axis0(self, tup) 831 ax0 = self.obj._get_axis(0) 832 if not ax0.is_lexsorted_for_tuple(tup): --> 833 raise e1 834 835 return None /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _handle_lowerdim_multi_index_axis0(self, tup) 820 try: 821 # fast path for series or for tup devoid of slices --> 822 return self._get_label(tup, axis=0) 823 except TypeError: 824 # slices are unhashable /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _get_label(self, label, axis) 84 raise IndexingError('no slices here, handle elsewhere') 85 ---> 86 return self.obj._xs(label, axis=axis) 87 88 def _get_loc(self, key, axis=0): /opt/conda/lib/python3.4/site-packages/pandas/core/generic.py in xs(self, key, axis, level, copy, drop_level) 1482 if isinstance(index, MultiIndex): 1483 loc, new_index = self.index.get_loc_level(key, -> 1484 drop_level=drop_level) 1485 else: 1486 loc = self.index.get_loc(key) /opt/conda/lib/python3.4/site-packages/pandas/core/index.py in get_loc_level(self, key, level, drop_level) 5553 key = tuple(self[indexer].tolist()[0]) 5554 -> 5555 return (self._engine.get_loc(_values_from_object(key)), 5556 None) 5557 else: pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)() pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)() pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)() pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)() KeyError: ('a', 'b') 

Make the same call after setting the sortlevel

df2 = df.sortlevel(0) print(df2.loc['a', 'b'])
col2 2 1 row2 1 0.142614 0.064209 2 -1.180964 0.885343 

Expected Output

The same helpful error message, regardless of using or not using an explicit slicer in the .loc query.

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)' 

output of pd.show_versions()

pd.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.4.4.final.0 python-bits: 64 OS: Linux OS-release: 4.2.0-27-generic machine: x86_64 processor: byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 pandas: 0.17.1 nose: None pip: 8.0.2 setuptools: 20.1.1 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.17.0 statsmodels: None IPython: 4.1.1 sphinx: None patsy: 0.4.0 dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.5.1 openpyxl: None xlrd: 0.9.4 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None Jinja2: None 

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndexNeeds TestsUnit test(s) needed to prevent regressions

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions