-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
IndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesMultiIndexNeeds TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressions
Milestone
Description
Hello,
I know it is well documented that MultiIndex DataFrames need to be sorted to use slicing, and that is fine. Even if you forget this, in most cases (for example when using .loc with a slicer) Pandas gives a helpful error message when you try to call it on an unsorted DataFrame, which makes it easy to spot the mistake and add the necessary sorting. However when simply using .loc without a slicer, the same KeyError exception is raised without an error message, which looks like as if it was a legit key error.
Code Sample, a copy-pastable example if possible
Create a test DataFrame
iterables = [['a', 'b'], [2, 1]] columns = pd.MultiIndex.from_product(iterables, names=['col1', 'col2']) rows = pd.MultiIndex.from_product(iterables, names=['row1', 'row2']) df = pd.DataFrame(np.random.randn(4, 4), index=rows, columns=columns) print(df)col1 a b col2 2 1 2 1 row1 row2 a 2 -1.285010 0.183851 -1.180964 0.885343 1 0.213501 0.479927 0.142614 0.064209 b 2 0.250557 -0.612791 -0.275680 -0.134086 1 -0.853687 -2.397638 0.940984 1.133747 Try to call .loc without a slicer
df.loc['a', 'b']--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-28-b77cac191687> in <module>() ----> 1 df.loc['a', 'b'] /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in __getitem__(self, key) 1223 def __getitem__(self, key): 1224 if type(key) is tuple: -> 1225 return self._getitem_tuple(key) 1226 else: 1227 return self._getitem_axis(key, axis=0) /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup) 736 def _getitem_tuple(self, tup): 737 try: --> 738 return self._getitem_lowerdim(tup) 739 except IndexingError: 740 pass /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup) 849 ax0 = self.obj._get_axis(0) 850 if isinstance(ax0, MultiIndex): --> 851 result = self._handle_lowerdim_multi_index_axis0(tup) 852 if result is not None: 853 return result /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _handle_lowerdim_multi_index_axis0(self, tup) 831 ax0 = self.obj._get_axis(0) 832 if not ax0.is_lexsorted_for_tuple(tup): --> 833 raise e1 834 835 return None /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _handle_lowerdim_multi_index_axis0(self, tup) 820 try: 821 # fast path for series or for tup devoid of slices --> 822 return self._get_label(tup, axis=0) 823 except TypeError: 824 # slices are unhashable /opt/conda/lib/python3.4/site-packages/pandas/core/indexing.py in _get_label(self, label, axis) 84 raise IndexingError('no slices here, handle elsewhere') 85 ---> 86 return self.obj._xs(label, axis=axis) 87 88 def _get_loc(self, key, axis=0): /opt/conda/lib/python3.4/site-packages/pandas/core/generic.py in xs(self, key, axis, level, copy, drop_level) 1482 if isinstance(index, MultiIndex): 1483 loc, new_index = self.index.get_loc_level(key, -> 1484 drop_level=drop_level) 1485 else: 1486 loc = self.index.get_loc(key) /opt/conda/lib/python3.4/site-packages/pandas/core/index.py in get_loc_level(self, key, level, drop_level) 5553 key = tuple(self[indexer].tolist()[0]) 5554 -> 5555 return (self._engine.get_loc(_values_from_object(key)), 5556 None) 5557 else: pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)() pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)() pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)() pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)() KeyError: ('a', 'b') Make the same call after setting the sortlevel
df2 = df.sortlevel(0) print(df2.loc['a', 'b'])col2 2 1 row2 1 0.142614 0.064209 2 -1.180964 0.885343 Expected Output
The same helpful error message, regardless of using or not using an explicit slicer in the .loc query.
KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)' output of pd.show_versions()
pd.show_versions()INSTALLED VERSIONS ------------------ commit: None python: 3.4.4.final.0 python-bits: 64 OS: Linux OS-release: 4.2.0-27-generic machine: x86_64 processor: byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 pandas: 0.17.1 nose: None pip: 8.0.2 setuptools: 20.1.1 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.17.0 statsmodels: None IPython: 4.1.1 sphinx: None patsy: 0.4.0 dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.5.1 openpyxl: None xlrd: 0.9.4 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None Jinja2: None Metadata
Metadata
Assignees
Labels
IndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesMultiIndexNeeds TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressions