Skip to content

Conversation

@johannes-mueller
Copy link
Contributor

@johannes-mueller johannes-mueller commented Nov 12, 2021

GH#44084 boils down to the following.

According to the docs .get_indexer_non_unique() is supposed to return
"integers from 0 to n - 1 indicating that the index at these positions matches
the corresponding target values". However, for an index that is non unique and
non monotonic it returns a boolean mask. That is because it uses .get_loc()
which for non unique, non monotonic indexes returns a boolean mask.

This patch catches that case and converts the boolean mask from .get_loc()
into the corresponding array of integers if the index is not unique and not
monotonic.

…ev#44084) GH#44084 boils down to the following. According to the docs `.get_indexer_non_unique()` is supposed to return "integers from 0 to n - 1 indicating that the index at these positions matches the corresponding target values". However, for an index that is non unique and non monotonic it returns a boolean mask. That is because it uses `.get_loc()` which for non unique, non monotonic indexes returns a boolean mask. This patch catches that case and converts the boolean mask from `.get_loc()` into the corresponding array of integers if the index is not unique and not monotonic.
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add whatsnew


def test_get_index_non_unique_non_monotonic(self):
# GH#44084
index = IntervalIndex.from_tuples(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the MultiIndex cases

On Windows `np.array([1, 3])` is obviously int32 and thus the comparison to the int64 array fails due to dtype mismatch.
Sometimes the world out there is a bit more complicated than what you have on your cozy desktop :)
@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type MultiIndex labels Nov 12, 2021
@jreback jreback added this to the 1.4 milestone Nov 14, 2021
@jreback jreback merged commit 2bbd4d6 into pandas-dev:master Nov 14, 2021
@jreback
Copy link
Contributor

jreback commented Nov 14, 2021

thanks @johannes-mueller

do we have sufficient testing on this for other index types aside from those explicityly tested here? if not would take a PR for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type MultiIndex

3 participants