-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
This one is a bit complex to explain, but I'll do my best.
Currently IntervalIndex.get_indexer fails if the other index doesn't contain Interval only (there's also another bug, but let's keep it simple here).
The underlying issue is that IntervalIndex.get_indexer depends on IntervalIndex.get_loc which is ambigous for how it treats number inputs:
>> ii = pd.IntervalIndex.from_breaks([0,1,2,3]) >> ii.get_loc(pd.Interval(1, 2)) 1 # ok >> ii.get_loc(1) # do we mean exactly 1, or if an interval contains the number 1? 1 # ambigousThe issue is that get_loc returns the location for both exact matches and inexact matches (i.e. if the number input is in an interval). For the purposes of get_indexer however, this behavious fails, as get_indexer needs get_loc to find exact matches only.
See #19021 (comment) for further discussion.
Solution
A solution could be adding a 'strict' option to the method parameter of IntervalIndex.get_loc.
This wasn't so difficult after all, and I've already made a PR on this, see #19353