-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: dropna changes index even when nothing dropped #43618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexes/range.py Outdated
| fill_value=fill_value, | ||
| **kwargs, | ||
| ) | ||
| if len(self) == len(indices) and np.all(self == indices): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead, could use lib.maybe_indices_to_slice; if you do get a slice then can return self[slc]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, checking this might give a considerable performance bottleneck in the take function. Can you check some cases how much this impact the performance of idx.take(..) with a few examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes there is perf drop.
Would the slice approach help with that?
before after ratio [f3d48175] [7d511e16] <master> <gh41965> + 1.97±0.2ms 3.28±1ms 1.67 indexing.Take.time_take('int')There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change suggested by @jbrockmendel, and this no longer hits perf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel , thoughts why this is failing on 32bit?
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
| slc = maybe_indices_to_slice(indices, len(indices)) | ||
| return self[slc] | ||
| except (ValueError, TypeError): | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when does this raise (and you are catching)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was raising in cases where I was not getting a slice (either because indices was BlockPlacement -> TypeError, or when indices contained anything other than intp_t -> ValueError).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just do something like
if isinstance(indices, slice): ..... to avoid the try/except
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried to do that.. but there can be a lot of possibilities of the contents of indices, and that's where the error is thrown
| slc = maybe_indices_to_slice(indices, len(indices)) | ||
| return self[slc] | ||
| except (ValueError, TypeError): | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just do something like
if isinstance(indices, slice): ..... to avoid the try/except
| with rewrite_exception("Int64Index", type(self).__name__): | ||
| if ( | ||
| fill_value is None | ||
| and isinstance(indices, np.ndarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is way too complicated and likely hiding bugs. @jbrockmendel can you suggest anything here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @phofl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take a look at what we do in DatetimeTimedeltaMixin.take; should be possible to do this without the try/except
| expected = DataFrame({"a": [1, 2, 3]}) | ||
| tm.assert_frame_equal(result, expected) | ||
| | ||
| def test_dropna_retains_RangeIndex_when_nothing_dropped(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RangeIndex -> rangeindex
| need to merge master and address comments |
| This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
| closing as stale if you want to continue, pls ping to reopen. |
Uh oh!
There was an error while loading. Please reload this page.