Skip to content

Conversation

@debnathshoham
Copy link
Contributor

@debnathshoham debnathshoham commented Sep 17, 2021

@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Sep 17, 2021
@debnathshoham debnathshoham marked this pull request as draft September 17, 2021 18:09
@debnathshoham debnathshoham marked this pull request as ready for review September 18, 2021 08:26
fill_value=fill_value,
**kwargs,
)
if len(self) == len(indices) and np.all(self == indices):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead, could use lib.maybe_indices_to_slice; if you do get a slice then can return self[slc]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, checking this might give a considerable performance bottleneck in the take function. Can you check some cases how much this impact the performance of idx.take(..) with a few examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes there is perf drop.
Would the slice approach help with that?

 before after ratio [f3d48175] [7d511e16] <master> <gh41965> + 1.97±0.2ms 3.28±1ms 1.67 indexing.Take.time_take('int')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change suggested by @jbrockmendel, and this no longer hits perf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel , thoughts why this is failing on 32bit?

slc = maybe_indices_to_slice(indices, len(indices))
return self[slc]
except (ValueError, TypeError):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this raise (and you are catching)?

Copy link
Contributor Author

@debnathshoham debnathshoham Sep 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was raising in cases where I was not getting a slice (either because indices was BlockPlacement -> TypeError, or when indices contained anything other than intp_t -> ValueError).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just do something like

if isinstance(indices, slice): ..... 

to avoid the try/except

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried to do that.. but there can be a lot of possibilities of the contents of indices, and that's where the error is thrown

slc = maybe_indices_to_slice(indices, len(indices))
return self[slc]
except (ValueError, TypeError):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just do something like

if isinstance(indices, slice): ..... 

to avoid the try/except

with rewrite_exception("Int64Index", type(self).__name__):
if (
fill_value is None
and isinstance(indices, np.ndarray)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is way too complicated and likely hiding bugs. @jbrockmendel can you suggest anything here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @phofl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take a look at what we do in DatetimeTimedeltaMixin.take; should be possible to do this without the try/except

expected = DataFrame({"a": [1, 2, 3]})
tm.assert_frame_equal(result, expected)

def test_dropna_retains_RangeIndex_when_nothing_dropped(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RangeIndex -> rangeindex

@jreback
Copy link
Contributor

jreback commented Oct 21, 2021

need to merge master and address comments

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Nov 21, 2021
@jreback
Copy link
Contributor

jreback commented Nov 28, 2021

closing as stale if you want to continue, pls ping to reopen.

@jreback jreback closed this Nov 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves Stale

5 participants