-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
CLN refactor core indexes #37582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN refactor core indexes #37582
Conversation
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm one comment.
ivanovmg left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, looks good to me.
pandas/core/indexes/datetimelike.py Outdated
| return self._data._box_func(i8[0]) | ||
| | ||
| # quick check | ||
| if len(self) and self.is_monotonic and i8[0] != iNaT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put both of tehse outside the try/except (I think this is ok to do), see L298
| can you rebase this again to make sure failures are not systemtic |
| The problem is that gets to if self.hasnans: if skipna: min_stamp = self[~self._isnan].asi8.min() else: return self._na_value else: min_stamp = i8.min() try: return self._data._box_func(min_stamp) except ValueError: return self._na_valueand throws a ValueError on latest commit addresses that |
pandas/core/indexes/datetimelike.py Outdated
| return self._data._box_func(max_stamp) | ||
| except ValueError: | ||
| | ||
| # quick check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could just return self._data.max(...). Downside is that doesn't take advantage of caching
| can you merge master ping on green |
pandas/core/indexes/base.py Outdated
| """ | ||
| trimmed = strings | ||
| while len(strings) > 0 and all(x[0] == " " for x in trimmed): | ||
| while trimmed and all(x.startswith(" ") for x in trimmed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like it changes the logic by checking trimmed instead of strings?
perf difference between startswith vs x[0] == " "?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but AFAIKT this condition is only needed if trimmed is an empty list (because all([]) will always be True), so if we do an early return then we can remove it.
Regarding perf differences:
In [15]: %timeit 'foobarfdsfsdfsdafsdafhdlsafhgsdlafhsdlafhsda'.startswith('f') 110 ns ± 3.31 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [16]: %timeit 'foobarfdsfsdfsdafsdafhdlsafhgsdlafhsdlafhsda'[0] == 'f' 24.1 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) so I'll go back to [0] == ' ' (cc @ivanovmg )
pandas/core/indexes/datetimes.py Outdated
| and (other.tz is None) | ||
| or (self.tz is None) | ||
| and (other.tz is not None) | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self.tz is None) ^ (other.tz is None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self.tz is None ^ other.tz is not None)? AFAIKT the check is to see if one of them is None but the other one isn't - I've tried regrouping the parens to make it clearer anyway, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this not viable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in,
if (self.tz is None) or (other.tz is None): raise TypeError("Cannot join tz-naive with tz-aware DatetimeIndex") ?
I don't think that would work because we don't want to raise if both self.tz and other.tz are None, just if one is but the other isn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not (self.tz is None) or (other.tz is None), but (self.tz is None) ^ (other.tz is None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry, I didn't actually know that was a Python command - that should work then, thanks so much!
| @MarcoGorelli if you rebase and fix this up we can get into 1.2 |
sure, have fixed conflicts and responded to review comments |
| """ | ||
| if isinstance(keyarr, Index): | ||
| pass | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we did it this way to make coverage obvious
pandas/core/indexes/base.py Outdated
| return trimmed | ||
| if not strings: | ||
| return strings | ||
| while all(x[0] == " " for x in strings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this going to break when one of the strings becomes empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're absolutely right, thanks! It fails on master too, but still, worth fixing while we're modifying these lines
In [1]: from pandas.core.indexes.base import trim_front In [2]: trim_front([' ', ' a']) --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-2-1d23255f182c> in <module> ----> 1 trim_front([' ', ' a']) ~/pandas-dev/pandas/core/indexes/base.py in trim_front(strings) 5848 """ 5849 trimmed = strings -> 5850 while len(strings) > 0 and all(x[0] == " " for x in trimmed): 5851 trimmed = [x[1:] for x in trimmed] 5852 return trimmed ~/pandas-dev/pandas/core/indexes/base.py in <genexpr>(.0) 5848 """ 5849 trimmed = strings -> 5850 while len(strings) > 0 and all(x[0] == " " for x in trimmed): 5851 trimmed = [x[1:] for x in trimmed] 5852 return trimmed IndexError: string index out of rangeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I don't think it's an issue, because trim_front is only ever called with a list of strings which are all of the same length.
It's only ever called from pd.Index._format_with_header:
result = trim_front(format_array(values, None, justify="left")) and format_array from pandas/io/formats/format.py returns
fmt_obj.get_result() which in turn returns
_make_fixed_width(fmt_values, self.justify) Nonetheless I can make the condition
while all(strings) and all(x[0] == " " for x in strings): and add a tiny test for which that'd be necessary
| can you rebase |
| thanks @MarcoGorelli |
Some refactorings found by Sourcery https://sourcery.ai/
I've removed the ones of the kind