-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
core: try coerce result back to DatetimeBlock #22008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @jbrockmendel . Should this work with tz-aware date times as well? |
Yes. If there's any way at all to do this without poking into core.internals, that would be preferred. The Series version of this works correctly right? This should be doing something like operating column-wise to dispatch to the Series implementation (which itself dispatches to the DatetimeIndex implementation). Can you include tests for subtraction while you're at it? |
Codecov Report
@@ Coverage Diff @@ ## master #22008 +/- ## ========================================== + Coverage 91.99% 91.99% +<.01% ========================================== Files 167 167 Lines 50578 50588 +10 ========================================== + Hits 46530 46540 +10 Misses 4048 4048
Continue to review full report at Codecov.
|
pandas/core/internals.py Outdated
| values, values_mask, other, other_mask = self._try_coerce_args( | ||
| transf(values), other) | ||
| except TypeError: | ||
| # `Timestamp` operate with `DateOffset` must cast to `object`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather see you override eval in the appropriate blocks
ad96a28 to a75f375 Compare | v2.0, override Below test fails, but not a regression. Will look deeper later. def test_timestamp_df_sub_timestamp(self): expected = pd.DataFrame([pd.Timedelta('365d')]) result = pd.DataFrame([pd.Timestamp('2018')]) - pd.Timestamp('2017') tm.assert_frame_equal(expected, result) |
| docs said returning a new pandas/pandas/core/internals/__init__.py Lines 1333 to 1336 in 37c7458
pandas/pandas/core/internals/__init__.py Lines 1455 to 1457 in 37c7458
|
pandas/core/internals/__init__.py Outdated
| block = super().eval(func, other, try_cast=try_cast, **kwargs)[0] | ||
| if try_cast: | ||
| if isinstance(other, | ||
| (tslibs.Timestamp, np.datetime64, datetime, date)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an ABCTimestamp to check? At least 3 times use similar isinstance().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an ABCTimestamp to check?
No, but Timestamp subclasses datetime, so could actually be removed from this check.
4c36407 to 8a3a20e Compare | self.values[locs] = values | ||
| | ||
| def eval(self, func, other, try_cast=False, **kwargs): | ||
| block = super(DatetimeBlock, self).eval(func, other, try_cast=try_cast, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here if instead of calling super eval you delegate to self.holder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In DatetimeBlock - datetime case and DatetimeBlock + DateOffset case, it require the result block changing (DatetimeBlock -> TimeDeltaBlock , ObjectBlock -> DatetimeBlock or DatetimeTZBlock).
Delegating to self.holder couldn't change block type. Besides, self._holder is unaware of other.
In fact, what should be blamed is the last line in Block.eval carelessly make a wrong block. A proper fix should be
- spliting a method to make block regarding to
other. - override
make_blockwithotheras argument. But this will lead toself.make_blockreturning another type of block.
pandas/pandas/core/internals/__init__.py
Line 1456 in 0828c25
return [self.make_block(result)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, what should be blamed is the last line in Block.eval carelessly make a wrong block. A proper fix should be
There has been an effort recently to fix mismatches between the behaviors of arithmetic ops in Index vs Series vs DataFrame. For the timedelta and datetime cases, the canonical behavior is defined in TimedeltaIndex/DatetimeIndex.
The best case here to to avoid making this fix in core.internals but instead do it in core.DataFrame (or core.ops). The next base case is to have the FooBlock dispatch to the appropriate Index subclass to ensure the behavior is canonical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will be great, but it may need a big rewrite of eval and _try_coerce_args, cause now values is all coerced to ndarray[i8] for DatetimeLike. (May be benefited from numpy bi-ops, but confused to convert back.)
Is there an issue to xfer? This PR would depand on it.
| (2**63, 'complex128'), | ||
| (True, 'bool'), | ||
| (np.timedelta64(20, 'ns'), '<m8[ns]'), | ||
| (np.datetime64(20, 'ns'), '<M8[ns]'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removed? (Besides no is exempt sub working)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR lead to datetime64 OP datetime64 = timedelta64, regardless of OP. Although meaningless, it will result datetime64 + datetime64 = timedelta64.
The assumption datetime64 OP datetime64 = datetime64 is meaningless for all OP. If allow this line, it will require extra OP check in DatetimeBlock.eval, which I don't think worth.
| (operator.mod, 'complex128'), | ||
| (operator.mod, '<m8[ns]'), | ||
| (operator.pow, 'bool'), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls avoid formatting changes where convenient, makes it harder for reviewers to identify the actual changes.
'<M8[ns]' - '<M8[ns]' = '<m8[ns]' , other binop are invalid.
c82ea08 to 21c1c7c Compare
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a lot of concerns over the changes in the impl here.
| def _try_coerce_result(self, result): | ||
| """ reverse of try_coerce_args """ | ||
| if isinstance(result, np.ndarray): | ||
| if isinstance(result, (np.ndarray, Block)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a good change, please remove. the point is that these are scalars/arrays NEVER blocks.
| if isinstance(result, np.ndarray): | ||
| if result.dtype.kind in ['i', 'f', 'O']: | ||
| result = result.astype('M8[ns]') | ||
| elif isinstance(result, Block): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, NEVER do this
| Two of the bugs this addresses (#21610, #8554) were closed by #22163. The remaining issue #12437 is still open. @holymonson can this be simplified if only aimed at that issue? |
This |
git diff upstream/master -u -- "*.py" | flake8 --diffHang up until
Block.evalreformed.