-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: Add Timedelta Support to JSON Reader with orient=table (#21140) #21827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WillAyd left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change - one comment to address
| if 'timedelta64' in dtypes.values(): | ||
| raise NotImplementedError('table="orient" can not yet read ' | ||
| 'ISO-formatted Timedelta data') | ||
| for col, dtype in dtypes.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this block necessary? Assumed the subsequent astype call would take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, support for iso-format timedeltas should be added to astype function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No just go with what jreback said for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, are you sure you need to do this? I believe .astype(dtypes) should handle this, and if not we should fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now
delta = pd.Timedelta(1e9).isoformat() pd.DataFrame([delta]).astype('timedelta64[ns]')results in error.
Codecov Report
@@ Coverage Diff @@ ## master #21827 +/- ## ========================================== + Coverage 92.05% 92.05% +<.01% ========================================== Files 170 170 Lines 50708 50710 +2 ========================================== + Hits 46677 46679 +2 Misses 4031 4031
Continue to review full report at Codecov.
|
pandas/io/json/table_schema.py Outdated
| 'ISO-formatted Timedelta data') | ||
| for col, dtype in dtypes.items(): | ||
| if dtype == 'timedelta64[ns]': | ||
| df[col] = df[col].apply(Timedelta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would not convert this way, use pd.to_timedelta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can pd.to_timedelta parse iso-formated timedeltas now? Should support for this format be added to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to change this now
| raise NotImplementedError('table="orient" can not yet read timezone ' | ||
| 'data') | ||
| | ||
| # No ISO constructor for Timedelta as of yet, so need to raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this not hit in tests prior to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have been covered by test_read_json_table_orient_raises through parametrization
| ts = np.timedelta64(ts) | ||
| elif is_string_object(ts): | ||
| ts = np.timedelta64(parse_timedelta_string(ts)) | ||
| if len(ts) > 0 and ts[0] == 'P': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm what problem are you trying to solve with this addition? ISO format support should have been added back in #19191 as a precursor to this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd.to_timedelta was not converting iso-formated strings
In pd.Timedelta constructor parse_iso_format_string function is used:
if len(value) > 0 and value[0] == 'P': value = parse_iso_format_string(value) else: value = parse_timedelta_string(value)while in pd.to_timedelta array_to_timedelta64 is used. Which is not using parse_iso_format_string, actually there're only two occurrences of this function in timedeltas.pyx, its definition and inside pd.Timedelta.
Also, why have parse_iso_format_string and parse_timedelta_string simultaneously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK I see. If that's the case then it would be better to do a separate PR as a pre-cursor to this one. That way we can communicate the enhancement and ensure appropriate testing.
Since you started on that, do you want to open an issue about pd.to_timedelta not accepting the ISO duration and send a separate PR to fix accordingly? Can come back to this one after the fact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll open an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened an issue #21877
| df[col] = df[col].apply(Timedelta) | ||
| df[col] = to_timedelta(df[col]) | ||
| | ||
| df = df.astype(dtypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any conflict between this and the loop above? Wondering if we should be removing any timedelta calls from the dtypes dict since they get accounted for in the loop
pandas/io/json/table_schema.py Outdated
| 'ISO-formatted Timedelta data') | ||
| for col, dtype in dtypes.items(): | ||
| if dtype == 'timedelta64[ns]': | ||
| df[col] = df[col].apply(Timedelta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to change this now
pandas/io/json/table_schema.py Outdated
| | ||
| import pandas._libs.json as json | ||
| from pandas import DataFrame | ||
| from pandas import DataFrame, Timedelta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import to_timedelta instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I've pushed this accidentally
| Hello, what is wrong with Travis test? Is it a problem on the testing side or on mine? |
| Looks like something on the Travis side in this particular instance - @TomAugspurger can you help restart that particular job? |
| restarted. …On Tue, Jul 24, 2018 at 10:23 AM William Ayd ***@***.***> wrote: Looks like something on the Travis side in this particular instance - @TomAugspurger <https://github.com/TomAugspurger> can you help restart that particular job? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21827 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIpHh9lD3neA7Yf-jvKy2AexGTWt8ks5uJzwIgaJpZM4VHm9i> . |
| @WillAyd is there anything else that I should change? |
WillAyd left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one minor fix up but otherwise lgtm
| if 'timedelta64' in dtypes.values(): | ||
| raise NotImplementedError('table="orient" can not yet read ' | ||
| 'ISO-formatted Timedelta data') | ||
| for col, dtype in dtypes.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, are you sure you need to do this? I believe .astype(dtypes) should handle this, and if not we should fix that.
doc/source/whatsnew/v0.24.0.txt Outdated
| - :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`) | ||
| - :func:`~DataFrame.to_csv` and :func:`~DataFrame.to_json` now support ``compression='infer'`` to infer compression based on filename (:issue:`15008`) | ||
| - :func:`to_timedelta` now supports iso-formated timedelta strings (:issue:`21877`) | ||
| - :func:`read_json` now parse timedelta with `orient='table'` (:issue:`21140`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed this on previous comment but there's a small typo here. parse -> parses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd I've pushed a fix, but now there is a merge conflict. How should I deal with it on github?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to deal locally and re-push. Assuming you have things set up as mentioned in the pandas contributing guide do this on your local branch
git fetch upstream git merge upstream/masterYou'll probably get a merge conflict there, so fix that up and do:
git merge --continueEverything should be resolved then so re-push after that
| Hello, @jreback |
| i believe we handle astype for date times by calling to_datetme we should do the same for timedelta (which then would make this automatically work) |
| For now such conversions throw error >>> s = pd.Series(['0:0:1']) >>> s.astype('timedelta64[ns]') WTF timedelta64[ns] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/sergey/projects/pandas/pandas/util/_decorators.py", line 178, in wrapper return func(*args, **kwargs) File "/home/sergey/projects/pandas/pandas/core/generic.py", line 5149, in astype **kwargs) File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 555, in astype return self.apply('astype', dtype=dtype, **kwargs) File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 422, in apply applied = getattr(b, f)(**kwargs) File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 564, in astype **kwargs) File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 655, in _astype values = astype_nansafe(values.ravel(), dtype, copy=True) File "/home/sergey/projects/pandas/pandas/core/dtypes/cast.py", line 717, in astype_nansafe return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape) File "pandas/_libs/lib.pyx", line 455, in pandas._libs.lib.astype_intsafe util.set_value_at_unsafe(result, i, v) File "pandas/_libs/src/util.pxd", line 144, in util.set_value_at_unsafe ValueError: Could not convert object to NumPy timedeltaI think the problem is with: pandas/pandas/core/dtypes/cast.py Lines 712 to 723 in 114f415
We're not reaching the line 721 as np.issubdtype(np.timedelta64, np.integer) is True.Should I create a new pull request? |
| @fjdiod if you are interested it would be preferable to have a new issue and PR, similar to what you just did for the |
| can you merge master and update |
| can you merge master and update |
| Closing as stale though would be great to have this. Please ping if you'd like to pick this back up |
git diff upstream/master -u -- "*.py" | flake8 --diff