ENH: Add Timedelta Support to JSON Reader with orient=table (#21140) #21827

fjdiod · 2018-07-09T12:22:31Z

closes Add Timedelta Support to JSON Reader with orient="table" #21140
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

WillAyd

Nice change - one comment to address

WillAyd · 2018-07-09T15:52:32Z

pandas/io/json/table_schema.py

- if 'timedelta64' in dtypes.values():
- raise NotImplementedError('table="orient" can not yet read '
- 'ISO-formatted Timedelta data')
+ for col, dtype in dtypes.items():


Is this block necessary? Assumed the subsequent astype call would take care of this

So, support for iso-format timedeltas should be added to astype function?

No just go with what jreback said for now.

hmm, are you sure you need to do this? I believe .astype(dtypes) should handle this, and if not we should fix that.

For now

delta = pd.Timedelta(1e9).isoformat() pd.DataFrame([delta]).astype('timedelta64[ns]')

results in error.

codecov · 2018-07-09T19:29:49Z

Codecov Report

Merging #21827 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@ Coverage Diff @@ ## master #21827 +/- ## ========================================== + Coverage 92.05% 92.05% +<.01%  ========================================== Files 170 170 Lines 50708 50710 +2 ========================================== + Hits 46677 46679 +2  Misses 4031 4031

Flag	Coverage Δ
#multiple	`90.45% <100%> (ø)`	⬆️
#single	`42.36% <20%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/json/table_schema.py	`98.31% <100%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be6ad72...0f28fd0. Read the comment docs.

jreback · 2018-07-09T21:39:14Z

pandas/io/json/table_schema.py

- 'ISO-formatted Timedelta data')
+ for col, dtype in dtypes.items():
+ if dtype == 'timedelta64[ns]':
+ df[col] = df[col].apply(Timedelta)


you would not convert this way, use pd.to_timedelta

Can pd.to_timedelta parse iso-formated timedeltas now? Should support for this format be added to this function?

Should be able to change this now

jreback · 2018-07-09T21:40:03Z

pandas/io/json/table_schema.py

 raise NotImplementedError('table="orient" can not yet read timezone '
 'data')

- # No ISO constructor for Timedelta as of yet, so need to raise


was this not hit in tests prior to this PR?

Should have been covered by test_read_json_table_orient_raises through parametrization

WillAyd · 2018-07-10T20:57:45Z

pandas/_libs/tslibs/timedeltas.pyx

 ts = np.timedelta64(ts)
 elif is_string_object(ts):
- ts = np.timedelta64(parse_timedelta_string(ts))
+ if len(ts) > 0 and ts[0] == 'P':


Hmm what problem are you trying to solve with this addition? ISO format support should have been added back in #19191 as a precursor to this PR

pd.to_timedelta was not converting iso-formated strings
In pd.Timedelta constructor parse_iso_format_string function is used:

if len(value) > 0 and value[0] == 'P': value = parse_iso_format_string(value) else: value = parse_timedelta_string(value)

while in pd.to_timedelta array_to_timedelta64 is used. Which is not using parse_iso_format_string, actually there're only two occurrences of this function in timedeltas.pyx, its definition and inside pd.Timedelta.
Also, why have parse_iso_format_string and parse_timedelta_string simultaneously?

Ah OK I see. If that's the case then it would be better to do a separate PR as a pre-cursor to this one. That way we can communicate the enhancement and ensure appropriate testing.

Since you started on that, do you want to open an issue about pd.to_timedelta not accepting the ISO duration and send a separate PR to fix accordingly? Can come back to this one after the fact

OK, I'll open an issue

I've opened an issue #21877

WillAyd · 2018-07-10T20:58:38Z

pandas/io/json/table_schema.py

- df[col] = df[col].apply(Timedelta)
+ df[col] = to_timedelta(df[col])

 df = df.astype(dtypes)


Is there any conflict between this and the loop above? Wondering if we should be removing any timedelta calls from the dtypes dict since they get accounted for in the loop

…ev#21140)

WillAyd · 2018-07-20T17:54:42Z

pandas/io/json/table_schema.py

- 'ISO-formatted Timedelta data')
+ for col, dtype in dtypes.items():
+ if dtype == 'timedelta64[ns]':
+ df[col] = df[col].apply(Timedelta)


Should be able to change this now

WillAyd · 2018-07-20T17:54:58Z

pandas/io/json/table_schema.py


 import pandas._libs.json as json
-from pandas import DataFrame
+from pandas import DataFrame, Timedelta


Import to_timedelta instead

Oh, sorry, I've pushed this accidentally

…ev#21140) add iso-format support to to_timedelta Revert "add iso-format support to to_timedelta" This reverts commit 3f5f176.

fjdiod · 2018-07-24T14:01:31Z

Hello, what is wrong with Travis test? Is it a problem on the testing side or on mine?

WillAyd · 2018-07-24T15:23:46Z

Looks like something on the Travis side in this particular instance - @TomAugspurger can you help restart that particular job?

TomAugspurger · 2018-07-24T15:24:58Z

restarted.

…

On Tue, Jul 24, 2018 at 10:23 AM William Ayd ***@***.***> wrote: Looks like something on the Travis side in this particular instance - @TomAugspurger <https://github.com/TomAugspurger> can you help restart that particular job? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21827 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIpHh9lD3neA7Yf-jvKy2AexGTWt8ks5uJzwIgaJpZM4VHm9i> .

fjdiod · 2018-07-25T15:30:20Z

@WillAyd is there anything else that I should change?

WillAyd

Only one minor fix up but otherwise lgtm

jreback · 2018-07-26T12:56:52Z

pandas/io/json/table_schema.py

- if 'timedelta64' in dtypes.values():
- raise NotImplementedError('table="orient" can not yet read '
- 'ISO-formatted Timedelta data')
+ for col, dtype in dtypes.items():


hmm, are you sure you need to do this? I believe .astype(dtypes) should handle this, and if not we should fix that.

WillAyd · 2018-07-26T15:58:06Z

doc/source/whatsnew/v0.24.0.txt

 - :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`)
 - :func:`~DataFrame.to_csv` and :func:`~DataFrame.to_json` now support ``compression='infer'`` to infer compression based on filename (:issue:`15008`)
 - :func:`to_timedelta` now supports iso-formated timedelta strings (:issue:`21877`)
+- :func:`read_json` now parse timedelta with `orient='table'` (:issue:`21140`)


I might have missed this on previous comment but there's a small typo here. parse -> parses

@WillAyd I've pushed a fix, but now there is a merge conflict. How should I deal with it on github?

Better to deal locally and re-push. Assuming you have things set up as mentioned in the pandas contributing guide do this on your local branch

git fetch upstream git merge upstream/master

You'll probably get a merge conflict there, so fix that up and do:

git merge --continue

Everything should be resolved then so re-push after that

fjdiod · 2018-07-27T13:11:14Z

Hello, @jreback
How should I proceed with astype? Should support for iso-formatted timedeltas be added there?
Now astype could not parse timedeltas and datetimes from strings. So, wouldn't it be inconsistent to add support for iso-format?

jreback · 2018-07-27T15:19:53Z

i believe we handle astype for date times by calling to_datetme we should do the same for timedelta (which then would make this automatically work)

fjdiod · 2018-07-27T17:24:11Z

For now such conversions throw error

>>> s = pd.Series(['0:0:1']) >>> s.astype('timedelta64[ns]') WTF timedelta64[ns] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/sergey/projects/pandas/pandas/util/_decorators.py", line 178, in wrapper return func(*args, **kwargs) File "/home/sergey/projects/pandas/pandas/core/generic.py", line 5149, in astype **kwargs) File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 555, in astype return self.apply('astype', dtype=dtype, **kwargs) File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 422, in apply applied = getattr(b, f)(**kwargs) File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 564, in astype **kwargs) File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 655, in _astype values = astype_nansafe(values.ravel(), dtype, copy=True) File "/home/sergey/projects/pandas/pandas/core/dtypes/cast.py", line 717, in astype_nansafe return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape) File "pandas/_libs/lib.pyx", line 455, in pandas._libs.lib.astype_intsafe util.set_value_at_unsafe(result, i, v) File "pandas/_libs/src/util.pxd", line 144, in util.set_value_at_unsafe ValueError: Could not convert object to NumPy timedelta

I think the problem is with:

pandas/pandas/core/dtypes/cast.py

Lines 712 to 723 in 114f415

     elif is_object_dtype(arr):  
    
   # work around NumPy brokenness, #1987  
   if np.issubdtype(dtype.type, np.integer):  
   return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)  
    
   # if we have a datetime/timedelta array of objects  
   # then coerce to a proper dtype and recall astype_nansafe  
    
   elif is_datetime64_dtype(dtype):  
   from pandas import to_datetime  
   return astype_nansafe(to_datetime(arr).values, dtype, copy=copy)  
 

We're not reaching the line 721 as np.issubdtype(np.timedelta64, np.integer) is True.
Should I create a new pull request?

WillAyd · 2018-07-27T17:41:42Z

@fjdiod if you are interested it would be preferable to have a new issue and PR, similar to what you just did for the to_timedelta change. Sorry for all of the pre-cursors!

fjdiod · 2018-07-28T21:26:25Z

Hello, @WillAyd. I've created an issue and a pull request #22100 #22107

jreback · 2018-10-11T02:01:15Z

can you merge master and update

jreback · 2018-11-01T01:30:19Z

can you merge master and update

WillAyd · 2018-11-23T03:49:57Z

Closing as stale though would be great to have this. Please ping if you'd like to pick this back up

WillAyd requested changes Jul 9, 2018

View reviewed changes

WillAyd added IO JSON read_json, to_json, json_normalize Timedelta Timedelta data type labels Jul 9, 2018

jreback requested changes Jul 9, 2018

View reviewed changes

WillAyd requested changes Jul 10, 2018

View reviewed changes

WillAyd mentioned this pull request Jul 13, 2018

pd.to_timedelta not parsing iso-formatted strings #21877

Closed

WillAyd mentioned this pull request Jul 20, 2018

How should I store frames with multiindex columns in CSV? #21976

Closed

fjdiod added 2 commits July 20, 2018 20:36

ENH: Add Timedelta Support to JSON Reader with orient=table (pandas-d…

7be068b

…ev#21140)

add whatsnew entry

7f1336c

WillAyd requested changes Jul 20, 2018

View reviewed changes

fjdiod added 2 commits July 20, 2018 21:25

ENH: Add Timedelta Support to JSON Reader with orient=table (pandas-d…

b0150e4

…ev#21140) add iso-format support to to_timedelta Revert "add iso-format support to to_timedelta" This reverts commit 3f5f176.

add whatsnew entry

ed0e1fe

fjdiod force-pushed the json-iso branch from 8564cff to ed0e1fe Compare July 20, 2018 18:29

WillAyd approved these changes Jul 25, 2018

View reviewed changes

jreback requested changes Jul 26, 2018

View reviewed changes

WillAyd reviewed Jul 26, 2018

View reviewed changes

fjdiod added 2 commits July 26, 2018 19:02

fix typo

6842582

Merge remote-tracking branch 'upstream/master' into json-iso

0f28fd0

WillAyd mentioned this pull request Aug 10, 2018

Change Default to/read JSON Format to orient='table' #22271

Closed

WillAyd closed this Nov 23, 2018

Uh oh!

ENH: Add Timedelta Support to JSON Reader with orient=table (#21140) #21827

ENH: Add Timedelta Support to JSON Reader with orient=table (#21140) #21827

Uh oh!

Conversation

fjdiod commented Jul 9, 2018 • edited by WillAyd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjdiod commented Jul 24, 2018

WillAyd commented Jul 24, 2018

TomAugspurger commented Jul 24, 2018 via email

fjdiod commented Jul 25, 2018

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjdiod Jul 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjdiod commented Jul 27, 2018

jreback commented Jul 27, 2018

fjdiod commented Jul 27, 2018

WillAyd commented Jul 27, 2018

fjdiod commented Jul 28, 2018

jreback commented Oct 11, 2018

jreback commented Nov 1, 2018

WillAyd commented Nov 23, 2018

Labels

4 participants

fjdiod commented Jul 9, 2018 •

edited by WillAyd

Loading

codecov bot commented Jul 9, 2018 •

edited

Loading

fjdiod Jul 26, 2018 •

edited

Loading