BUG: numpy.split on non-UTC changes original time (#14042) #17255

wooyekim · 2017-08-15T08:12:55Z

Root cause: When a DatetimeIndex is created by array_wrap,
round-trip of timezone occured

Solution: Remove the timezone of the original index when creating
a DatetimeIndex via array_wrap and apply the timezone later

closes numpy.split on non-UTC, tz-aware data undergoes UTC roundtrip #14042
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Root cause: When a DatetimeIndex is created by __array_wrap__, round-trip of timezone occured Solution: Remove the timezone of the original index when creating a DatetimeIndex via __array_wrap__ and apply the timezone later

codecov · 2017-08-15T09:05:25Z

Codecov Report

Merging #17255 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

@@ Coverage Diff @@ ## master #17255 +/- ## ========================================== - Coverage 91.03% 90.99% -0.05%  ========================================== Files 162 162 Lines 49527 49533 +6 ========================================== - Hits 45086 45071 -15  - Misses 4441 4462 +21

Flag	Coverage Δ
#multiple	`88.77% <100%> (-0.03%)`	⬇️
#single	`40.25% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/base.py	`95.94% <100%> (+0.01%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.23% <0%> (-1.82%)`	⬇️
pandas/core/frame.py	`97.72% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f25426...d928221. Read the comment docs.

codecov · 2017-08-15T09:05:47Z

Codecov Report

Merging #17255 into master will decrease coverage by 0.24%.
The diff coverage is 100%.

@@ Coverage Diff @@ ## master #17255 +/- ## ========================================== - Coverage 91.24% 90.99% -0.25%  ========================================== Files 163 162 -1 Lines 50091 49536 -555 ========================================== - Hits 45704 45074 -630  - Misses 4387 4462 +75

Flag	Coverage Δ
#multiple	`88.77% <100%> (-0.26%)`	⬇️
#single	`40.25% <0%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/base.py	`95.94% <100%> (-0.48%)`	⬇️
pandas/io/s3.py	`0% <0%> (-85%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/tests/plotting/__init__.py	`77.77% <0%> (-22.23%)`	⬇️
pandas/core/tools/datetimes.py	`66.85% <0%> (-16.12%)`	⬇️
pandas/util/_decorators.py	`66% <0%> (-14.71%)`	⬇️
pandas/compat/pickle_compat.py	`69.51% <0%> (-6.1%)`	⬇️
pandas/core/dtypes/missing.py	`87.19% <0%> (-3.43%)`	⬇️
pandas/core/indexes/range.py	`92.81% <0%> (-2.85%)`	⬇️
pandas/util/_validators.py	`93.75% <0%> (-2.6%)`	⬇️
... and 88 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5959ee3...a1023f5. Read the comment docs.

jreback · 2017-08-15T10:07:45Z

doc/source/whatsnew/v0.21.0.txt

 - Fixes ``DataFrame.loc`` for setting with alignment and tz-aware ``DatetimeIndex`` (:issue:`16889`)
 - Avoids ``IndexError`` when passing an Index or Series to ``.iloc`` with older numpy (:issue:`17193`)
 - Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`)
+- Fixes bug for ``numpy.split`` on pandas dataframe with non-utc timezone (:issue:`14042`)


bug in numpy array_wrap with a pandas Series/Index.

jreback · 2017-08-15T10:12:29Z

pandas/core/indexes/base.py

 attrs = self._get_attributes_dict()
 attrs = self._maybe_update_attributes(attrs)
- return Index(result, **attrs)
+ from pandas.core.indexes.datetimes import DatetimeIndex


add this

diff --git a/pandas/core/indexes/base.py b/pandas/core/indexes/base.py index de6221987..865e92201 100644 --- a/pandas/core/indexes/base.py +++ b/pandas/core/indexes/base.py @@ -99,6 +99,9 @@ def _new_Index(cls, d): if issubclass(cls, ABCPeriodIndex): from pandas.core.indexes.period import _new_PeriodIndex return _new_PeriodIndex(cls, **d) + elif issubclass(cls, ABCDatetimeIndex): + from pandas.core.indexes.datetimes import _new_DatetimeIndex + return _new_DatetimeIndex(cls, **d) return cls.__new__(cls, **d)

then you can simply call
return _new_Index(result attrs)

jreback · 2017-08-15T10:13:05Z

pandas/tests/indexes/datetimes/test_datetime.py

 assert result == expected
+
+ def test_split_date_range_with_timezone(self):
+ # https://github.com/pandas-dev/pandas/issues/14042


add a 1-liner about what you are testing here

jreback · 2017-08-15T10:13:53Z

pandas/tests/indexes/datetimes/test_datetime.py

+ # https://github.com/pandas-dev/pandas/issues/14042
+ idx = DatetimeIndex(['2016-01-01 00:00:00', '2016-01-02 00:00:00'],
+ tz='Asia/Seoul')
+ split = np.split(idx, indices_or_sections=[])


use an tm.assert_index_equal and test for each piece.

Also test using a Series (can be here is ok)

ideally we could exercise more numpy ufuncs for 1d reshaping ops here as well.

I found that Series does not suffer from this issue. But I'll leave the testcase.

pep8speaks · 2017-08-15T15:09:42Z

Hello @wooyekim! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 29, 2017 at 14:37 Hours UTC

jreback · 2017-08-18T00:43:37Z

pandas/core/indexes/base.py

 attrs = self._get_attributes_dict()
 attrs = self._maybe_update_attributes(attrs)
- return Index(result, **attrs)
+ from pandas.core.dtypes.generic import ABCDatetimeIndex


not sure if you saw my comment. pls move this logic to _new_Index (and the import can be at the top of the file)

jreback · 2017-09-13T01:19:28Z

can you rebase / update

jreback

can you rebase

jreback · 2017-10-28T15:52:02Z

doc/source/whatsnew/v0.21.0.txt

 - Bug in reindexing on an empty ``CategoricalIndex`` (:issue:`16770`)
 - Fixes ``DataFrame.loc`` for setting with alignment and tz-aware ``DatetimeIndex`` (:issue:`16889`)
 - Avoids ``IndexError`` when passing an Index or Series to ``.iloc`` with older numpy (:issue:`17193`)
 - Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`)


move to 0.21.1

wooyekim added 3 commits August 15, 2017 16:44

BUG: numpy.split on non-UTC changes original time (#14042)

bf8c8cd

Root cause: When a DatetimeIndex is created by __array_wrap__, round-trip of timezone occured Solution: Remove the timezone of the original index when creating a DatetimeIndex via __array_wrap__ and apply the timezone later

Fix indentation issue

cd5982a

Write whatsnew

d928221

wooyekim changed the title ~~Issue14042~~ BUG: numpy.split on non-UTC changes original time (#14042) Aug 15, 2017

jreback requested changes Aug 15, 2017

View reviewed changes

jreback added Bug Compat pandas objects compatability with Numpy or Python functions Timezones Timezone data dtype labels Aug 15, 2017

Fix requested changes

d059057

Fix PEP8 whitespace issues

f628072

jreback requested changes Aug 18, 2017

View reviewed changes

jreback requested changes Oct 28, 2017

View reviewed changes

wooyekim added 3 commits October 29, 2017 23:28

Merge branch 'master' into Issue14042

a14458f

Update v0.21.0.txt

2893048

Update v0.22.0.txt

a1023f5

wooyekim closed this Oct 29, 2017

wooyekim deleted the Issue14042 branch October 29, 2017 14:50

jreback mentioned this pull request Oct 29, 2017

BUG: numpy.split on non-UTC changes original time (#14042) #18019

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: numpy.split on non-UTC changes original time (#14042) #17255

BUG: numpy.split on non-UTC changes original time (#14042) #17255

Uh oh!

wooyekim commented Aug 15, 2017

codecov bot commented Aug 15, 2017 •

edited

Loading

codecov bot commented Aug 15, 2017 •

edited

Loading

jreback Aug 15, 2017

jreback Aug 15, 2017

jreback Aug 15, 2017

jreback Aug 15, 2017

jreback Aug 15, 2017

wooyekim Aug 15, 2017

pep8speaks commented Aug 15, 2017 •

edited

Loading

jreback Aug 18, 2017

jreback commented Sep 13, 2017

jreback left a comment

jreback Oct 28, 2017

Labels

3 participants

Uh oh!

BUG: numpy.split on non-UTC changes original time (#14042) #17255

BUG: numpy.split on non-UTC changes original time (#14042) #17255

Uh oh!

Conversation

wooyekim commented Aug 15, 2017

codecov bot commented Aug 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

codecov bot commented Aug 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Aug 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on October 29, 2017 at 14:37 Hours UTC

Choose a reason for hiding this comment

jreback commented Sep 13, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Labels

3 participants

codecov bot commented Aug 15, 2017 •

edited

Loading

codecov bot commented Aug 15, 2017 •

edited

Loading

pep8speaks commented Aug 15, 2017 •

edited

Loading