BUG: maybe_convert_objects with convert_datetime #19423

lingster · 2018-01-27T08:23:33Z

[ Y] closes BUG: maybe_convert_objects on soft converttof datetime can raise unexpectedly #19359
[ Y] tests added / passed
[ Y] passes git diff upstream/master -u -- "*.py" | flake8 --diff
[ Y] whatsnew entry: bug fix for GH19359

…_19359

codecov · 2018-01-27T14:18:39Z

Codecov Report

Merging #19423 into master will decrease coverage by 0.17%.
The diff coverage is n/a.

@@ Coverage Diff @@ ## master #19423 +/- ## ========================================== - Coverage 91.8% 91.63% -0.18%  ========================================== Files 152 150 -2 Lines 49215 48724 -491 ========================================== - Hits 45181 44646 -535  - Misses 4034 4078 +44

Flag	Coverage Δ
#multiple	`90% <ø> (-0.19%)`	⬇️
#single	`41.74% <ø> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/plotting/_compat.py	`62% <0%> (-28.91%)`	⬇️
pandas/core/missing.py	`84.3% <0%> (-7.35%)`	⬇️
pandas/plotting/_timeseries.py	`60.82% <0%> (-4.49%)`	⬇️
pandas/core/reshape/tile.py	`90.25% <0%> (-3.12%)`	⬇️
pandas/io/html.py	`85.98% <0%> (-2.81%)`	⬇️
pandas/io/formats/format.py	`96.24% <0%> (-2.01%)`	⬇️
pandas/plotting/_converter.py	`65.22% <0%> (-1.59%)`	⬇️
pandas/core/ops.py	`95.52% <0%> (-0.84%)`	⬇️
pandas/util/_decorators.py	`81.66% <0%> (-0.74%)`	⬇️
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01882ba...081817e. Read the comment docs.

jreback · 2018-01-27T16:52:17Z

pandas/_libs/src/inference.pyx


 # we try to coerce datetime w/tz but must all have the same tz
 if seen.datetimetz_:
- if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:


rather than this, I think you can do:

if seen.datetimetz_: from pandas import to_datetime return to_datetime(objects, errors='ignore')

and you can remove the seen.object_ = 1 as this will return a DTI if posible or just the input if not (which is what we want)

Hi, I tried this, but it didn't pass my parametrized tests, the timezones need to be unique, so this is the proposed fix:

# we try to coerce datetime w/tz but must all have the same tz if seen.datetimetz_: unique_types = set() from dateutil import tz for val in objects: item = getattr(val, 'tzinfo', type(val).__name__) # as tzoffset is not hashable, we use __repr__ in our set if isinstance(item, tz.tzoffset): unique_types.add(item.__repr__()) else: unique_types.add(item) if len(unique_types) == 1: from pandas import DatetimeIndex return DatetimeIndex(objects) seen.object_ = 1

this needs to change to something more like

tzs = (getattr(val, 'tzinfo', None) for val in objects) if len(set(tzs)) and all(arg for arg in tzs is not None): ....

you don't need to do all of this checking, we only care if there is 1 and only 1 timezone and no non-timezone aware objects.

if I try to implement as you have suggested I get the following compiler error:

Error compiling Cython file: ------------------------------------------------------------ ... return ints @cython.boundscheck(False) @cython.wraparound(False) def maybe_convert_objects(ndarray[object] objects, bint try_float=0, ^ ------------------------------------------------------------ pandas/_libs/src/inference.pyx:1196:26: Buffer types only allowed as function local variables building 'pandas._libs.lib' extension

I could get around this by the following:

tzs = [getattr(val, 'tzinfo', None) for val in objects] if len(list(filter(None, tzs))) == 1: ...

however this does not work as it will fail in the test cases where you have a timezone aware and non-timezone aware dates:
test.log

hence my original solution of having to check every item and to ensure that all times have the same UTC timezone...

jreback · 2018-01-27T16:52:32Z

pandas/tests/frame/test_apply.py

+ def test_gh_19359_with_and_without_tz(self):
+ # GH #19359
+ def transform_time(x):
+ from dateutil.parser import parse


import at the top

jreback · 2018-01-27T16:53:10Z

pandas/tests/frame/test_apply.py

+ return Series({'time': parse("22:05 UTC+1"),
+ 'title': parse("23:59")})
+
+ applied = DataFrame(["stub"]).apply(transform_time)


result =

construct the expected directly

jreback · 2018-01-27T16:53:41Z

pandas/tests/frame/test_apply.py

+
+ applied = DataFrame(["stub"]).apply(transform_int)
+ assert applied is not None
+ answer = transform_int(1)


instead of duplicating code, pls use parametrize

replaced with parametrize

lingster · 2018-01-28T10:12:41Z

I've refactored the unit tests to use parametrize - learnt something new :)

jreback · 2018-01-31T12:22:32Z

pandas/_libs/src/inference.pyx


 # we try to coerce datetime w/tz but must all have the same tz
 if seen.datetimetz_:
- if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:


this needs to change to something more like

tzs = (getattr(val, 'tzinfo', None) for val in objects) if len(set(tzs)) and all(arg for arg in tzs is not None): ....

you don't need to do all of this checking, we only care if there is 1 and only 1 timezone and no non-timezone aware objects.

jreback · 2018-03-16T22:08:05Z

can you rebase

lingster · 2018-03-20T21:22:16Z

Rebased and updated proposed fix for review

jreback

can you add a whatsnew note (bug fix, reshaping section)

jreback · 2018-03-25T14:36:40Z

pandas/tests/frame/test_apply.py



 class TestDataFrameAggregate(TestData):
+ _multiprocess_can_split_ = True


remove this

jreback · 2018-03-25T14:37:08Z

pandas/_libs/src/inference.pyx

 break

- # we try to coerce datetime w/tz but must all have the same tz
+ # we try to coerce datetime w/tz but must all have the same tz, ie if we have UTC and PST tzinfo then this will not


can you wrap this comment better

jreback · 2018-03-25T14:38:51Z

pandas/_libs/src/inference.pyx

+ unique_types = set()
+ from dateutil import tz
+ for val in objects:
+ item = getattr(val, 'tzinfo', type(val).__name__)


why are you not simply adding add str(item) for each one? this check seems superfluous

jreback · 2018-03-25T14:39:29Z

pandas/tests/frame/test_apply.py

 assert_frame_equal(result, df)

+ @pytest.mark.parametrize('time_in', ['22:05 UTC+1', '22:05'])
+ @pytest.mark.parametrize("test_input", [


test_input -> input
time_in -> time

Can we avoid “input”? Avoid clash with built-in names.

jreback · 2018-03-25T14:39:57Z

pandas/tests/frame/test_apply.py

+ return Series({'time': parse(time_in),
+ 'title': test_input})
+
+ applied = DataFrame(['stub']).apply(transform)


applied -> result

the assert on the applied is uncecessary

jreback · 2018-03-25T14:40:14Z

pandas/tests/frame/test_apply.py

+
+ applied = DataFrame(['stub']).apply(transform)
+ assert applied is not None
+ answer = Series(data=[parse(time_in), test_input],


answer -> expected

jreback · 2018-03-25T14:40:29Z

pandas/tests/frame/test_apply.py

+ assert applied is not None
+ answer = Series(data=[parse(time_in), test_input],
+ index=['time', 'title'])
+ answer.name = 0


you can pass name to the Series constructor, e.g. name=0

jreback · 2018-03-25T14:41:03Z

pandas/tests/frame/test_apply.py

+ parse('15:56 UTC+2'),
+ 42,
+ 3.14159, ])
+ def test_gh_19359(self, time_in, test_input):


add the gh issue number as a comment. you can rename the test name to something more informative

jreback · 2018-07-07T22:49:41Z

@lingster can you rebase and update

jreback · 2018-10-11T01:54:27Z

closing as stale, if you want to continue working, pls ping and we can re-open. you will need to merge master.

drewmassey · 2019-01-21T00:31:00Z

@jreback Is the only thing missing here the rebase and the cleanup from the previous code?
I can tidy this up so we can start retiring some old issues from the repo.
cc: @lingster

jreback · 2019-01-21T00:34:28Z

yes needs a rebase and updating according to comments

lingster added 6 commits January 27, 2018 07:56

test cases and fixes for bug GH pandas-dev#19359

6a87708

BUG: test cases and fixes for GH19359

6467e39

Merge branch 'gh_19359' of https://github.com/lingster/pandas into gh…

0cb1f00

…_19359

Fix flakes errors

a35b812

Fix flakes errors

bbaf8ce

Merge branch 'gh_19359' of https://github.com/lingster/pandas into gh…

a7c1ac0

…_19359

jreback changed the title ~~BUG: fix GH19359~~ BUG: maybe_convert_objects with convert_datetime Jan 27, 2018

jreback added Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions labels Jan 27, 2018

jreback requested changes Jan 27, 2018

View reviewed changes

lingster added 2 commits January 28, 2018 09:30

refactor tests to use parametrize as per pull request

d6dc992

comment on the fact that tzoffset is not hashable

a45f0e6

jreback requested changes Jan 31, 2018

View reviewed changes

lingster added 2 commits March 20, 2018 20:37

merge from upstream

767bf47

updated comment for fix to gh_19359

081817e

jreback requested changes Mar 25, 2018

View reviewed changes

jreback closed this Oct 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: maybe_convert_objects with convert_datetime #19423

BUG: maybe_convert_objects with convert_datetime #19423

Uh oh!

lingster commented Jan 27, 2018

codecov bot commented Jan 27, 2018 •

edited

Loading

jreback Jan 27, 2018

lingster Jan 28, 2018

jreback Jan 31, 2018

lingster Feb 3, 2018 •

edited

Loading

jreback Jan 27, 2018

lingster Jan 30, 2018

jreback Jan 27, 2018

lingster Jan 30, 2018

jreback Jan 27, 2018

lingster Jan 30, 2018

lingster commented Jan 28, 2018

jreback Jan 31, 2018

jreback commented Mar 16, 2018

lingster commented Mar 20, 2018

jreback left a comment

jreback Mar 25, 2018

jreback Mar 25, 2018

jreback Mar 25, 2018

jreback Mar 25, 2018

jbrockmendel Jul 11, 2018

jreback Mar 25, 2018

jreback Mar 25, 2018

jreback Mar 25, 2018

jreback Mar 25, 2018

jreback commented Jul 7, 2018

jreback commented Oct 11, 2018

drewmassey commented Jan 21, 2019

jreback commented Jan 21, 2019

Labels

4 participants



		class TestDataFrameAggregate(TestData):
		_multiprocess_can_split_ = True

Uh oh!

BUG: maybe_convert_objects with convert_datetime #19423

BUG: maybe_convert_objects with convert_datetime #19423

Uh oh!

Conversation

lingster commented Jan 27, 2018

codecov bot commented Jan 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lingster Feb 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lingster commented Jan 28, 2018

Choose a reason for hiding this comment

jreback commented Mar 16, 2018

lingster commented Mar 20, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jul 7, 2018

jreback commented Oct 11, 2018

drewmassey commented Jan 21, 2019

jreback commented Jan 21, 2019

Labels

4 participants

codecov bot commented Jan 27, 2018 •

edited

Loading

lingster Feb 3, 2018 •

edited

Loading