Skip to content

Conversation

@lingster
Copy link

@codecov
Copy link

codecov bot commented Jan 27, 2018

Codecov Report

Merging #19423 into master will decrease coverage by 0.17%.
The diff coverage is n/a.

Impacted file tree graph

@@ Coverage Diff @@ ## master #19423 +/- ## ========================================== - Coverage 91.8% 91.63% -0.18%  ========================================== Files 152 150 -2 Lines 49215 48724 -491 ========================================== - Hits 45181 44646 -535  - Misses 4034 4078 +44
Flag Coverage Δ
#multiple 90% <ø> (-0.19%) ⬇️
#single 41.74% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/plotting/_compat.py 62% <0%> (-28.91%) ⬇️
pandas/core/missing.py 84.3% <0%> (-7.35%) ⬇️
pandas/plotting/_timeseries.py 60.82% <0%> (-4.49%) ⬇️
pandas/core/reshape/tile.py 90.25% <0%> (-3.12%) ⬇️
pandas/io/html.py 85.98% <0%> (-2.81%) ⬇️
pandas/io/formats/format.py 96.24% <0%> (-2.01%) ⬇️
pandas/plotting/_converter.py 65.22% <0%> (-1.59%) ⬇️
pandas/core/ops.py 95.52% <0%> (-0.84%) ⬇️
pandas/util/_decorators.py 81.66% <0%> (-0.74%) ⬇️
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01882ba...081817e. Read the comment docs.

@jreback jreback changed the title BUG: fix GH19359 BUG: maybe_convert_objects with convert_datetime Jan 27, 2018
@jreback jreback added Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions labels Jan 27, 2018

# we try to coerce datetime w/tz but must all have the same tz
if seen.datetimetz_:
if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than this, I think you can do:

if seen.datetimetz_: from pandas import to_datetime return to_datetime(objects, errors='ignore') 

and you can remove the seen.object_ = 1 as this will return a DTI if posible or just the input if not (which is what we want)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I tried this, but it didn't pass my parametrized tests, the timezones need to be unique, so this is the proposed fix:

# we try to coerce datetime w/tz but must all have the same tz if seen.datetimetz_: unique_types = set() from dateutil import tz for val in objects: item = getattr(val, 'tzinfo', type(val).__name__) # as tzoffset is not hashable, we use __repr__ in our set if isinstance(item, tz.tzoffset): unique_types.add(item.__repr__()) else: unique_types.add(item) if len(unique_types) == 1: from pandas import DatetimeIndex return DatetimeIndex(objects) seen.object_ = 1 
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to change to something more like

tzs = (getattr(val, 'tzinfo', None) for val in objects) if len(set(tzs)) and all(arg for arg in tzs is not None): .... 

you don't need to do all of this checking, we only care if there is 1 and only 1 timezone and no non-timezone aware objects.

Copy link
Author

@lingster lingster Feb 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I try to implement as you have suggested I get the following compiler error:

Error compiling Cython file: ------------------------------------------------------------ ... return ints @cython.boundscheck(False) @cython.wraparound(False) def maybe_convert_objects(ndarray[object] objects, bint try_float=0, ^ ------------------------------------------------------------ pandas/_libs/src/inference.pyx:1196:26: Buffer types only allowed as function local variables building 'pandas._libs.lib' extension 

I could get around this by the following:

 tzs = [getattr(val, 'tzinfo', None) for val in objects] if len(list(filter(None, tzs))) == 1: ... 

however this does not work as it will fail in the test cases where you have a timezone aware and non-timezone aware dates:
test.log

hence my original solution of having to check every item and to ensure that all times have the same UTC timezone...

def test_gh_19359_with_and_without_tz(self):
# GH #19359
def transform_time(x):
from dateutil.parser import parse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import at the top

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return Series({'time': parse("22:05 UTC+1"),
'title': parse("23:59")})

applied = DataFrame(["stub"]).apply(transform_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result =

construct the expected directly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


applied = DataFrame(["stub"]).apply(transform_int)
assert applied is not None
answer = transform_int(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of duplicating code, pls use parametrize

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with parametrize

@lingster
Copy link
Author

I've refactored the unit tests to use parametrize - learnt something new :)


# we try to coerce datetime w/tz but must all have the same tz
if seen.datetimetz_:
if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to change to something more like

tzs = (getattr(val, 'tzinfo', None) for val in objects) if len(set(tzs)) and all(arg for arg in tzs is not None): .... 

you don't need to do all of this checking, we only care if there is 1 and only 1 timezone and no non-timezone aware objects.

@jreback
Copy link
Contributor

jreback commented Mar 16, 2018

can you rebase

@lingster
Copy link
Author

Rebased and updated proposed fix for review

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a whatsnew note (bug fix, reshaping section)



class TestDataFrameAggregate(TestData):
_multiprocess_can_split_ = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

break

# we try to coerce datetime w/tz but must all have the same tz
# we try to coerce datetime w/tz but must all have the same tz, ie if we have UTC and PST tzinfo then this will not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you wrap this comment better

unique_types = set()
from dateutil import tz
for val in objects:
item = getattr(val, 'tzinfo', type(val).__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you not simply adding add str(item) for each one? this check seems superfluous

assert_frame_equal(result, df)

@pytest.mark.parametrize('time_in', ['22:05 UTC+1', '22:05'])
@pytest.mark.parametrize("test_input", [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_input -> input
time_in -> time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid “input”? Avoid clash with built-in names.

return Series({'time': parse(time_in),
'title': test_input})

applied = DataFrame(['stub']).apply(transform)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied -> result

the assert on the applied is uncecessary


applied = DataFrame(['stub']).apply(transform)
assert applied is not None
answer = Series(data=[parse(time_in), test_input],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answer -> expected

assert applied is not None
answer = Series(data=[parse(time_in), test_input],
index=['time', 'title'])
answer.name = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can pass name to the Series constructor, e.g. name=0

parse('15:56 UTC+2'),
42,
3.14159, ])
def test_gh_19359(self, time_in, test_input):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the gh issue number as a comment. you can rename the test name to something more informative

@jreback
Copy link
Contributor

jreback commented Jul 7, 2018

@lingster can you rebase and update

@jreback
Copy link
Contributor

jreback commented Oct 11, 2018

closing as stale, if you want to continue working, pls ping and we can re-open. you will need to merge master.

@jreback jreback closed this Oct 11, 2018
@drewmassey
Copy link

@jreback Is the only thing missing here the rebase and the cleanup from the previous code?
I can tidy this up so we can start retiring some old issues from the repo.
cc: @lingster

@jreback
Copy link
Contributor

jreback commented Jan 21, 2019

yes needs a rebase and updating according to comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions

4 participants