-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: maybe_convert_objects with convert_datetime #19423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@ ## master #19423 +/- ## ========================================== - Coverage 91.8% 91.63% -0.18% ========================================== Files 152 150 -2 Lines 49215 48724 -491 ========================================== - Hits 45181 44646 -535 - Misses 4034 4078 +44
Continue to review full report at Codecov.
|
| | ||
| # we try to coerce datetime w/tz but must all have the same tz | ||
| if seen.datetimetz_: | ||
| if len({getattr(val, 'tzinfo', None) for val in objects}) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than this, I think you can do:
if seen.datetimetz_: from pandas import to_datetime return to_datetime(objects, errors='ignore') and you can remove the seen.object_ = 1 as this will return a DTI if posible or just the input if not (which is what we want)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I tried this, but it didn't pass my parametrized tests, the timezones need to be unique, so this is the proposed fix:
# we try to coerce datetime w/tz but must all have the same tz if seen.datetimetz_: unique_types = set() from dateutil import tz for val in objects: item = getattr(val, 'tzinfo', type(val).__name__) # as tzoffset is not hashable, we use __repr__ in our set if isinstance(item, tz.tzoffset): unique_types.add(item.__repr__()) else: unique_types.add(item) if len(unique_types) == 1: from pandas import DatetimeIndex return DatetimeIndex(objects) seen.object_ = 1 There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to change to something more like
tzs = (getattr(val, 'tzinfo', None) for val in objects) if len(set(tzs)) and all(arg for arg in tzs is not None): .... you don't need to do all of this checking, we only care if there is 1 and only 1 timezone and no non-timezone aware objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I try to implement as you have suggested I get the following compiler error:
Error compiling Cython file: ------------------------------------------------------------ ... return ints @cython.boundscheck(False) @cython.wraparound(False) def maybe_convert_objects(ndarray[object] objects, bint try_float=0, ^ ------------------------------------------------------------ pandas/_libs/src/inference.pyx:1196:26: Buffer types only allowed as function local variables building 'pandas._libs.lib' extension I could get around this by the following:
tzs = [getattr(val, 'tzinfo', None) for val in objects] if len(list(filter(None, tzs))) == 1: ... however this does not work as it will fail in the test cases where you have a timezone aware and non-timezone aware dates:
test.log
hence my original solution of having to check every item and to ensure that all times have the same UTC timezone...
pandas/tests/frame/test_apply.py Outdated
| def test_gh_19359_with_and_without_tz(self): | ||
| # GH #19359 | ||
| def transform_time(x): | ||
| from dateutil.parser import parse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import at the top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/frame/test_apply.py Outdated
| return Series({'time': parse("22:05 UTC+1"), | ||
| 'title': parse("23:59")}) | ||
| | ||
| applied = DataFrame(["stub"]).apply(transform_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result =
construct the expected directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/frame/test_apply.py Outdated
| | ||
| applied = DataFrame(["stub"]).apply(transform_int) | ||
| assert applied is not None | ||
| answer = transform_int(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of duplicating code, pls use parametrize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced with parametrize
| I've refactored the unit tests to use parametrize - learnt something new :) |
| | ||
| # we try to coerce datetime w/tz but must all have the same tz | ||
| if seen.datetimetz_: | ||
| if len({getattr(val, 'tzinfo', None) for val in objects}) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to change to something more like
tzs = (getattr(val, 'tzinfo', None) for val in objects) if len(set(tzs)) and all(arg for arg in tzs is not None): .... you don't need to do all of this checking, we only care if there is 1 and only 1 timezone and no non-timezone aware objects.
| can you rebase |
| Rebased and updated proposed fix for review |
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a whatsnew note (bug fix, reshaping section)
| | ||
| | ||
| class TestDataFrameAggregate(TestData): | ||
| _multiprocess_can_split_ = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
| break | ||
| | ||
| # we try to coerce datetime w/tz but must all have the same tz | ||
| # we try to coerce datetime w/tz but must all have the same tz, ie if we have UTC and PST tzinfo then this will not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you wrap this comment better
| unique_types = set() | ||
| from dateutil import tz | ||
| for val in objects: | ||
| item = getattr(val, 'tzinfo', type(val).__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you not simply adding add str(item) for each one? this check seems superfluous
| assert_frame_equal(result, df) | ||
| | ||
| @pytest.mark.parametrize('time_in', ['22:05 UTC+1', '22:05']) | ||
| @pytest.mark.parametrize("test_input", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_input -> input
time_in -> time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid “input”? Avoid clash with built-in names.
| return Series({'time': parse(time_in), | ||
| 'title': test_input}) | ||
| | ||
| applied = DataFrame(['stub']).apply(transform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied -> result
the assert on the applied is uncecessary
| | ||
| applied = DataFrame(['stub']).apply(transform) | ||
| assert applied is not None | ||
| answer = Series(data=[parse(time_in), test_input], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answer -> expected
| assert applied is not None | ||
| answer = Series(data=[parse(time_in), test_input], | ||
| index=['time', 'title']) | ||
| answer.name = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can pass name to the Series constructor, e.g. name=0
| parse('15:56 UTC+2'), | ||
| 42, | ||
| 3.14159, ]) | ||
| def test_gh_19359(self, time_in, test_input): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the gh issue number as a comment. you can rename the test name to something more informative
| @lingster can you rebase and update |
| closing as stale, if you want to continue working, pls ping and we can re-open. you will need to merge master. |
| yes needs a rebase and updating according to comments |
git diff upstream/master -u -- "*.py" | flake8 --diff