BUG: timezone lost in groupby-agg with cython functions

xref #10668 (for more examples)

Hello!

I'm running into some odd behavior trying to group rows of a pandas dataframe by ID and then selecting out max/min datetimes (w/ timezones). This is with python 2.7, pandas 0.18.1 and numpy 1.11.1 (I saw in earlier posts a similar problem was apparently fixed w/ pandas 0.15).

Specifically, if I try:
print orders.groupby('OrderID')['start_time'].agg(np.min).iloc[:5]

I get:

OrderID O161101XVS100000044 2016-11-01 12:03:12.920000-04:00 O161101XVS100000047 2016-11-01 12:03:36.693000-04:00 O161101XVS100000098 2016-11-01 12:09:08.330000-04:00 O161101XVS100000122 2016-11-01 12:09:59.950000-04:00 O161101XVS100000152 2016-11-01 12:11:29.790000-04:00 Name: start_time, dtype: datetime64[ns, US/Eastern]

Where the raw data had times closer to 8 am (US/Eastern). In other words, it reverted back to UTC times, even though it says it's eastern times, and has UTC-4 offset.

But if I instead try:
print orders.groupby('OrderID')['start_time'].agg(lambda x: np.min(x)).iloc[:5]

I now get:

OrderID O161101XVS100000044 2016-11-01 08:03:12.920000-04:00 O161101XVS100000047 2016-11-01 08:03:36.693000-04:00 O161101XVS100000098 2016-11-01 08:09:08.330000-04:00 O161101XVS100000122 2016-11-01 08:09:59.950000-04:00 O161101XVS100000152 2016-11-01 08:11:29.790000-04:00 Name: start_time, dtype: datetime64[ns, US/Eastern]

Which is the behavior I intended. This second method is vastly slower, and I would have assumed the two approaches would yield identical results ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: timezone lost in groupby-agg with cython functions #15426

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: timezone lost in groupby-agg with cython functions #15426

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions