Skip to content

Conversation

@jbrockmendel
Copy link
Member

We could plausibly disallow mean for date types, matching what we do for PeriodDtype.

Fixes 232 xfails, cuts test runtime by 88 seconds.

@jbrockmendel jbrockmendel added the Arrow pyarrow functionality label Jan 26, 2023
result = result.cast(pa_type)
elif pa.types.is_time(pa_type):
# TODO: with time types we should probably retain "unit",
# but not clear how to get that since pa_type.unit raises
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this on an older version of pyarrow?

In [6]: t = pa.time32("ms") In [7]: t.unit Out[7]: 'ms' In [8]: pa.__version__ Out[8]: '10.0.1' In [9]: t = pa.time64("ns") In [10]: t.unit Out[10]: 'ns' In [11]: t = pa.time32("ms") In [12]: t.unit Out[12]: 'ms' 
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it depends on exactly what we're checking. Here pa_type is defined as pa_type = self._data.type which DataType(time32[ms]), which does not have a .unit attribute in 10.0.1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so how do I get the Time32Type(time32[ms]) instead of the DataType(time32[ms]) here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Yeah I don't see a way to do that. Any insight @jorisvandenbossche?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was only recently fixed (should be in pyarrow 11): apache/arrow#14633

When getting the type from the array, it wasn't correctly boxed in the Time specific class, and so the unit attribute is not available. I am not directly sure of a workaround for older pyarrow versions, except for parsing that from the str repr of the type ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so something like?

def get_unit_from_pyarrow_dtype(pa_dtype): if pa_lt_11: return str(pa_dtype).split("[")[-1][:-1] return pa_dtype.unit 
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should work I think

@mroeschke mroeschke added this to the 2.0 milestone Feb 9, 2023
@mroeschke mroeschke merged commit 52306d9 into pandas-dev:main Feb 9, 2023
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the enh-pyarrow-reductions-4 branch February 9, 2023 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality

3 participants