Skip to content

Conversation

@mroeschke
Copy link
Member

@mroeschke mroeschke commented Apr 6, 2023

@mroeschke mroeschke added Datetime Datetime data dtype Arrow pyarrow functionality labels Apr 6, 2023
@mroeschke mroeschke added this to the 2.1 milestone Apr 6, 2023
@mroeschke
Copy link
Member Author

@phofl do you prefer backporting this too?

@phofl
Copy link
Member

phofl commented Apr 11, 2023

No real opinion here, I think the string equivalent is used more widely so was better to back port the split pr

Comment on lines 2121 to 2125
@property
def _dt_is_quarter_start(self):
is_correct_month = pc.is_in(pc.month(self._pa_array), pa.array([1, 4, 7, 10]))
is_first_day = pc.equal(pc.day(self._pa_array), 1)
return type(self)(pc.and_(is_correct_month, is_first_day))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use pc.floor_temporal? E.g:

return pc.equal(pc.floor_temporal(self._pa_array, unit='quarter'), self._pa_array)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For is_quarter_start, even if the date has non zero hour, minute, etc components we want to return true if the month and day are correct, so unfortunately we don't want the flooring/ceiling of those components e.g.

In [15]: ser Out[15]: 0 2023-11-30 03:00:00 1 2023-01-01 03:00:00 2 2023-03-31 03:00:00 3 NaT dtype: datetime64[ns] In [16]: ser.dt.is_quarter_start Out[16]: 0 False 1 True 2 False 3 False dtype: bool 
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. How about:

pc.equal(pc.floor_temporal(self._pa_array, unit='quarter'), pc.floor_temporal(self._pa_array, unit='day'))
Comment on lines 2127 to 2132
@property
def _dt_is_quarter_end(self):
is_correct_month = pc.is_in(pc.month(self._pa_array), pa.array([3, 6, 9, 12]))
plus_one_day = pc.add(self._pa_array, pa.scalar(datetime.timedelta(days=1)))
is_first_day = pc.equal(pc.day(plus_one_day), 1)
return type(self)(pc.and_(is_correct_month, is_first_day))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above but with pc.ceil_temporal.

Comment on lines 2134 to 2146
@property
def _dt_days_in_month(self):
pa_array = self._pa_array
if self.dtype.pyarrow_dtype.unit != "ns":
pa_array = self._pa_array.cast(
pa.timestamp("ns", tz=self.dtype.pyarrow_dtype.tz)
)
pa_array_int = pa_array.cast(pa.int64())
if self._hasna:
pa_array_int = pa_array_int.fill_null(iNaT)
np_result = get_date_field(pa_array_int.to_numpy(), "dim")
mask = np_result == -1
return type(self)(pa.array(np_result, mask=mask))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might make sense to implement upstream. Otherwise this might work:

pc.days_between(pc.floor_temporal(self._pa_array, unit='month'), pc.ceil_temporal(self._pa_array, unit='month'))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This works perfectly

Comment on lines 2298 to 2301
if ambiguous != "raise":
raise NotImplementedError(f"{ambiguous=} is not supported")
if nonexistent != "raise":
raise NotImplementedError(f"{nonexistent=} is not supported")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these checks really required? Do we actually need ambiguous/nonexistent?

Comment on lines 2303 to 2309
if tz is None:
result = self._pa_array.cast(pa.timestamp(current_unit, "UTC")).cast(
pa.timestamp(current_unit)
)
else:
pa_tz = str(tz)
result = self._pa_array.cast(pa.timestamp(current_unit, pa_tz))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably reduce this to just:

Suggested change
if tz is None:
result = self._pa_array.cast(pa.timestamp(current_unit, "UTC")).cast(
pa.timestamp(current_unit)
)
else:
pa_tz = str(tz)
result = self._pa_array.cast(pa.timestamp(current_unit, pa_tz))
result = self._pa_array.cast(pa.timestamp(current_unit, pa_tz))
def _dt_is_quarter_end(self):
result = pc.equal(
pc.ceil_temporal(self._pa_array, unit="quarter"),
pc.ceil_temporal(self._pa_array, unit="day"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this have to be:

Suggested change
pc.ceil_temporal(self._pa_array, unit="day"),
pc.floor_temporal(self._pa_array, unit="day"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so I just noticed pc.ceil_temporal(self._pa_array, unit="quarter") will return the start of the new quarter

(Pdb) self._pa_array <pyarrow.lib.ChunkedArray object at 0x14764cb30> [ [ 2023-11-30 03:00:00.000000, 2023-01-01 03:00:00.000000, 2023-03-31 03:00:00.000000, null ] ] (Pdb) pc.ceil_temporal(self._pa_array, unit="quarter") <pyarrow.lib.ChunkedArray object at 0x1478e1860> [ [ 2024-01-01 00:00:00.000000, 2023-04-01 00:00:00.000000, 2023-04-01 00:00:00.000000, null ] ] (Pdb) pc.floor_temporal(self._pa_array, unit="day") <pyarrow.lib.ChunkedArray object at 0x1478e1950> [ [ 2023-11-30 00:00:00.000000, 2023-01-01 00:00:00.000000, 2023-03-31 00:00:00.000000, null ] ] 

so I would need to check if pc.floor_temporal(self._pa_array, unit="day") + 1 day = pc.ceil_temporal(self._pa_array, unit="quarter")?

Copy link
Contributor

@rok rok Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! So something like:

return pc.equal(pc.days_between(pc.floor_temporal(array, unit="day"), pc.ceil_temporal(array, unit="quarter")), 1)
@mroeschke
Copy link
Member Author

Thanks for all the reviews @rok!

@mroeschke
Copy link
Member Author

Going to merge this in. Can address any follow ups if needed

@mroeschke mroeschke merged commit 367d24f into pandas-dev:main Apr 21, 2023
@mroeschke mroeschke deleted the enh/arrow/more_dt branch April 21, 2023 17:59
Rylie-W pushed a commit to Rylie-W/pandas that referenced this pull request May 19, 2023
…andas-dev#52503) * Add more properties & attributes * Add issue number * Add xfails * Simplify days_in_month * Add tz_convert * Undo quarter * Add another issue * simplify is_quarter * undo test * simplify * fix is_quarter_end * Address is_month_end * Remove unused
@rhshadrach rhshadrach mentioned this pull request Aug 15, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality Datetime Datetime data dtype

3 participants