API: timestamp resolution inference: default to microseconds when possible #62031

jorisvandenbossche · 2025-08-03T07:36:52Z

Draft PR for #58989

This should already make sure that we consistently use 'us' when converting non-numeric data in pd.to_datetime and pd.Timestamp, but if we want to do this, this PR still requires updating lots of tests and docs (and whatsnew) and cleaning up.

Currently the changes here will ensure that we use microseconds more consistently when inferring the resolution while creating datetime64 data. Exceptions: if the data don't fit in the range of us (either because out of bounds (use ms or s) or because it has nanoseconds or below (use ns)), or if the input data already has a resolution defined (for Timestamp objects, or numpy datetime64 data).

…sible

jorisvandenbossche · 2025-08-03T07:40:24Z

pandas/tests/tools/test_to_datetime.py

 tm.assert_index_equal(result, expected)
+
+
+class TestToDatetimeInferUnit:


I assume this is already tested elsewhere (in various places?), but while developing it just wrote some simple tests with the different cases I encountered.

For example now I see there is pandas/tests/tslibs/test_array_to_datetime.py::TestArrayToDatetimeResolutionInference and pandas/tests/tslibs/test_strptime.py::TestArrayStrptimeResolutionInference that are failing, so can integrate those tests.

jorisvandenbossche · 2025-08-13T17:47:29Z

@jbrockmendel would you have time to give this a review?

jbrockmendel · 2025-08-13T17:58:01Z

Yes, but its in line behind a few other reviews i owe.

jbrockmendel · 2025-08-15T23:02:17Z

pandas/_libs/tslibs/dtypes.pyx

+ np.iinfo(np.int64).min + 1, NPY_DATETIMEUNIT.NPY_FR_us, &dts_us_min
+)
+pandas_datetime_to_datetimestruct(
+ np.iinfo(np.int64).max, NPY_DATETIMEUNIT.NPY_FR_us, &dts_us_max


util.INT64_MAX

if you implement this in np_datetime.pyx you can reuse _NS_MIN_DTS, _NS_MAX_DTS, etc

jbrockmendel · 2025-08-15T23:03:56Z

pandas/_libs/tslibs/dtypes.pyx

+ # Similar as above, but taking the actual datetime value in account,
+ # defaulting to 'us' if possible.
+ if reso == NPY_DATETIMEUNIT.NPY_FR_GENERIC:
+ return NPY_DATETIMEUNIT.NPY_FR_ns


should this be FR_us?

jbrockmendel · 2025-08-15T23:04:38Z

pandas/_libs/tslibs/dtypes.pyx

+ # defaulting to 'us' if possible.
+ if reso == NPY_DATETIMEUNIT.NPY_FR_GENERIC:
+ return NPY_DATETIMEUNIT.NPY_FR_ns
+ # if dts.ps != 0:


why are the .ps checks not necessary?

jbrockmendel · 2025-08-15T23:06:39Z

Small comments, no complaints about the approach. Haven't looked at the tests yet since i don't expect any surprises; will do so once green.

github-actions · 2025-09-15T00:08:15Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

jbrockmendel · 2025-10-21T15:55:03Z

pandas/tests/io/parser/test_parse_dates.py

 }
 )
+ if parser.engine == "pyarrow":
+ expected["a"] = expected["a"].dt.as_unit("s")


had to make the same change in my branch. does this mean the pyarrow csv parser is giving back second unit? i thought it standardized on "us"

pyarrow doesn't really have a default, essentially you always have to specify a precision yourself when manually creating an array / specifying the type.
But for string parsing in the CSV reader, it will indeed be data dependent unfortunately (but for example, for something like "2012-01-01" in a CSV file, it will also use a date type and not timestamp)

jbrockmendel · 2025-10-21T15:59:32Z

pandas/tests/groupby/test_timegrouper.py

 # GH 5869
 # datetimelike dtype conversion from int
 df = DataFrame({"A": Timestamp("20130101"), "B": np.arange(5)})
- # TODO: can we retain second reso in .apply here?


is this comment no longer desirable?

I suppose there might still be value in the question about retaining resolution in apply, but it is no longer the case for this test here, since the test data will use a default of microseconds and then also result in microseconds

jbrockmendel · 2025-10-21T18:53:28Z

im having second thoughts about pushing forward on my branch, so posting the most-worth-salvaging parts:

cdef NPY_DATETIMEUNIT get_supported_reso_for_dts( npy_datetimestruct* dts ): """ If the dts can be stored with NPY_FR_ns, do so. Otherwise find the nearest unit that it can be stored in. Raise if this entails lossy rounding. """ # First, find the highest resolution such that storing dts in it won't # overflow. if cmp_dtstructs(dts, &_NS_MIN_DTS, Py_GE) and cmp_dtstructs(dts, &_NS_MAX_DTS, Py_LE): # i.e. dts >= ns_dts_min and dts <= ns_dts_max: creso = NPY_FR_ns elif cmp_dtstructs(dts, &_US_MIN_DTS, Py_GE) and cmp_dtstructs(dts, &_US_MAX_DTS, Py_LE): creso = NPY_FR_us elif cmp_dtstructs(dts, &_MS_MIN_DTS, Py_GE) and cmp_dtstructs(dts, &_MS_MAX_DTS, Py_LE): creso = NPY_FR_ms else: creso = NPY_FR_s check_dts_bounds(dts, creso) # Next, check that storing in that resolution doesn't mean lossy rounding. if creso == NPY_FR_ns: pass elif creso == NPY_FR_us: if dts.ps // 1000 != 0: check_dts_bounds(dts, NPY_FR_ns) elif creso == NPY_FR_ms: if dts.ps // 1000 != 0 or dts.us % 1000 != 0: check_dts_bounds(dts, NPY_FR_us) else: if dts.ps // 1000 != 0 or dts.us != 0: check_dts_bounds(dts, NPY_FR_ms) return creso

API: timestamp resolution inference: default to microseconds when pos…

7421077

…sible

jorisvandenbossche mentioned this pull request Aug 3, 2025

API: timestamp resolution inference - default to one unit (if possible) instead of being data-dependent? #58989

Closed

jorisvandenbossche added Datetime Datetime data dtype Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timestamp pd.Timestamp and associated methods labels Aug 3, 2025

jorisvandenbossche commented Aug 3, 2025

View reviewed changes

jorisvandenbossche added 12 commits August 10, 2025 13:19

Merge remote-tracking branch 'upstream/main' into default-us-reso

fcb2953

update tests

d4a5536

fix timestamp replace

fd6c6df

update indexes / indexing tests

c940c86

update frame and series tests

21954fb

update arrays, base, dtypes tests

9faa930

update excel tests

cf12f9b

update groupby / resample / reshape tests

ee40976

update remaining io tests

461ef56

update test_tzlocal_offset

c67f1f6

Merge remote-tracking branch 'upstream/main' into default-us-reso

991c986

update docstrings

e3d7b64

jbrockmendel reviewed Aug 15, 2025

View reviewed changes

github-actions bot added the Stale label Sep 15, 2025

jbrockmendel reviewed Oct 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: timestamp resolution inference: default to microseconds when possible #62031

API: timestamp resolution inference: default to microseconds when possible #62031

Uh oh!

jorisvandenbossche commented Aug 3, 2025

jorisvandenbossche Aug 3, 2025

jorisvandenbossche commented Aug 13, 2025

jbrockmendel commented Aug 13, 2025

jbrockmendel Aug 15, 2025

jbrockmendel Oct 21, 2025

jbrockmendel Aug 15, 2025

jbrockmendel Aug 15, 2025

jbrockmendel commented Aug 15, 2025

github-actions bot commented Sep 15, 2025

jbrockmendel Oct 21, 2025

jorisvandenbossche Oct 29, 2025

jbrockmendel Oct 21, 2025

jorisvandenbossche Oct 29, 2025

jbrockmendel commented Oct 21, 2025

Labels

2 participants

		tm.assert_index_equal(result, expected)


		class TestToDatetimeInferUnit:

Uh oh!

API: timestamp resolution inference: default to microseconds when possible #62031

Are you sure you want to change the base?

API: timestamp resolution inference: default to microseconds when possible #62031

Uh oh!

Conversation

jorisvandenbossche commented Aug 3, 2025

Choose a reason for hiding this comment

jorisvandenbossche commented Aug 13, 2025

jbrockmendel commented Aug 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Aug 15, 2025

github-actions bot commented Sep 15, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Oct 21, 2025

Labels

2 participants