Skip to content

PERF: read_csv should check if column is already a datetime column before initiating the conversion #52546

@phofl

Description

@phofl

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this issue exists on the latest version of pandas.

  • I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

dr = pd.Series(pd.date_range("2019-12-31", periods=1_000_000, freq="s").astype(pd.ArrowDtype(pa.timestamp(unit="ns"))), name="a") dr.to_csv("tmp.csv") pd.read_csv("tmp.csv", engine="pyarrow", dtype_backend="pyarrow", parse_dates=["a"]) 

The read call takes 1.6 seconds, without parse dates it's down to 0.01 and pyarrow already enforces timestamp

 int64[pyarrow] a timestamp[s][pyarrow] dtype: object 

This was introduced by the dtype backend I guess, so would like to fix soonish

Installed Versions

main

Prior Performance

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityIO CSVread_csv, to_csvPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions