-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
Arrowpyarrow functionalitypyarrow functionalityIO CSVread_csv, to_csvread_csv, to_csvPerformanceMemory or execution speed performanceMemory or execution speed performance
Milestone
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this issue exists on the latest version of pandas.
-
I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
dr = pd.Series(pd.date_range("2019-12-31", periods=1_000_000, freq="s").astype(pd.ArrowDtype(pa.timestamp(unit="ns"))), name="a") dr.to_csv("tmp.csv") pd.read_csv("tmp.csv", engine="pyarrow", dtype_backend="pyarrow", parse_dates=["a"]) The read call takes 1.6 seconds, without parse dates it's down to 0.01 and pyarrow already enforces timestamp
int64[pyarrow] a timestamp[s][pyarrow] dtype: object This was introduced by the dtype backend I guess, so would like to fix soonish
Installed Versions
main
Prior Performance
No response
Metadata
Metadata
Assignees
Labels
Arrowpyarrow functionalitypyarrow functionalityIO CSVread_csv, to_csvread_csv, to_csvPerformanceMemory or execution speed performanceMemory or execution speed performance