Skip to content

BUG: Pyarrow engine doesn't seem to support \s+ but error message implies it does? #52554

@IgnacioJPickering

Description

@IgnacioJPickering

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas pandas.read_csv("", sep="\s+", engine="pyarrow")

Issue Description

This fails with the following error:

ValueError: the 'pyarrow' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex) 

Expected Behavior

I'm not sure if pyarrow is meant to support \s+. If pyarrow supports it, then this should not fail. If pyarrow does not support it, then I believe the error should be modified to reflect this, since it now seems to imply that \s+ is not interpreted as a regex, so pyarrow should support it.

Update: I looked in the main branch and it seems that pyarrow does not to support \s+, so changing the error message should be enough.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 478d340667831908b5b4bf09a2787a11a14560c9 python : 3.11.0.final.0 python-bits : 64 OS : Linux OS-release : 5.15.0-69-generic Version : #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.0.0 numpy : 1.24.2 pytz : 2023.2 dateutil : 2.8.2 setuptools : 67.6.0 pip : 23.0.1 Cython : None pytest : 7.2.2 hypothesis : None sphinx : 6.1.3 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.12.0 pandas_datareader: None bs4 : None bottleneck : None brotli : fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None 

Metadata

Metadata

Assignees

Labels

Arrowpyarrow functionalityBugError ReportingIncorrect or improved errors from pandasIO CSVread_csv, to_csv

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions