-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Open
Labels
Arrowpyarrow functionalitypyarrow functionalityBugError ReportingIncorrect or improved errors from pandasIncorrect or improved errors from pandasIO CSVread_csv, to_csvread_csv, to_csv
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas pandas.read_csv("", sep="\s+", engine="pyarrow")Issue Description
This fails with the following error:
ValueError: the 'pyarrow' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex) Expected Behavior
I'm not sure if pyarrow is meant to support \s+. If pyarrow supports it, then this should not fail. If pyarrow does not support it, then I believe the error should be modified to reflect this, since it now seems to imply that \s+ is not interpreted as a regex, so pyarrow should support it.
Update: I looked in the main branch and it seems that pyarrow does not to support \s+, so changing the error message should be enough.
Installed Versions
INSTALLED VERSIONS ------------------ commit : 478d340667831908b5b4bf09a2787a11a14560c9 python : 3.11.0.final.0 python-bits : 64 OS : Linux OS-release : 5.15.0-69-generic Version : #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.0.0 numpy : 1.24.2 pytz : 2023.2 dateutil : 2.8.2 setuptools : 67.6.0 pip : 23.0.1 Cython : None pytest : 7.2.2 hypothesis : None sphinx : 6.1.3 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.12.0 pandas_datareader: None bs4 : None bottleneck : None brotli : fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None Metadata
Metadata
Assignees
Labels
Arrowpyarrow functionalitypyarrow functionalityBugError ReportingIncorrect or improved errors from pandasIncorrect or improved errors from pandasIO CSVread_csv, to_csvread_csv, to_csv