Skip to content

unhelpful error message when header is a list of names in read_csv #16338

@pierre-haessig

Description

@pierre-haessig

This is a minor issue about error reporting to the mindless user (me...) who confuses the header and the name argument of read_csv. Basically, when calling read_csv with header=['a', 'b'] (whereas it should be names=['a', 'b']), the error message is crytic:

TypeError: must be str, not int

(pandas 0.20.1, see details below)

Two issues:

  • unhelpful, quite cryptic message, doesn't point in the good direction. E.g. it doesn't explain which argument causes the problem. Of course in the dummy example below, there is just one argument, but in the real case where I got bitten it was messier...
  • it is impossible to debug with %debug magic, because error is raised in the compiled code parsers.pyx

Here is code to reproduce the error message, taken from a IPython session. (First line may be a bit Unix specific, sorry. It's just to create a dummy CSV file)

In [] !echo '1,2\n3,4' > 1234.csv In [] pd.read_csv('1234.csv')	1	2 0	3	4 In [] pd.read_csv('1234.csv', names=['a', 'b']) # proper call	a	b 0	1	2 1	3	4 In [] pd.read_csv('1234.csv', header=['a', 'b']) # beginer's mistake TypeError Traceback (most recent call last) <ipython-input-5-b065bd1f57c6> in <module>() ----> 1 pd.read_csv('1234.csv', header=['a', 'b']) /home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision) 653 skip_blank_lines=skip_blank_lines) 654 --> 655 return _read(filepath_or_buffer, kwds) 656 657 parser_f.__name__ = name /home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 403 404 # Create the parser. --> 405 parser = TextFileReader(filepath_or_buffer, **kwds) 406 407 if chunksize or iterator: /home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds) 760 self.options['has_index_names'] = kwds['has_index_names'] 761 --> 762 self._make_engine(self.engine) 763 764 def close(self): /home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine) 964 def _make_engine(self, engine='c'): 965 if engine == 'c': --> 966 self._engine = CParserWrapper(self.f, **self.options) 967 else: 968 if engine == 'python': /home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds) 1580 kwds['allow_leading_cols'] = self.index_col is not False 1581 -> 1582 self._reader = parsers.TextReader(src, **kwds) 1583 1584 # XXX pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__ (pandas/_libs/parsers.c:5996)() TypeError: must be str, not int 

Expected Output

I'm not expecting a fancy AI-assistant like error message. However, an early check of the header argument should verify, in coherence with the docstring, that header should be int or list of ints.

What do you think? Is it an overkill?

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-2-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_FR.utf8 LOCALE: fr_FR.UTF-8

pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Error ReportingIncorrect or improved errors from pandasIO CSVread_csv, to_csv

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions