-
- Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
This is a minor issue about error reporting to the mindless user (me...) who confuses the header and the name argument of read_csv. Basically, when calling read_csv with header=['a', 'b'] (whereas it should be names=['a', 'b']), the error message is crytic:
TypeError: must be str, not int
(pandas 0.20.1, see details below)
Two issues:
- unhelpful, quite cryptic message, doesn't point in the good direction. E.g. it doesn't explain which argument causes the problem. Of course in the dummy example below, there is just one argument, but in the real case where I got bitten it was messier...
- it is impossible to debug with %debug magic, because error is raised in the compiled code
parsers.pyx
Here is code to reproduce the error message, taken from a IPython session. (First line may be a bit Unix specific, sorry. It's just to create a dummy CSV file)
In [] !echo '1,2\n3,4' > 1234.csv In [] pd.read_csv('1234.csv') 1 2 0 3 4 In [] pd.read_csv('1234.csv', names=['a', 'b']) # proper call a b 0 1 2 1 3 4 In [] pd.read_csv('1234.csv', header=['a', 'b']) # beginer's mistake TypeError Traceback (most recent call last) <ipython-input-5-b065bd1f57c6> in <module>() ----> 1 pd.read_csv('1234.csv', header=['a', 'b']) /home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision) 653 skip_blank_lines=skip_blank_lines) 654 --> 655 return _read(filepath_or_buffer, kwds) 656 657 parser_f.__name__ = name /home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 403 404 # Create the parser. --> 405 parser = TextFileReader(filepath_or_buffer, **kwds) 406 407 if chunksize or iterator: /home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds) 760 self.options['has_index_names'] = kwds['has_index_names'] 761 --> 762 self._make_engine(self.engine) 763 764 def close(self): /home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine) 964 def _make_engine(self, engine='c'): 965 if engine == 'c': --> 966 self._engine = CParserWrapper(self.f, **self.options) 967 else: 968 if engine == 'python': /home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds) 1580 kwds['allow_leading_cols'] = self.index_col is not False 1581 -> 1582 self._reader = parsers.TextReader(src, **kwds) 1583 1584 # XXX pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__ (pandas/_libs/parsers.c:5996)() TypeError: must be str, not int Expected Output
I'm not expecting a fancy AI-assistant like error message. However, an early check of the header argument should verify, in coherence with the docstring, that header should be int or list of ints.
What do you think? Is it an overkill?
Output of pd.show_versions()
pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None