-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
DocsIO CSVread_csv, to_csvread_csv, to_csvMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone
Description
Given data, such as below in 'csv' format:
header 1, header 2
data 1, data 2
data 3, data 4, data 5
On import, Pandas will fail with the error:
ParserError: Error tokenizing data. C error: Expected 2 fields in line 2, saw 3
There are many ways around this error:
- Loading the file in a spreadsheet program (i.e. Excel), and saving it as a new file
- Specifying
usecols - Setting
error_bad_linesto False
But there are instances where these workarounds are impractical:
- When you want to preserve all of your data (usecols and error_bad_lines will cause loss)
- When using a spreadsheet program is not feasible (i.e. encodings not supported by Excel, or when dealing with a large number of files (or in particular, the combination of the two))
Desired feature
In read_csv and read_table, add an expand_header option, which when set to True will set the width of the DataFrame to that of the widest row. Cells left without data due to this would be blank or NaN.
mybarbie9917 and 0xGuindi
Metadata
Metadata
Assignees
Labels
DocsIO CSVread_csv, to_csvread_csv, to_csvMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate