Skip to content

[Feature Request] On import, allow option for number of fields to match widest row of data (filling missing with NaN) #17319

@joshjacobson

Description

@joshjacobson

Given data, such as below in 'csv' format:

header 1, header 2
data 1, data 2
data 3, data 4, data 5

On import, Pandas will fail with the error:

ParserError: Error tokenizing data. C error: Expected 2 fields in line 2, saw 3

There are many ways around this error:

  1. Loading the file in a spreadsheet program (i.e. Excel), and saving it as a new file
  2. Specifying usecols
  3. Setting error_bad_lines to False

But there are instances where these workarounds are impractical:

  1. When you want to preserve all of your data (usecols and error_bad_lines will cause loss)
  2. When using a spreadsheet program is not feasible (i.e. encodings not supported by Excel, or when dealing with a large number of files (or in particular, the combination of the two))

Desired feature

In read_csv and read_table, add an expand_header option, which when set to True will set the width of the DataFrame to that of the widest row. Cells left without data due to this would be blank or NaN.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsIO CSVread_csv, to_csvMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions