1

I have a csv with multiple lines that produce the following error:

df1 = pd.read_csv('df1.csv', skiprows=[176, 2009, 2483, 3432, 7486, 7608, 7990, 11992, 12421]) 
ParserError: Error tokenizing data. C error: Expected 52 fields in line 12541, saw 501 

As you can probably notice, I have multiple lines that produce a ParserError.

To work around this, I am just updating 'skiprows' to include the error and continue parsing the csv. I have over 30K lines and would prefer to just do this all at once rather than hitting run in Jupyter Notebook, getting a new error, and updating. Otherwise, I wish it would just skip the errors and parse the rest, I've tried googling a solution that way - but all the SO responses were too complicated for me to follow and reproduce for my data structures.

P.S. why is that when using skiprows with just 1 line, like 177, I can just enter skiprows = 177, but when using skiprows with a list, I have to do skiprows = 'errored line - 1'? Why does the counting change?

1 Answer 1

1

pandas ≥ 1.3

You should use the on_bad_lines parameter of read_csv (pandas ≥ 1.3.0)

df1 = pd.read_csv('df1.csv', on_bad_lines='warn') 

This will skip the invalid lines and give you a warning. If you use on_bad_lines='skip' you skip the lines without warning. The default value of on_bad_lines='error' raises an error for the first issue and aborts.

pandas < 1.3

The parameters are error_bad_lines=False and warn_bad_lines=True.

Sign up to request clarification or add additional context in comments.

1 Comment

fantastic answer, instantly solved my issue

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.