0

I am parsing an Apache log file and saving it into pandas data frame for my further investigation.

But in the log file I have some bad lines and so the following error occurs:

ValueError: Expected 11 fields in line 4320, saw 27

To overcome this issue, I included error_bad_lines = False while reading the file. This doesn't help as I am getting the following error:

ValueError: The 'error_bad_lines' option is not supported with the 'python' engine

Note : I am explicitly using python engine as I have separator as a regular expression.

Code snippet:

data = pd.read_csv( log_file, sep=r'\s(?=(?:[^"]*"[^"]*")*[^"]*$)(?![^\[]*\])', engine='python', na_values='-', header=None, usecols = use_cols, skiprows =1, converters={time_taken_index[0]:parse_sec, time_index[0]:parse_datetime, req_index[0]:parse_str,status_index[0]:parse_str}, error_bad_lines = False ) 

I'd be grateful for any suggestions. Thank you.

1
  • Could you attach a part of the log file you're talking about? Commented Dec 17, 2017 at 17:09

1 Answer 1

1

It seems that you are using an old version of Pandas (<= 0.19.0).

The parameter error_bad_lines = False will work with the python engine in Pandas 0.20.0+.

So, just update the Pandas library.

Sign up to request clarification or add additional context in comments.

4 Comments

That was the problem. The lower version of pandas did not support error_bad_lines for Python. However, I wasn't able to use it with C engine, any reason for that?
@AklankJain What version of pandas are you using?
I am using pandas v0.21
I'm using pandas v0.24.2, updated to 0.25, the same error still there.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.