Properly reading in a CSV file in pandas with double quotes

Question

I have a csv file, where each line is enclosed with double quotes, where the first field is not enclosed in double quotes, but all other fields are, like this -

"Col1,""Col2"",""Col3"",""Col4"",""Col5"",""Col6"",""Col7""" "1,""entry "",""entry "",""entry"",""entry"",""entry"",""entry""" "2,""entry "",""entry "",""entry"",""entry"",""entry"",""entry"""

and so forth and I'm trying to read it into a pandas dataframe. The desired output would be something like -

 Col1 "Col2" "Col3" "Col4" "Col5" "Col6" "Col7" 0 1 entry entry entry entry entry entry 1 2 entry entry entry entry entry entry

I ran the following command on terminal - file 'filename.csv', and the output is the following - ISO-8859 text, with very long lines, with CRLF line terminators

I tried various ways of changing the read_csv parameters, such as

input_data = pd.read_csv('filename.csv', sep = ',',encoding = 'iso-8859-1', engine = 'python')

The output of that is a data frame with 2 columns and 100+ rows, where the first column is empty, and the second column contains all the data I actually want, when in fact I want a dataframe with 7 columns and 100++ rows, something like -

Current Output

I cannot post the real data because of confidentiality purposes unfortunately.

Could someone help me out here? The fix intuitively feels like a simple one, but I'm not sure what I'm missing?

I cannot post the data because of confidentiality purposes unfortunately. Sure, but you could cleanse the data to avoid confidentiality issues. Just make a mock csv with dummy data. — David Zemens
– David Zemens, Commented Jun 17, 2019 at 17:40
FYI: this is a single python string with escaped doublequotes: "Col1,""Col2"",""Col3"",""Col4"",""Col5"",""Col6"",""Col7""". I suspect that's not actually your CSV's contents, but the result of a print statement? — David Zemens
– David Zemens, Commented Jun 17, 2019 at 17:48
Hi, I have just added the screenshot of the current ouptut! Thanks for your time — Avishek Mondal
– Avishek Mondal, Commented Jun 17, 2019 at 18:30

Kiran Bhagwat · Accepted Answer · 2019-06-17 18:32:26Z

Solution 1. Remove starting and ending "(double quotes) for each line and use
input_data = pd.read_csv('temp.csv' , sep = ',' )

Solution 2. Use parameter quoting =3
input_data = pd.read_csv('temp.csv' , encoding = 'iso-8859-1', engine = 'python' ,sep = ',' , quoting =3)

Solution 3. Remove extra "" from each value (each column values will be exactly what you wanted)

No explanation is given for solution 2 neither a link to the docs

Collectives™ on Stack Overflow

Properly reading in a CSV file in pandas with double quotes

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related