0

I have a csv file, where each line is enclosed with double quotes, where the first field is not enclosed in double quotes, but all other fields are, like this -

"Col1,""Col2"",""Col3"",""Col4"",""Col5"",""Col6"",""Col7""" "1,""entry "",""entry "",""entry"",""entry"",""entry"",""entry""" "2,""entry "",""entry "",""entry"",""entry"",""entry"",""entry""" 

and so forth and I'm trying to read it into a pandas dataframe. The desired output would be something like -

 Col1 "Col2" "Col3" "Col4" "Col5" "Col6" "Col7" 0 1 entry entry entry entry entry entry 1 2 entry entry entry entry entry entry 

I ran the following command on terminal - file 'filename.csv', and the output is the following - ISO-8859 text, with very long lines, with CRLF line terminators

I tried various ways of changing the read_csv parameters, such as

input_data = pd.read_csv('filename.csv', sep = ',',encoding = 'iso-8859-1', engine = 'python')

The output of that is a data frame with 2 columns and 100+ rows, where the first column is empty, and the second column contains all the data I actually want, when in fact I want a dataframe with 7 columns and 100++ rows, something like -

Current Output

I cannot post the real data because of confidentiality purposes unfortunately.

Could someone help me out here? The fix intuitively feels like a simple one, but I'm not sure what I'm missing?

4
  • 2
    post minimum 2 rows of you csv data, and desired output Commented Jun 17, 2019 at 14:55
  • I cannot post the data because of confidentiality purposes unfortunately. Sure, but you could cleanse the data to avoid confidentiality issues. Just make a mock csv with dummy data. Commented Jun 17, 2019 at 17:40
  • FYI: this is a single python string with escaped doublequotes: "Col1,""Col2"",""Col3"",""Col4"",""Col5"",""Col6"",""Col7""". I suspect that's not actually your CSV's contents, but the result of a print statement? Commented Jun 17, 2019 at 17:48
  • Hi, I have just added the screenshot of the current ouptut! Thanks for your time Commented Jun 17, 2019 at 18:30

1 Answer 1

1

Solution 1. Remove starting and ending "(double quotes) for each line and use
input_data = pd.read_csv('temp.csv' , sep = ',' )

Solution 2. Use parameter quoting =3
input_data = pd.read_csv('temp.csv' , encoding = 'iso-8859-1', engine = 'python' ,sep = ',' , quoting =3)

Solution 3. Remove extra "" from each value (each column values will be exactly what you wanted)

Sign up to request clarification or add additional context in comments.

1 Comment

No explanation is given for solution 2 neither a link to the docs

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.