4

I currently have the following data.csv which has a comma delimiter:

name,day Chicken Sandwich,Wednesday Pesto Pasta,Thursday Lettuce, Tomato & Onion Sandwich,Friday Lettuce, Tomato & Onion Pita,Friday Soup,Saturday 

The parser script is:

import pandas as pd df = pd.read_csv('data.csv', delimiter=',', error_bad_lines=False, index_col=False) print(df.head(5)) 

The output is:

Skipping line 4: expected 2 fields, saw 3 Skipping line 5: expected 2 fields, saw 3 name day 0 Chicken Sandwich Wednesday 1 Pesto Pasta Thursday 2 Soup Saturday 

How do I handle the case Lettuce, Tomato & Onion Sandwich. Each item should be separated by , but it's possible that an item has a comma in it followed by a space. The desired output is:

 name day 0 Chicken Sandwich Wednesday 1 Pesto Pasta Thursday 2 Lettuce, Tomato & Onion Sandwich Friday 3 Lettuce, Tomato & Onion Pita Friday 4 Soup Saturday 
2
  • That's a tough one. How do you suggest pandas should know whether the comma is within a field or outside a field? Commented Feb 27, 2018 at 16:59
  • You may need to do some csv pre-processing, see stackoverflow.com/questions/14550441/… Commented Feb 27, 2018 at 17:10

2 Answers 2

1

An alternative that works in other situations too. OK, it's ugly.

import pandas as pd from io import StringIO for_pd = StringIO() with open('theirry.csv') as input: for line in input: line = line.rstrip().replace(', ', '|||').replace(',', '```').replace('|||', ', ').replace('```', '|') print (line, file=for_pd) for_pd.seek(0) df = pd.read_csv(for_pd, sep='|') print (df) 

Result:

 name day 0 Chicken Sandwich Wednesday 1 Pesto Pasta Thursday 2 Lettuce, Tomato & Onion Sandwich Friday 3 Lettuce, Tomato & Onion Pita Friday 4 Soup Saturday 
Sign up to request clarification or add additional context in comments.

Comments

0

This might help.

import pandas as pd p = "PATH_TO.csv" df = pd.read_csv(p, delimiter='(,(?=\S)|:)') #print(df.head(5)) print "-----" print df["name"] print "-----" print df["day"] 

Output:

----- 0 Chicken Sandwich 1 Pesto Pasta 2 Lettuce, Tomato & Onion Sandwich 3 Lettuce, Tomato & Onion Pita 4 Soup Name: name, dtype: object ----- 0 Wednesday 1 Thursday 2 Friday 3 Friday 4 Saturday Name: day, dtype: object 

3 Comments

Very nice but I think it makes 'comma' the name of a column, doesn't it?
@BillBell;. It looks ok in my console.
Have you examined df.columns?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.