0

I have a txt file that has the following format

a 1 blah b 2 blah,inc c 3 foo,inc 

i want to read it into a df using read_csv() but the commas are giving me an error and I don't want to skip with error_bad_lines=False.

How do I read it into a df ONE column per line? Or should I use another method?

1
  • 2
    try sep='\s+' or parameter delim_whitespace=True Commented Feb 9, 2017 at 14:15

2 Answers 2

3

I think you need change default separator , to s\+ for white-space sep:

import pandas as pd from pandas.compat import StringIO temp=u""" a 1 blah b 2 blah,inc c 3 foo,inc""" #after testing replace 'StringIO(temp)' to 'filename.csv' df = pd.read_csv(StringIO(temp), sep='\s+', header=None, names=['a','b','c']) print (df) a b c 0 a 1 blah 1 b 2 blah,inc 2 c 3 foo,inc 

For one column use some separator which is NOT in data like | or ¥:

temp=u""" a 1 blah b 2 blah,inc c 3 foo,inc""" #after testing replace 'StringIO(temp)' to 'filename.csv' df = pd.read_csv(StringIO(temp), sep='|', header=None, names=['a']) print (df) a 0 a 1 blah 1 b 2 blah,inc 2 c 3 foo,inc 

Another solution with read_fwf:

df = pd.read_fwf(StringIO(temp), header=None, colspecs=[(0, 100)]) print (df) 0 0 a 1 blah 1 b 2 blah,inc 2 c 3 foo,inc 
Sign up to request clarification or add additional context in comments.

3 Comments

i guess the not in data approach is little dangerous, in that you never know what will be in the data. but it works for now.
Yes, it depends of data. But I think obviosly ¥ is not in data.
I add another solution, please check - docs
1

I think that pd.read_csv(delim_whitespace=True), should do the trick.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.