0

I have a csv file only 4 out of 4000 records has some non-ASCII chars. For example

['com.manager', '2016012300', '16.1.23', 'en', 'kinzie', '2015-04-11T17:36:23Z', '1428773783781', '2016-03-11T09:53:45Z', 'df', '5', "\xa5\x06`'", '\xc0\x03"', '\xa2{\xac ===]\xa9}\xf7\xf7\xf7\xf7\xf7\xf7\xf7\xf7\xf7\xf7\xf7\xf7\xf7>', '', '', '', 'https://play.google.com/apps/publish?account=sd#ReviewDetailsPlace:p=com.manager&reviewid=gp:AOqpTOEcQQGmjFcd-bFfU372DTrxh'] 

I am using following python code to read the csv

with open('/Users/duttaam/Downloads/test1.csv', 'rU') as csvfile: reader_obj = csv.reader(x.replace('\0', '') for x in csvfile) rownum=0 for row in reader_obj: rownum += 1 if len(row) != 16: print rownum print row 

For four rows the reader is showing inconsistent column numbers. But when I counted the delimiters (,) in those rows it shows fine. Only issue I could see is non ascii chars as the example row shown in the above row. I am guessing those are some emojis converted into some chars.

I came up with a function to remove non-printable chars from string how do I apply this to the entire csv?(Thanks to following post:Stripping non printable characters from a string in python)

def removeSpecialcahr(s): printable = set(string.printable) return filter(lambda x: x in printable, s) 

Is there a way to process the csv and remove all non-printable and/or non-ascii characters?

1 Answer 1

5

To drop non-ASCII characters from your file, replace your open call with codecs.open(). You could also define your own error handler...:

import codecs codecs.open('file.csv', 'r', encoding='ascii', errors='ignore') 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @joeforker . I used the code and it removed the non-ascii chars but when I am using following code to read the file object reader_obj = csv.reader(x.replace('\0', '') for x in csvfile) rownum=0 for row in reader_obj: rownum += 1 if len(row) != 16: print rownum print row print len(row) csv reader not reading the file properly. Your code does answer my question though. BTW any other way to read the csv effectively?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.