0

I'm trying to write some news headline into CSV using python CSV module and it seems that when there is an Apostrophe in a headline, such as 'What’s So Great About Snapchat Anyway?', then encode error would show up.

The error is as below:

enter image description here

Code for this:

enter image description here

Are there any thoughts about this error or any suggestions?

2
  • problem is not CSV but terminal/console in your system (probably Windows) because it doesn't display UTF-8 and it has problem to convert it. Change default code page of Windows console to UTF-8 Commented Feb 5, 2017 at 23:59
  • Thanks for replying Furas! I figured out that it's because of the Python CSV module doesn't support Unicode... Here's a post which is useful. link Commented Feb 6, 2017 at 0:10

1 Answer 1

1

Python2.7 csv module can't handle unicode natively. But the docs have an example of how to do it in the class UnicodeWriter. You can also try python3 because csv module there will handle unicode natively.

This snippet has been shamelessly ripped from the docs I linked

class UnicodeWriter: """ A CSV writer which will write rows to CSV file "f", which is encoded in the given encoding. """ def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): # Redirect output to a queue self.queue = cStringIO.StringIO() self.writer = csv.writer(self.queue, dialect=dialect, **kwds) self.stream = f self.encoder = codecs.getincrementalencoder(encoding)() def writerow(self, row): self.writer.writerow([s.encode("utf-8") for s in row]) # Fetch UTF-8 output from the queue ... data = self.queue.getvalue() data = data.decode("utf-8") # ... and reencode it into the target encoding data = self.encoder.encode(data) # write to the target stream self.stream.write(data) # empty queue self.queue.truncate(0) def writerows(self, rows): for row in rows: self.writerow(row) 

Then you can just call it doing

writer = UnicodeWriter(open("foo", "w")) writer.writerow(['1', 'bar']) 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for replying Greg! You are totally right. I solved this problem by add this title = content.text.encode('ascii', 'ignore') when I grab title.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.