UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) [duplicate]

Question

I have this code:

 printinfo = title + "\t" + old_vendor_id + "\t" + apple_id + '\n' # Write file f.write (printinfo + '\n')

But I get this error when running it:

 f.write(printinfo + '\n') UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)

It's having toruble writing out this:

Identité secrète (Abduction) [VF]

Any ideas please, not sure how to fix.

Cheers.

UPDATE: This is the bulk of my code, so you can see what I am doing:

def runLookupEdit(self, event): newpath1 = pathindir + "/" errorFileOut = newpath1 + "REPORT.csv" f = open(errorFileOut, 'w') global old_vendor_id for old_vendor_id in vendorIdsIn.splitlines(): writeErrorFile = 0 from lxml import etree parser = etree.XMLParser(remove_blank_text=True) # makes pretty print work path1 = os.path.join(pathindir, old_vendor_id) path2 = path1 + ".itmsp" path3 = os.path.join(path2, 'metadata.xml') # Open and parse the xml file cantFindError = 0 try: with open(path3): pass except IOError: cantFindError = 1 errorMessage = old_vendor_id self.Error(errorMessage) break tree = etree.parse(path3, parser) root = tree.getroot() for element in tree.xpath('//video/title'): title = element.text while '\n' in title: title= title.replace('\n', ' ') while '\t' in title: title = title.replace('\t', ' ') while ' ' in title: title = title.replace(' ', ' ') title = title.strip() element.text = title print title ######################################### ######## REMOVE UNWANTED TAGS ######## ######################################### # Remove the comment tags comments = tree.xpath('//comment()') q = 1 for c in comments: p = c.getparent() if q == 3: apple_id = c.text p.remove(c) q = q+1 apple_id = apple_id.split(':',1)[1] apple_id = apple_id.strip() printinfo = title + "\t" + old_vendor_id + "\t" + apple_id # Write file # f.write (printinfo + '\n') f.write(printinfo.encode('utf8') + '\n') f.close()

If you look at the right side of the question, you will notice a column of "Related" questions. I suggest you start by looking at them. You would also have gotten a list of possibly duplicates when writing your question title. — Some programmer dude
– Some programmer dude, Commented Nov 7, 2013 at 10:26

Martijn Pieters · Accepted Answer · 2013-11-07 10:26:33Z

73

You need to encode Unicode explicitly before writing to a file, otherwise Python does it for you with the default ASCII codec.

Pick an encoding and stick with it:

f.write(printinfo.encode('utf8') + '\n')

or use io.open() to create a file object that'll encode for you as you write to the file:

import io f = io.open(filename, 'w', encoding='utf8')

You may want to read:

The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

before continuing.

answered Nov 7, 2013 at 10:26

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

speedyrazor Over a year ago

Using f.write(printinfo.encode('utf8') + '\n') works but creates odd characters Identit√© secr√®te (Abduction) [VF] which should be accented Identité secrète (Abduction) [VF]

Martijn Pieters Over a year ago

@speedyrazor: please do read the links I provided. You are opening a UTF-8 file with something that displays the bytes as a different encoding instead. Pick the right encoding for your application.

speedyrazor Over a year ago

@Martin Pieters: I have had a read through, but don't really understand. If I have "Identité secrète" in my XML file I am reading, I pick lines out and write them to a file, but that line comes out as "Identit√© secr√®te". Sorry to ask, but what code would sort this out please?

Martijn Pieters Over a year ago

@speedyrazor: Your XML file uses a codec too. It either uses UTF-8 or has a different codec specified on the first line of the XML file. The XML parser then decodes that data to a Unicode value. When writing out the values to a file, you need to pick a codec again to write bytes. I picked UTF-8 for you because that codec can encode all of unicode, but whatever you used to view the resulting file used a different codec to interpret the bytes. The é character is unicode codepoint U+00E9. UTF-8 encodes that to two bytes, hex C3 and A9. Misinterpreting those two bytes gives you √©.

Martijn Pieters Over a year ago

@speedyrazor: without knowing how you are reading the produces file again, I cannot help you further.

|

Collectives™ on Stack Overflow

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) [duplicate]

1 Answer 1

15 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

15 Comments

Linked

Related