39

I have this code:

 printinfo = title + "\t" + old_vendor_id + "\t" + apple_id + '\n' # Write file f.write (printinfo + '\n') 

But I get this error when running it:

 f.write(printinfo + '\n') UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) 

It's having toruble writing out this:

Identité secrète (Abduction) [VF] 

Any ideas please, not sure how to fix.

Cheers.

UPDATE: This is the bulk of my code, so you can see what I am doing:

def runLookupEdit(self, event): newpath1 = pathindir + "/" errorFileOut = newpath1 + "REPORT.csv" f = open(errorFileOut, 'w') global old_vendor_id for old_vendor_id in vendorIdsIn.splitlines(): writeErrorFile = 0 from lxml import etree parser = etree.XMLParser(remove_blank_text=True) # makes pretty print work path1 = os.path.join(pathindir, old_vendor_id) path2 = path1 + ".itmsp" path3 = os.path.join(path2, 'metadata.xml') # Open and parse the xml file cantFindError = 0 try: with open(path3): pass except IOError: cantFindError = 1 errorMessage = old_vendor_id self.Error(errorMessage) break tree = etree.parse(path3, parser) root = tree.getroot() for element in tree.xpath('//video/title'): title = element.text while '\n' in title: title= title.replace('\n', ' ') while '\t' in title: title = title.replace('\t', ' ') while ' ' in title: title = title.replace(' ', ' ') title = title.strip() element.text = title print title ######################################### ######## REMOVE UNWANTED TAGS ######## ######################################### # Remove the comment tags comments = tree.xpath('//comment()') q = 1 for c in comments: p = c.getparent() if q == 3: apple_id = c.text p.remove(c) q = q+1 apple_id = apple_id.split(':',1)[1] apple_id = apple_id.strip() printinfo = title + "\t" + old_vendor_id + "\t" + apple_id # Write file # f.write (printinfo + '\n') f.write(printinfo.encode('utf8') + '\n') f.close() 
2
  • 6
    If you look at the right side of the question, you will notice a column of "Related" questions. I suggest you start by looking at them. You would also have gotten a list of possibly duplicates when writing your question title. Commented Nov 7, 2013 at 10:26
  • @MartijnPieters: you are right, as usual. Comment erased. Commented Nov 7, 2013 at 10:37

1 Answer 1

73

You need to encode Unicode explicitly before writing to a file, otherwise Python does it for you with the default ASCII codec.

Pick an encoding and stick with it:

f.write(printinfo.encode('utf8') + '\n') 

or use io.open() to create a file object that'll encode for you as you write to the file:

import io f = io.open(filename, 'w', encoding='utf8') 

You may want to read:

before continuing.

Sign up to request clarification or add additional context in comments.

15 Comments

Using f.write(printinfo.encode('utf8') + '\n') works but creates odd characters Identit√© secr√®te (Abduction) [VF] which should be accented Identité secrète (Abduction) [VF]
@speedyrazor: please do read the links I provided. You are opening a UTF-8 file with something that displays the bytes as a different encoding instead. Pick the right encoding for your application.
@Martin Pieters: I have had a read through, but don't really understand. If I have "Identité secrète" in my XML file I am reading, I pick lines out and write them to a file, but that line comes out as "Identit√© secr√®te". Sorry to ask, but what code would sort this out please?
@speedyrazor: Your XML file uses a codec too. It either uses UTF-8 or has a different codec specified on the first line of the XML file. The XML parser then decodes that data to a Unicode value. When writing out the values to a file, you need to pick a codec again to write bytes. I picked UTF-8 for you because that codec can encode all of unicode, but whatever you used to view the resulting file used a different codec to interpret the bytes. The é character is unicode codepoint U+00E9. UTF-8 encodes that to two bytes, hex C3 and A9. Misinterpreting those two bytes gives you √©.
@speedyrazor: without knowing how you are reading the produces file again, I cannot help you further.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.