12

I have never dealt with encoding and decoding strings, so I am quite the newbie on this front. I am receiving a UnicodeEncodeError when I try to write the contents I read from another file to a temporary file using file.write in Python. I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 41333: ordinal not in range(128) 

Here is what I am doing in my code. I am reading an XML file and getting the text from the "mydata" tag. I then iterate through mydata to look for CDATA

 parser = etree.XMLParser(strip_cdata=False) root = etree.parse(myfile.xml, parser) data = root.findall('./mydata') # iterate through list to find text (lua code) contained in elements containing CDATA for item in myData: myCode = item.text # Write myCode to a temporary file. tempDirectory = tempfile.mkdtemp(suffix="", prefix="TEST_THIS_") file = open(tempDirectory + os.path.sep + "myCode.lua", "w") file.write(myCode + "\n") file.close() 

It fails with the UnicodeEncodeError when I hit the following line:

file.write(myCode + "\n") 

How should I properly encode and decode this?

2

1 Answer 1

28

Python2.7's open function does not transparently handle unicode characters like python3 does. There is extensive documentation on this, but if you want to write unicode strings directly without decoding them, you can try this

>>> import codecs >>> f = codecs.open(filename, 'w', encoding='utf8') >>> f.write(u'\u201c') 

For comparison, this is how the error happen

>>> f = open(filename, 'w') >>> f.write(u'\u201c') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.