Python line.replace returns UnicodeEncodeError

Question

I have a tex file that was generated from rst source using Sphinx, it is encoded as UTF-8 without BOM (according to Notepad++) and named final_report.tex, with following content:

% Generated by Sphinx. \documentclass[letterpaper,11pt,english]{sphinxmanual} \usepackage[utf8]{inputenc} \begin{document} \chapter{Preface} Krimson4 is a nice programming language. Some umlauts äöüßÅö. That is an “double quotation mark” problem. Johnny’s apostrophe allows connecting multiple ports. Components that include data that describe how they ellipsis … Software interoperability – some dash – is not ok. \end{document}

Now, before I compile the tex source to pdf, I want to replace some lines in the tex file to get nicer results. My script was inspired by another SO question.

#!/usr/bin/python # -*- coding: utf-8 -*- import os newFil=os.path.join("build", "latex", "final_report.tex-new") oldFil=os.path.join("build", "latex", "final_report.tex") def freplace(old, new): with open(newFil, "wt", encoding="utf-8") as fout: with open(oldFil, "rt", encoding="utf-8") as fin: for line in fin: print(line) fout.write(line.replace(old, new)) os.remove(oldFil) os.rename(newFil, oldFil) freplace('\documentclass[letterpaper,11pt,english]{sphinxmanual}', '\documentclass[letterpaper, 11pt, english]{book}')

This works on Ubuntu 16.04 with Python 2.7 as well as Python 3.5, but it fails on Windows with Python 3.4. The error message I get is:

File "C:\Python34\lib\encodings\cp850.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u201c' in position 11: character maps to <undefined>

where 201c stands for left double quotation mark. If I remove the problematic character, the script proceeds till it finds the next problematic character.

In the end, I need a solution that works on Linux and Windows with Python 2.7 and 3.x. I tried quite a lot of the solutions suggested here on SO, but could not yet find one that works for me...

My example does not have 19 lines, I assume the error message refers to line 19 of the cp850.py file. — matth
– matth, Commented Jul 4, 2016 at 14:29

Padraic Cunningham · Accepted Answer · 2016-07-04 14:17:00Z

2

You need to specify the correct encoding with the encoding="the_encoding":

with open(oldFil, "rt", encoding="utf-8") as fin, open(newFil, "wt", encoding="utf-8") as fout:

If you don't the preferred encoding will be used.

open

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding

answered Jul 4, 2016 at 14:17

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Padraic Cunningham Over a year ago

@matth, what double quote? If you still have encoding issues then you don't have utf-8 encoded data

Padraic Cunningham Over a year ago

@matth, you have specified the encoding as utf-8 for both annd the error happens on write?

Padraic Cunningham Over a year ago

@matth, so setting the encoding fixed the first error but now you have another?

Padraic Cunningham Over a year ago

For python 2 you would need to use the io lib, using io.open. when I get back on my comp this evening I will add a link to a nice answer that allows you to print from a cmd shell, although I would recommend using cygwin as your default shell on windows or use an side like pycharm.

Padraic Cunningham Over a year ago

No need, Python3's open is io.open so the code will work as is for both 2.7 and 3

|

Collectives™ on Stack Overflow

Python line.replace returns UnicodeEncodeError

1 Answer 1

9 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Linked

Related