I have a text Aur\xc3\xa9lien and want to decode it with python 3.8.
I tried the following
import codecs s = "Aur\xc3\xa9lien" codecs.decode(s, "urf-8") codecs.decode(bytes(s), "urf-8") codecs.decode(bytes(s, "utf-8"), "utf-8") but none of them gives the correct result Aurélien.
How to do it correctly?
And is there no basic, general authoritative simple page that describes all these encodings for python?
s = "Aur\xc3\xa9lien"; b = bytes(s, 'latin-1'); print(b.decode('utf-8'))b. You are using a special feature of Python (which allow binary characters together Unicode sequence).opencommand. Which parameter do you use? Usuallyopenread a text file, and you should have a unicode strings (with ev. replacement characters,). But on no normal case you get such "string". To have a binary string, just use'b'inopenwith open('file.cvs', encoding='utf8' as f: for line in f.readlines(): fields=line.split(','). But you may be using a module?csvmodule? How do you read the file? [long ago, in earlier 3.x versions csv was buggy regarding Unicode files]