How to solve UnicodeDecodeError, invalid continuation byte error

Question

I ran some code in Python and received the following error, using f = open(file) and f.read() commands:

File "/usr/lib/python3.4/codecs.py", line 313, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 62475: invalid continuation byte

Firstly, how do I find position 62475 in the source file to see what the characters are? I tried opening the file in pluma and in notepadqq, and both display the line and column numbers, but there doesn't seem to be a way to search by position number.

Once I find the problem area, is there a quick guide to what the character types are and how I can solve the problem?

I am a newbie, can you explain how I do that? Don't I have to open the file in Python in order to get that character? If I run the following code, f = open(file','r') contents = f.read() contents[62470:62480], it does not get to the 3rd line because it displays the error message about UTF encoding on the contents = f.read() step. — user2144412
– user2144412, Commented Dec 4, 2015 at 23:33
How do I change the encoding to latin1? I tried 2 ways, neither of which worked, I changed the 2nd line of the Python file to # -- coding: <latin1> -- based on this website: docs.python.org/3/howto/unicode.html. I also tried changing the contents=r.read() to contents = f.read().decode('latin1') — user2144412
– user2144412, Commented Dec 7, 2015 at 22:59

Armali · Accepted Answer · 2017-08-17 06:47:31Z

In open encoding=latin-1 – Padraic Cunningham

Thank you, the code worked perfectly! No error message, so I suppose the encoding was actually latin-1 and not UTF-8, so whatever was at position 62475 is probably moot. – user2144412

Guillaume Legrain · Accepted Answer · 2015-12-04 22:47:59Z

1

You could try using xxd -b -s +62475 <yourfilename> if that helps

answered Dec 4, 2015 at 22:47

Guillaume Legrain

1997 bronze badges

4 Comments

user2144412 Over a year ago

I tried this command in the terminal and it gives a tremendous amount of output, I'm not sure the command is correct?

Guillaume Legrain Over a year ago

You could try limiting the number of characters displayed after 62475 using the -l option: xxd -b -s +62475 -l 1 <yourfilename> The output should display the character causing your error in binary (0xe1 i suppose) and it's ascii representation)

user2144412 Over a year ago

I ran the new code and get the following output: 000f40b: 11100001 . I tried searching in the original file for both "000f40b" and for "11100001" but neither are found. I opened the file in pluma 1.8.1. Not sure how to find the trouble code in this file.

user2144412 Over a year ago

I tried seeing more characters but I'm not able to interpret the output, can you post a link to where I can find more information about this? Eg, I get: 000f40b: 11100001 01101110 00100000 01000001 00100000 00100010 .n A "

Collectives™ on Stack Overflow

How to solve UnicodeDecodeError, invalid continuation byte error

2 Answers 2

Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Related