0

I ran some code in Python and received the following error, using f = open(file) and f.read() commands:

File "/usr/lib/python3.4/codecs.py", line 313, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 62475: invalid continuation byte 

Firstly, how do I find position 62475 in the source file to see what the characters are? I tried opening the file in pluma and in notepadqq, and both display the line and column numbers, but there doesn't seem to be a way to search by position number.

Once I find the problem area, is there a quick guide to what the character types are and how I can solve the problem?

6
  • 62475 is also the index so get the 62475th character Commented Dec 4, 2015 at 22:59
  • I am a newbie, can you explain how I do that? Don't I have to open the file in Python in order to get that character? If I run the following code, f = open(file','r') contents = f.read() contents[62470:62480], it does not get to the 3rd line because it displays the error message about UTF encoding on the contents = f.read() step. Commented Dec 4, 2015 at 23:33
  • Set the encoding to latin1 Commented Dec 4, 2015 at 23:47
  • How do I change the encoding to latin1? I tried 2 ways, neither of which worked, I changed the 2nd line of the Python file to # -- coding: <latin1> -- based on this website: docs.python.org/3/howto/unicode.html. I also tried changing the contents=r.read() to contents = f.read().decode('latin1') Commented Dec 7, 2015 at 22:59
  • In open encoding=latin-1 Commented Dec 7, 2015 at 23:28

2 Answers 2

1

In open encoding=latin-1 – Padraic Cunningham

Thank you, the code worked perfectly! No error message, so I suppose the encoding was actually latin-1 and not UTF-8, so whatever was at position 62475 is probably moot. – user2144412

Sign up to request clarification or add additional context in comments.

Comments

1

You could try using xxd -b -s +62475 <yourfilename> if that helps

4 Comments

I tried this command in the terminal and it gives a tremendous amount of output, I'm not sure the command is correct?
You could try limiting the number of characters displayed after 62475 using the -l option: xxd -b -s +62475 -l 1 <yourfilename> The output should display the character causing your error in binary (0xe1 i suppose) and it's ascii representation)
I ran the new code and get the following output: 000f40b: 11100001 . I tried searching in the original file for both "000f40b" and for "11100001" but neither are found. I opened the file in pluma 1.8.1. Not sure how to find the trouble code in this file.
I tried seeing more characters but I'm not able to interpret the output, can you post a link to where I can find more information about this? Eg, I get: 000f40b: 11100001 01101110 00100000 01000001 00100000 00100010 .n A "

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.