0

My current solution is just read all bytes of a file, try to decode, if any exception, I will say this file is not properly encoded. Any other more elegant ways? Thanks.

utfbytes.decode('utf-8') 

regards, Lin

2
  • 2
    Possible duplicate of Python: Is there a way to determine the encoding of text file? Commented Aug 6, 2016 at 23:18
  • Thanks @DeanFenster, vote up. If I do not use the 3rd party library, my current solution of leveraging Python 2.7 built-in solution is already good? Commented Aug 6, 2016 at 23:28

1 Answer 1

1

No. From that answer:

Correctly detecting the encoding all times is impossible.

(From chardet FAQ:)

However, some encodings are optimized for specific languages, and languages are not random. Some character sequences pop up all the time, while other sequences make no sense. A person fluent in English who opens a newspaper and finds “txzqJv 2!dasd0a QqdKjvz” will instantly recognize that that isn't English (even though it is composed entirely of English letters). By studying lots of “typical” text, a computer algorithm can simulate this kind of fluency and make an educated guess about a text's language.

However, there are some libraries that exist that do make the best effort to try and find the encoding type.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Nick, vote up. If I do not use the 3rd party library, my current solution of leveraging Python 2.7 built-in solution is already good?
Your solution looks perfect, as long as you handle exceptions!
Sure, thanks Nick. Have a good weekend. Vote up and mark your reply as answer.
@LinMa Thanks happy to help, and to yourself!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.