How to convert a Non-ISO extended-ASCII English text, with CRLF line terminators to utf-8 in Python
4 Answers
Extending Jishiyu's Answer, you might use uchardet to identify the char set. For example
iconv -f `uchardet a_strange_file.txt` -t UTF-8 -o the_output_file.txt a_strange_file.txt Although this does not do the job in python.
Comments
i think the linux command unix2dos、dos2unix、iconv will helpful。
such like
iconv -f latin-1 -t UTF-8 latin.txt >utf8.txt
1 Comment
If you obtain a raw byte-stream for your input file, you can then decode it to utf-8. See this blog post with some Python 3 examples.

Comments
I have created an automated conversion script using the enca library, I use it on my NAS to convert subtitles to UTF-8 but it could be utilized for any automated conversion
Feel free to use :)
EDIT:
#!/bin/bash LANGUAGE=czech TO=utf8 CONVERT="enca -L $LANGUAGE -x $TO" # Find and onvert find ./ -type f -name "*.srt" | while read fn; do IS_TARGET=`enca "${fn}" | egrep -ow -m 1 'UTF-8|Unrecognized|KOI8-CS2|7bit ASCII|UCS-2|Macintosh Central European'` if [ "$IS_TARGET" != "UTF-8" ] && [ "$IS_TARGET" != "UCS-2" ] && [ "$IS_TARGET" != "Macintosh Central European" ] && [ "$IS_TARGET" != "Unrecognized" ] && [ "$IS_TARGET" != "7bit ASCII" ] && [ "$IS_TARGET" != "KOI8-CS2" ]; then echo "${fn} ---- Will be converted!" # optional backup of original srt # cp "${fn}" "${fn}.bak" $CONVERT "${fn}" fi done