File encoding from English text to UTF-8

Question

How to convert a Non-ISO extended-ASCII English text, with CRLF line terminators to utf-8 in Python

ezdazuzena · Accepted Answer · 2013-12-05 14:31:42Z

Extending Jishiyu's Answer, you might use uchardet to identify the char set. For example

iconv -f `uchardet a_strange_file.txt` -t UTF-8 -o the_output_file.txt a_strange_file.txt

Although this does not do the job in python.

jishiyu · Accepted Answer · 2012-05-01 07:26:46Z

0

i think the linux command unix2dos、dos2unix、iconv will helpful。

such like

iconv -f latin-1 -t UTF-8 latin.txt >utf8.txt

answered May 1, 2012 at 7:26

jishiyu

33 bronze badges

1 Comment

gsivaram Over a year ago

But i need a python package that automatically converts to the specified format.

Eli Bendersky · Accepted Answer · 2012-05-01 08:23:54Z

If you obtain a raw byte-stream for your input file, you can then decode it to utf-8. See this blog post with some Python 3 examples.

enter image description here

Barlog951 · Accepted Answer · 2016-08-30 13:22:01Z

I have created an automated conversion script using the enca library, I use it on my NAS to convert subtitles to UTF-8 but it could be utilized for any automated conversion

Feel free to use :)

EDIT:

#!/bin/bash LANGUAGE=czech TO=utf8 CONVERT="enca -L $LANGUAGE -x $TO" # Find and onvert find ./ -type f -name "*.srt" | while read fn; do IS_TARGET=`enca "${fn}" | egrep -ow -m 1 'UTF-8|Unrecognized|KOI8-CS2|7bit ASCII|UCS-2|Macintosh Central European'` if [ "$IS_TARGET" != "UTF-8" ] && [ "$IS_TARGET" != "UCS-2" ] && [ "$IS_TARGET" != "Macintosh Central European" ] && [ "$IS_TARGET" != "Unrecognized" ] && [ "$IS_TARGET" != "7bit ASCII" ] && [ "$IS_TARGET" != "KOI8-CS2" ]; then echo "${fn} ---- Will be converted!" # optional backup of original srt # cp "${fn}" "${fn}.bak" $CONVERT "${fn}" fi done

You should probably include the source code in your answer as opposed to just linking to it.

Collectives™ on Stack Overflow

File encoding from English text to UTF-8

4 Answers 4

Comments

1 Comment

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

1 Comment

Linked

Related