Python script to convert from UTF-8 to ASCII [duplicate]

Question

I'm trying to write a script in python to convert utf-8 files into ASCII files:

#!/usr/bin/env python # *-* coding: iso-8859-1 *-* import sys import os filePath = "test.lrc" fichier = open(filePath, "rb") contentOfFile = fichier.read() fichier.close() fichierTemp = open("tempASCII", "w") fichierTemp.write(contentOfFile.encode("ASCII", 'ignore')) fichierTemp.close()

When I run this script I have the following error :

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 13: ordinal not in range(128)

I thought that can ignore error with the ignore parameter in the encode method. But it seems not.

I'm open to other ways to convert.

You got the error because the character doesn't exist in the ASCII character set, so it can't be converted. Sometimes you can map the UTF8 character to a closest visual-fit character in ASCII, such as é to e, but that can change the meaning of words. You have to decide if that path will work for your application. — the Tin Man
– the Tin Man, Commented Nov 28, 2010 at 23:24

Utku Zihnioglu · Accepted Answer · 2010-11-28 23:13:03Z

71

data="UTF-8 DATA" udata=data.decode("utf-8") asciidata=udata.encode("ascii","ignore")

answered Nov 28, 2010 at 23:13

Utku Zihnioglu

4,8833 gold badges42 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

tchrist Over a year ago

Sounds like a bad recipe for data loss.

Utku Zihnioglu Over a year ago

You should expect data loss if you wish to convert from a 8bit encoding to 7bit.

Nicolas Over a year ago

I ignored that I have to decode first. It works now thanks. To answer to the questions, I want to do this because my MP3 player can only display lyrics files encoded in ASCII.

JSBach Over a year ago

You can have a look at this solution: stackoverflow.com/a/517974/1463812

peer Over a year ago

I get AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'? for the second line with python 3.10.4

Ignacio Vazquez-Abrams · Accepted Answer · 2010-11-28 23:23:07Z

import codecs ... fichier = codecs.open(filePath, "r", encoding="utf-8") ... fichierTemp = codecs.open("tempASCII", "w", encoding="ascii", errors="ignore") fichierTemp.write(contentOfFile) ...

Tobu · Accepted Answer · 2010-11-28 23:26:59Z

6

UTF-8 is a superset of ASCII. Either your UTF-8 file is ASCII, or it can't be converted without loss.

answered Nov 28, 2010 at 23:26

Tobu

25.6k4 gold badges94 silver badges100 bronze badges

6 Comments

Ignacio Vazquez-Abrams Over a year ago

I think he's aware of that, otherwise he wouldn't be trying to use 'ignore'.

Tobu Over a year ago

@Ignacio True. But this one left me wondering what the asker is trying to achieve. They could be cargo-culting, or maybe their need is best met by something like urlencode, or being lossy is just acceptable.

tchrist Over a year ago

I am afraid of the cargo-culting. Culling all characters that you don’t have an appreciation for is really insensitive.

tchrist Over a year ago

@Ignacio: Imagine being addressed as Vzquez-Abrams. :(

Ignacio Vazquez-Abrams Over a year ago

@tchrist: That's why I never use it.

|

Collectives™ on Stack Overflow

Python script to convert from UTF-8 to ASCII [duplicate]

3 Answers 3

5 Comments

Comments

6 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

6 Comments

Linked

Related