46

I'm trying to write a script in python to convert utf-8 files into ASCII files:

#!/usr/bin/env python # *-* coding: iso-8859-1 *-* import sys import os filePath = "test.lrc" fichier = open(filePath, "rb") contentOfFile = fichier.read() fichier.close() fichierTemp = open("tempASCII", "w") fichierTemp.write(contentOfFile.encode("ASCII", 'ignore')) fichierTemp.close() 

When I run this script I have the following error :

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 13: ordinal not in range(128)

I thought that can ignore error with the ignore parameter in the encode method. But it seems not.

I'm open to other ways to convert.

3
  • 2
    The problem is that you never decode in the first place. Commented Nov 28, 2010 at 23:23
  • You got the error because the character doesn't exist in the ASCII character set, so it can't be converted. Sometimes you can map the UTF8 character to a closest visual-fit character in ASCII, such as é to e, but that can change the meaning of words. You have to decide if that path will work for your application. Commented Nov 28, 2010 at 23:24
  • This seems like a really bad idea!! Commented Nov 28, 2010 at 23:55

3 Answers 3

71
data="UTF-8 DATA" udata=data.decode("utf-8") asciidata=udata.encode("ascii","ignore") 
Sign up to request clarification or add additional context in comments.

5 Comments

Sounds like a bad recipe for data loss.
You should expect data loss if you wish to convert from a 8bit encoding to 7bit.
I ignored that I have to decode first. It works now thanks. To answer to the questions, I want to do this because my MP3 player can only display lyrics files encoded in ASCII.
You can have a look at this solution: stackoverflow.com/a/517974/1463812
I get AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'? for the second line with python 3.10.4
9
import codecs ... fichier = codecs.open(filePath, "r", encoding="utf-8") ... fichierTemp = codecs.open("tempASCII", "w", encoding="ascii", errors="ignore") fichierTemp.write(contentOfFile) ... 

Comments

6

UTF-8 is a superset of ASCII. Either your UTF-8 file is ASCII, or it can't be converted without loss.

6 Comments

I think he's aware of that, otherwise he wouldn't be trying to use 'ignore'.
@Ignacio True. But this one left me wondering what the asker is trying to achieve. They could be cargo-culting, or maybe their need is best met by something like urlencode, or being lossy is just acceptable.
I am afraid of the cargo-culting. Culling all characters that you don’t have an appreciation for is really insensitive.
@Ignacio: Imagine being addressed as Vzquez-Abrams. :(
@tchrist: That's why I never use it.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.