1

I have this function to remove accents in a word

def remove_accents(word): return ''.join(x for x in unicodedata.normalize('NFKD', word) if x in string.ascii_letters) 

But when I run it it shows an error

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position 3: ordinal not in range(128) 

The character in position 3 is : ó

5
  • Works for me, if the input is a unicode string: remove_accents(u"foóbar") correctly returns u"foobar" Commented Apr 28, 2015 at 0:19
  • You think that it may be because I'm working on a Mac? Commented Apr 28, 2015 at 0:23
  • No, I'm on a Mac, too. Commented Apr 28, 2015 at 0:23
  • When I enter a character with accent it shows this: Unsupported characters in input Commented Apr 28, 2015 at 0:27
  • How are you entering those characters? Commented Apr 28, 2015 at 0:28

1 Answer 1

1

If your input is a unicode string, it works:

>>> remove_accents(u"foóbar") u'foobar' 

If it isn't, it doesn't. I don't get the error you describe, I get a TypeError instead, and only get the UnicodeDecodeError if I try to cast it to unicode by doing

>>> remove_accents(unicode("foóbar")) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128) 

If that is your problem, i.e. you have Python 2 str objects as an input, you can solve it by decoding it as utf-8 first:

>>> remove_accents("foóbar".decode("utf-8")) u'foobar' 
Sign up to request clarification or add additional context in comments.

2 Comments

Yes that is exactly my problem, I tried decoding it as utf-8 but it shows this error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 3: unexpected end of data
Just a wild guess, but try "cp1252" instead of "utf-8"

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.