Eliminate accents in python

Question

I have this function to remove accents in a word

def remove_accents(word): return ''.join(x for x in unicodedata.normalize('NFKD', word) if x in string.ascii_letters)

But when I run it it shows an error

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position 3: ordinal not in range(128)

The character in position 3 is : ó

Works for me, if the input is a unicode string: remove_accents(u"foóbar") correctly returns u"foobar" — L3viathan
– L3viathan, Commented Apr 28, 2015 at 0:19
When I enter a character with accent it shows this: Unsupported characters in input — Bastus BG
– Bastus BG, Commented Apr 28, 2015 at 0:27

L3viathan · Accepted Answer · 2015-04-28 00:24:08Z

If your input is a unicode string, it works:

>>> remove_accents(u"foóbar") u'foobar'

If it isn't, it doesn't. I don't get the error you describe, I get a TypeError instead, and only get the UnicodeDecodeError if I try to cast it to unicode by doing

>>> remove_accents(unicode("foóbar")) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

If that is your problem, i.e. you have Python 2 str objects as an input, you can solve it by decoding it as utf-8 first:

>>> remove_accents("foóbar".decode("utf-8")) u'foobar'

Yes that is exactly my problem, I tried decoding it as utf-8 but it shows this error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 3: unexpected end of data

Collectives™ on Stack Overflow

Eliminate accents in python

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related