- Use the
fileinput module to loop over standard input or a list of files, - decode the lines you read from UTF-8 to unicode objects
- then map any unicode characters you desire with the
translate method
translit.py would look like this:
#!/usr/bin/env python2.6 # -*- coding: utf-8 -*- import fileinput table = { 0xe4: u'ae', ord(u'ö'): u'oe', ord(u'ü'): u'ue', ord(u'ß'): None, } for line in fileinput.input(): s = line.decode('utf8') print s.translate(table),
And you could use it like this:
$ cat utf8.txt sömé täßt sömé täßt sömé täßt $ ./translit.py utf8.txt soemé taet soemé taet soemé taet
In case you are using python 3 strings are by default unicode and you dont' need to encode it if it contains non-ASCII characters or even a non-Latin characters. So the solution will look as follow:
line = 'Verhältnismäßigkeit, Möglichkeit' table = { ord('ä'): 'ae', ord('ö'): 'oe', ord('ü'): 'ue', ord('ß'): 'ss', } line.translate(table) >>> 'Verhaeltnismaessigkeit, Moeglichkeit'