2

I have a pandas Data Frame from a Excel File as Input in my program.

I would like to replace some non ASCII characters in the pandas Data Frame.

import pandas as pd XList=['Meßi','Ürik'] YList=['01.01.1970','01.01.1990'] df = pd.DataFrame({'X':XList, 'Y':YList}) X Y 0 Meßi 01.01.1970 1 Ürik 01.01.1990 

I would like to create some replace rules: eg. ß->ss and Ü->UE

and get this:

 X Y 0 Messi 01.01.1970 1 UErik 01.01.1990 

Note: Im using Python 2.7

UPDATE:

Solved using the answer below and setting up by Eclipse following:

1°: Changing Text file encoding in Eclipe to UTF-8.

How to: How to use Special Chars in Java/Eclipse

2°: Adding to the first line command

# -*- coding: UTF-8 -*- 

http://www.vogella.com/tutorials/Python/article.html

1 Answer 1

1

One way would be to create a dict and iterate over the k,v and use replace:

In [42]: repl_dict = {'ß':'ss', 'Ü':'UE'} for k,v in repl_dict.items(): df.loc[df.X.str.contains(k), 'X'] = df.X.str.replace(pat=k,repl=v) df Out[42]: X Y 0 Messi 01.01.1970 1 UErik 01.01.1990 

EDIT

for editors that don't allow unicode encoding in the python script you can use the unicode values for the transliteration:

In [72]: repl_dict = {'\u00DF':'ss', '\u00DC':'UE'} for k,v in repl_dict.items(): df.loc[df.X.str.contains(k), 'X'] = df.X.str.replace(pat=k,repl=v) df Out[72]: X Y 0 Messi 01.01.1970 1 UErik 01.01.1990 
Sign up to request clarification or add additional context in comments.

3 Comments

I cant type non ASCII char in Eclipse in my code. Do you know how could I rewrite this? repl_dict = {'ß':'ss', 'Ü':'UE'}
I suggest you use a different editor, I'm using IPython but you should be able to save unicode characters in your python script. Otherwise lookup how to enable unicode encoding in eclipse
Thank you! Problem solved with Eclipse. I updated it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.