unicode datas of a dataframe to strings

Question

I have some troubles with a dataframe obtained from reading a xls file. Every data on such dataframe has the type 'unicode' and I can't do anything with this. I wanna change it to str values. Also, iff possible, I'd like to know the reason of this fact. I heard something about 'external data', and I know that both columns and index also present the 'u' of unicode before the names of these ones. I don't know neither almost anything about encoding and I would be really grateful if someone explains something about this in addition.

I'm using Python 2 and I tryed to solve it column by column with functions as

.astype(str) .astype(basestring) .apply(str)

and

.str.decode('iso-8859-1').str.encode('utf-8')

(I read this last one here and I just wrote it in my code to try another thing). I also tried

unicodedata.normalize('NFKD', df_bolsa[l]).encode('ascii','ignore')

but this last one cannot be used with a series. I hope someone to be able to help me to clarify this matter. Thank you very much in advance!!

Thank you but I don't know actually how to apply that problem to mine. Anyway I will read it tomorrow to try to understand something about the encoding... thanks again! — emilio.molina
– emilio.molina, Commented Feb 23, 2017 at 17:32

Jaroslav Bezděk · Accepted Answer · 2020-04-27 13:59:55Z

8

You can use the following code.

for column in df: df[column] = df_peru[column].str.encode('utf-8')

edited Apr 27, 2020 at 13:59

Jaroslav Bezděk

7,7156 gold badges34 silver badges59 bronze badges

answered Apr 4, 2017 at 16:44

emilio.molina

4271 gold badge3 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MEdwin · Accepted Answer · 2020-07-23 10:37:41Z

To help others, this version worked for me. I was getting an error while loading my dataframe to an oracle database: "UnicodeDecodeError: 'ascii' codec can't decode byte 0xea in position 2: ordinal not in range(128)"

I am on Python ver 2.7

for column in df: df[column]= df[column].astype(str).str.decode('utf-8')

Collectives™ on Stack Overflow

unicode datas of a dataframe to strings

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related