Why can python iterate over strings of utf-8 but not emoticons?

Consider this:

s = u"おはよう" print len(s) for c in s: print c

The output is

4 お は よ う

which is what I expect

Now with emojis:

s = u"hi 🏈"

Output is

5 h i ???? ????

Why is that? How can I fix it? I have looked at various links before but can't get my head around it Ideally I would like a solution that works both for japanese AND emoticons but if it is for ascii and emoticons only Im fine with it too

asked Feb 8, 2017 at 12:42

Thomas

9,0968 gold badges60 silver badges100 bronze badges

2

might be a version issue. works fine in python 3.5

Mohammad Athar
– Mohammad Athar

2017-02-08 12:44:30 +00:00
Commented Feb 8, 2017 at 12:44
3

It sounds like you have a narrow build. Please see Python returns length of 2 for single Unicode character string for more info.

PM 2Ring
– PM 2Ring

2017-02-08 12:48:48 +00:00
Commented Feb 8, 2017 at 12:48
1

Anyoway, thh advice is to upgrade to use Python 3.5 or 3.6 - there is no need to use an ancient version as Python 2.7 for this kind of work, and doubly so if you keep in mind that easier working with unicode is one of the strenghts of Python3.x series

jsbueno
– jsbueno

2017-02-08 12:52:07 +00:00
Commented Feb 8, 2017 at 12:52
5

I have installed python 3.x and it works fine. took me for ever to find a good reason to do the switch. Thanks guys

Thomas
– Thomas

2017-02-08 12:54:36 +00:00
Commented Feb 8, 2017 at 12:54
2

Well done, Thomas! It'll take you a little while to get used to the differences, but once you do, you'll wonder how you ever tolerated Python 2's string / Unicode madness. :)

PM 2Ring
– PM 2Ring

2017-02-08 12:57:42 +00:00
Commented Feb 8, 2017 at 12:57

| Show 5 more comments

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Why can python iterate over strings of utf-8 but not emoticons? [duplicate]

0

Linked

Hot Network Questions