If I do:
print "\xE2\x82\xAC" print len("€") print len(u"€") I get:
€ 3 1 But if I do:
print '\xf0\xa4\xad\xa2' print len("𤭢") print len(u"𤭢") I get:
𤭢 4 2 In the second example, the len() function returned 2 instead of 1 for the one character unicode string u"𤭢".
Can someone explain to me why this is the case?