I found something really weird about unicode, in my understanding, if I u"" + "string", the type will be unicode, but why are their length different?
print len(u''+'New York\u200b') 14 print type(u''+'New York\u200b') <type 'unicode'> print len(u'New York\u200b') 9 print type(u'New York\u200b') <type 'unicode'> I also tried to get rid of \u200b, which I think it is unicode
text = u'New York\u200b' print text.encode('ascii', errors='ignore') New York text = u''+'New York\u200b' print text.encode('ascii', errors='ignore') New York\u200b Also got different result, I am really confused! btw, I am using python 2.7, is it the time to change to 3.3?? Thanks in advance!!
u''+'New York\u200b','New York\u200b'is not unicode, therefore, the \u200b is ignored. This is inconsistent with your second result, though.