Having an UTF-8 string like this:
mystring = "işğüı" is it possible to get its (in memory) size in Bytes with Python (2.5)?
Assuming you mean the number of UTF-8 bytes (and not the extra bytes that Python requires to store the object), it’s the same as for the length of any other string. A string literal in Python 2.x is a string of encoded bytes, not Unicode characters.
Byte strings:
>>> mystring = "işğüı" >>> print "length of {0} is {1}".format(repr(mystring), len(mystring)) length of 'i\xc5\x9f\xc4\x9f\xc3\xbc\xc4\xb1' is 9 Unicode strings:
>>> myunicode = u"işğüı" >>> print "length of {0} is {1}".format(repr(myunicode), len(myunicode)) length of u'i\u015f\u011f\xfc\u0131' is 5 It’s good practice to maintain all of your strings in Unicode, and only encode when communicating with the outside world. In this case, you could use len(myunicode.encode('utf-8')) to find the size it would be after encoding.
len(bytes(u'計算機', 'utf8')) # returns 9 NOT len(u'計算機') # returns 3
len(mystring). other wise, it turns into'i\xc5\x9f\xc4\x9f\xc3\xbc\xc4\xb1'`mystring[2:6]. Just putting this out there as I am surprised as well.