utf-8 convert to utf-16

Question

i want to convert the chinese character to the unicode format, like '\uXXXX' but when i use str.encode('utf-16be'), it'll show that:

b'\xOO\xOO'

so, i write some code to perform my request as below:

data="index=索引?" print(data.encode('UTF-16LE')) def convert(s): returnCode=[] temp='' for n in s.encode('utf-16be'): if temp=='': if str.replace(hex(n),'0x','')=='0': temp='00' continue temp+=str.replace(hex(n),'0x','') else: returnCode.append(temp+str.replace(hex(n),'0x','')) temp='' return returnCode print(convert(data))

can someone give me suggestions to do this conversion in python 3.x?

Not sure what the problem is. UTF-16LE isn't Unicode, but it's what Microsoft calls "Unicode". Describe your goal, not your process. — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Nov 26, 2013 at 9:09
"index=索引?".encode('utf-16be') gives b'\x00i\x00n\x00d\x00e\x00x\x00=}"_\x15\x00?' . What output did you want instead? — lvc
– lvc, Commented Nov 26, 2013 at 9:15
i want to convert the characters to the format '\uXXXX'. like this: index=\u0069\u006e\u0064\u0065\u0078\u003d\u7d22\u5f15\u003f — alvinshih
– alvinshih, Commented Nov 27, 2013 at 1:36

erny · Accepted Answer · 2013-11-27 11:10:06Z

I'm not sure if I understand you well.

Unicode is like a type. In python 3, all strings are unicode, so when you write data = "index=索引?" then data is already unicode. If you want to get an alternative representation just for displaying, you could use:

def display_unicode(data): return "".join(["\\u%s" % hex(ord(l))[2:].zfill(4) for l in data]) >>> data = "index=索引?" >>> print(display_unicode(data)) \u0069\u006e\u0064\u0065\u0078\u003d\u7d22\u5f15\u003f

Note that the string has now real backslashes and numeric representations, not unicode characters.

But there may be other alternatives

>>> data.encode('ascii', 'backslashreplace') b'index=\\u7d22\\u5f15?' >>> data.encode('unicode_escape') b'index=\\u7d22\\u5f15?'

OP is almost certainly using Python 3 - see print being used as a function, and a b'' literal. Also, encoding of text files doesn't necessarily follow $LANG - IDEs and text editors often let you set it in their configuration, and have their own defaults.
Sorry, I didn't read the question correctly. Doesn't data.encode('ascii', 'backslashreplace') do the trick?

greg · Accepted Answer · 2013-11-26 09:07:57Z

1

Try to decode first, like: s.decode('utf-8').encode('utf-16be')?

answered Nov 26, 2013 at 9:07

greg

1,41610 silver badges28 bronze badges

1 Comment

Ignacio Vazquez-Abrams Over a year ago

The parens on print imply Python 3.x.

Collectives™ on Stack Overflow

utf-8 convert to utf-16

2 Answers 2

3 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Related