Convert string to unicode characters in python

Question

Is it possible to convert a string to unicode characters in Python?

For example: Hello => \u0048\u0045\u004C\u004C\u004F

Jongware · Accepted Answer · 2020-04-10 13:30:59Z

1

You can't. A string already consists of 'Unicode characters'. What you want to change is not its representation but its actual contents.

That, fortunately, is as simple as

import re text = 'Hello' print (re.sub('.', lambda x: r'\u%04X' % ord(x.group()), text))

which outputs

\u0048\u0065\u006C\u006C\u006F

answered Apr 10, 2020 at 13:30

Jongware

22.6k8 gold badges56 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Bravi Over a year ago

Thank you! What is this conversion called? I've been googling for almost an hour now. I would assume this has already been asked and been answered somewhere.

Jongware Over a year ago

It's a regular expression that replaces each character with its own Unicode string through a custom replacement function. But there are several other approaches possible – this is only one. The basic idea is that each character gets replaced with another string.

Bravi Over a year ago

I know it's a regular expression and it's using a custom replacement function, I was just asking about converting character H to \u0048..

furas Over a year ago

print(''.join(r'\u{:04X}'.format(ord(char)) for char in 'Hello'))

Bravi Over a year ago

@furas I like that version too, looks pythonic. :)

|

Collectives™ on Stack Overflow

Convert string to unicode characters in python

1 Answer 1

7 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Linked

Related