1

How convert wchar value into number in unicode table?

I have a variable:

wchar_t znak; znak=getwchar(); 

I type 'ą' how convert znak to #261 I need number in unicode table.

ą U+0105 LATIN SMALL LETTER A WITH OGONEK

UTF-16: 0x0105

XML: & # 261;

1
  • 2
    0x105 (base-16) is 261 (base-10). Commented Apr 18, 2015 at 7:51

1 Answer 1

5

The standard didn't specify sizeof(wchar_t) (or its encoding), so you should have stated what system you are on.

Assuming *nix (Linux, BSD, OSX etc.)

wchar_t is 32 bits and stores UTF-32 code points, which is a fixed-length encoding. You could use znak directly with no conversion needed.

Although you should first check whether UTF-8 and char aren't better up to your task (For conversion, UTF-32 is certainly better, but your program might do more than that).

If you determine that UTF-8 is an overall better choice for your program, you can use mbstowcs to get a UTF-32 code point out of your UTF-8 code point.

Assuming Windows

wchar_t is 16 bits and stores UTF-16LE code units. For Console I/O you are limited to UCS-2 though. The difference lies in that UTF-16 is not a fixed length encoding. So-called Surrogate pairs (albeit rare) allow the representation of non-BMP code points.

So in your case, just using using znak directly will work too.

For completion sake's though, here is a possible implementation from the UTF-16 Wikipedia article:

u32 read_code_point_from_utf16() { u16 code_unit = getu16(); if (code_unit >= 0xD800 && code_unit <= 0xDBFF) { u16 code_unit_2 = getu16(); if (code_unit_2 >= 0xDC00 && code_unit_2 <= 0xDFFF) return (code_unit << 10) + code_unit_2 - 0x35FDC00; push_back(code_unit_2); } return code_unit; } 

Finally, use sprintf(s, "&#%d;", znak) and sprintf(s, "0x%x", znak) to get it into the required base.

Sign up to request clarification or add additional context in comments.

1 Comment

Minor: "0x%04x instead of "0x%x.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.