How convert wchar value into number in unicode table?
I have a variable:
wchar_t znak; znak=getwchar(); I type 'ą' how convert znak to #261 I need number in unicode table.
ą U+0105 LATIN SMALL LETTER A WITH OGONEK
UTF-16: 0x0105
XML: & # 261;
The standard didn't specify sizeof(wchar_t) (or its encoding), so you should have stated what system you are on.
wchar_t is 32 bits and stores UTF-32 code points, which is a fixed-length encoding. You could use znak directly with no conversion needed.
Although you should first check whether UTF-8 and char aren't better up to your task (For conversion, UTF-32 is certainly better, but your program might do more than that).
If you determine that UTF-8 is an overall better choice for your program, you can use mbstowcs to get a UTF-32 code point out of your UTF-8 code point.
wchar_t is 16 bits and stores UTF-16LE code units. For Console I/O you are limited to UCS-2 though. The difference lies in that UTF-16 is not a fixed length encoding. So-called Surrogate pairs (albeit rare) allow the representation of non-BMP code points.
So in your case, just using using znak directly will work too.
For completion sake's though, here is a possible implementation from the UTF-16 Wikipedia article:
u32 read_code_point_from_utf16() { u16 code_unit = getu16(); if (code_unit >= 0xD800 && code_unit <= 0xDBFF) { u16 code_unit_2 = getu16(); if (code_unit_2 >= 0xDC00 && code_unit_2 <= 0xDFFF) return (code_unit << 10) + code_unit_2 - 0x35FDC00; push_back(code_unit_2); } return code_unit; } Finally, use sprintf(s, "&#%d;", znak) and sprintf(s, "0x%x", znak) to get it into the required base.
"0x%04x instead of "0x%x.
0x105(base-16) is261(base-10).