Timeline for Transliterate wide-character input
Current License: CC BY-SA 4.0
13 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Mar 6 at 22:14 | comment | added | indi | Well, there are the std::mbrtoc32(), et al functions for switching between the runtime char encoding and charN_t, but I’m honestly vague on their gotchas (if any). I know of no way to convert the runtime char encoding to UTF-8. The future seems to be a ranges-like transcoding interface, plus [having std::format() (and std::print() by extension) “just work” with Unicode](). So, in theory, you could just take your input string and auto s32 = s | std::uc::to_utf32 | std::ranges::to<std::u32string>();', work in Unicode, then std::print("{}", s32);`. | |
| Mar 6 at 7:22 | vote | accept | Toby Speight | ||
| Mar 6 at 7:18 | comment | added | Toby Speight | @indi, sure the ISO 2022 encodings use 8-bit bytes - which are used to encode double-byte character sets (e.g. in JIS X 0208). So if you want to do anything useful with them (such as sorting, or drawing their glyphs), one needed to use mbstowcs or similar. I do see that the use of whchar_t for UTF-16 is a Windows error, though. And we're gifted with functions (e.g. std::mbstowcs()) and objects (e.g. std::wcout) for standard wide characters, but none for Unicode characters using std::wchar32_t - that's disappointing. | |
| Mar 5 at 23:28 | comment | added | indi | Eh, it was overstating to say it was created for Windows, but it has never served any major real-world usage other than Windows. You mentioned ISO 2022, for example: that is a shift-style encoding that uses bytes (either 7 or 8, I can’t recall), so it would be done with char, not wchar_t. Whatever the ancient origins of wchar_t, its standardization has been entirely dictated by “what Windows needs”, and even Microsoft admits they have made it a “unique burden”. | |
| Mar 5 at 9:43 | comment | added | Toby Speight | Actually, I should have said 1970s, given that e.g. JIS X 0208 dates from 1978, and others are probably earlier (ISO 2022 was published in 1971). | |
| Mar 5 at 9:30 | comment | added | Toby Speight | BTW, @indi, how is the wide-character concept invented to accommodate Windows? My understanding is that Unix systems introduced wchar_t to support (mainly east Asian) character sets, later also used for Unicode code-points. Was Windows even a thing in the 1980s? | |
| Mar 2 at 23:39 | history | became hot network question | |||
| Mar 2 at 20:18 | comment | added | Toby Speight | @J_H, I cheated a bit, removing the accent from Γειά for a simpler test. | |
| Mar 2 at 19:04 | answer | added | G. Sliepen | timeline score: 9 | |
| Mar 2 at 18:52 | answer | added | J_H | timeline score: 5 | |
| Mar 2 at 18:32 | comment | added | J_H | Well, that's funny! I thank you for the very honest "known limitations" discussion. As I was reading that I thought to myself that I occasionally use tr -d X for some unwanted X, but far and away the most common invocation is tr A-Z a-z. And low and behold, the "quick demo" immediately goes, in inconvenient form, to exactly that for Γεια σου κόσμο!. | |
| Mar 2 at 15:46 | history | edited | Toby Speight | CC BY-SA 4.0 | Include a demo; more entertaining test |
| Mar 2 at 15:33 | history | asked | Toby Speight | CC BY-SA 4.0 |