Timeline for Transliterate wide-character input

Current License: CC BY-SA 4.0

13 events

when toggle format	what		by	license	comment
Mar 6 at 22:14	comment	added	indi		Well, there are the `std::mbrtoc32()`, et al functions for switching between the runtime `char` encoding and `charN_t`, but I’m honestly vague on their gotchas (if any). I know of no way to convert the runtime `char` encoding to UTF-8. The future seems to be a ranges-like transcoding interface, plus [having `std::format()` (and `std::print()` by extension) “just work” with Unicode](). So, in theory, you could just take your input string and `auto s32 = s \| std::uc::to_utf32 \| std::ranges::to<std::u32string>();', work in Unicode, then` std::print("{}", s32);`.
Mar 6 at 7:22	vote	accept	Toby Speight
Mar 6 at 7:18	comment	added	Toby Speight		@indi, sure the ISO 2022 encodings use 8-bit bytes - which are used to encode double-byte character sets (e.g. in JIS X 0208). So if you want to do anything useful with them (such as sorting, or drawing their glyphs), one needed to use `mbstowcs` or similar. I do see that the use of `whchar_t` for UTF-16 is a Windows error, though. And we're gifted with functions (e.g. `std::mbstowcs()`) and objects (e.g. `std::wcout`) for standard wide characters, but none for Unicode characters using `std::wchar32_t` - that's disappointing.
Mar 5 at 23:28	comment	added	indi		Eh, it was overstating to say it was created for Windows, but it has never served any major real-world usage other than Windows. You mentioned ISO 2022, for example: that is a shift-style encoding that uses bytes (either 7 or 8, I can’t recall), so it would be done with `char`, not `wchar_t`. Whatever the ancient origins of `wchar_t`, its standardization has been entirely dictated by “what Windows needs”, and even Microsoft admits they have made it a “unique burden”.
Mar 5 at 9:43	comment	added	Toby Speight		Actually, I should have said 1970s, given that e.g. JIS X 0208 dates from 1978, and others are probably earlier (ISO 2022 was published in 1971).
Mar 5 at 9:30	comment	added	Toby Speight		BTW, @indi, how is the wide-character concept invented to accommodate Windows? My understanding is that Unix systems introduced `wchar_t` to support (mainly east Asian) character sets, later also used for Unicode code-points. Was Windows even a thing in the 1980s?
Mar 2 at 23:39	history	became hot network question
Mar 2 at 20:18	comment	added	Toby Speight		@J_H, I cheated a bit, removing the accent from Γειά for a simpler test.
Mar 2 at 19:04	answer	added	G. Sliepen		timeline score: 9
Mar 2 at 18:52	answer	added	J_H		timeline score: 5
Mar 2 at 18:32	comment	added	J_H		Well, that's funny! I thank you for the very honest "known limitations" discussion. As I was reading that I thought to myself that I occasionally use `tr -d X` for some unwanted X, but far and away the most common invocation is `tr A-Z a-z`. And low and behold, the "quick demo" immediately goes, in inconvenient form, to exactly that for Γεια σου κόσμο!.
Mar 2 at 15:46	history	edited	Toby Speight	CC BY-SA 4.0	Include a demo; more entertaining test
Mar 2 at 15:33	history	asked	Toby Speight	CC BY-SA 4.0