Timeline for C++ UTF-8 decoder
Current License: CC BY-SA 4.0
7 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Sep 12, 2023 at 7:16 | comment | added | Toby Speight | @Davislor, that's hideous - every day I find a new reason to be glad I never have to support Microsoft platforms! | |
| Sep 9, 2023 at 17:45 | comment | added | Davislor | Note that wchar_t is only 16 bits wide on MSVC (even though that violates the Standard). The best type for the return value is char32_t. | |
| Apr 27, 2023 at 17:22 | comment | added | Tau | mbrtowc and related functions (besides bringing with them significant performance overhead over the straightforward UTF-8 decoder) are extremely inadvisable simply because of their dependence on global locale. You had the foresight to try and set that in main(), but consider that 1. this is not possible when writing a library, 2. you might be forced to use a library that is itself stupidly locale-dependent and 3. "en_US.utf8" might not even exist on your target machine, in which case you're just completely hosed. | |
| Jan 8, 2023 at 10:45 | comment | added | Toby Speight | Yes, that's true. It's not clear why the review code wants to deal with a codepoint at a time, rather than simply transforming an entire string to UCS-4. And of course, codepoints aren't always complete in themselves if combining characters are involved... | |
| Jan 8, 2023 at 4:43 | comment | added | Dwayne Robinson | Calling locale or mbrtowc for every single character is much overhead for a transformation that does not (and should not ever) rely on the current locale. Definitely agree with returning char32_t though. | |
| Apr 6, 2021 at 14:31 | vote | accept | KlemenPl | ||
| Apr 6, 2021 at 13:14 | history | answered | Toby Speight | CC BY-SA 4.0 |