Timeline for Should UTF-16 be considered harmful?
Current License: CC BY-SA 3.0
9 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Aug 13, 2015 at 16:25 | history | unlocked | Thomas Owens♦ | ||
| Aug 13, 2015 at 16:05 | history | locked | CommunityBot | ||
| Aug 13, 2015 at 15:47 | history | edited | user22815 | CC BY-SA 3.0 | Spelling, other minor improvements for readability. |
| Jun 19, 2014 at 5:05 | comment | added | musiphil | Even though UTF-32 is fixed-width for code points, it is not fixed-width for characters. (Heard of something called "combining characters"?) So you can't go to the N'th character simply by indexing 4N into the byte array. | |
| May 1, 2012 at 0:16 | comment | added | Qwertie | Endianness issues are unavoidable as long as different processors continue to use different byte orders. However, it might have been nice if there were a "preferred" byte order for file storage of UTF-16. | |
| Aug 18, 2011 at 21:32 | history | made wiki | Post Made Community Wiki | ||
| Aug 11, 2011 at 14:30 | comment | added | tchrist | @Tronic: Technically, this is not true. Although UCS-4 can store any 32-bit integer, UTF-32 is forbidden from storing the non-character code points that are illegal for interchange, such as 0xFFFF, 0xFFFE, and the all the surrogates. UTF is a transport encoding, not an internal one. | |
| Oct 20, 2010 at 23:34 | comment | added | Tronic | Unspecified endianess is supposed to include BOM as the first character, used for determining which way the string should be read. UCS-4 and UTF-32 indeed are the same nowadays, i.e. a numeric UCS value between 0 and 0x10FFFF stored in a 32 bit integer. | |
| Oct 19, 2010 at 7:06 | history | answered | Patrick Horgan | CC BY-SA 2.5 |