Skip to main content
9 events
when toggle format what by license comment
Aug 13, 2015 at 16:25 history unlocked Thomas Owens
Aug 13, 2015 at 16:05 history locked CommunityBot
Aug 13, 2015 at 15:47 history edited user22815 CC BY-SA 3.0
Spelling, other minor improvements for readability.
Jun 19, 2014 at 5:05 comment added musiphil Even though UTF-32 is fixed-width for code points, it is not fixed-width for characters. (Heard of something called "combining characters"?) So you can't go to the N'th character simply by indexing 4N into the byte array.
May 1, 2012 at 0:16 comment added Qwertie Endianness issues are unavoidable as long as different processors continue to use different byte orders. However, it might have been nice if there were a "preferred" byte order for file storage of UTF-16.
Aug 18, 2011 at 21:32 history made wiki Post Made Community Wiki
Aug 11, 2011 at 14:30 comment added tchrist @Tronic: Technically, this is not true. Although UCS-4 can store any 32-bit integer, UTF-32 is forbidden from storing the non-character code points that are illegal for interchange, such as 0xFFFF, 0xFFFE, and the all the surrogates. UTF is a transport encoding, not an internal one.
Oct 20, 2010 at 23:34 comment added Tronic Unspecified endianess is supposed to include BOM as the first character, used for determining which way the string should be read. UCS-4 and UTF-32 indeed are the same nowadays, i.e. a numeric UCS value between 0 and 0x10FFFF stored in a 32 bit integer.
Oct 19, 2010 at 7:06 history answered Patrick Horgan CC BY-SA 2.5