Skip to main content
Post Unlocked by Thomas Owens
Post Locked by CommunityBot
Post Made Community Wiki
Source Link
David X
  • 141
  • 1
  • 4

UTF-16? definitely harmful. Just my grain of salt here, but there are exactly three acceptable encodings for text in a program:

  • ASCII: when dealing with low level things (eg: microcontrollers) that can't afford anything better

  • UTF8: storage in fixed-width media such as files

  • integer codepoints ("CP"?): an array of the largest integers that are convenient for your programming language and platform (decays to ASCII in the limit of low resorces). Should be int32 on older computers and int64 on anything with 64-bit addressing.

  • Obviously interfaces to legacy code use what encoding is needed to make the old code work right.