By "cases" I mean uppercase, lowercase, and titlecase.
It seems many languages assumes that there is one-to-one correspondence of uppercase letters and lowercase letters, if the script that the letters belong has a notion of "cases".
However, this is simply a false assumption. There are so many scenarios demonstrating the failure of such assumption. To list a few:
1. The Turkish I Problem:
Most languages that adopted Latin as their script has I and i as an uppercase-lowercase pair. However, Turkish doesn't have this pair, but rather I and ı, and İ and i. And there are some other languages doing this, influenced by the Turkish orthography.
2. The Greek lowercase sigma:
The Greek letter sigma, uppercase Σ and lowercase σ, is the default pair for the sigma. However, at the end of a word, the lowercase σ is not used, and ς is used instead.
3. The Greek subscript iota:
The ancient Greek letter ᾼ is not an uppercase letter. It's a letter that is titlecase by its own. As such, ᾳ, when is converted to uppercase, the result must be ΑΙ, which consists of two codepoints.
4. Letters without precomposed uppercase counterpart:
For example, the Latin ligature fi doesn't have a precomposed uppercase counterpart. The uppercase counterpart is FI, consisting of two codepoints.
5. The Georgian script:
In the modern orthography of Georgian, the "uppercase" letters are simply unused, and only the "lowercase" letters are used. In fact, calling those "uppercase" and "lowercase" was simply a misnomer.
These have few consequences. First of all, since there is simply no one-to-one correspondence of case pairs as single codepoints, the case-converting functions cannot just map characters, but must have a more sophisticated mechanism. Second of all, the Turkish I problem suggests that the case-converting functions must be able to refer to a locale as well.
Are there any other problems regarding cases? Are there a better way to solve such problems?
toupper()? Or maybe give a concrete example where it is a problem. $\endgroup$