In Kotlin/JVM, strings are encoded in UTF-16. (To be precise, they may be encoded internally using Latin1 but they still behave externally as if they're encoded in UTF-16.) This means that they're made up of 16-bit characters. To get the actual Unicode code points, including those above U+FFFF, you can use Java's codePoints() method:
val s = "Hëllø! € 😀" for (cp in s.codePoints()) { println("${buildString { appendCodePoint(cp) }} $cp") }
Output:
H 72 ë 235 l 108 l 108 ø 248 ! 33 32 € 8364 32 😀 128512
However, be aware of the presence of combining characters, where multiple Unicode code points are used to make a single grapheme. If you want to support combining characters, then my answer will not help: you will need to look at Sweeper's answer in this question instead.
Unfortunately, on other platforms, Kotlin currently doesn't make it easy to handle Unicode. See this discussion for a list of currently open issues.
Iteratorimplementation shown in the duplicate target should already do what you want. My answer there goes one step further and combines all the combining marks into a single element too (which you might want to do as well).for (c in s)withfor (c in s.codePoints()).var i = IntArray(1); i[0] = c; println("${String(i, 0, 1)} $c"). This compiles with kotlinc, but with Kotlin/Native I get errors likeunresolved reference 'codePoints'. Both using Kotlin version 2.0.21.