My question is simple. Start with a character that is not in the basic multilingual plane, say var original = "🎮" or equivalently
var original=`\u{1f3ae}` Javascript stores this string in memory via UTF-16 encoding. Unfortunately, you give the string to some database/application (specifics irrelevant) and it mis-interprets the UTF-16 bytes as UTF-8 bytes, and when you read out the string from the database/application what it actually gives you is precisely
var switchedEncoding = Buffer.from(original, 'utf16le').toString('utf8') If you log switchedEncoding in this case you get <خ�. Not good. Okay, so you try to switch it back:
var switchedBack = Buffer.from(switchedEncoding,'utf8').toString('utf16le') If you log switchedBack in this case you get �붿 not 🎮. Bummer.
On the otherhand if your original string is in the BMP, switchedBack recovers the original just fine. My question is whether or not information is irreversibly lost by the incorrect decoding done by the application/database? If not, I would like a clever function that can invert it even for characters in the astral planes.
Thanks for your help!