5

The API data field only supports ASCII encoding -- but I need to support Unicode (emoji, foreign characters, etc.)

I'd like to encode the user's text input as an escaped unicode string:

let textContainingUnicode = """ Let's go 🏊 in the 🌊. And some new lines. """ let result = textContainingUnicode.unicodeScalars.map { $0.escaped(asASCII: true)} .joined(separator: "") .replacingOccurrences( of: "\\\\u\\{(.+?(?=\\}))\\}", <- converting swift format \\u{****} with: "\\\\U$1", <- into format python expects options: .regularExpression) 

result here is "Let\'s go \U0001F3CA in the \U0001F30A.\n And some new lines."

And on the server decoding with python:

codecs.decode("Let\\'s go \\U0001F3CA in the \\U0001F30A.\\n And some new lines.\n", 'unicode_escape')

But this smells funny -- do i really need to do so much string manipulation in swift to get the escaped unicode? Are these formats not standardized across languages.

2
  • Why can’t you just send the original string to the server? Unicode itself is the “standardized format”. Commented Apr 10, 2019 at 22:57
  • It's a constraint of AWS: "User-defined metadata is a set of key-value pairs. Amazon S3 stores user-defined metadata keys in lowercase. Each key-value pair must conform to US-ASCII when you are using REST and to UTF-8 when you are using SOAP or browser-based uploads via POST." They don't use SOAP clients anymore, i could write my own i guess. docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html Commented Apr 10, 2019 at 23:37

1 Answer 1

5

You can use reduce in your collection and check if each character isASCII, if true return that character otherwise convert the special character to unicode:

Swift 5.1 • Xcode 11

extension Unicode.Scalar { var hexa: String { .init(value, radix: 16, uppercase: true) } } 

extension Character { var hexaValues: [String] { unicodeScalars .map(\.hexa) .map { #"\\U"# + repeatElement("0", count: 8-$0.count) + $0 } } } 

extension StringProtocol where Self: RangeReplaceableCollection { var asciiRepresentation: String { map { $0.isASCII ? .init($0) : $0.hexaValues.joined() }.joined() } } 

let textContainingUnicode = """ Let's go 🏊 in the 🌊. And some new lines. """ let asciiRepresentation = textContainingUnicode.asciiRepresentation print(asciiRepresentation) // "Let's go \\U0001F3CA in the \\U0001F30A.\n And some new lines." 
Sign up to request clarification or add additional context in comments.

3 Comments

If you’re going to use Swift 5 you can use the new nonescaping string literals and get rid of some backslashes
thanks Leo -- kind of cool to go through the logic of creating a utf-8 string. i'll accept this answer, thanks for taking the time on it.
Nice!. Thanks Leo

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.