1

I am trying to match a regex pattern in a string in Swift. When I use the actual characters in regex pattern, it works as expected. However, I use Unicode versions of the same characters in regex, it does not work as expected. Could you please help me with what is wrong here. I need to use regex with Unicode.

Code:

var input = "一" // u{4E00} extension String { var patternMatchesWithUnicode: Bool { //doesnt work return self.range(of: #"[\u{4E00}-\u{9FFF}]"#, options: .regularExpression) != nil } var patternMatchesWithString: Bool { //works return self.range(of: #"[一-鿿]"#, options: .regularExpression) != nil } } print(input.patternMatchesWithString) print(input.patternMatchesWithUnicode) 

Output:

false true 

1 Answer 1

2

You can use

extension String { var patternMatchesWithUnicode: Bool { return self.range(of: #"[\u4E00-\u9FFF]"#, options: .regularExpression) != nil } } 

These will also work:

return self.range(of: #"[\x{4E00}-\x{9FFF}]"#, options: .regularExpression) != nil return self.range(of: #"[\U00004E00-\U00009FFF]"#, options: .regularExpression) != nil 

Swift regex flavor is ICU, see the excerpt from the docs page:

\uhhhh - Match the character with the hex value hhhh.
\Uhhhhhhhh - Match the character with the hex value hhhhhhhh. Exactly eight hex digits must be provided, even though the largest Unicode code point is \U0010ffff.
\x{hhhh} - Match the character with hex value hhhh. From one to six hex digits may be supplied.
\xhh - Match the character with two digit hex value hh.

Sign up to request clarification or add additional context in comments.

1 Comment

Nice. It worked and I settled with \uhhhh. +1 for sharing ICU docs.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.