0

Is it possible in the current version of JFlex (1.9.1) to represent a range of full Unicode values in a regular expression ?

Something like this:

UnicodeIdentifier = [a-zA-Z_\u007F-\u10FFFF] [a-zA-Z0-9_\u007F-\u10FFFF]* 

except this does not work (and makes JFlex emit a warning) because Unicode escape sequences in Java must be 16 bits in hexadecimal so the high end would be treated as \u10FF.

The spec says that representing supplementary characters in the range U+010000 to U+10FFFF requires two consecutive Unicode escapes however using this:

UnicodeIdentifier = [a-zA-Z_\u007F-\uDBFF\uDFFF] [a-zA-Z0-9_\u007F-\uDBFF\uDFFF]* 

does not work either.

2
  • [a-zA-Z_\u007F-\x{10FFFF}] [a-zA-Z0-9_\u007F-\x{10FFFF}]* See the documentation of Pattern for details. Commented Mar 8 at 15:58
  • This leads to a warning from JFlex: Impossible character class range (end is less than start), at least with Java 11/17. Commented Mar 11 at 19:27

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.