0

From the Chrome console, I noticed this oddity:

/[^A-z]/.test("^") false /[A-z]/.test("^") true "^".charCodeAt(0) 94 "A".charCodeAt(0) 65 "z".charCodeAt(0) 122 /[a-zA-Z]/.test("^") false 

It would make sense that caret matches in the range of 65-122 since it's character code is 94, but I didn't realize that /[A-z]/ is not equivalent to /[a-zA-Z]/.

So I guess my question is, does javascript use ASCII codes for ranged matches like A-z? And is that the explanation for this behavior?

EDIT:

After some further investigation, this appears to be true

String.fromCharCode(91) "[" String.fromCharCode(92) "\" String.fromCharCode(93) "]" String.fromCharCode(94) "^" String.fromCharCode(95) "_" String.fromCharCode(96) "`" /[^A-z]/.test("^[\\_`") false 
1
  • Actually, it uses Unicode, not ASCII. Commented Oct 23, 2013 at 16:13

3 Answers 3

3

ECMAScript 15.10.2.15 handles the generation of range-based character sets during regular expression evaluation. When building a range from character A to character B (i.e., A-B):

  1. Let a be the one character in CharSet A.
  2. Let b be the one character in CharSet B.
  3. Let i be the code unit value of character a.
  4. Let j be the code unit value of character b.
  5. If i > j then throw a SyntaxError exception.
  6. Return the set containing all characters numbered i through j, inclusive.

The phrase "code unit value" here is a Unicode term. Thus, the range A-z includes all characters whose Unicode code unit values fall between the code unit values of A and z, inclusive. This range (0x41 - 0x7A) does include six non-alphabetic characters:

U+005B [ 5b LEFT SQUARE BRACKET U+005C \ 5c REVERSE SOLIDUS U+005D ] 5d RIGHT SQUARE BRACKET U+005E ^ 5e CIRCUMFLEX ACCENT U+005F _ 5f LOW LINE U+0060 ` 60 GRAVE ACCENT 
Sign up to request clarification or add additional context in comments.

Comments

3

/[A-z]/ range means code range from 65 to 122 and that includes 94 as well which is ^

That's the reason /[A-z]/ will match ^ OR [ OR ] OR _ etc.

Comments

0

Please note that in regex, the caret means the start of a new line. You have to escape it with a backslash if you mean the literal caret character.

However, anubhava's answer above is the reason you are seeing this behavior.

1 Comment

As the first character inside [], the caret means to negate the matching of the block, not the start of a new line.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.