0

What's the difference between this regex: /(everything|cool)/gi and this one: /(?:everything|cool)/gi ?

I'm asking this because I've got an regex which I wasn't able to write myself* and there are, as you can see below, a lot of ?: in that regex. I've read somewhere that ?: is bad for performance so I want to remove it.Can I remove it or is it important for anything?

* (?:(?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)(?:\\.|[^\\'"]))+

3 Answers 3

4

Without ?:, a reference to a matched group is created.
With a ?:, the group is matched, but not captured.

Here's a benchmark on both methods: http://jsperf.com/regex-capture-vs-non-capture

By looking at the bars, one would say that the non-captured groups are faster. However, if you look at the bottom, the differences can be neglected, since both methods are already incredibly fast.

Removing or adding ?: to an existing solution might break the code, so I advise to not edit the RegExp when it's not causing any issues.

Sign up to request clarification or add additional context in comments.

Comments

4

(?:...) is fine. It's when capturing groups, and particularly backreferences to them, get involved that you start seeing performance hits.

6 Comments

The key to why non-capturing groups are faster is that they allow the RE engine to do a lot of optimization. Capturing groups require a lot more information to be kept by the matcher, and that defeats a lot of the more complex optimizations (such as conversion to a single-pass DFA).
Oddly, i see better performance with a non-capturing group than even with no group at all. :P jsperf.com/regex-capture-vs-non-capture/2 (Probably v8 specific)
That's a really curious result. I wonder if there were any issues with garbage collection kicking in? (I suspected that's what happened when I ran the test on that page…)
@micha: exec has its place. IIRC it returns all the matches -- and it's probably the most effective way of doing so. (At the very least, it's the simplest.) But if you only care whether there's a match in the string (and not where it is, or what exactly matched), test is the better choice.
@micha: $1 thru $9 are apparently deprecated. You shouldn't really be using them if you care about your scripts working in the future.
|
2

You should have heard that (foo) is slower than (?:foo). This is because the first one is a capturing group, and the second one is a non-capturing group. the second one has less work to do (it doesn't need to remember the text that matched), so it should be faster.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.