regex ?: performance

Question

What's the difference between this regex: /(everything|cool)/gi and this one: /(?:everything|cool)/gi ?

I'm asking this because I've got an regex which I wasn't able to write myself* and there are, as you can see below, a lot of ?: in that regex. I've read somewhere that ?: is bad for performance so I want to remove it.Can I remove it or is it important for anything?

* (?:(?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)(?:\\.|[^\\'"]))+

Rob W · Accepted Answer · 2012-04-15 17:39:22Z

Without ?:, a reference to a matched group is created.
With a ?:, the group is matched, but not captured.

Here's a benchmark on both methods: http://jsperf.com/regex-capture-vs-non-capture

By looking at the bars, one would say that the non-captured groups are faster. However, if you look at the bottom, the differences can be neglected, since both methods are already incredibly fast.

Removing or adding ?: to an existing solution might break the code, so I advise to not edit the RegExp when it's not causing any issues.

cHao · Accepted Answer · 2012-04-15 17:39:18Z

4

(?:...) is fine. It's when capturing groups, and particularly backreferences to them, get involved that you start seeing performance hits.

answered Apr 15, 2012 at 17:39

cHao

87.1k21 gold badges147 silver badges178 bronze badges

6 Comments

Donal Fellows Over a year ago

The key to why non-capturing groups are faster is that they allow the RE engine to do a lot of optimization. Capturing groups require a lot more information to be kept by the matcher, and that defeats a lot of the more complex optimizations (such as conversion to a single-pass DFA).

cHao Over a year ago

Oddly, i see better performance with a non-capturing group than even with no group at all. :P jsperf.com/regex-capture-vs-non-capture/2 (Probably v8 specific)

Donal Fellows Over a year ago

That's a really curious result. I wonder if there were any issues with garbage collection kicking in? (I suspected that's what happened when I ran the test on that page…)

cHao Over a year ago

@micha: exec has its place. IIRC it returns all the matches -- and it's probably the most effective way of doing so. (At the very least, it's the simplest.) But if you only care whether there's a match in the string (and not where it is, or what exactly matched), test is the better choice.

cHao Over a year ago

@micha: $1 thru $9 are apparently deprecated. You shouldn't really be using them if you care about your scripts working in the future.

|

Roland Illig · Accepted Answer · 2012-04-15 17:41:10Z

You should have heard that (foo) is slower than (?:foo). This is because the first one is a capturing group, and the second one is a non-capturing group. the second one has less work to do (it doesn't need to remember the text that matched), so it should be faster.

Collectives™ on Stack Overflow

regex ?: performance

3 Answers 3

Comments

6 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

6 Comments

Comments

Related