Skip to content

Conversation

@tmerr
Copy link
Contributor

@tmerr tmerr commented Jan 14, 2015

The problem was that a regular expression like ^x would take linear time to is_match a string aaa.... when all it needed to do was check the first character then bail out. This PR makes it so if the regex begins with ^ (without multiline) or \A, then it stops iterating over characters as soon as there are no possible matches.

Benchmarks:

dynamic: anchored_literal_long_non_match: 7492 ns/iter => 435 ns/iter anchored_literal_short_non_match: 917 ns/iter => 431 ns/iter native: anchored_literal_long_non_match: 5020 ns/iter => 50 ns/iter anchored_literal_short_non_match: 375 ns/iter => 50 ns/iter 
The problem was that a regular expression like "^x" would take linear time to is_match a string "aaa...." when all it needed to do was check the first character then bail out. This commit makes it so if the regex begins with '^' (without multiline) or '\A', then it stops iterating over characters as soon as there are no possible matches. Benchmarks: dynamic: anchored_literal_long_non_match: 7492 ns/iter => 435 ns/iter anchored_literal_short_non_match: 917 ns/iter => 431 ns/iter native: anchored_literal_long_non_match: 5020 ns/iter => 50 ns/iter anchored_literal_short_non_match: 375 ns/iter => 50 ns/iter
@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @huonw (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see CONTRIBUTING.md for more information.

@huonw
Copy link
Contributor

huonw commented Jan 14, 2015

Thanks!

Could I ask you to expand the commit message a little to describe what the problem is and how it is fixed? Just saying the issue number is better than nothing, but it does mean that someone looking at the source and checking the git history needs to cross reference against github (which may have disappeared at that point).

(I'm happy to help if you need assistance with git or anything else.)

@tmerr tmerr changed the title Resolve #17 Resolve #17 optimize use of '^' / '\A' Jan 14, 2015
@tmerr
Copy link
Contributor Author

tmerr commented Jan 14, 2015

Sure, expanded. Is that alright?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove && clist.size == 0? That doesn't seem right to me. I think literal prefixes should only be tried if there are no other threads to try. Moreover, that check is still in place in the regex! macro code: https://github.com/tmerr/regex/blob/master/regex_macros/src/lib.rs#L548-L553

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of concerned that this doesn't cause a test to fail... Perhaps I need to come up with one. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section of code seems to be inside a if clist.size == 0 { branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... doy. Thanks :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh it's hard to see from the diff snippets but in the original code it was redundant

if clist.size == 0 { ... if self.prog.prefix.len() > 0 && clist.size == 0 { ... } } 

It's not my top priority, I just tried to sneak it the removal :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm perfectly happy to revert that part if you want

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no no. It's great. :-) It's my misunderstanding!

@BurntSushi
Copy link
Member

@tmerr This looks great. I'm happy to see that the change is small. I just had one question, but otherwise this looks awesome. Thank you!

@huonw
Copy link
Contributor

huonw commented Jan 14, 2015

@tmerr that messsage looks much better, thanks! Could you place it into the git commit message too? (e.g. git commit --amend and then git push -f to update the PR by force pushing over this branch.)

@tmerr
Copy link
Contributor Author

tmerr commented Jan 14, 2015

@huonw Done, thanks for the help!

It's been a blast reading through this code by the way. I had been aware of NFAs but only the "draw pretty circles on paper with lines between them" version, not as a machine with split and jump instructions. The two models of thinking about it meld together nicely

@huonw
Copy link
Contributor

huonw commented Jan 14, 2015

Thanks! I merged as 10edff6 (there was a piece of trailing whitespace which I took the initiative to remove).

@huonw huonw closed this Jan 14, 2015
@tmerr tmerr unassigned huonw Sep 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants