- Notifications
You must be signed in to change notification settings - Fork 489
Resolve #17 optimize use of '^' / '\A' #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The problem was that a regular expression like "^x" would take linear time to is_match a string "aaa...." when all it needed to do was check the first character then bail out. This commit makes it so if the regex begins with '^' (without multiline) or '\A', then it stops iterating over characters as soon as there are no possible matches. Benchmarks: dynamic: anchored_literal_long_non_match: 7492 ns/iter => 435 ns/iter anchored_literal_short_non_match: 917 ns/iter => 431 ns/iter native: anchored_literal_long_non_match: 5020 ns/iter => 50 ns/iter anchored_literal_short_non_match: 375 ns/iter => 50 ns/iter
| Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @huonw (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see CONTRIBUTING.md for more information. |
| Thanks! Could I ask you to expand the commit message a little to describe what the problem is and how it is fixed? Just saying the issue number is better than nothing, but it does mean that someone looking at the source and checking the git history needs to cross reference against github (which may have disappeared at that point). (I'm happy to help if you need assistance with git or anything else.) |
| Sure, expanded. Is that alright? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove && clist.size == 0? That doesn't seem right to me. I think literal prefixes should only be tried if there are no other threads to try. Moreover, that check is still in place in the regex! macro code: https://github.com/tmerr/regex/blob/master/regex_macros/src/lib.rs#L548-L553
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of concerned that this doesn't cause a test to fail... Perhaps I need to come up with one. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole section of code seems to be inside a if clist.size == 0 { branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... doy. Thanks :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh it's hard to see from the diff snippets but in the original code it was redundant
if clist.size == 0 { ... if self.prog.prefix.len() > 0 && clist.size == 0 { ... } } It's not my top priority, I just tried to sneak it the removal :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm perfectly happy to revert that part if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no no. It's great. :-) It's my misunderstanding!
| @tmerr This looks great. I'm happy to see that the change is small. I just had one question, but otherwise this looks awesome. Thank you! |
| @tmerr that messsage looks much better, thanks! Could you place it into the git commit message too? (e.g. |
| @huonw Done, thanks for the help! It's been a blast reading through this code by the way. I had been aware of NFAs but only the "draw pretty circles on paper with lines between them" version, not as a machine with split and jump instructions. The two models of thinking about it meld together nicely |
| Thanks! I merged as 10edff6 (there was a piece of trailing whitespace which I took the initiative to remove). |
The problem was that a regular expression like
^xwould take linear time to is_match a stringaaa....when all it needed to do was check the first character then bail out. This PR makes it so if the regex begins with^(without multiline) or\A, then it stops iterating over characters as soon as there are no possible matches.Benchmarks: