Detecting evil regexes
Try Nicolaas Weideman's RegexStaticAnalysis project.
- Try Nicolaas Weideman's RegexStaticAnalysis project.
- Try my ensemble-style vuln-regex-detector which has a CLI for Weideman's tool and others.
Rules of thumb
Evil regexes are always due to ambiguity in the corresponding NFA, which you can visualize with tools like regexper.
Here are some forms of ambiguity. Don't use these in your regexes.
- Nesting quantifiers like
(a+)+(aka "star height > 1"). This can cause exponential blow-up. See substack'ssafe-regextool. - Quantified Overlapping Disjunctions like
(a|a)+. This can cause exponential blow-up. - Avoid Quantified Overlapping Adjacencies like
\d+\d+. This can cause polynomial blow-up.
Additional resources
I wrote this paper on super-linear regexes. It includes loads of references to other regex-related research.