Skip to main content
1 of 5
Doorknob
  • 72.1k
  • 20
  • 146
  • 393

Regex in reverse - decompose regular expressions

The Problem

I have a bunch of regular expressions that I need to use in some code, but I'm using a programming language that doesn't support regex! Luckily, I know that the test string will have a maximum length and will be composed of printable ASCII only.

The Challenge

You must input a regex and a number n, and output every string composed of printable ASCII (ASCII codes 32 to 126 inclusive, to ~, no tabs or newlines) of length less than n that matches that regex. Regular expressions will be limited to the following:

  • Literal characters (and escapes, which force a character to be literal, so \. is a literal ., \n is a literal n (equivalent to just n), and \w is equivalent to w. You do not need to support escape sequences.)
  • . - wildcard (any character)
  • Character classes, [abc] means "a or b or c" and [d-f] means anything from d to f (so, d or e or f). The only characters that have special meaning in a character class are [ and ] (which will always be escaped, so don't worry about those), \ (the escape character, of course), ^ at the beginning of the character class (which is a negation), and - (which is a range).
  • | - the OR operator, matches either all previous tokens up to a ( or the beginning of the regex, or all subsequent tokens until a ) or end of regex. a|z means either a or z.
  • * - match the previous token repeated zero or more times, greedy (it tries to repeat as many times as possible)
  • + - repeated one or more times, greedy
  • ? - zero or one times
  • Grouping with parentheses, to group tokens for |, *. +, or ?

The input regex will always be valid (i.e., you do not have to handle input like ?abc or a{two} or any invalid input). You may output the strings in any order you would like.

The Test Cases

Input: .*, 1
Output: (empty string), , !, ", ..., }, ~

Input: w\w+, 3
Output: ww, www

Input: [abx-z][^ -}][\\], 3
Output: a~\, b~\, x~\, y~\, z~\

Input: ab*a|c[de]*, 3
Output: (empty string), c, cd, ce, aa, cde, ced, cdd, cee, aba

Input: (foo)+(bar)?!?, 6
Output: foo, foo!, foofoo, foobar

Input: (a+|b*c)d, 4
Output: ad, cd, aad, bcd, aaad, bbcd

Input: p+cg, 4
Output: pcg, ppcg

The Winner

This is , so the shortest code in bytes will win!

Doorknob
  • 72.1k
  • 20
  • 146
  • 393