The key to documenting the regular expression is documenting it. Far too often people toss in what appears to be line noise and leave it at that.
Within perl thea single /x operator at the end oftells the regular expression suppresses whitespace allowing oneparser to document the regular expressionignore most whitespace that is neither backslashed nor within a bracketed character class.
The above regular expression would then become:
$re = qr/ ^\s* (?: (?: ([\d]+)\s*:\s* )? (?: ([\d]+)\s*:\s* ) )? ([\d]+) (?: \s*[.,]\s*([\d]+) )? \s*$ /x; Yes, its a bit consuming of vertical whitespace, though one could shorten it up without sacrificing too much readability.
And then, what the earlier regexp does is this: parse a string of numbers in format 1:2:3.4, capturing each number, where spaces are allowed and only 3 is required.
Looking at this regular expression one can see how it works (and doesn't work). In this case, this regex will match the string 1.
Similar approaches can be taken in other language. The python re.VERBOSE option works there.
Perl6 (the above example was for perl5) takes this further with the concept of rules which leads to even more powerful structures than the PCRE (it provides access to other grammars (context free and context sensitive) than just regular and extended regular ones).
In Java (where this example draws from), one can use string concatenation to form the regex.
Pattern re = Pattern.compile( "^\\s*"+ "(?:"+ "(?:"+ "([\\d]+)\\s*:\\s*"+ // Capture group #1 ")?"+ "(?:"+ "([\\d]+)\\s*:\\s*"+ // Capture group #2 ")"+ ")?"+ // First groups match 0 or 1 times "([\\d]+)"+ // Capture group #3 "(?:\\s*[.,]\\s*([0-9]+))?"+ // Capture group #4 (0 or 1 times) "\\s*$" ); Admittedly, this creates many more " in the string possibly leading to some confusion there, can be more easily read (especially with syntax highlighting on most IDEs) and documented.
The key is recognizing the power and "write once" nature that regular expressions often fall into. Writing the code to defensively avoid this so that the regular expression remains clear and understandable is key. We format Java code for clarity - regular expressions are no different when the language gives you the option to do so.