Skip to main content
show a new example of this in action
Source Link
Deadcode
  • 12.9k
  • 11
  • 10

In some other answers, this could save well over 2 bytes. (Edit: Indeed, in Is this a Permutation of 1..n it has saved 5 bytes, bringing it down from e.g. 34 bytes to 29 bytes.)

In some other answers, this could save well over 2 bytes.

In some other answers, this could save well over 2 bytes. (Edit: Indeed, in Is this a Permutation of 1..n it has saved 5 bytes, bringing it down from e.g. 34 bytes to 29 bytes.)

Became Hot Meta Post
Source Link
Deadcode
  • 12.9k
  • 11
  • 10

What is a good ASCII boundary-character for regex unary, and will using that instead of a delimiter affect the perceived legitimacy of my answers?

So far, I have only used comma-delimited unary for my answers taking a list / array as input, since it's the "obvious" choice. That looks like xx,xxxxxx,x,xx,x,xxx.

However, this can't differentiate between \$[]\$ and \$[0]\$, as both would be an empty string. In most challenges, this isn't a direct problem since an empty list is allowed to be undefined behavior.

But more to the point, in many answers, it requires using \b where a single byte could have been used instead. I have considered using an input specification where a single character is used not just as a delimiter, but also as a sandwich character for the entire list. So for example using ! would result in input looking like !xx!xxxxxx!x!xx!x!xxx!.

What would be the best name for this type of character (as opposed to "delimiter")?

  • framing character
  • boundary character
  • container character
  • something else?

As an example, my recent answer to Visible Dice Faces currently uses , as a delimiter and has a 29 byte shortest solution:

^(?!.*\b(x(\2?+,.*\b)?){7}\b) 

Try it online!

But using a boundary character, this could be 27 bytes:

^(?!.*:(x(\2?+:.*\b)?){7}:) 

Try it online!

In some other answers, this could save well over 2 bytes.

So should I go ahead and do this? What about answers where switching from a delimiter to a boundary-character wouldn't result in better golf – should I do it anyway for consistency? (That'd mean lots of old answers to edit, so it might be a bad idea.) What if there's a challenge where it'd actually result in worse golf (seems unlikely, but for sake of argument)?

Would it negatively affect the perceived legitimacy of my answer if I use a boundary-character to improve the golf? As a really extreme example of manipulating the input specification for golf, see my 163 byte answer in Is it a valid chess move? (Python, with python-chess).

And what would be the best choice of ASCII character for this? It has to be a non-word character, because even with a sandwich character, \b still has to be used in some places. What I'm mainly concerned about how readable it is in a regex pattern, and the aesthetics of the choice.

In rough order of my preference:

  • : - ^(?!.*:(x(\2?+:.*\b)?){7}:) - symmetric on two axes; stands out fairly well; is thin horizontally, making it look good as a separator
  • ! - ^(?!.*!(x(\2?+!.*\b)?){7}!) - symmetric on one axes, but doesn't have much of a connotation of use as a separator; also used in (?!), which might slightly impact readability
  • ' - ^(?!.*'(x(\2?+'.*\b)?){7}') - symmetric on one axis, but has a weak connotation of strings; doesn't stand out too well in a regex
  • " - ^(?!.*"(x(\2?+".*\b)?){7}") - symmetric on one axis, but has a heavy connotation of strings
  • - - ^(?!.*-(x(\2?+-.*\b)?){7}-) - symmetric on two axes; has a meaning inside regex character classes, but that shouldn't be a problem; doesn't look much like a separator though
  • ~ - ^(?!.*~(x(\2?+~.*\b)?){7}~)
  • = - ^(?!.*=(x(\2?+=.*\b)?){7}=) - its mathematical meaning would probably distract too much from using it for this purpose
  • / - ^(?!.*/(x(\2?+/.*\b)?){7}/) - commonly used as the delimiter in substitution expressions, so might not be good to apply to this usage as well
  • - ^(?!.* (x(\2?+ .*\b)?){7} ) - not too readable, since it's just a blank space; would make it harder to pretty-print the regex, as the spaces would have to be \-escaped in that version
  • newline - would require using the s (DOTALL) flag, and showing the character as \n or in pretty-printed listings, and would make the actual regex take up multiple lines. Also already has connotations as a terminator or separator, not a prefix.
  • tab - same drawbacks as space, but additionally would be hard to distinguish from space, and already has connotations as a separator, not a boundary character.
  • NUL - technically difficult to use, but nevertheless usable. However, already has connotations as a terminator, not a prefix.
  • # - ^(?!.*#(x(\2?+#.*\b)?){7}#) - stands out pretty well, but feels too heavy/large to be a separator
  • ` - ^(?!.*`(x(\2?+`.*\b)?){7}`) - stands out as a particularly small character, but for that same reason, might be harder to read
  • ; - ^(?!.*;(x(\2?+;.*\b)?){7};) - asymmetric; often used as a line terminator, but would be weird as prefix
  • , - ^(?!.*,(x(\2?+,.*\b)?){7},) - asymmetric; already used as a separator, so probably shouldn't be used as a prefix (and would look weird used that way)
  • % - ^(?!.*%(x(\2?+%.*\b)?){7}%)
  • & - ^(?!.*&(x(\2?+&.*\b)?){7}&)