2

Strange question I know.

I know that it is possible to have options, to be more precise option keys, consisting of letters, but there seem to be possible more. some option keys seem to have a value attached in the form key=value but I cannot figure out the general form of value.

Finally, I know that options can be passed at once separated by comma. This seems to indicate that neither keys nor values may contain commas. Also it seems quite save to assume that blanks cannot be neither in a key nor in a value.

Could you provide me with more specific information? I ask the question because I work on a parser. (I know, latex parser, ... i must be foolish)

9
  • 2
    Before you get too deep into this: Parser for pure LaTeX and Is there a BNF grammar of the TeX language? might be of interest. Commented Feb 10, 2024 at 19:54
  • 1
    Are you specifically asking about package/class options, or key-value options more generally? For example, TikZ keys routinely contain spaces. And values can contain commas if the value is inside {...}. Commented Feb 10, 2024 at 19:56
  • 1
    Also, see the big list of every keyval package. Commented Feb 10, 2024 at 20:37
  • Is there a general form of value? Not really but the closer definition you can get to, is a set of tokens. What you do with them how you store them or validate user input, results in a DSL domain specific language. I have used the word set as used in a mathematical sense. I have also used token as meaning an object in set theory. Commented Feb 11, 2024 at 2:29
  • 1
    @yannisl Oh, OK. Thanks. Unrelated usage, then. (Though with your terms we get types from Russell and tokens again, just with completely different meaning than the type/token distinction I'm familiar with.) Commented Jun 20, 2024 at 20:56

2 Answers 2

8

Oh, you're in for a ride here...

Note that there are different key=value packages, each with their own features, and possibly slight deviations from the base syntax.

Usually the behaviour is:

  1. split the list at commas outside of any nested braces to get the next pair
  2. split the pair at first equals sign outside of any nested braces
    1. if there is one: use stuff before equals sign as raw-key, after equals sign as raw-value
    2. if there is none: everything is raw-key and use default-value as value (if there is one, else error)
  3. for both raw-key and raw-value remove space from either end, after space stripping remove one set of outer braces if there are any, the result of this is key and value
  4. call key-code for key and use value as its argument (if key-code is defined, else error).

With these rules applied it becomes possible to input values that contain commas, equals signs, spaces on their ends, basically arbitrary as long as they are legal arguments in TeX.

Note that most packages (as far as I'm aware) strip only one space from either end on step 3 (but since TeX combines multiple consecutive spaces into a single one, and ignores spaces at the start of a line, this is fine most of the time).


For package and class options the situation is even more convoluted, as there are two parallel lists of these available in LaTeX, most packages support only the older variant of that list, which got through some very special treatment (every space outside of braces got removed, this might also remove braces if outside of them on both sides there are spaces, everything remaining is fully expanded), the newer variant is the list almost exactly as given (the only change is that the first token gets expanded once by \expandafter). Then these lists are subject to the parsing lain out above.

Note that the fact that a list is used as-is, even though its contents had to survive being fully expanded can lead to problems in strange edge-cases, though package and class options should usually not have such complicated input that this becomes a real problem.

Packages/implementations using the new list (meaning packages providing access to the unaltered options; other packages supporting package/class options use the old space-zapped and expanded list):

  • ltkeys (LaTeX kernel) with \ProcessKeyOptions
  • expkv-opt
  • scrbase (the option handling of KOMA-script)

(I hope I didn't overlook any other package using the new list)


Now lets get to the exceptions (this is most likely non-exhaustive! I'm typing this from memory, so it might be that I forget about an exception, or that I give some details slightly incorrectly -- leave a comment or edit this answer to fix any errors I make!):

  • expkv (of which I'm the author, so it's the first in this list):

    • 2.2: expkv has no real default values, but instead supports something more general: It distinguishes keys given a value (calling them Val-<key>), and keys used without a value (so without any equals sign outside of braces; calling them NoVal-<key>), and both can have completely different key-code (though of course in the NoVal-<key> you can call the key-code of the eponymous Val-<key> providing a default value).
    • additionally expkv supports a syntax it calls exp-notation, which allows to control expansion and manipulation of value and key in between point 3 and 4, this might strip an additional set of braces around the key, but only if an expansion prefix is found. See the package documentation (texdoc expkv-bundle) if you're interested in the syntax.
  • keyval:

    • 2: if there is more than a single equals sign the second one and everything following after it will be removed from the value.
    • 3: depending on whether there are actually spaces at either end or not, keyval might strip more than a single set of outer braces.
  • xkeyval: see keyval, but it might strip even more sets of outer braces.

  • l3keys/ltkeys:

    • 2: If there is more than a single equals sign outside of any nested braces it throws an error
    • 2.2: by default a missing value is treated like an empty value (this can be changed via key attributes .value_required:n and .value_forbidden:n).
    • 3: It strips all spaces from either end of both raw-key and raw-value, not just a single
    • additionally it handles keys in a structure reminiscent of a Unix file system tree, and strips spaces around the directory separator / for this
  • pgfkeys: see keyval

    • additionally it handles keys in a structure reminiscent of a Unix file system tree, and strips spaces around the directory separator / for this
    • it also has handlers, so that for instance for input ending in /.expand once the real key is whatever is there before that post-fix, and the value is expanded once (there are other handlers as well).
    • also handlers can be extended by custom ones allowing a lot of flexibility
    • 2.2: by default, if no value is given the token \pgfkeysnovalue is provided as the value (this can be changed via key handlers /.value required and /.value forbidden)
  • kvsetkeys:

    • 2: it includes all equals signs after the first inside the value, but strips spaces around them if they aren't contained in nested braces.
  • options: see pgfkeys (but its handlers have different names, also it has small differences in the way the file tree like structure is handled, afair)

  • simplekv:

    • 2: if there is more than a single equals sign the second one and everything following after it will be removed from the value.
    • 4: it doesn't throw an error on undefined key-code, but instead defines a new key holding value (accessible via \useKV)
  • yax:

    • Where to start? yax has a completely different syntax from the one described above, but can be set up to also work for above syntax description.
    • 3: If using the "standard key=value syntax" it might remove more than a single set of outer braces, I have no idea how it behaves with its own syntax regarding this though.
  • ltxkeys:

    • Isn't compatible with modern LaTeX (and arguably never was).
    • 3: It doesn't remove any outer braces. Because of this, you can't input arbitrary values in ltxkeys, most notably value can't contain a comma unless it's surrounded by braces, and can't have a space on either end.
  • luakeys:

    • Not sure about it, it doesn't parse the keys inside of TeX but uses Lua inside LuaTeX for that job, I never took a deeper look into it, tbh.

And there are even more key=value packages, some of which build atop those above (most notably keyval), no idea which further exceptions they introduce.

14
  • 1
    I edited that paragraph above the list, is this easier to understand? Commented Jun 20, 2024 at 21:11
  • 1
    @cfr ` pgfkeys` ability for the user to define custom handlers is sorely missed in l3. I asked a question long ago about it but it seems the l3 team don't want to provide it. I have code that I defined handlers like .tlset and clist to sort of simulate l3 concepts. Commented Jun 20, 2024 at 21:12
  • 1
    @yannisl the team aims for a stable interface that doesn't get altered by other packages/users. I can understand that point of view, it's all about reliability. If you need something like a handler you can use .code:n = \__my_handler:n {#1}, yes a bit more to type, but not so much it becomes a total annoyance to me. Commented Jun 20, 2024 at 21:17
  • 1
    @yannisl I agree. I also miss autofowarding. It makes my l3 keys harder to read and far more cluttered because I end up with .code:n or .choice: where I really want .bool_set:N or something. But being able to set different kinds of data holders directly is nice. Commented Jun 21, 2024 at 0:00
  • 1
    @cfr I think I now get what you're meaning, and why I said "no, all the other...". You meant "all non-key=value packages using options", and I thought you meant all the other key=value packages... My bad. Commented Jun 21, 2024 at 7:44
3

Firstly, I mention basic TeX principles. The lines of the source text are transformed to a sequence of tokens. The token processor transforms the lines by following rules: (roughly speaking and only when category codes are set as usual):

  • \foo (a control sequence) is transformed to a single token,
  • % closes reading the line a starts reading at a next line,
  • end of lines are transformed to spaces and lines are combined to a single data stream,
  • more consecutive spaces are transformed to single space,
  • the spaces and characters outside the control sequences are transformed to single tokens (one character is one token with given category).

There is one more important TeX rule: if TeX interprets a part of token sequence (saves it as macro body, reads it as parameter, etc.) then this part is always balanced text. I.e. all { must match with }. A part of token sequences abc{def or abc}def are never possible. It means that scanned key or value must be always balanced text.

And there is another subsequent TeX rule: if the parameter in the form {...} is scanned then these outer braces are removed before the parameter is used.

When a token sequence with key=value syntax is split by comma separators or equal separators, then they must be at outer level of { and }, otherwise the previous rules would not be observed. So, the token sequence a{b,}cd,ef is split by comma to two sequences a{b,c}d and ef. And token sequence {ab,cd},ef is split by comma and interpreted as ab,cd and ef. Outer braces are removed by the above mentioned rule.

Last but not least TeX rule: The tokens \foo are transformed to something different at expansion process (if it is expandable control sequence) or they are part of a special TeX syntax rules and they can run an internal TeX algorithm.

All rules mentioned above are general TeX rules, they are not declared by the LaTeX macro set.

More syntactical rules for scanning key=value syntax used by various LaTeX packages were mentioned in another answer here. I add one more "package". It is not a real package, it is a part of OpTeX macros for scanning key=vaule syntax.

When OpTeX reads a token sequence with key=value syntax, then it replaces all outer comma+space and space+comma by comma and splits the text by the outer commas. Moreover, it replaces all outer =+space and space+= by =. If a divided part includes an outer = then the key is before the first outer = and value is after it. Otherwise there is only key. A macro programmer can get expanded or unexpanded value concerned to given key, can test if the given key was declared and can assign a code to all mentioned keys and another code for other keys. This code is processed when the key=value syntax is scanned.

Note that all key=value macros in OpTeX are sitting only on 16 lines of code in OpTeX macros.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.