I'm trying to define a command where the user can define a regex, which should then be used to create a larger regex that in turn will be matched on a token list.
The naive approach to insert it from a token list obviously fails because of a wrong catcode setup:
\documentclass{article} \usepackage{expl3} \begin{document} \ExplSyntaxOn \tl_set:Nn \l_foo_tl { [a-z]+ } \regex_const:Nn \l_foo_regex { (\w+)( \[ \u{l_foo_tl} \] ) } \regex_show:N \l_foo_regex \seq_new:N \l_foo_seq \regex_extract_all:NnN \l_foo_regex { a[x], b[yy], c[zzz] } \l_foo_seq \seq_show:N \l_foo_seq \ExplSyntaxOff \end{document}
outputs
+-branch ,-group begin | Match, repeated 1 or more times, greedy | range [97,122] | range [65,90] | range [48,57] | char code 95 `-group end ,-group begin | char code 91 | char 91, catcode 12 | char 97, catcode 11 | char 45, catcode 12 | char 122, catcode 11 | char 93, catcode 12 | char 43, catcode 12 | char code 93 `-group end.
What I'd want is either something like
\regex_const:Nn \l_sub_regex { [a-z]+ } \regex_const:Nn \l_foo_regex { (\w+)( \[ ... \] ) }
where ... somehow inserts the regex represented by \l_sub_regex (both \c{l_sub_regex} and \u{l_sub_regex} give wrong results here); or a way to convert a compiled regex back to its string representation, something like \regex_to_str:N.
Perhaps there's a way to insert it back from a token list using some \detokenize or \scantokens hackery, but I'm wondering if l3regex already provides a proper solution for this.
EDIT: I found a note in the l3regex documentation about features that are "likely to be implemented at some point in the future":
Provide a syntax such as \ur{l_my_regex} to use an already-compiled regex in a more complicated regex. This makes regexes more easily composable.
So it seems such a feature doesn't currently exist but is planned for the future.
(By the way, it would be really helpful if the \regex_show: functions would also print the actual ASCII representation of a character if it is in the set of printable characters. Several lines of char code XXX are harder to debug than necessary.)