Expl3 - Can you outline the trick applied with \file_parse_full_name:nNNN for preventing expansion only with active characters?

Question

This question is not about something not working.
This question is about how a nice trick is accomplished.

interface3.pdf, section "12.2 File operation functions" says about the function
\file_parse_full_name:nNNN {⟨full name⟩} ⟨dir⟩ ⟨name⟩ ⟨ext⟩:

Before parsing, the ⟨full name⟩ is expanded until only non-expandable tokens remain, except that active characters are also not expanded. Quotes (") are invalid in file names and are discarded from the input.

Can you outline the trick applied for preventing expansion of active characters while not preventing expansion of other expandable tokens?

The trick is a lie :-) Some time ago, the expansion was done by analysing the input token by token, and expanding each when appropriate (i.e., when neither an active token nor a protected macro). This was rather slow, and when LaTeX's filename parsing started using that, we had to move to a faster, less granular approach: now the code just does a fancy \csname <filename>\endcsname to expand <filename> (thus active tokens are expanded, and that documentation is outdated) — Phelype Oleinik
– Phelype Oleinik, Commented Jan 17, 2022 at 17:13
as @PhelypeOleinik wrote but the new (actually, older) trick is that the standard definitions of the active characters as used for inputenc processing all \string themselves if used in \csname. — David Carlisle
– David Carlisle, Commented Jan 17, 2022 at 17:19
'Had to' is a relative term ;) I'd go with 'the democratic decision was to move away' — Joseph Wright
– Joseph Wright ♦, Commented Jan 17, 2022 at 17:22
@PhelypeOleinik Thanks for the clarification. The \csname..\endcsname thing I saw in source3.pdf. But I didn`t find anything there that indicated special treatment of active characters. Thus I assumed I was missing something about the code. Too bad: I thought I was onto an interesting trick. :-) Shall I delete the question? — Ulrich Diez
– Ulrich Diez, Commented Jan 17, 2022 at 17:23
@UlrichDiez Yes, utf-8 active chars use \ifincsname. Other active chars have to be dealt with on a case-by-case basis (for example, somewhere in the filename processing in the kernel we do \edef~{\string~} before passing the file name to \csname..\endcsname — Phelype Oleinik
– Phelype Oleinik, Commented Jan 17, 2022 at 17:33

Joseph Wright · Accepted Answer · 2022-01-17 17:20:58Z

Although, as Phelype says, the current code relies on 'sensible' definitions of active chars, it is quite possible to selectively-expand material. The last check-in with the old code is a03c651350. It has the following for the name-sanitizing code:

 % \begin{macro}[EXP]{\__kernel_file_name_sanitize:n} % \begin{macro}[EXP]{\__kernel_file_name_expand_loop:w} % \begin{macro}[EXP]{\__kernel_file_name_expand_N_type:Nw} % \begin{macro}[EXP]{\__kernel_file_name_expand_group:nw} % \begin{macro}[EXP]{\__kernel_file_name_expand_space:w} % \begin{macro}[EXP]{\__kernel_file_name_strip_quotes:n} % \begin{macro}[EXP]{\__kernel_file_name_strip_quotes:nnnw} % \begin{macro}[EXP]{\__kernel_file_name_strip_quotes:nnn} % \begin{macro}[EXP]{\__kernel_file_name_trim_spaces:n} % \begin{macro}[EXP]{\__kernel_file_name_trim_spaces:nw} % \begin{macro}[EXP]{\__kernel_file_name_trim_spaces_aux:n} % \begin{macro}[EXP]{\__kernel_file_name_trim_spaces_aux:w} % Expanding the file name without expanding active characters is done % using the same token-by-token approach as for example case changing. % The finale outcome only need be \texttt{e}-type expandable, so there % is no need for the shuffling that is seen in other locations. % \begin{macrocode} \cs_new:Npn \__kernel_file_name_sanitize:n #1 { \exp_args:Ne \__kernel_file_name_trim_spaces:n { \exp_args:Ne \__kernel_file_name_strip_quotes:n { \__kernel_file_name_expand_loop:w #1 \q_@@_recursion_tail \q_@@_recursion_stop } } } \cs_new:Npn \__kernel_file_name_expand_loop:w #1 \q_@@_recursion_stop { \tl_if_head_is_N_type:nTF {#1} { \__kernel_file_name_expand_N_type:Nw } { \tl_if_head_is_group:nTF {#1} { \__kernel_file_name_expand_group:nw } { \__kernel_file_name_expand_space:w } } #1 \q_@@_recursion_stop } \cs_new:Npn \__kernel_file_name_expand_N_type:Nw #1 { \@@_if_recursion_tail_stop:N #1 \bool_lazy_and:nnTF { \token_if_expandable_p:N #1 } { \bool_not_p:n { \bool_lazy_any_p:n { { \token_if_protected_macro_p:N #1 } { \token_if_protected_long_macro_p:N #1 } { \token_if_active_p:N #1 } } } } { \exp_after:wN \__kernel_file_name_expand_loop:w #1 } { \token_to_str:N #1 \__kernel_file_name_expand_loop:w } } \cs_new:Npx \__kernel_file_name_expand_group:nw #1 { \c_left_brace_str \exp_not:N \__kernel_file_name_expand_loop:w #1 \c_right_brace_str } \exp_last_unbraced:NNo \cs_new:Npx \__kernel_file_name_expand_space:w \c_space_tl { \c_space_tl \exp_not:N \__kernel_file_name_expand_loop:w } % \end{macrocode} % Quoting file name uses basically the same approach as for % \texttt{luaquotejobname}: count the |"| tokens and remove them. % \begin{macrocode} \cs_new:Npn \__kernel_file_name_strip_quotes:n #1 { \__kernel_file_name_strip_quotes:nnnw {#1} { 0 } { } #1 " \q_@@_recursion_tail " \q_@@_recursion_stop } \cs_new:Npn \__kernel_file_name_strip_quotes:nnnw #1#2#3#4 " { \@@_if_recursion_tail_stop_do:nn {#4} { \__kernel_file_name_strip_quotes:nnn {#1} {#2} {#3} } \__kernel_file_name_strip_quotes:nnnw {#1} { #2 + 1 } { #3#4 } } \cs_new:Npn \__kernel_file_name_strip_quotes:nnn #1#2#3 { \int_if_even:nT {#2} { \__kernel_msg_expandable_error:nnn { kernel } { unbalanced-quote-in-filename } {#1} } #3 } % \end{macrocode} % Spaces need to be trimmed from the start of the name and from the end of % any extension. However, the name we are passed might not have an extension: % that means we have to look for one. If there is no extension, we still use % the standard trimming function but deliberately prevent any spaces being % removed at the end. % \begin{macrocode} \cs_new:Npn \__kernel_file_name_trim_spaces:n #1 { \__kernel_file_name_trim_spaces:nw {#1} #1 . \q_@@_nil . \s_@@_stop } \cs_new:Npn \__kernel_file_name_trim_spaces:nw #1#2 . #3 . #4 \s_@@_stop { \@@_quark_if_nil:nTF {#3} { \exp_args:Ne \__kernel_file_name_trim_spaces_aux:n { \tl_trim_spaces:n { #1 \s_@@_stop } } } { \tl_trim_spaces:n {#1} } } \cs_new:Npn \__kernel_file_name_trim_spaces_aux:n #1 { \__kernel_file_name_trim_spaces_aux:w #1 } \cs_new:Npn \__kernel_file_name_trim_spaces_aux:w #1 \s_@@_stop {#1} % \end{macrocode} % \end{macro} % \end{macro} % \end{macro} % \end{macro} % \end{macro} % \end{macro} % \end{macro} % \end{macro} % \end{macro} % \end{macro} % \end{macro} % \end{macro}

As shown, this uses a 'tl action' loop to grab each token/group in turn (which normalises {/} tokens). We can then examine each N-type token and filter out protected macros and active tokens.

The downside to this approach is it's quite slow. The current code is written to be faster, using a \csname to expand tokens and relying on the fact that UTF-8 and similar active-char support includes an \ifincsname in the definition of the active character tokens.

Thank you for this. Seems internally \token_if_active_p:N was/is used for cranking out active characters. \token_if_active_p:N does something like \ifcat\noexpand<token to test>\noexpand<active*>... This relies on active * not let equal to a non-active character-token. It also relies on <token to test> not let equal to a non-active character. (Regarding \csname..\endcsname and usage in a filename the latter wouldn't matter.) — Ulrich Diez
– Ulrich Diez, Commented Jan 17, 2022 at 17:50
@UlrichDiez True: I guess we could make \c_catcode_active_tl more obscure (I'd favour using ^^@). I think the 'active token' test is OK, as the idea is to deal with the case that the token can expand. — Joseph Wright
– Joseph Wright ♦, Commented Jan 17, 2022 at 19:03

Stack Exchange Network

Expl3 - Can you outline the trick applied with \file_parse_full_name:nNNN for preventing expansion only with active characters?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Expl3 - Can you outline the trick applied with \file_parse_full_name:nNNN for preventing expansion only with active characters?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions