1

Is it possible to define a macro \foo such that it can accept verbatim code like: \foo{cos(16 % 2)} without interpreting them as latex code first? I know I can use an environment for that, but I wanted to define a simple command as well. I'd prefer an xparse-based solution, but I am open to other stuff as well.

MWE:

\documentclass{article} \usepackage{robust-externalize} % you might need to copy the .sty from https://github.com/leo-colisson/robust-externalize \robExtConfigure{ enable fallback to manual mode, % avoid error if shell escape is forgotten/not used, print instead the command in the pdf new preset={my python exec}{ python, custom include command={\mytmpvalue}, add import={import math}, set placeholder={__ROBEXT_MAIN_CONTENT__}{write_to_out(r"\gdef\mytmpvalue{" + str(__ROBEXT_MAIN_CONTENT_ORIG__)+ r"}")} }, } \begin{document} You can compute simple values like \cacheMe[my python exec]{math.sqrt(43)} % will not compile if you replace 43 with 43 % 2, as % is a comment in latex. \end{document} 

EDIT The v argument does not seem to work as expected, maybe I'm missing something stupid (I checked, the #1 seems to have been replaced with an empty string in the cached file)… maybe an incompatiblity with the xsim package I'm using?

\documentclass{article} \usepackage{robust-externalize} \robExtConfigure{ enable fallback to manual mode, % avoid error if shell escape is forgotten/not used, print instead the command in the pdf new preset={my python exec}{ python, custom include command={\mytmpvalue}, add import={import math}, set placeholder={__ROBEXT_MAIN_CONTENT__}{write_to_out(r"\gdef\mytmpvalue{" + str(__ROBEXT_MAIN_CONTENT_ORIG__)+ r"}")} }, } \NewDocumentCommand{\myPython}{v}{% \begin{CacheMeCode}{my python exec} #1 \end{CacheMeCode} } \begin{document} Why is this printing the actual content instead of the result: \myPython{math.sqrt(42 % 9)}?? Compare with: \begin{CacheMeCode}{my python exec} math.sqrt(42 % 8) \end{CacheMeCode} \end{document} % Local Variables: % TeX-command-extra-options: "--shell-escape -halt-on-error" % End: 

enter image description here

EDIT 2

It seems to be solved , I can actually put the content into a latex3 string, which is good enough for my need. Though, I'm quite curious to understand why the above solution does not work.

EDIT3 In fact this is not solved, if I put the string in verbatim like

\ExplSyntaxOn \str_new:N \__robExt_tmp_contain_code_str \NewDocumentCommand{\robExtCacheMeCode}{O{}+v}{% {% Group %% We store the input in a non-string element for efficiently implementing "auto forward" \str_set:Nn \__robExt_tmp_contain_code_str {#2} \str_show:N \__robExt_tmp_contain_code_str }} 

Then if the input contains newline, they are actually replaced with ^^M and I can't find how to replace them with proper new lines.

9
  • 1
    Are you free to use LuaLaTeX? If yes, the posting How to handle verbatim material in LuaLaTeX may be of interest to you. Commented Mar 13, 2024 at 11:57
  • 1
    Is there an issue with the ltcmd v-type argument? Commented Mar 13, 2024 at 13:46
  • @JosephWright oh, I thought that v was only available for environments. So I tried, but the result is not the expected one, see my edit. Commented Mar 13, 2024 at 15:13
  • 1
    @Mico thanks, sadly I'd like the solution to be cross-engine Commented Mar 13, 2024 at 15:15
  • 2
    @tobiasBora The v-type is in xparse, so on an older system you should just need to load that - we've had it for a number of years Commented Mar 13, 2024 at 15:27

1 Answer 1

3

In case of having \myPython internally call \cacheMe, it might be sufficient to have \myPython gather an xparse-v-type-argument and pass that on to \cacheMe.

In case of having \myPython internally call the environment CacheMeCode, I suggest defining \myPython to grab a v-type-argument and place that between the verbatim(!) phrases \begin⁠{CacheMeCode}⁠{my python exec}⁠⟨newline⟩ and ⟨newline⟩⁠\end{CacheMeCode}⁠⟨newline⟩⁠% and pass that entire thing on to \scantokens.
(Of course, with this approach the phrase \end{CacheMeCode} must not occur within the argument of \myPython.)

Some remarks:

  • LaTeX's "mechanism" for obtaining the tokens that belong to a v-type-argument changes the category codes assigned to the characters ␣/[SP] – space, \, {, }, $, &, #, ^, _, % and ~ before having TeX tokenize .tex-input-characters for obtaining the tokens that form the v-type-argument. (Plain TeX's \dospecials additionally changes the category codes assigned to [SOH] – start of heading and [VT] – vertical tabulation). The change of category codes is done because as components of v-type-arguments these characters shall not have special functionalities but shall be treated in the same way as any other ordinary character/letter. But the category code assigned to the character [HT] – horizontal tab, which usually is 10(space), is not changed by this mechanism. This implies that usually during tokenization the processing of a horizontal tab character belonging to a v-type-argument yields an explicit character token of category 10(space) and character code 32(!). 32, however, is not the character code of the horizontal-tab, but is the character code of the space character. The reason for the change of the character code is: When tokenizing characters of .tex-input which are not components of the name of a control sequence token, then TeX normalizes characters of category code 10(space) into explicit character tokens of category 10(space) and character code 32. In order to prevent normalization to character code 32 with the horizontal tab character during tokenization, assign the horizontal tab character category code 12(other) before having TeX read and tokenize the v-type-argument.

  • For assigning the horizontal tab character the category code 12(other) via an assignment of the pattern \catcode⟨number⟩=⟨number⟩, you need to somehow denote the number of the code-point of the horizontal tab character in TeX's internal character representation scheme.

    (The internal character representation scheme of traditional 8-bit TeX engines is based on representing a single character by a group of 8bits/by a single byte. Since there are 2^8=256 possibilities for what a byte can look like, 256 different characters can be represented, which in turn can be assigned different character codes in the number range from 0 to 255. In 8-bit TeX engines the character codes 0 to 127 are assigned to characters according to the American Standard Code for Information Interchange (ASCII). In 8-bit TeX engines character codes from 128 to 255 are available for adapting TeX to non-English conditions.

    The internal character representation scheme of the TeX engines LuaTeX and XeTeX is based on unicode, whereof ASCII is a strict subset. I.e., the range of possible character codes is the range of code point numbers of unicode and character codes are assigned to characters according to the unicode standard. Usually with these engines the transformaton format for representing the character codes of characters/the numbers of unicode-codepoint-numbers of characters as sequences of bits/bytes is utf-8.)

    On the one hand I like to denote the number of the code point of a character in TeX's internal character representation scheme by means of what in the TeXbook's Backus-Naur-notation of the grammar of TeX is called an ⟨alphabetic constant⟩, whereby the ⟨alphabetic constant⟩ is formed by the explicit character token `12, trailed by a one-letter-control sequence token were the letter forming the name of the control sequence token is the character whose codepoint's number is to be obtained, trailed by ⟨one optional space⟩.

    E.g., in contexts where TeX gathers a ⟨number⟩, `\j␣ denotes the number of the code point in TeX's internal character representation scheme of the character j.

    E.g., in contexts where TeX gathers a ⟨number⟩, `\⟨horizontal tab character⟩ denotes the number of the code point in TeX's internal character representation scheme of the horizontal tab chatracter.

    On the other hand I don't like to type horizontal-tab-characters into .tex-input-files directly as when looking at the .tex-input-file in an editor, the displaying of things with horizontal-tab-characters might cause confusion. Therefore I tend to denote in .tex-input files the horizontal-tab-character via TeX's ^^-notation. ^^-notation is a means for providing substitute-representations of characters only in terms of printable characters. In TeX's ^^-notation the horizontal-tab-character can be denoted as ^^I: Horizontal-tab has code point number 9 in TeX's internal character representation scheme while the 9th letter of the Latin alphabet is I, which in turn has code point number (decimal) 64+9=73 in TeX's internal character representation scheme.

    So in contexts where TeX gathers a ⟨number⟩, `\^^I␣ denotes the number of the code point in TeX's internal character representation scheme of the character represented as ^^I, i.e., of the horizontal tab character.

  • TeX reads input line by line from the .tex-input-file. I.e., when TeX needs to read from the .tex-input-file, always an entire line is read. When TeX needs tokens, it tokenizes characters from the line. The crucial point is: Whenever TeX reads a line from a file, it does some pre-processing, even before tokenization. In this stage of pre-processing all space characters at the right end of the line are removed and - in case the value of \endlinechar is in the range of code point numbers of TeX's internal character encoding scheme - a character is appended, whose code point number in TeX's internal character encoding scheme equals the value of the integer parameter \endlinechar current at the time of reading and pre-processing the line. That parameter in turn usually has the value 13 and thus denotes the carriage return character. The circumstance of removal of space characters at right ends of lines taking place even before tokenization taking place implies that with a non-LuaTeX-TeX-engine/alone with the TeX-frontend of a LuaTeX-engine you cannot verbatim-read/verbatim-copy things in a way where spaces at line ends are preserved.

\documentclass{article} \usepackage{robust-externalize} \robExtConfigure{ enable fallback to manual mode, % avoid error if shell escape is forgotten/not used, print instead the command in the pdf new preset={my python exec}{ python, custom include command={\mytmpvalue}, add import={import math}, set placeholder={__ROBEXT_MAIN_CONTENT__}{write_to_out(r"\gdef\mytmpvalue{" + str(__ROBEXT_MAIN_CONTENT_ORIG__)+ r"}")} }, } %---------------------------------------------------------------------- \NewDocumentCommand\myOtherPython {} {% \begingroup % The fake-file-writing-part of `\scantokens` shall take returns for % directives to write another line: \newlinechar=\endlinechar % horizontal tab (ASCII 9, 9th letter of alphabet is I) has % category 10(space) even in verbatim-mode, so give it % category 12(other). \catcode`\^^I=12\relax \myOtherPythoninner } \NewDocumentCommand{\myOtherPythoninner}{+v+v}{% \RenewDocumentCommand{\myOtherPythoninner}{v}{% \scantokens{#1##1#2}% }% }% \myOtherPythoninner{\endgroup\begin{CacheMeCode}{my python exec} }{ \end{CacheMeCode} %} % The percent with `%}` must be! It is read verbatim and goes verbatim % into the final definition of \myOtherPythoninner. It is the last thing % fed to \scantokens. \scantokens is like writing tokens unexpanded to file % and then reading that file, hereby tokenizing things according to the current % catcode-régime. Reading files is linewise, with pre-processing, i.e. % removal of spaces at line ends and appending a character according % to \endlinechar. The percent will be right before the endline- % character inserted at the end of the last "line", in the stage of % tokenizing causing TeX to skip that endline-character as a comment. %---------------------------------------------------------------------- \NewDocumentCommand\myPython {} {% \begingroup % horizontal tab (ASCII 9, 9th letter of alphabet is I) has % category 10(space) even in verbatim-mode, so give it % category 12(other). \catcode`\^^I=12\relax \myPythoninner } \NewDocumentCommand{\myPythoninner}{v}{\endgroup\cacheMe[my python exec]{#1}}% %---------------------------------------------------------------------- \begin{document} On my system you see the result: !!\myOtherPython{math.sqrt(42 % 9)}!! On my system you see the result: !!\myPython{math.sqrt(42 % 9)}!! Compare with: !!\begin{CacheMeCode}{my python exec} math.sqrt(42 % 8) \end{CacheMeCode} !! \end{document} % Local Variables: % TeX-command-extra-options: "--shell-escape -halt-on-error" % End: 

enter image description here




Now let's look at your second example, where you try with a v-type-argument as follows:

Why is this printing the actual content instead of the result: \myPython{math.sqrt(42 % 9)}?? 

The environment CacheMeCode is the same as the environment RobExtCacheMeCode, which in turn calls the starting-command of the environment RobExtPlaceholderFromCode. That command in turn does \XSIMfilewritestart. \XSIMfilewritestart is defined in xsimverb.sty so that its file-writing-machinery works by making the endline-character (carriage-return-character, character code 13) active and having the active carriage-return-character-token as a macro which as its delimited argument takes things up to the next active carriage-return-character token and after processing its argument reinserts the active carriage-return-character-token removed as argument-delimiter. Thus \XSIMfilewritestart's file-writing machinery the first time affects things when encountering the next end of a line/the next endline-character after processing \begin{CacheMeCode} and tokenizing that endline-character as active character token. Most of the stuff between \begin{CacheMeCode} and the next end of a line/the next endline-character is just tokenized and processed and typeset as usual.

However, when—as in your scenario—everything, both \begin{CacheMeCode} and the body of the environment and \end{CacheMeCode}, is the result of expanding the macro \myPython, then the command for ending the environment is encountered before encountering the next linebreak/the next endline-character after processing \begin{CacheMeCode}. Thus the file-writing machinery initialized by \begin{CacheMeCode}\RobExtPlaceholderFromCode\XSIMfilewritestart is deinitialized by \end{CacheMeCode} before it had a chance to come into action. Therefore the entire body of the environment, i.e., the v-type-argument of \myPython, is typeset into the .pdf-file and nothing of the body is written to external text file.


Now let's look at the question in your comment:

actually if I use \str_set:Nn \l_tmpa_str {#2}, then new lines are turned into ^^M. Do you know why/how to avoid this?.

According to interface3.pdf via \str_set:Nn the token list #2 is converted into a string and the conversion-result is stored in \l_tmpa_str. Conversion is done by the routine \tl_to_str:n which is just another name for the primitve \detokenize. \detokenize is like applying \string to every token and appending an explicit space token to the stringifications of control word tokens and doubling stringifications of explicit character tokens of category 6(hash). (\string in turn delivers explicit character tokens of category 12(other), the only exception is that spaces are delivered as explicit character tokens of category 10(space). The result of applying \string to a ⟨control word token⟩ is affected by the parameter \escapechar.)
Character translation to ^^-notation is done neither with \detokenize nor with \scantokens. Character translation is dependent both on the computer platform and on the TeX-engine in use and is only done when TeX does really write something to an external text file or to the screen/console/shell.

You can avoid character translation with carriage return characters by using a TeX-engine which is completely permissive and does not do character translation at all or which at least does not do character translation with the carriage return character. With modern TeX platforms TeX binaries have command line options and configuration files (.tcx-files) for affecting character translation.

To some degree you can handle newlines by giving both \newlinechar and \endlinechar the value 13 which denotes the carriage return character, and giving the carriage return character category 12. With these settings, the endline-character in the stage of pre-processing appended at the end of each line of .tex-input yields, at the time of tokenizing, an explicit carriage return character token of category 12. At the time of writing, such an explicit carriage return character token of category 12 is taken for a directive to start writing another line of text according to the conditions of your computer platform rather than being written as a character and hereby probably being subject to character translation into ^^-notation.


Armed with the knowledge about when character translation is done and when it is not done, let's look at:

EDIT3 In fact this is not solved, if I put the string in verbatim like

\ExplSyntaxOn \str_new:N \__robExt_tmp_contain_code_str \NewDocumentCommand{\robExtCacheMeCode}{O{}+v}{% {% Group %% We store the input in a non-string element for efficiently implementing "auto forward" \str_set:Nn \__robExt_tmp_contain_code_str {#2} \str_show:N \__robExt_tmp_contain_code_str }} 

Then if the input contains newline, they are actually replaced with ^^M and I can't find how to replace them with proper new lines.

Reading from the .tex-input file and tokenizing tokens belonging to a +v-argument takes place while the integer parameter \endlinechar is set to 13, which in TeX's internal character representation scheme denotes the carriage-return-character, and category code 12(other) is assigned to the carriage-return-character. Therefore, while reading and tokenizing tokens belonging to a +v-argument, a linebreak yields a single explicit carriage-return character-token of category 12(other), i.e., an explicit character token of character code 13 and category 12(other). Applying \str_set:Nn doesn't do any changes to such character tokens as they are already of category 12.

However, when \str_show:N writes the value of the string-variable to shell/screen/console/standard output, during that writing character-translation takes place and therefore explicit carriage-return character tokens are displayed as ^^M. So the ^^M you see on the shell/screen/console/standard output are not something the explicit carriage-return character tokens of your +v-argument / of the stringification of your +v-argument are replaced with. They are just the way in which carriage-return-characters are displayed on shell/screen/console/standard output with TeX engines where character translation at writing-time takes place.

If in your code you slightly change things so that, instead of displaying the entire string, the meaning of each character of the string is displayed, you see that the sequence ^^M displayed on the shell/screen/console/standard output does not stand for three tokens ^, ^ and M but does represent a single token, namely the explicit carriage-return character token:

\ExplSyntaxOn \str_new:N \__robExt_tmp_contain_code_str \NewDocumentCommand{\robExtCacheMeCode}{O{}+v}{% {% Group \str_set:Nn \__robExt_tmp_contain_code_str {#2} \tex_message:D {^^J \str_map_function:NN \__robExt_tmp_contain_code_str \mystuff_meaningAndLinebreak } }} \cs_new:Npn \mystuff_meaningAndLinebreak #1 {\cs_meaning:N #1 ^^J} \ExplSyntaxOff \robExtCacheMeCode{z Z} \stop 

On the shell/screen/console/standard output you get the message:

 the character z the character ^^M the character Z 

If the sequence ^^M which you see on the shell/screen/console/standard output would stand for three character tokens ^, ^, M instead of a single explicit carriage-return character token, then the message would be different:

Instead of the single line

the character ^^M 

, where ^^M due to character translation at the time of writing to shell/screen/console/standard output, also stands for a single carriage-return character, you would have three lines

the character ^ the character ^ the character M 

.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.