How can I change the catcode within a command, then reference input using the new catcode?

Question

pdfTeX: I am attempting to define a command which

takes an input,
redefines the catcode and function of a character, and
prints the input using the new character definition.

I recognize that this is quite a convoluted task, but I'm guessing this is possible and outside my current understanding. I consulted this answer, which works quite well, and is what my code is based off. I saw this and this question dealing with similar topics, but as you'll see in my code, aren't illuminating for my particular requirements.

Minimal (not) working example:

\documentclass{minimal} \def\transfigured#1{\catcode`|=\active% allows | as command \begingroup\endlinechar-1% \everyeof{{#1}\endgroup}\scantokens{\gdef|}}% \def\subsumed#1{\begingroup \transfigured{#1} | \endgroup}% should simply do the same \begin{document} \begingroup \transfigured{test}% gets desired output | \endgroup \subsumed{test}% doesn't \end{document}

which begets:

As far as I can tell, \protect and other such commands aren't the issue, but it feels like a fundamental mistake. Any help would be much appreciated.

EDIT: I have been asked to post the original function, to avoid an XY-problem.

\documentclass{minimal} \usepackage{tikz} \makeatletter \newsavebox{\internal@inner} \newlength\arg@ht \newlength\arg@dp %%% TRANSFIGURE command: changes appearance of character \def\transfigure#1#2{\catcode`#1=\active% alters char #1 \begingroup\endlinechar-1% to #2 \everyeof{{#2}\endgroup}\scantokens{\gdef|}}% %%% DELIM-MID-DELIM command: scans for | in arg \def\delimmiddelim#1#2#3#4{% 1: arg 2: left 3: mid 4: right \text{\savebox{\internal@inner}{$#1$}% \arg@ht=\ht\internal@inner\relax% \arg@dp=\dp\internal@inner\relax% % minimum height: \ifdim\arg@ht<1.2ex\relax \arg@ht=1.2ex\relax \fi % minimum depth: \ifdim\arg@dp<.2ex\relax \arg@dp=.2ex\relax \fi % output \ensuremath{\begingroup \transfigure{|}{#3{\arg@dp}{\arg@ht}}% changes | #2{\arg@ht}{\arg@dp}\scantokens{#1}#4{\arg@ht}{\arg@dp} \endgroup}}}% returns | to normal use %%% ANGLED BRACKETS: \newcommand\@angl[2]{\ensuremath{\mathopen{\mkern1mu\relax% \begin{tikzpicture}[scale=1, baseline=0mm, line width=.1375mm+.00625*#1+.00625*#2, line cap=round, line join=miter] \draw (.2mm+.175*#1+.175*#2,.2mm+#1) -- (0mm,.5*#1-.5*#2) -- (.2mm+.175*#1+.175*#2,-.2mm-#2); \end{tikzpicture}%\mkern-5mu\relax }}} \newcommand\@angr[2]{\ensuremath{\mathclose{%\mkern-5mu\relax% \begin{tikzpicture}[scale=1, baseline=0mm, line width=.1375mm+.00625*#1+.00625*#2, line cap=round, line join=miter] \draw (-.2mm-.175*#1-.175*#2,.2mm+#1) -- (0mm,.5*#1-.5*#2) -- (-.2mm-.175*#1-.175*#2,-.2mm-#2); \end{tikzpicture}\mkern1mu\relax }}} %%% MIDLINE: \newcommand\@vertm[2]{\ensuremath{\mathrel{\relax% \begin{tikzpicture}[scale=1, baseline=-1.75mm, line width=.1375mm+.00625*#1+.00625*#2, line cap=round] \draw (0mm,.2mm+#1) -- (0mm,-.2mm-#2); \end{tikzpicture}\relax }}}% relation \newcommand{\brkt}[1]{\delimmiddelim{#1}{\@angl}{\@vertm}{\@angr}} \makeatother \begin{document} \[\brkt{\int|\sum|x}\] \end{document}

It works as far as I'm concerned, but feedback and improvements are appreciated. Here is the output:

$output for \brk{}$

You could make active for the definition of \subsumed, as in \catcode`|=\active \def\subsumed#1{\begingroup \transfigured{#1}|\endgroup}\catcode`|=12 — Steven B. Segletes
– Steven B. Segletes, Commented Dec 12, 2023 at 20:21
Is this just for academic interest or do you have an application in mind? In the latter case, the risk of an XY-question is high. — egreg
– egreg, Commented Dec 12, 2023 at 20:40
@egreg Thank you for your concern. I was able to get things sorted. I have defined a command which gives adaptive bra-ket notation: \brkt{...|...|...}. The question was in trying to figure out how to interpret the |'s so that I don't need to define multiple commands with different numbers of arguments. — Jeff Buffkin
– Jeff Buffkin, Commented Dec 12, 2023 at 21:15
@JeffBuffkin As I suspected, this was an XY-question and you probably want “math active” rather than “active“. — egreg
– egreg, Commented Dec 13, 2023 at 7:58
@JeffBuffkin I believe it's better if you ask the real question about the bra-ket notation. — egreg
– egreg, Commented Dec 13, 2023 at 21:42

Ulrich Diez · Accepted Answer · 2023-12-13 22:00:19Z

The main issue might be that | within the replacement text of the definition of \subsumed is not active. You can do \scantokens{|}.

\documentclass{minimal} \def\transfigured#1{\catcode`|=\active% allows | as command \begingroup\endlinechar-1\relax \everyeof{{#1}\endgroup}\scantokens{\gdef|}}% % This entire definition is tokenized while | is _not_ active: \def\subsumed#1{\begingroup \transfigured{#1}% \scantokens{|}% \endgroup}% should simply do the same \begin{document} \begingroup \transfigured{test}% gets desired output |% \endgroup \subsumed{test}% \end{document}

By the way: Using the fact that in LaTeX ~ usually is active you can get active | also via \lccode and \lowercase:

\documentclass{minimal} \def\transfigured{% \begingroup \lccode`\~=`\|\relax \lowercase{\endgroup\def~}% }% \def\subsumed#1{% \transfigured{#1}% \begingroup \lccode`\~=`\|\relax \lowercase{\endgroup~}% }% \begin{document} \transfigured{test}% \begingroup\lccode`\~=`\|\relax\lowercase{\endgroup~}% \subsumed{test}% \end{document}

Academic info:

You can (ab)use \futurelet for getting rid of end-of-file-markers that come into being due to \scantokens or the \input-primitive— in LaTeX \input is redefined and the \input-primitive is renamed to \@@input:

\documentclass{minimal} \begingroup\catcode`\%=12\relax \def\percentchar{\endgroup\def\percentchar{%}}\percentchar \def\neutralizeoutertoken{% \expandafter\transfiguredRemoveEOFMarkerwithArgAsRelax\noexpand }% \def\transfiguredRemoveEOFMarkerwithArgAsRelax#1{\let#1=\relax\transfiguredRemoveEOFMarker{#1}}% \def\transfiguredRemoveEOFMarker#1{% \endgroup % <- undo \transfigured's catcode-change before \futurelet % "looks ahead" and probably triggers tokenization \begingroup \let#1=\relax \def\transfiguredRemoveEOFMarker{\endgroup\def#1}% \futurelet\scratchy\transfiguredRemoveEOFMarker }% \def\transfigured{% \begingroup \catcode`|=\active % allows | as command \catcode`\%=14 % allows % as comment % active | might be defined \outer ... \expandafter\neutralizeoutertoken\scantokens\expandafter{\expandafter|\percentchar}% }% \begingroup \catcode`|=\active \csname @firstofone\endcsname{% \endgroup \def\subsumed#1{\transfigured{#1}|}% }% \begin{document} {\catcode`|=\active \outer\gdef|{Huh?}} \transfigured{test}% define active | with <replacement text> "test" \begingroup\catcode`|=\active\csname @firstofone\endcsname{\endgroup|}% \subsumed{test}% \end{document}

Thank you for your informative response. I've seen the lowercase trick, but somehow I like this formulation better. — Jeff Buffkin
– Jeff Buffkin, Commented Dec 12, 2023 at 20:45

Ulrich Diez · Accepted Answer · 2023-12-14 23:11:38Z

With character tokens of category 11(letter) and 12(other) by assigning the \mathcode-value "8000 you can pretend within mathmode that they are active.

Thus the following works out as long as the argument of \brkt does not contain | in places where TeX will not be in mathmode when typesetting:

\documentclass{minimal} \usepackage{tikz} \usepackage{amsmath} \makeatletter \newsavebox\internal@inner %%% TRANSFIGURE command: changes appearance of character \newcommand\transfigure[1]{% alters active char #1 to #2 \begingroup \lccode`\~=`#1\relax \lowercase{\endgroup\def~}% }% %%% DELIM-MID-DELIM command: scans for | in arg \newcommand\delimmiddelim[4]{% 1: arg 2: left 3: mid 4: right \begingroup \text{% \savebox{\internal@inner}{\ensuremath{#1}}% % minimum height: \ifdim\ht\internal@inner<1.2ex\relax \ht\internal@inner=1.2ex\relax \fi % minimum depth: \ifdim\dp\internal@inner<.2ex\relax \dp\internal@inner=.2ex\relax \fi %------- % output %------- % change the definition of active | : \transfigure{|}{#3{\dp\internal@inner}{\ht\internal@inner}}% % within math-mode pretend that | is active: \mathcode`\|="8000\relax \ensuremath{% #2{\ht\internal@inner}{\dp\internal@inner}#1#4{\ht\internal@inner}{\dp\internal@inner}% }% }% \endgroup }% %%% ANGLED BRACKETS: \newcommand\@angl[2]{\ensuremath{\mathopen{\mkern1mu\relax% \begin{tikzpicture}[scale=1, baseline=0mm, line width=.1375mm+.00625*(#1)+.00625*(#2), line cap=round, line join=miter] \draw ({.2mm+.175*(#1)+.175*(#2)},{.2mm+(#1)}) -- ({0mm},{.5*(#1)-.5*(#2)}) -- ({.2mm+.175*(#1)+.175*(#2)},{-.2mm-(#2)}); \end{tikzpicture}%\mkern-5mu\relax }}} \newcommand\@angr[2]{\ensuremath{\mathclose{%\mkern-5mu\relax% \begin{tikzpicture}[scale=1, baseline=0mm, line width=.1375mm+.00625*(#1)+.00625*(#2), line cap=round, line join=miter] \draw ({-.2mm-.175*(#1)-.175*(#2)},{.2mm+(#1)}) -- ({0mm},{.5*(#1)-.5*(#2)}) -- ({-.2mm-.175*(#1)-.175*(#2)},{-.2mm-(#2)}); \end{tikzpicture}\mkern1mu\relax }}} %%% MIDLINE: \newcommand\@vertm[2]{\ensuremath{\mathrel{\relax \begin{tikzpicture}[scale=1, baseline=-1.75mm, line width=.1375mm+.00625*(#1)+.00625*(#2), line cap=round] \draw ({0mm},{.2mm+(#1)}) -- ({0mm},{-.2mm-(#2)}); \end{tikzpicture}\relax }}}% relation \newcommand{\brkt}[1]{\delimmiddelim{#1}{\@angl}{\@vertm}{\@angr}} \makeatother \begin{document} \[\brkt{\int|\sum|x}\] \end{document}

Addendum for answering questions asked in a comment from December 14, 2003:

> I appreciate the improvements.

Judging whether these changes are to be considered improvements is up to you. ;-)

> Could you clarify
> (1) why changing the math code is better?

Judging whether this is better is up to you. ;-)

Let's mention some side effects of \scantokens:

\scantokens takes ⟨balanced text⟩ as argument and pretends writing the tokens that form the ⟨balanced text⟩ unexpanded to an external text file and then processing that external text file via the \input-primitive.

(I say "pretends" because, instead of creating a text file, an area of the computer's volatile ram is used.

Unlike \write \scantokens' pretended writing is without character translation/is without converting to ^^-notation and without re-encoding in the computer-platform's character encoding scheme. The keyword "character translation" is related to these .tcx-files mentioned in the manuals of Web2C-implementations of TeX distributions like MiKTeX or TeX Live.

Processing a text file via \input means that (step 1) a line of the text file is read whenever the buffer holding not-yet processed characters of the line read as the last one is empty while more characters are needed and that (step 2) not-yet processed characters of the line are taken for directives for producing tokens and appending them to the token-stream whenever more tokens are needed.

With processing a text file via \input reading a line of text and filling the buffer holding not-yet processed characters of the line read as the last one goes along with some pre-processing whereby characters coming from the file are converted from the computer platform's character encoding scheme to TeX's internal character encoding scheme.

With \scantokens' pretended inputting converting from the computer platform's character encoding scheme to TeX's internal character encoding scheme in the stage of pre-processing a line of .tex-input is omitted. That's because here the characters already are encoded in TeX's internal character encoding scheme.)

\scantokens' pretended writing implies the same peculiarities as with TeX's writing of tokens unexpanded to a real text file or screen via \write:

E.g., when writing, explicit character tokens of category 6(parameter) are doubled.
So with \scantokens usually hashes are doubled.
E.g., when writing a control word token, i.e., a control sequence token whose name consists either of a single character whose current category code is 11(letter) or consists of several characters of whatsoever category code, a space character is appended.

\scantokens' pretended processing via \input implies that (almost) the same things are done that are done when TeX line by line reads a .tex-input-file and takes the characters of that line for directives for creating tokens when tokens are needed:

When the end of the pretended file is reached, the tokens forming the value of the ⟨token parameter⟩ \everyeof are appended to the token stream.
With \scantokens tokenization starts with the category code régime current at the time of starting to carry out \scantokens. (In case the ⟨balanced text⟩ which forms the argument of \scantokens contains things that are tokenized as directives for changing the category code régime whereafter these directives get carried out, these changes may affect how subsequent things also coming from that ⟨balanced text⟩ are tokenized.)

The category code régime which is current at the time of starting to carry out \scantokens is not necessarily the same as the category code régime which was current at the time of tokenizing the tokens which form the ⟨balanced text⟩/the argument of \scantokens.

Thus, if you do

\scantokens{#}%

, you get the tokens #₆#₆␣₁₀—␣₁₀ denotes an explicit character token of category 10(space) and character code 32 which comes into being due to TeX's \endlinechar-mechanism and the circumstance of TeX's reading-apparatus being in state M (middle of line) after producing #₆ via tokenization. Colloquially such a ␣₁₀-token is called space token. Space tokens yield horizontal glue if processed while TeX is typesetting things in horizontal mode or in restricted horizontal mode. Unless in special situations where they are removed, e.g., as ⟨optional space⟩ when TeX is gathering tokens that make components of a TeX ⟨number⟩ quantity, or between two undelimited arguments of a macro.
Additionally tokens forming the current value of the ⟨token parameter⟩ \everyeof are appended to the token-stream.

If you do

 \makeatletter\def\macro{\scantokens{\@macro}}\makeatother [...] \macro

, then the instance of \macro carried out after \makeatother does not yield a single token \@macro but does yield tokens \@m₁₁a₁₁c₁₁r₁₁o₁₁␣₁₀.
Additionally tokens forming the current value of the ⟨token parameter⟩ \everyeof are appended to the token-stream.

If you do

\catcode`\A=12\relax

, then the .tex-input \ABC is tokenized as \AB₁₁C₁₁.

If you do

\catcode`\A=12\relax \def\macro{\scantokens{\ABC}}% \macro \catcode`\A=11\relax \macro

, then the first instance of \macro yields the tokens \AB₁₁C₁₁␣₁₀.
Additionally tokens forming the current value of the ⟨token parameter⟩ \everyeof are appended to the token-stream.

The second instance of \macro yields the control wordl token \ABC.
Additionally tokens forming the current value of the ⟨token parameter⟩ \everyeof are appended to the token-stream.

> You mention that it typesets the appropriate symbol in non-math environments, but originally, the savebox was set so that the input was in math mode.

I think I mentioned that doing things via (locally) changing the \mathcode of | only works out for things that are typeset while TeX is in mathmode. ;-)

Somebody might do s.th. weird like \[\brkt{\int\hbox{text with |}|\sum|x}\]. The | in \hbox{text with |} would not be processed in mathmode but in restricted horizontal mode where the \mathcode-change-thingie does not apply and thus | is not treated as if it was active.

I think this might be an advantage as it seems that you wish special behavior of | only in mathmode.
But judging this is up to you.

> (2) Why not defining new lengths is better?

A matter of taste. Saving some length registers and assignments.

> Wouldn't this new formulation potentially stretch the contents of the box?

It does not change/stretch the contents, it just changes the borders of the box.

If you decrease height/depth/width, stuff is still there unstretched. But some of that stuff may stick out of the borders of the box/may stick out of what is considered the area of the box. The borders of the box, however, are the relevant criterion for placing other boxes or surrounding glue or or surrounding \kern or surrounding \leaders. Thus having a box whose material sticks out of the borders of the box may lead to the material sticking out being overlapped by material coming with other boxes.

If you increase height/depth/width, stuff is still there unstretched. But the area which makes the "inside" of the box is considered to be larger.

You can get a visible impression of the measurements of a box by putting an \fbox around it while \fboxsep is set to 0pt. (\fboxsep is the distance/the width of the empty space between the (invisible) border of the box and the (visible) lines/rules of the frame drawn around the box.)

The following snippet of LaTeX 2ε-code places into the box register \mybox a box where you can see in black color the baseline of that box and some great text.

First this box is typeset with unchanged box-measurements while putting a red frame around it via \fbox.

Then the measurements of that box are decreased/increased and it is typeset again while putting a red frame around it via \fbox.

The red frames indicate what is considered the area covered by the box.

\documentclass[border=1cm]{standalone} \usepackage{color} \newsavebox\mybox \begin{document} \huge \savebox\mybox{% \hbox{% \savebox\mybox{\hbox{Some great text}}\usebox\mybox\llap{% \hbox to\wd\mybox{\leaders\hrule height .8pt \hfill \kern 0pt }}% }% }% \begin{minipage}{27.7cm}% \color{red}% \fboxsep=0pt % Switch to horizontal mode, i. e., the mode where TeX does the breaking of paragraphs % into lines for you automatically: \leavevmode % Draw red lines showing the baseline of the surrounding line of text: \hrulefill % Place the box on the baseline of the surrounding line of text % and have TeX draw a red frame at the borders of the box: \fbox{\usebox\mybox}% % Draw red lines showing the baseline of the surrounding line of text: \hrulefill % Decrease the measurements of the box: \ht\mybox=.5\dimexpr\ht\mybox\relax \dp\mybox=.5\dimexpr\dp\mybox\relax \wd\mybox=.5\dimexpr\wd\mybox\relax % Place the box on the baseline of the surrounding line of text % and have TeX draw a red frame at the borders of the box: \fbox{\usebox\mybox}% % Draw red lines showing the baseline of the surrounding line of text: \hrulefill % Increase the measurements of the box: \ht\mybox=4\dimexpr\ht\mybox\relax \dp\mybox=4\dimexpr\dp\mybox\relax \wd\mybox=4\dimexpr\wd\mybox\relax % Place the box on the baseline of the surrounding line of text % and have TeX draw a red frame at the borders of the box: \fbox{\usebox\mybox}% % Draw red lines showing the baseline of the surrounding line of text: \hrulefill\null \end{minipage}% \end{document}

The output is:

The red frames indicate what is considered the borders of the boxes.

The red lines indicate what TeX considers outside these boxes.

The black content of the boxes is not stretched/shrunk, but decreasing height/depth/width yields that some stuff sticks out of the borders of the box and therefore is outside the box and thus is not taken into account when calculating the positioning of other material. Increasing height/depth/width yields that the area wherein the content of the box is placed and which makes the "inside" of the box is considered to be larger.

If you wish to shrink/stretch the contents of a box, you can, e.g., use the command \scalebox of the package graphicx.

> (3) Why is the lowercase trick better?

Judging whether it is better is up to you.

With \lowercase instead of \scantokens you don't run into potential troubles related to

TeX's subtleties with unexpanded writing tokens (hash doubling, appending space-characters to control word tokens) and retokenizing under different category code régime probably yielding a set of tokens that differs from the original one in ways not desired.
messing around with the ⟨token parameter⟩ \everyeof. (What if the argument of \brkt itself contains another \scantokens-directive where you don't wish changes to the value of \everyeof to be applied? You'd need to locally undo the changes to \everyeof inside the argument of \brkt...)

> (4) Why did you alter the tikzpictures with braces and parentheses? Is this simply for aesthetics?

The parentheses are for ensuring carrying out math-operations in proper order in case the arguments themselves contain not just a single token like \arg@ht but contain several tokens like \ht\internal@inner or forming mathematical expressions.

The curly braces are to ensure that parentheses delimiting a comma-separated thing,

(..., ...) -- (..., ...)

, are not erroneously matched up by parentheses belonging to mathematical expressions that are components of what is denoted by ....

So it is like

({...},{...}) -- ({...},{...})

for ensuring that ( or ) inside {...} do not erroneously match up ( or ) outside {...}.

I appreciate the improvements. Could you clarify (1) why changing the math code is better? You mention that it typesets the appropriate symbol in non-math environments, but originally, the savebox was set so that the input was in math mode. (2) Why not defining new lengths is better? Wouldn't this new formulation potentially stretch the contents of the box? (3) Why is the lowercase trick better? (4) Why did you alter the tikzpictures with braces and parentheses? Is this simply for aesthetics? — Jeff Buffkin
– Jeff Buffkin, Commented Dec 14, 2023 at 5:10
@JeffBuffkin I just added a few remarks to my answer for addressing your questions... — Ulrich Diez
– Ulrich Diez, Commented Dec 14, 2023 at 19:16

egreg · Accepted Answer · 2023-12-16 23:21:49Z

Too long for a comment.

You just need a math active |, no need of \scantokens.

The \brkt commands can be nested without problems using the code below.

\documentclass{article} \NewDocumentCommand{\brkt}{m}{% \begingroup \activatebar \mathcode`|="8000 \left\langle #1 \right\rangle \endgroup } \newcommand{\activatebar}{% \begingroup\lccode`~=`|\lowercase{\endgroup\gdef~}{\:\middle|\:}% } \begin{document} \[ \brkt{\int|\sum_{k=1}^n \brkt{a_k|c_k} | x} \] \end{document}

Stack Exchange Network

How can I change the catcode within a command, then reference input using the new catcode?

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

How can I change the catcode within a command, then reference input using the new catcode?

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions