Peek ahead and process characters

Question

I want to replace unicode character pairs in XeTeX similar to Country flags unicode char. I don't want to specify every option (every country code) and want to be flexible for other emoji (skin color variations, other non-country-code flags etc.). The problem with the code in the mentioned answer is, that it doesn't allow single pairable characters (even if I modify the False-statement accordingly).

After days of fiddling I found a solution (see below) which works for almost all cases I could imagine.

My questions:

How can I check the last cases (9: following ~ and 21: following alignment character) and why is it not working if I put a letter between my character and the alignment character (case 22) but works with a seperating space (case 21)?
Have I forgotten any cases?
Is there a more elegant way? My solution seems very crude to me.

My solution including example-list with mentioned cases (not working cases [last two items] are commented out):

\documentclass[varwidth,border=10pt]{standalone} \usepackage{expl3} \usepackage{newunicodechar} \renewcommand{\familydefault}{\sfdefault} % math is more obvious \ExplSyntaxOn \cs_new_protected:Npn \single_uni:n #1 { \int_to_Hex:n{`#1} } % method is very specific, am I forgetting something? Not possible with ^^7e(~), ... \cs_new_protected:Nn \dual_uni:n { \peek_catcode:NTF \c_space_token { \int_to_Hex:n{`#1} }{ \peek_catcode:NTF \c_other_token { \int_to_Hex:n{`#1} }{ \peek_charcode:NTF ^^24 { % math, \c_math_toggle_token not working \int_to_Hex:n{`#1} }{ \peek_charcode:NTF ^^5e { % math sub, \c_math_subscript_token not working \int_to_Hex:n{`#1} }{ \peek_charcode:NTF ^^5f { % math super, \c_math_superscript_token not working \int_to_Hex:n{`#1} }{ \peek_catcode:NTF \c_group_end_token { % end group \int_to_Hex:n{`#1} }{ \peek_catcode:NTF \c_group_begin_token { % begin group \int_to_Hex:n{`#1} }{ \peek_catcode:NTF \c_alignment_token { % has no effect?! \int_to_Hex:n{`#1} }{ \dual_uni_cont:nn{#1} } } } } } } } } } \cs_new_protected:Nn \dual_uni_cont:nn{ \textbf{\int_to_Hex:n{`#1}}\textit{\int_to_Hex:n{`#2}} } \newunicodechar{➀}{\single_uni:n{➀}} % single_uni \newunicodechar{➁}{\dual_uni:n{➁}} % dual_uni % ➂ undeclared newunicodechar \ExplSyntaxOff \begin{document} \begin{enumerate} \item[] expected \quad -- \quad result \item ➂ \quad -- \quad ➂ % comparsion undeclared uni (expected blank, because not in font) \item 2780 \quad -- \quad ➀ % comparsion single_uni \item \textbf{2781}\textit{78} \quad -- \quad ➁x % following letter \item \textbf{2781}\textit{21} \quad -- \quad ➁! % following other \item \textbf{2781}\textit{2782} \quad -- \quad ➁➂ % following unicode (other) \item \textbf{2781}\textit{2781} \quad -- \quad ➁➀ % following "newunicode" as \single_uni (active?!) \item \textbf{2781}\textit{2780}x \quad -- \quad ➁➁x % following "newunicode" as \dual_uni (active?!), following letter \item 2781 x \quad -- \quad ➁ x % following space \item 2781~x \quad -- \quad ➁~x % following active \item 2781 \quad -- \quad ➁\\ % following newline \phantom{nothing} % nothing for a new line \item $\textbf{2781}\textit{78}$ \quad -- \quad $➁x$ % in math (following letter) \item $2781^x$ \quad -- \quad $➁^x$ % following math superscript \item $2781_x$ \quad -- \quad $➁_x$ % following math subscript \item $2781$ \quad -- \quad $➁$ % in math (following math toggle) \item 2781\$ \quad -- \quad ➁\$ % following \$ \item 2781\textbullet \quad -- \quad ➁\textbullet % following command \item 2781\footnote{x} \quad -- \quad ➁\footnote{x} % following command \item {2781} \quad -- \quad {➁} % following group end \item 2781{x} \quad -- \quad ➁{x} % following group begin \item 2781 \quad -- \quad ➁% %x following comment \item \begin{tabular}{llllll} % in tabular, following space 2781&x &\quad -- \quad& ➁ x& x\\ \end{tabular} \item \begin{tabular}{llllll} % in tabular, following letter, alignment 2781&x &\quad -- \quad& %➁x& x\\ \end{tabular} \item \begin{tabular}{llllll} % in tabular, following alignment, space, letter 2781&x &\quad -- \quad& %➁& x\\ \end{tabular} \end{enumerate} \end{document}

The result looks like this at the moment (the last two items have no result): If characters, which are defined as potential double-characters find a following "partner", the result is the original character in bold font and the following in italics. I the character is nou double-character or hasn't found a "partner", it is printed normally:

Update: I tried another approach with different problems: Compare macro names instead of meaning. But this question still needs answering...

Yes. I had linked it in the first sentence :D As mentioned I don't want to use these pre-determined cases but want to be flexible for other/new emoji-combinations. — genericFJS
– genericFJS, Commented Oct 21, 2017 at 15:32
btw: the mentioned article has the same problems if a flag-character does not come in pairs. but in case of other emoji (skin color variations) this is a possibility: 👨 as single or the same with a color variation: 👨🏽 — genericFJS
– genericFJS, Commented Oct 21, 2017 at 17:43
I think the easiest way is to check if the next token can be grabbed as an argument (i.e., not a space token or a brace) and then just grab it, have a token list of characters, check if the token is inside the list and act accordingly. — Manuel
– Manuel, Commented Oct 22, 2017 at 14:44
@Manuel How would that "grabbing" look like? How would the token list look like? Like the \str_case:nnF in the cited question? I would be very happy, if you could make a small example :-) — genericFJS
– genericFJS, Commented Oct 22, 2017 at 14:59

Manuel · Accepted Answer · 2017-10-22 18:20:00Z

I think this is what I would do. You create a list with all the options, check if you can grab safely the token, and in case you can, you grab it, check if it's in the list and if it's the list then use the dual option, otherwise use the single option and leave the token.

\documentclass{article} \usepackage{expl3} \usepackage{newunicodechar} \ExplSyntaxOn \str_new:N \g_fjs_duals_list_str \str_gset:Nn \g_fjs_duals_list_str { ➀➁ } \cs_new:Nn \fjs_uni:N { [ \int_to_Hex:n { `#1 } ] } \cs_new:Nn \fjs_uni:NN { [ \int_to_Hex:n { `#1 } ; \int_to_Hex:n { `#2 } ] } \cs_new_protected:Nn \fjs_checkdual:N { \peek_N_type:TF { \fjs_checkdual_grab:NN #1 } { \fjs_uni:N #1 } } \cs_new_protected:Nn \fjs_checkdual_grab:NN { \tl_if_in:NoTF \g_fjs_duals_list_str { \token_to_str:N #2 } { \fjs_uni:NN #1 #2 } { \fjs_uni:N #1 #2 } } \cs_generate_variant:Nn \tl_if_in:NnTF { No } \newunicodechar{➀}{ \fjs_checkdual:N ➀ } \newunicodechar{➁}{ \fjs_checkdual:N ➁ } \ExplSyntaxOff \begin{document} ➁➀ ➁➁ ➀➁ ➀➀ \end{document}

I don't know what would be the ideal approach, because here I'm mixing str functions with tl, but it's needed (at least in this current approach) because otherwise the symbols are made active and have different catcode so they are not in the original list.

This could be a nice opportunity to use \fjs_uni:N and \fjs_uni:NN and let the argument signature differentiate between both functions. — Manuel
– Manuel, Commented Oct 22, 2017 at 17:17
Thank you for your answer. I have tried your code but sadly, the token can not be found in the token list. Have you tested your snippet? I replaced my header-code with your snippet (and modified \newunicodechar{➀}{\fjs_uni:N ➀} \newunicodechar{➁}{\fjs_checkdual:N ➁} accordingly. But in the case ➁➀ I get "27802781" and not "[2708;2781]". — genericFJS
– genericFJS, Commented Oct 22, 2017 at 17:58
It doesn't work, because they are active. I added a solution, it now works, but I don't know if this is the “ideal approach”, so you might ask Joseph Wright or some other demigods of the subject. — Manuel
– Manuel, Commented Oct 22, 2017 at 18:18
And still, there might be some issues with the activeness of the characters, in case you do something different with \fjs_uni: commands. — Manuel
– Manuel, Commented Oct 22, 2017 at 18:27
Thank you for your solution. It looks very good. I would be interested in your opinion regarding another solution to this problem I found (which I based on another question you answered): tex.stackexchange.com/a/397537/136226 — genericFJS
– genericFJS, Commented Oct 22, 2017 at 19:49

Stack Exchange Network

Peek ahead and process characters

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

Peek ahead and process characters

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions