0

The context

I'm working on a homework assignment for my Databases courses. I want to typeset the ⨝ character (JOIN operation from relational algebra) but at the same time I would like to have the following behavior when reading the PDF: when selecting and copying the character, it must be copied as it is.

I've done my research and so far I've gotten the following

lualatex main 
\documentclass{article} \usepackage{fontspec} \usepackage{unicode-math} \setmathfont{XITS Math} \begin{document} $A ⨝ B$ \end{document} 

When copying the ⨝ character in all the PDF viewers I've tried (zathura, okular and firefox), the character is copied as it is. I thought I had accomplished my goal. However, a new problem arises.

The problem

The problem is that ASCII characters are not copied as ASCII characters in some PDF viewers. Okular is the only PDF viewer that copy A and B as ASCII characters (see below).

Using Firefox, the line is copied as

firefox --version 
Mozilla Firefox 88.0.1 
𝐴⨝𝐡 

Using Okular, the line is copied as

okular --version 
okular 21.04.0 
A⨝B 

Using Zathura, the line is copied as

zathura --version 
zathura 0.4.7 girara 0.3.5 (runtime: 0.3.5) (plugin) djvu (0.2.9) (/usr/lib/zathura/libdjvu.so) (plugin) pdf-mupdf (0.3.6) (/usr/lib/zathura/libpdf-mupdf.so) 
𝐴 ⨝ 𝐡 

The question

Is there any way to create a document that meet the following conditions

  • Typeset the JOIN character such that when copying it in a PDF viewer, the ⨝ is inserted into the clipboard.
  • Typeset the ASCII characters such that when copying them in a PDF viewer, ASCII characters are inserted into the clipboard.

In simpler words: Is there any package that would ensure that: When copying the selected characters from the generated PDF, the characters, that were used to typeset the ones from the PDF, are copied.

Here, the definition of a PDF viewer is any of the following: okular, zathura, built-in firefox PDF viewer. I'm just making clear since I know that there are many bad PDF viewers out there that would have different behaviors in the scenario presented here.

Additional context

Behavior of pdfgrep and pdftotext

pdfgrep and pdftotext also interpret the ASCII characters of the PDF as non-ASCII characters.

pdfgrep '' main.pdf 
𝐴⨝𝐡 1 
pdftotext main.pdf cat main.txt 
𝐴⨝𝐡 1 

Trying every font in my TeXLive distribution

I thought that this problem was caused because of the specified font in \setmathfont. For this reason, I created the following script which generates a PDF for each OTF font in the default TeXLive installation.

\documentclass{article} \usepackage{fontspec} \usepackage{unicode-math} \setmathfont{...} \begin{document} foo $A ⨝ B$ bar \end{document} 
found="$(locate "/usr/local/*.otf")" total="$(echo "$found" | wc -l)" counter=1 for file in $found do echo "Trying $file ($counter/$total)" echo "Trying $file ($counter/$total)" >> lualatex.log font=$(basename "$file") sed -i "s/\.\.\./$font/g" main.tex lualatex -interaction nonstopmode main 2>&1 >> lualatex.log exit_code=$? sed -i "s/$font/\.\.\./g" main.tex rm -f main.aux if [ "$exit_code" = 0 ] then mv main.pdf "$font.pdf" pdftotext "$font.pdf" fi counter=$((counter + 1)) done 

The script took more than 30 minutes to finish and this is what I found. Of the 1702 *.otf fonts, the following fonts are the only ones that can typeset the ⨝ character.

grep -l -R --include="*.txt" '⨝' $my__experiments | sort 
/home/beep1560/e/Asana-Math.otf.txt /home/beep1560/e/Erewhon-Math.otf.txt /home/beep1560/e/GFSNeohellenicMath.otf.txt /home/beep1560/e/KpMath-Bold.otf.txt /home/beep1560/e/KpMath-Light.otf.txt /home/beep1560/e/KpMath-Regular.otf.txt /home/beep1560/e/KpMath-Sans.otf.txt /home/beep1560/e/KpMath-Semibold.otf.txt /home/beep1560/e/NewCMMath-Book.otf.txt /home/beep1560/e/NewCMMath-Regular.otf.txt /home/beep1560/e/STIX2Math.otf.txt /home/beep1560/e/STIX-Regular.otf.txt /home/beep1560/e/XITSMath-Regular.otf.txt 

Apparently, there is no font that typeset ASCII characters as ASCI characters because the following search yields no result

grep -l -R --include="*.txt" 'A.*⨝' $my__experiments | wc -l 
0 

So, I think that this is enough to think that this can be solved by using another font.

2
  • 2
    I think the issue here is that what looks like ASCII characters are in fact not. When you typeset with lualatex, your typical β€˜A’ in a formula is rendered as 𝐴, which is in fact the character U+1D434 MATHEMATICAL ITALIC CAPITAL A. Clearly, this is intentionally so. Whether there is a way to get what you want, I do not know. Here, more knowledgeable folks must pitch in. Commented May 15, 2021 at 6:52
  • @HaraldHanche-Olsen thanks for answering. I've already found a solution (see the first answer.) Commented May 15, 2021 at 6:58

1 Answer 1

1

You can accomplish what you are searching by specifying the following option: math-style.

\documentclass{article} \usepackage{fontspec} \usepackage{unicode-math} \setmathfont[math-style=upright]{XITS Math} \begin{document} $A ⨝ B$ \\ \end{document} 

I tested this solution in the same versions of the software that you mentioned (because we are the same person).

I knew this because I searched ASCII in the unicode-math official documentation. Next time, make sure that you look at the official documentation first (in this scenario, $ texdoc unicode-math) and search for keywords (in this scenario, ASCII).

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.