The context
I'm working on a homework assignment for my Databases courses. I want to typeset the β¨ character (JOIN operation from relational algebra) but at the same time I would like to have the following behavior when reading the PDF: when selecting and copying the character, it must be copied as it is.
I've done my research and so far I've gotten the following
lualatex main \documentclass{article} \usepackage{fontspec} \usepackage{unicode-math} \setmathfont{XITS Math} \begin{document} $A β¨ B$ \end{document} When copying the β¨ character in all the PDF viewers I've tried (zathura, okular and firefox), the character is copied as it is. I thought I had accomplished my goal. However, a new problem arises.
The problem
The problem is that ASCII characters are not copied as ASCII characters in some PDF viewers. Okular is the only PDF viewer that copy A and B as ASCII characters (see below).
Using Firefox, the line is copied as
firefox --version Mozilla Firefox 88.0.1 π΄β¨π΅ Using Okular, the line is copied as
okular --version okular 21.04.0 Aβ¨B Using Zathura, the line is copied as
zathura --version zathura 0.4.7 girara 0.3.5 (runtime: 0.3.5) (plugin) djvu (0.2.9) (/usr/lib/zathura/libdjvu.so) (plugin) pdf-mupdf (0.3.6) (/usr/lib/zathura/libpdf-mupdf.so) π΄ β¨ π΅ The question
Is there any way to create a document that meet the following conditions
- Typeset the JOIN character such that when copying it in a PDF viewer, the
β¨is inserted into the clipboard. - Typeset the ASCII characters such that when copying them in a PDF viewer, ASCII characters are inserted into the clipboard.
In simpler words: Is there any package that would ensure that: When copying the selected characters from the generated PDF, the characters, that were used to typeset the ones from the PDF, are copied.
Here, the definition of a PDF viewer is any of the following: okular, zathura, built-in firefox PDF viewer. I'm just making clear since I know that there are many bad PDF viewers out there that would have different behaviors in the scenario presented here.
Additional context
Behavior of pdfgrep and pdftotext
pdfgrep and pdftotext also interpret the ASCII characters of the PDF as non-ASCII characters.
pdfgrep '' main.pdf π΄β¨π΅ 1 pdftotext main.pdf cat main.txt π΄β¨π΅ 1 Trying every font in my TeXLive distribution
I thought that this problem was caused because of the specified font in \setmathfont. For this reason, I created the following script which generates a PDF for each OTF font in the default TeXLive installation.
\documentclass{article} \usepackage{fontspec} \usepackage{unicode-math} \setmathfont{...} \begin{document} foo $A β¨ B$ bar \end{document} found="$(locate "/usr/local/*.otf")" total="$(echo "$found" | wc -l)" counter=1 for file in $found do echo "Trying $file ($counter/$total)" echo "Trying $file ($counter/$total)" >> lualatex.log font=$(basename "$file") sed -i "s/\.\.\./$font/g" main.tex lualatex -interaction nonstopmode main 2>&1 >> lualatex.log exit_code=$? sed -i "s/$font/\.\.\./g" main.tex rm -f main.aux if [ "$exit_code" = 0 ] then mv main.pdf "$font.pdf" pdftotext "$font.pdf" fi counter=$((counter + 1)) done The script took more than 30 minutes to finish and this is what I found. Of the 1702 *.otf fonts, the following fonts are the only ones that can typeset the β¨ character.
grep -l -R --include="*.txt" 'β¨' $my__experiments | sort /home/beep1560/e/Asana-Math.otf.txt /home/beep1560/e/Erewhon-Math.otf.txt /home/beep1560/e/GFSNeohellenicMath.otf.txt /home/beep1560/e/KpMath-Bold.otf.txt /home/beep1560/e/KpMath-Light.otf.txt /home/beep1560/e/KpMath-Regular.otf.txt /home/beep1560/e/KpMath-Sans.otf.txt /home/beep1560/e/KpMath-Semibold.otf.txt /home/beep1560/e/NewCMMath-Book.otf.txt /home/beep1560/e/NewCMMath-Regular.otf.txt /home/beep1560/e/STIX2Math.otf.txt /home/beep1560/e/STIX-Regular.otf.txt /home/beep1560/e/XITSMath-Regular.otf.txt Apparently, there is no font that typeset ASCII characters as ASCI characters because the following search yields no result
grep -l -R --include="*.txt" 'A.*β¨' $my__experiments | wc -l 0 So, I think that this is enough to think that this can be solved by using another font.