3

I'm developing an application that uses pdflatex with a dynamically generated .tex file to output pdf reports.

In this .tex file, I'm embedding UTF-8 strings from a database, which can contain absolutely any possible UTF-8 character, since these come from OCR'ing some images and other pdfs.

What I'm currently doing to escape and make these strings safe is to replace some characters that I've found to be problematic, with the following ruby code:

 def latex_escape replacements = { "\\" => "$\\backslash$", "^" => "\\\\^", "$" => "\\\\$ ", "%" => "\\%", "&" => "\\\\&", "_" => "\\\\_", "~" => "*~*", "#" => "\\\\#", "{" => "$\\\\{$", "}" => "$\\\\}$", " - " => " --- ", " :" => '\\@:', /"([^"]+)"/ => '`\1\'', "..." => "\\ldots", "°" => '${^\\circ}$', /[\r\n]+/ => "\n\n" } new_str = self.dup replacements.each { |k,v| new_str.gsub!(k,v) } new_str end 

This works, but does not cover all possible cases.

The .tex files are being generated with the \usepackage[utf8x]{inputenc} heading.

Is there an easier way to make these strings safe for .tex file embedding, other than mapping by hand and replacing all possible problematic characters?

5
  • seems related to tex.stackexchange.com/questions/393225/… Commented Sep 26, 2017 at 20:29
  • I don't want to strip the characters, they need to be in the final pdf. In that answer, they seem to be defining every possible odd character, which I'm already doing. What I'm looking for is a generic, anyone-can-use way, to handle unpredictable UTF-8 input. Commented Sep 26, 2017 at 20:41
  • 6
    If you want to use arbitrary Unicode characters with LaTeX, I strongly suggest using one of the engines with "proper" Unicode support, i.e. LuaTeX or XeTeX. Not only will that make handling of uncommon characters easier, but you can also choose an OpenType font that includes the Unicode characters that you need. Commented Sep 26, 2017 at 20:56
  • I've ended up using XeTex. It does exactly what I was expecting. This should be the answer. Commented Sep 26, 2017 at 22:09
  • Feel free to (and in fact, please) post it as an answer, after giving a few hours for @diabonas to post the answer first. Commented Sep 27, 2017 at 3:28

1 Answer 1

4

If you want to use arbitrary Unicode characters with LaTeX, I strongly suggest using one of the engines with "proper" Unicode support, i.e. LuaTeX or XeTeX. Not only will that make handling of uncommon characters easier, but you can also choose an OpenType font that includes the Unicode characters that you need.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.