Dynamically generated .tex file with unpredictable UTF-8 strings

Question

I'm developing an application that uses pdflatex with a dynamically generated .tex file to output pdf reports.

In this .tex file, I'm embedding UTF-8 strings from a database, which can contain absolutely any possible UTF-8 character, since these come from OCR'ing some images and other pdfs.

What I'm currently doing to escape and make these strings safe is to replace some characters that I've found to be problematic, with the following ruby code:

 def latex_escape replacements = { "\\" => "$\\backslash$", "^" => "\\\\^", "$" => "\\\\$ ", "%" => "\\%", "&" => "\\\\&", "_" => "\\\\_", "~" => "*~*", "#" => "\\\\#", "{" => "$\\\\{$", "}" => "$\\\\}$", " - " => " --- ", " :" => '\\@:', /"([^"]+)"/ => '`\1\'', "..." => "\\ldots", "°" => '${^\\circ}$', /[\r\n]+/ => "\n\n" } new_str = self.dup replacements.each { |k,v| new_str.gsub!(k,v) } new_str end

This works, but does not cover all possible cases.

The .tex files are being generated with the \usepackage[utf8x]{inputenc} heading.

Is there an easier way to make these strings safe for .tex file embedding, other than mapping by hand and replacing all possible problematic characters?

I don't want to strip the characters, they need to be in the final pdf. In that answer, they seem to be defining every possible odd character, which I'm already doing. What I'm looking for is a generic, anyone-can-use way, to handle unpredictable UTF-8 input. — lairtonlelis
– lairtonlelis, Commented Sep 26, 2017 at 20:41
If you want to use arbitrary Unicode characters with LaTeX, I strongly suggest using one of the engines with "proper" Unicode support, i.e. LuaTeX or XeTeX. Not only will that make handling of uncommon characters easier, but you can also choose an OpenType font that includes the Unicode characters that you need. — diabonas
– diabonas, Commented Sep 26, 2017 at 20:56
I've ended up using XeTex. It does exactly what I was expecting. This should be the answer. — lairtonlelis
– lairtonlelis, Commented Sep 26, 2017 at 22:09
Feel free to (and in fact, please) post it as an answer, after giving a few hours for @diabonas to post the answer first. — ShreevatsaR
– ShreevatsaR, Commented Sep 27, 2017 at 3:28

diabonas · Accepted Answer · 2017-09-27 06:43:21Z

If you want to use arbitrary Unicode characters with LaTeX, I strongly suggest using one of the engines with "proper" Unicode support, i.e. LuaTeX or XeTeX. Not only will that make handling of uncommon characters easier, but you can also choose an OpenType font that includes the Unicode characters that you need.

Stack Exchange Network

Dynamically generated .tex file with unpredictable UTF-8 strings

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

Dynamically generated .tex file with unpredictable UTF-8 strings

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions