I'm developing an application that uses pdflatex with a dynamically generated .tex file to output pdf reports.
In this .tex file, I'm embedding UTF-8 strings from a database, which can contain absolutely any possible UTF-8 character, since these come from OCR'ing some images and other pdfs.
What I'm currently doing to escape and make these strings safe is to replace some characters that I've found to be problematic, with the following ruby code:
def latex_escape replacements = { "\\" => "$\\backslash$", "^" => "\\\\^", "$" => "\\\\$ ", "%" => "\\%", "&" => "\\\\&", "_" => "\\\\_", "~" => "*~*", "#" => "\\\\#", "{" => "$\\\\{$", "}" => "$\\\\}$", " - " => " --- ", " :" => '\\@:', /"([^"]+)"/ => '`\1\'', "..." => "\\ldots", "°" => '${^\\circ}$', /[\r\n]+/ => "\n\n" } new_str = self.dup replacements.each { |k,v| new_str.gsub!(k,v) } new_str end This works, but does not cover all possible cases.
The .tex files are being generated with the \usepackage[utf8x]{inputenc} heading.
Is there an easier way to make these strings safe for .tex file embedding, other than mapping by hand and replacing all possible problematic characters?