6

The following markdown code containing html table tags doesn't render properly when converted to latex format using Pandoc.

file.md:

<table> <tr> <th>Alfreds Futterkiste</th> <th>Maria Anders</th> <th>Germany</th> </tr> <tr> <td>Centro comercial Moctezuma</td> <td>Francisco Chang</td> <td>Mexico</td> </tr> </table> | Alfreds Futterkiste | Maria Anders | Germany | |---------------------|--------------|---------| | Centro comercial Moctezuma | Francisco Chang | Mexico | 

pandoc file.md -s -t latex

results in (output snipped to relevant portion):

Alfreds Futterkiste Maria Anders Germany Centro comercial Moctezuma Francisco Chang Mexico \begin{longtable}[]{@{}lll@{}} \toprule Alfreds Futterkiste & Maria Anders & Germany \\ \midrule \endhead Centro comercial Moctezuma & Francisco Chang & Mexico \\ \bottomrule \end{longtable} 

adding the --verbose option to pandoc shows it is ignoring the html tags

[INFO] Not rendering RawBlock (Format "html") "<table>" [INFO] Not rendering RawBlock (Format "html") "<tr>" [INFO] Not rendering RawBlock (Format "html") "<td>" [INFO] Not rendering RawBlock (Format "html") "</td>" [INFO] Not rendering RawBlock (Format "html") "<td>" [INFO] Not rendering RawBlock (Format "html") "</td>" [INFO] Not rendering RawBlock (Format "html") "<td>" [INFO] Not rendering RawBlock (Format "html") "</td>" [INFO] Not rendering RawBlock (Format "html") "</tr>" [INFO] Not rendering RawBlock (Format "html") "<tr>" [INFO] Not rendering RawBlock (Format "html") "<td>" [INFO] Not rendering RawBlock (Format "html") "</td>" [INFO] Not rendering RawBlock (Format "html") "<td>" [INFO] Not rendering RawBlock (Format "html") "</td>" [INFO] Not rendering RawBlock (Format "html") "<td>" [INFO] Not rendering RawBlock (Format "html") "</td>" [INFO] Not rendering RawBlock (Format "html") "</tr>" [INFO] Not rendering RawBlock (Format "html") "</table>" 

How can I get it to process these as html tables within markdown like the pipe tables?

I do not wish to use pipe tables as they are harder for tech writers to edit/use.

3
  • Use pandoc to first convert from Markdown to HTML, and then from HTML to tex? (I think the issue is that you are asking pandoc to process Markdown and so it ignores raw HTML.) Commented Jan 24, 2022 at 2:57
  • Converting to html first fixes the tables, but seems to cause problems with title tags: I get: "[WARNING] This document format requires a nonempty <title> element." It automatically adds title tags which I don't want because I already have a markdown heading. Besides, aren't html tables allowed in standard markdown? See this [website]daringfireball.net/projects/markdown/syntax regarding 'INLINE HTML' Commented Jan 24, 2022 at 3:30
  • Maybe the list-table-filter is what you are looking for, given that you are writing the tables, not generating them. Commented Jan 24, 2022 at 9:26

2 Answers 2

9

Pandoc's default behavior is to leave the raw HTML content alone. You can force it to be parsed, e.g. by using a Lua filter. Place the following code in a file parse-html.lua:

function RawBlock (raw) return raw.format:match 'html' and pandoc.read(raw.text, 'html').blocks or raw end 

Then call pandoc with

pandoc --lua-filter=parse-html.lua --from=markdown-markdown_in_html_blocks ... 

Your tables should now show up as proper LaTeX tables.

2
  • I don't suppose you have a handy filter for having both tables and using markdown inside those tables? I'm just trying to output a PDF and I'm running into the same problem but I would still like to be able to use markdown inside the table. What's odd to me is that the manual seems to say that should be the default, but I end up with no tables in the output (just the markdown inside them). Commented Aug 23, 2023 at 16:01
  • I'm trying to use this solution to force pandoc to convert <sub> and <sup> tags correctly when converting markdown -> pdf, but its not working. Is there something I should be doing differently? I'm pretty new to pandoc and lua. This is the command I'm running: pandoc <filename>.md --lua-filter=parse-html.lua --pdfengine=xelatex --from=markdown-markdown_in_html_blocks -o test.pdf Commented Sep 11, 2023 at 22:11
1

starting from tarleb's answer, you might be able to handle markdown in the html block with the filter:

function RawBlock (raw) if raw.format:match 'html' then blocks = pandoc.read(raw.text, 'html').blocks for i = 1, #blocks do blocks[i] = pandoc.walk_block(blocks[i], { SoftBreak = function(el) return pandoc.Str("\n") end, Plain = function(el) return pandoc.read(pandoc.utils.stringify(el), 'markdown').blocks end } ) end return blocks end return raw end 

the same way, markdown_in_html_blocks needs to be disabled:

pandoc --lua-filter=parse-html.lua --from=markdown-markdown_in_html_blocks ... 
2
  • Welcome to TeX.SE! Commented Jan 9, 2024 at 19:45
  • As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. Commented Jan 9, 2024 at 20:10

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.