1

My goal is to convert this document (https://galileo.phys.virginia.edu/classes/252/lorentztrans.html) which contains math into word document with well formatted equations.

Why only Microsoft Word you ask? I am teaching myself physics from this lecture notes. I make all my notes in onenote (on my Ipad with handwritten equations and hand-drawn diagrams using ipencil). The thing is, OneNote has same equation system as the Microsoft Word. If its converted into Word, then it is converted into OneNote.

I have tried all possibilities I could by Googling. I tried the following methods without success.

Method 1: Copy pasting MathML into MS Word. It is working for some simple equations, I found elsewhere. But strangely, it isn't working for any equations from this website. I think there is something strange about MathML of this website.

Method 2: Converting from HTML to docx using pandoc. I saved the html (only) of this page. Then used pandoc -s input.html -o output.docx. It skipped all the equations.

Method 3: Copypasting directly into MS Word and Apache OpenOffice Write.

I don't mind converting first into intermediate format and then converting it into Word.

NOTE: I am looking for an automatic solution because I need to do it for hundreds of pages. The author has written his lecture notes on various in this format.

1
  • 1
    Have you tried pandoc -s input.html --mathml -o output.docx? Or pandoc -f html --mathml -o output.docx input.html? Commented Mar 21, 2021 at 13:56

1 Answer 1

2

The math tags in the document look like this:

<math xmlns='//www.w3.org/1998/Math/MathML' style='background-color:#'> <semantics> <mi>v</mi> </semantics> </math> 

The XML namespace is given as a protocol-independent URI, i.e., it starts with //. This is not correct, it must use the http: protocol, like so: http://www.w3.org/1998/Math/MathML.

Pandoc gets confused by this as well, since it isn't valid MathML, and so doesn't recognize it as an equation. It works well if one adds the http: prefix. The solution is therefore to do a search-and-replace in the input HTML document, fixing the xmlns attribute, and then pass the fixed result to pandoc.

1
  • 1
    Thanks a ton! I totally missed that. I added 'http:' and executed pandoc -s input.html --mathml -o output.docx. It worked brilliantly. Thanks a lot once again :) Commented Mar 22, 2021 at 6:05

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.