-1

We have a requirement in which we need to upload two Microsoft word documents and then, compare and merge it into a document with the non-repeating data in the document in Javascript.

Is it possible to achieve this?

2
  • 1
    Do you mean that you want to merge the 2 documents using Javascript? of do you mean that you want to upload 2 documents to the server and process it on the server and return a new word document as outcome? If it's the latter one then it can be done. The first one is impossible i think. Commented Feb 27, 2018 at 8:58
  • The requirement here is to merge it all in JavaScript alone. Commented Feb 27, 2018 at 17:21

1 Answer 1

-3

Doing this entirely in the browser (client-side) is challenging because .docx files are complex zipped XML structures. Merging them while preserving all formatting requires a deep understanding of the OpenXML standard.

However, if your goal is primarily comparison (finding the differences) and then generating a result, it is much more feasible to split the problem into two parts: Extraction and Comparison.

I recently had to build a similar feature for a document comparison tool. Here is the approach that worked for me:

  1. Parse the Docx: Use a library like mammoth.js to convert the raw .docx ArrayBuffer into HTML or text. This is easier than parsing XML directly.

  2. Diff the Text: Use a library like diff (by kpdecker) to compute the changes between the two extracted texts.

  3. Display/Merge: Show the differences to the user.

Here is a simplified version of the code I used to extract text (handling tables specifically, which is often a pain point):

import mammoth from 'mammoth'; async function extractTextFromDocx(arrayBuffer) { // Convert to HTML first to preserve table structure const result = await mammoth.convertToHtml({ arrayBuffer }); const html = result.value; const tempDiv = document.createElement('div'); tempDiv.innerHTML = html; // Helper to process tables into text (e.g. Markdown style) tempDiv.querySelectorAll('table').forEach(table => { const rows = Array.from(table.querySelectorAll('tr')); const tableText = rows.map(row => { const cells = Array.from(row.querySelectorAll('td, th')); return '| ' + cells.map(c => c.textContent.trim()).join(' | ') + ' |'; }).join('\n'); // Replace table with text representation const textNode = document.createTextNode(`\n${tableText}\n`); table.parentNode.replaceChild(textNode, table); }); return tempDiv.textContent || ""; } 

Once you have the text, you can run a diff algorithm.

For the merging part (creating a new Word doc from the result), you would likely need to use docxtemplater or docx (JS library) to reconstruct a document from your diff data, but that is significantly more complex than just comparing.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.