We have a requirement in which we need to upload two Microsoft word documents and then, compare and merge it into a document with the non-repeating data in the document in Javascript.
Is it possible to achieve this?
We have a requirement in which we need to upload two Microsoft word documents and then, compare and merge it into a document with the non-repeating data in the document in Javascript.
Is it possible to achieve this?
Doing this entirely in the browser (client-side) is challenging because .docx files are complex zipped XML structures. Merging them while preserving all formatting requires a deep understanding of the OpenXML standard.
However, if your goal is primarily comparison (finding the differences) and then generating a result, it is much more feasible to split the problem into two parts: Extraction and Comparison.
I recently had to build a similar feature for a document comparison tool. Here is the approach that worked for me:
Parse the Docx: Use a library like mammoth.js to convert the raw .docx ArrayBuffer into HTML or text. This is easier than parsing XML directly.
Diff the Text: Use a library like diff (by kpdecker) to compute the changes between the two extracted texts.
Display/Merge: Show the differences to the user.
Here is a simplified version of the code I used to extract text (handling tables specifically, which is often a pain point):
import mammoth from 'mammoth'; async function extractTextFromDocx(arrayBuffer) { // Convert to HTML first to preserve table structure const result = await mammoth.convertToHtml({ arrayBuffer }); const html = result.value; const tempDiv = document.createElement('div'); tempDiv.innerHTML = html; // Helper to process tables into text (e.g. Markdown style) tempDiv.querySelectorAll('table').forEach(table => { const rows = Array.from(table.querySelectorAll('tr')); const tableText = rows.map(row => { const cells = Array.from(row.querySelectorAll('td, th')); return '| ' + cells.map(c => c.textContent.trim()).join(' | ') + ' |'; }).join('\n'); // Replace table with text representation const textNode = document.createTextNode(`\n${tableText}\n`); table.parentNode.replaceChild(textNode, table); }); return tempDiv.textContent || ""; } Once you have the text, you can run a diff algorithm.
For the merging part (creating a new Word doc from the result), you would likely need to use docxtemplater or docx (JS library) to reconstruct a document from your diff data, but that is significantly more complex than just comparing.