9

I need to compare two office documents, in this case two word documents and provide a difference, which is somewhat similar to what is show in SVN. Not to that extent, but at least be able to highlight the differences.

I tried using the office COM dll and got this far..

object fileToOpen = (object)@"D:\doc1.docx"; string fileToCompare = @"D:\doc2.docx"; WRD.Application WA = new WRD.Application(); Document wordDoc = null; wordDoc = WA.Documents.Open(ref fileToOpen, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing); wordDoc.Compare(fileToCompare, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing); 

Any tips on how to proceed further? This will be a web application having a lot of hits. Is using the office com object the right way to go, or are there any other things I can look at?

5
  • Just of interest, how SVN show difefrence between two binary files? (AFAIK docx is a zip archive format) Commented Nov 23, 2011 at 15:39
  • select the two files in question, usually on the same folder in the client side. You have tortoiseSVN installed. You right click and go to the TortoiseSVN menu and select Diff... Commented Nov 23, 2011 at 15:48
  • Yep I know how to do it but which difference you will see, does it makes any sense? Commented Nov 23, 2011 at 16:02
  • I'm open to a better way of comparing the two documents in a more sensible manner. Can you suggest one? Commented Nov 24, 2011 at 6:12
  • Also see stackoverflow.com/questions/12321490/… Commented Nov 25, 2013 at 6:51

7 Answers 7

4

You should use Document class to compare files and open in a Word document the result.

using OfficeWord = Microsoft.Office.Interop.Word; object fileToOpen = (object)@"D:\doc1.docx"; string fileToCompare = @"D:\doc2.docx"; var app = Global.OfficeFile.WordApp; object readOnly = false; object AddToRecent = false; object Visible = false; OfficeWord.Document docZero = app.Documents.Open(fileToOpen, ref missing, ref readOnly, ref AddToRecent, Visible: ref Visible); docZero.Final = false; docZero.TrackRevisions = true; docZero.ShowRevisions = true; docZero.PrintRevisions = true; //the OfficeWord.WdCompareTargetNew defines a new file, you can change this valid value to change how word will open the document docZero.Compare(fileToCompare, missing, OfficeWord.WdCompareTarget.wdCompareTargetNew, true, false, false, false, false); 
Sign up to request clarification or add additional context in comments.

4 Comments

Hi @anderson-rissardi! What does the Compare method actually do? Does it open some file somewhere? Because I'm not seeing anything when I run this in my unit test. How am I supposed to get the result since the method returns void?
Hi @ditoslav. It opens a new file. It is the 'Copare' button inside the Word. Open the MS Word -> Tab 'Review' -> Button 'Compare'. Is the same functionality, a new document it is generate. You must to do a save of this new document.
Where did Global.OfficeFile.WordApp go? Using VS 2019 it is apparently no longer part of Office.Interop.Word.
@Slagmoth Global.OfficeFile.WordApp its an internal variable. You should use the Microsoft.Office.Interop.Word.Application of your app
3

So my requirements were that I had to use a .Net lib and I wanted to avoid working on actual files but work with streams.

ZipArchive is in System.IO.Compressed

What I did and it worked out quite nicely was using the ZipArchive from .Net and comparing contents while skipping the .rels file because it seems the it is randomly generated on each file creation. Here's my snippet:

 private static bool AreWordFilesSame(byte[] wordA, byte[] wordB) { using (var streamA = new MemoryStream(wordA)) using (var streamB = new MemoryStream(wordB)) using (var zipA = new ZipArchive(streamA)) using (var zipB = new ZipArchive(streamB)) { streamA.Seek(0, SeekOrigin.Begin); streamB.Seek(0, SeekOrigin.Begin); for(int i = 0; i < zipA.Entries.Count; ++i) { Assert.AreEqual(zipA.Entries[i].Name, zipB.Entries[i].Name); if (zipA.Entries[i].Name.EndsWith(".rels")) //These are some weird word files with autogenerated hashes { continue; } var streamFromA = zipA.Entries[i].Open(); var streamFromB = zipB.Entries[i].Open(); using (var readerA = new StreamReader(streamFromA)) using (var readerB = new StreamReader(streamFromB)) { var bytesA = readerA.ReadToEnd(); var bytesB = readerB.ReadToEnd(); if (bytesA != bytesB || bytesA.Length == 0) { return false; } } } return true; } } 

Comments

2

This function lets you compare two documents as well as two versions of a document in C#.

public async Task<object> compare() { Word.Application wordApp = new Word.Application(); wordApp.Visible = false; object wordTrue = (object)true; object wordFalse = (object)false; object fileToOpen = @"Give your file path here"; object missing = Type.Missing; Word.Document doc1 = wordApp.Documents.Open(ref fileToOpen, ref missing, ref wordFalse, ref wordFalse, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref wordTrue, ref missing, ref missing, ref missing, ref missing); object fileToOpen1 = @"Give your file path here"; Word.Document doc2 = wordApp.Documents.Open(ref fileToOpen1, ref missing, ref wordFalse, ref wordFalse, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing); Word.Document doc = wordApp.CompareDocuments(doc1, doc2, Word.WdCompareDestination.wdCompareDestinationNew, Word.WdGranularity.wdGranularityWordLevel, true, true, true, true, true, true, true, true, true, true, "", true); doc1.Close(ref missing, ref missing, ref missing); doc2.Close(ref missing, ref missing, ref missing); // This Hides both original and revised documents you can change it according to your use case. wordApp.ActiveWindow.ShowSourceDocuments = WdShowSourceDocuments.wdShowSourceDocumentsNone; wordApp.Visible = true; doc.Activate(); return Ok("Compared Successfully"); } 

Comments

1

I agree w/ Joseph about diff'ing the string. I would also recommend a purpose-built diffing engine (several found here: Any decent text diff/merge engine for .NET?) which can help you avoid some of the normal pitfalls in diffing.

Comments

1

For a solution on a server, or running without an installation of Word and using the COM tools, you could use the WmlComparer component of XmlPowerTools.

The documentation is a bit limited, but here's an example usage:

var expected = File.ReadAllBytes(@"c:\expected.docx"); var actual = File.ReadAllBytes(@"c:\result.docx"); var expectedresult = new WmlDocument("expected.docx", expected); var actualDocument = new WmlDocument("result.docx", actual); var comparisonSettings = new WmlComparerSettings(); var comparisonResults = WmlComparer.Compare(expectedresult, actualDocument, comparisonSettings); var revisions = WmlComparer.GetRevisions(comparisonResults, comparisonSettings); 

which will show you the differences between the two documents.

1 Comment

do you know whether XmlPowerTools can generate a resulting document with the differences as "tracked changes"?
0

You should really be extracting the doc into a string and diff'ing that.

You only care about the textual changes and not the formatting right?

1 Comment

everything, even if the image is different. But I am going to try and relax that requirement.
-1

To do a comparison between Word documents, you need

  1. A library to manipulate Word document, e.g. read paragraphs, text, tables etc from a Word file. You can try Office Interop, OpenXML or Aspose.Words for .NET.
  2. An algorithm/library to do the actual comparison, on the text retrieved from both Word documents. You can write your own or use a library like DiffMatchPatch or similar.

This question is old, now there are more solutions like GroupDocs Compare available.

Document Comparison by Aspose.Words for .NET is an open source showcase project that uses Aspose.Words and DiffMatchPatch for comparison.

I work at Aspose as a Developer Evangelist.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.