16

I'm writing a unit test and need to compare a result file to a golden file. What's the easiest way to do so?

So far I have (for Linux environment):

int result = system("diff file1 file2"); 

They are different if result != 0.

4
  • 2
    That sounds like a plausible way to compare two files, yes. Commented Feb 27, 2013 at 17:42
  • There are various standard options of diff to suppress output. Use them, if you call it through system. Commented Feb 27, 2013 at 17:45
  • 2
    You can use cmp instead of diff. Commented Feb 27, 2013 at 17:50
  • 1
    Absolute fastest, if these are big files, may be check that they are the same length, then mmap() them and call memcmp(). Commented Feb 27, 2013 at 17:53

5 Answers 5

24

If you want a pure c++ solution, I would do something like this

#include <algorithm> #include <iterator> #include <string> #include <fstream> template<typename InputIterator1, typename InputIterator2> bool range_equal(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, InputIterator2 last2) { while(first1 != last1 && first2 != last2) { if(*first1 != *first2) return false; ++first1; ++first2; } return (first1 == last1) && (first2 == last2); } bool compare_files(const std::string& filename1, const std::string& filename2) { std::ifstream file1(filename1); std::ifstream file2(filename2); std::istreambuf_iterator<char> begin1(file1); std::istreambuf_iterator<char> begin2(file2); std::istreambuf_iterator<char> end; return range_equal(begin1, end, begin2, end); } 

It avoids reading the entire file into memory, and stops as soon as the files are different (or at end of file). The range_equal because std::equal doesn't take a pair of iterators for the second range, and isn't safe if the second range is shorter.

Sign up to request clarification or add additional context in comments.

3 Comments

Can you explain why as end you use an unitialized iterator? OP mentions binary files, would it make sense using std::ios::binary? P.S.: I would note this is not the fastest, as it checks one byte at the time also for big files. But as a simple solution seems excellent.
@Antonio An uninitialized std::istreambuf_iterator is the end iterator. For performance, the code assumes that your stream is doing the buffering (for example, in many implementations of of std::ifstream, the underlying stream is buffered).
How about md5 compare? Does the computing of md5 also read the entire file? So it doesn't faster than directly compare byte-chunk of the files?
7

Developing from DaveS's answer, and as first thing checking file size:

#include <fstream> #include <algorithm> bool compare_files(const std::string& filename1, const std::string& filename2) { std::ifstream file1(filename1, std::ifstream::ate | std::ifstream::binary); //open file at the end std::ifstream file2(filename2, std::ifstream::ate | std::ifstream::binary); //open file at the end const std::ifstream::pos_type fileSize = file1.tellg(); if (fileSize != file2.tellg()) { return false; //different file size } file1.seekg(0); //rewind file2.seekg(0); //rewind std::istreambuf_iterator<char> begin1(file1); std::istreambuf_iterator<char> begin2(file2); return std::equal(begin1,std::istreambuf_iterator<char>(),begin2); //Second argument is end-of-range iterator } 

(I wonder if before rewinding, fileSize could be used to create a more efficient end of stream iterator, which, by knowing the stream length, would allow std::equal to process more bytes at the time).

2 Comments

Why not simply if (file2.tellg() != file2.tellg())? How storing first in the fileSize helps?
@iammilind Thanks for pointing out that, there was an error in the post script after the code, now it should be clear why I wanted to underline that being the size of the file.
2

one way to prevent reading both files is to pre-compute the golden file into a hash, eg a md5. Then you only have to check the test file. Note, this may be slower than just reading both files!

Alternatively, layer your checking - look at the file sizes, if they're different then the files are different and you can avoid a lengthy read-and-compare operation.

Comments

1

This should work:

#include <string> #include <fstream> #include <streambuf> #include <iterator> bool equal_files(const std::string& a, const std::string& b) { std::ifstream stream{a}; std::string file1{std::istreambuf_iterator<char>(stream), std::istreambuf_iterator<char>()}; stream = std::ifstream{b}; std::string file2{std::istreambuf_iterator<char>(stream), std::istreambuf_iterator<char>()}; return file1 == file2; } 

I suspect this to be not as fast as diff, but it avoids calling system. It should be sufficient for a test-case, though.

1 Comment

You probably want to include iterator.
0

Might be an overkill but you could build a table of hashes SHA-256 using boost/bimap and boost/scope_exit.

Here is a video how to do this by Stephan T Lavavej (starts at 8.15): http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Advanced-STL/C9-Lectures-Stephan-T-Lavavej-Advanced-STL-5-of-n

For more info about algorithm: http://en.wikipedia.org/wiki/SHA-2

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.