17

I'm trying to write a function which compares the content of two files.

I want it to return 1 if files are the same, and 0 if different.

ch1 and ch2 works as a buffer, and I used fgets to get the content of my files.

I think there is something wrong with the eof pointer, but I'm not sure. FILE variables are given within the command line.

P.S. It works with small files with size under 64KB, but doesn't work with larger files (700MB movies for example, or 5MB of .mp3 files).

Any ideas, how to work it out?

int compareFile(FILE* file_compared, FILE* file_checked) { bool diff = 0; int N = 65536; char* b1 = (char*) calloc (1, N+1); char* b2 = (char*) calloc (1, N+1); size_t s1, s2; do { s1 = fread(b1, 1, N, file_compared); s2 = fread(b2, 1, N, file_checked); if (s1 != s2 || memcmp(b1, b2, s1)) { diff = 1; break; } } while (!feof(file_compared) || !feof(file_checked)); free(b1); free(b2); if (diff) return 0; else return 1; } 

EDIT: I've improved this function with the inclusion of your answers. But it's only comparing first buffer only -> but with an exception -> I figured out that it stops reading the file until it reaches 1A character (attached file). How can we make it work?

EDIT2: Task solved (working code attached). Thanks to everyone for the help!

7
  • 9
    Don't use strcmp to compare two buffers of binary data!!!! It'll bail as soon as it sees a NULL terminator character. Commented May 28, 2011 at 18:31
  • what do you mean by "gets annoyed"? what exactly happens? Commented May 28, 2011 at 18:33
  • What i meant, was that program stops working after first iteration. Commented May 29, 2011 at 14:44
  • BTW, if you're dealing with binary data, to make your code as portable as possible, you should make your buffers unsigned char rather than char Commented May 31, 2011 at 12:46
  • 2
    You used "c++" tag but you use nothing but "c" in your code. Commented Jul 13, 2015 at 22:23

6 Answers 6

41

If you can give up a little speed, here is a C++ way that requires little code:

#include <fstream> #include <iterator> #include <string> #include <algorithm> bool compareFiles(const std::string& p1, const std::string& p2) { std::ifstream f1(p1, std::ifstream::binary|std::ifstream::ate); std::ifstream f2(p2, std::ifstream::binary|std::ifstream::ate); if (f1.fail() || f2.fail()) { return false; //file problem } if (f1.tellg() != f2.tellg()) { return false; //size mismatch } //seek back to beginning and use std::equal to compare contents f1.seekg(0, std::ifstream::beg); f2.seekg(0, std::ifstream::beg); return std::equal(std::istreambuf_iterator<char>(f1.rdbuf()), std::istreambuf_iterator<char>(), std::istreambuf_iterator<char>(f2.rdbuf())); } 

By using istreambuf_iterators you push the buffer size choice, actual reading, and tracking of eof into the standard library implementation. std::equal returns when it hits the first mismatch, so this should not run any longer than it needs to.

This is slower than Linux's cmp, but it's very easy to read.

Sign up to request clarification or add additional context in comments.

1 Comment

@Zhang - if you use istreambuf_iterator you get one char at a time, yes. The internal implementation reads multiple characters at a time. If you look at github.com/gcc-mirror/gcc/blob/… for instance, it looks like there is a buffer copy, and the buffer size depends on the instantiated type. But I'm not all that experienced in looking at the internal implementations so you may want to research this further.
11

Here's a C++ solution. It seems appropriate since your question is tagged as C++. The program uses ifstream's rather than FILE*'s. It also shows you how to seek on a file stream to determine a file's size. Finally, it reads blocks of 4096 at a time, so large files will be processed as expected.

// g++ -Wall -Wextra equifile.cpp -o equifile.exe #include <iostream> using std::cout; using std::cerr; using std::endl; #include <fstream> using std::ios; using std::ifstream; #include <exception> using std::exception; #include <cstring> #include <cstdlib> using std::exit; using std::memcmp; bool equalFiles(ifstream& in1, ifstream& in2); int main(int argc, char* argv[]) { if(argc != 3) { cerr << "Usage: equifile.exe <file1> <file2>" << endl; exit(-1); } try { ifstream in1(argv[1], ios::binary); ifstream in2(argv[2], ios::binary); if(equalFiles(in1, in2)) { cout << "Files are equal" << endl; exit(0); } else { cout << "Files are not equal" << endl; exit(1); } } catch (const exception& ex) { cerr << ex.what() << endl; exit(-2); } return -3; } bool equalFiles(ifstream& in1, ifstream& in2) { ifstream::pos_type size1, size2; size1 = in1.seekg(0, ifstream::end).tellg(); in1.seekg(0, ifstream::beg); size2 = in2.seekg(0, ifstream::end).tellg(); in2.seekg(0, ifstream::beg); if(size1 != size2) return false; static const size_t BLOCKSIZE = 4096; size_t remaining = size1; while(remaining) { char buffer1[BLOCKSIZE], buffer2[BLOCKSIZE]; size_t size = std::min(BLOCKSIZE, remaining); in1.read(buffer1, size); in2.read(buffer2, size); if(0 != memcmp(buffer1, buffer2, size)) return false; remaining -= size; } return true; } 

2 Comments

@HaSeeBMiR - I think your analysis is not quite correct. For example, more than the first 4KB are verified since the read is happening in a loop. In fact the entire files are read because of the loop.
But why you compare using buffer of BLOCKSIZE why you dont compare whole buffer of size1 at once with memcmp.
10

When the files are binary, use memcmp not strcmp as \0 might appear as data.

Comments

9

Since you've allocated your arrays on the stack, they are filled with random values ... they aren't zeroed out.

Secondly, strcmp will only compare to the first NULL value, which, if it's a binary file, won't necessarily be at the end of the file. Therefore you should really be using memcmp on your buffers. But again, this will give unpredictable results because of the fact that your buffers were allocated on the stack, so even if you compare to files that are the same, the end of the buffers past the EOF may not be the same, so memcmp will still report false results (i.e., it will most likely report that the files are not the same when they are because of the random values at the end of the buffers past each respective file's EOF).

To get around this issue, you should really first measure the length of the file by first iterating through the file and seeing how long the file is in bytes, and then using malloc or calloc to allocate the buffers you're going to compare, and re-fill those buffers with the actual file's contents. Then you should be able to make a valid comparison of the binary contents of each file. You'll also be able to work with files larger than 64K at that point since you're dynamically allocating the buffers at run-time.

1 Comment

I've made improvements. But still... it doesn't work as I wanted.
4

Switch's code looks good to me, but if you want an exact comparison the while condition and the return need to be altered:

int compareFile(FILE* f1, FILE* f2) { int N = 10000; char buf1[N]; char buf2[N]; do { size_t r1 = fread(buf1, 1, N, f1); size_t r2 = fread(buf2, 1, N, f2); if (r1 != r2 || memcmp(buf1, buf2, r1)) { return 0; // Files are not equal } } while (!feof(f1) && !feof(f2)); return feof(f1) && feof(f2); } 

Comments

3

Better to use fread and memcmp to avoid \0 character issues. Also, the !feof checks really should be || instead of && since there's a small chance that one file is bigger than the other and the smaller file is divisible by your buffer size..

int compareFile(FILE* f1, FILE* f2) { int N = 10000; char buf1[N]; char buf2[N]; do { size_t r1 = fread(buf1, 1, N, f1); size_t r2 = fread(buf2, 1, N, f2); if (r1 != r2 || memcmp(buf1, buf2, r1)) { return 0; } } while (!feof(f1) || !feof(f2)); return 1; } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.