4

I have a C app that generates very large binary files, each about 30GB. After writing each file, computing an MD5 checksum of it takes a while, (a couple of minutes per file, approximately.)

How would I go about computing the MD5 checksum of the file as it is being written to disk? I figure by doing this I would at least save the additional overhead of re-reading the file to compute the checksum afterwards.

I'm using the C standard library to do all file IO, and the OS is Linux.

Can this be done? Thanks!

1
  • There are a million MD5 implementations out there with an update method which the Google will find you. There is even a perfectly workable one right in the RFC. Commented Mar 16, 2012 at 2:15

1 Answer 1

5

This is certainly possible to do. Essentially, you initialise an MD5 calculation before you start writing. Then, whenever you write some data to disk, also pass that to the MD5 update function. After writing all the data, you call a final MD5 function to compute the final digest.

If you don't have any MD5 code handy, RFC 1321 has an MD5 reference implementation included that provides the above operations.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.