Timeline for Is it necessary to read every single byte to check if a copied file is identical to the original?
Current License: CC BY-SA 3.0
21 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jun 27, 2013 at 19:35 | answer | added | user14517 | timeline score: 0 | |
| Feb 9, 2012 at 7:43 | history | edited | user8 | Fix tags | |
| Jan 20, 2012 at 1:58 | answer | added | Loren Pechtel | timeline score: 1 | |
| Jan 19, 2012 at 23:30 | comment | added | psr | Short answer - No, it's best just to have your computer do it for you. | |
| Jan 19, 2012 at 23:18 | comment | added | jasonk | There's also a non-zero probability of two fluke disk read errors covering up an issue, or of a solar flare corrupting a single bit. It all depends on your comfort level. Servers have ECC ram for this reason. | |
| Dec 5, 2011 at 7:26 | vote | accept | Koen027 | ||
| Dec 3, 2011 at 23:46 | history | edited | user1249 | CC BY-SA 3.0 | edited title |
| Dec 3, 2011 at 23:37 | answer | added | Keith Thompson | timeline score: 45 | |
| Dec 3, 2011 at 23:26 | comment | added | Dean Harding | @KeithThompson: I think your first comment should be an answer :-) | |
| Dec 3, 2011 at 23:10 | comment | added | Keith Thompson | As for the likelihood of collision, if you use a decent hash like sha1sum you pretty much don't have to worry about it, unless someone is deliberately and expensively constructing files whose sha1sums collide. I don't have a source for this, but I've heard (in the context of git) that the probability of two different files having the same sha1sum is about the same as the probability of every member of your development team being eaten by wolves. On the same day. In completely unrelated incidents. | |
| Dec 3, 2011 at 23:09 | comment | added | user1249 | Also how will you ensure that the checksums/hashes are correct? | |
| Dec 3, 2011 at 23:07 | comment | added | Keith Thompson | Calculating CRCs (or, better, sha1sums) on both files requires reading every byte anyway. If you do a byte-by-byte comparison, you can quit as soon as you see a mismatch -- and you don't have to worry about two different files that happen to have the same checksum (though that's vanishingly unlikely for sha1sum). On the other hand, checksum comparisons are useful when you're comparing files that aren't on the same machine; the checksums can be computed locally, and you don't have to transfer the entire content over the network. | |
| Dec 3, 2011 at 22:48 | answer | added | NoChance | timeline score: 0 | |
| Dec 3, 2011 at 22:30 | comment | added | user1249 | Have a look at how "rsync" handles this. | |
| Dec 3, 2011 at 21:55 | history | edited | yannis | CC BY-SA 3.0 | deleted 5 characters in body; edited title |
| Dec 3, 2011 at 20:23 | history | tweeted | twitter.com/#!/StackProgrammer/status/143062801143959552 | ||
| Dec 3, 2011 at 16:21 | comment | added | Joey Adams | Even that isn't perfect if the file's content is cached in RAM or on the disk's write cache. | |
| Dec 3, 2011 at 15:57 | answer | added | JohnFx | timeline score: 11 | |
| Dec 3, 2011 at 15:21 | answer | added | user7007 | timeline score: 3 | |
| Dec 3, 2011 at 15:20 | answer | added | Dave Rager | timeline score: 5 | |
| Dec 3, 2011 at 15:08 | history | asked | Koen027 | CC BY-SA 3.0 |