Skip to content

Conversation

@vlakoff
Copy link
Contributor

@vlakoff vlakoff commented Feb 19, 2025

In File::hasSameHash(), replace MD5 with xxh128, which has been available since PHP 8.1.

xxh128 has the same 128-bit length, but is much faster and avoids the security issues associated with MD5. See also this article.

Quick benchmark for comparing 100 MB files:

Benchmark code
// Using MD5 hashes $start = microtime(true); $hash1 = md5_file('./random_100MB_file_1.bin'); $hash2 = md5_file('./random_100MB_file_2.bin'); $areHashesIdentical = ($hash1 === $hash2); $timeMd5Hash = microtime(true) - $start; // Using xxh128 hashes $start = microtime(true); $hash1 = hash_file('xxh128', './random_100MB_file_3.bin'); $hash2 = hash_file('xxh128', './random_100MB_file_4.bin'); $areHashesIdentical = ($hash1 === $hash2); $timeXxh128Hash = microtime(true) - $start; // Direct comparison of contents $start = microtime(true); $content1 = file_get_contents('./random_100MB_file_5.bin'); $content2 = file_get_contents('./random_100MB_file_6.bin'); $areContentsIdentical = ($content1 === $content2); $timeContent = microtime(true) - $start; // Results echo 'MD5 hashes: ' . round($timeMd5Hash * 1000) . " ms\n"; echo 'xxh128 hashes: ' . round($timeXxh128Hash * 1000) . " ms\n"; echo 'file_get_contents(): ' . round($timeContent * 1000) . " ms\n";

Results:

MD5 hashes: 375 ms xxh128 hashes: 76 ms file_get_contents(): 69 ms 

(The code using file_get_contents() is only for reference. Large files would exhaust memory, and hash_equals() (#49721) is designed for short, same-length parameters.)

Using xxh128 virtually removes the hashing overhead. 🚀

xxh128 is already used in some places in Laravel, for instance see #45371 and discussion #46074.

Drop-in replacement. Much faster than MD5, and avoids its security issues.
@crynobone
Copy link
Member

@vlakoff
Copy link
Contributor Author

vlakoff commented Feb 19, 2025

For the case at hand, I can't think of any regression. The hashes don't go outside the function.

@crynobone crynobone changed the title Replace MD5 with xxh128 in File::hasSameHash() [11.x] Replace MD5 with xxh128 in File::hasSameHash() Feb 19, 2025
@taylorotwell taylorotwell merged commit bf58d21 into laravel:11.x Feb 19, 2025
46 checks passed
@vlakoff vlakoff deleted the Filesystem/hasSameHash branch February 20, 2025 02:39
@francoism90
Copy link

Today I've learned this method actually exists. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants