Since I'm not a fan of Perl, here's a bash version:
<!-- language: lang-sh -->
#!/bin/bash
DIR="/path/to/big/files"
find $DIR -type f -exec md5sum {} \; | sort > /tmp/sums-sorted.txt
OLDSUM=""
IFS=$'\n'
for i in `cat /tmp/sums-sorted.txt`; do
NEWSUM=`echo "$i" | sed 's/ .*//'`
NEWFILE=`echo "$i" | sed 's/^[^ ]* *//'`
if [ "$OLDSUM" == "$NEWSUM" ]; then
echo ln -f "$OLDFILE" "$NEWFILE"
else
OLDSUM="$NEWSUM"
OLDFILE="$NEWFILE"
fi
done
This finds all files with the same checksum (whether they're big, small, or already hardlinks), and hardlinks them together.
This can be greatly optimized for repeated runs with additional find flags (eg. size) and a file cache (so you don't have to redo the checksums each time). If anyone's interested in the smarter, longer version, I can post it.
**NOTE:** As has been mentioned before, hardlinks work as long as the files never need modification, or to be moved across filesystems.