Skip to main content
added 30 characters in body
Source Link
Mat
  • 54.9k
  • 11
  • 164
  • 143

Since I'm not a fan of Perl, here's a bash version:

#!/bin/bash DIR="/path/to/big/files" find $DIR -type f -exec md5sum {} \; | sort > /tmp/sums-sorted.txt OLDSUM="" IFS=$'\n' for i in `cat /tmp/sums-sorted.txt`; do NEWSUM=`echo "$i" | sed 's/ .*//'` NEWFILE=`echo "$i" | sed 's/^[^ ]* *//'` if [ "$OLDSUM" == "$NEWSUM" ]; then echo ln -f "$OLDFILE" "$NEWFILE" else OLDSUM="$NEWSUM" OLDFILE="$NEWFILE" fi done 
#!/bin/bash DIR="/path/to/big/files" find $DIR -type f -exec md5sum {} \; | sort > /tmp/sums-sorted.txt OLDSUM="" IFS=$'\n' for i in `cat /tmp/sums-sorted.txt`; do NEWSUM=`echo "$i" | sed 's/ .*//'` NEWFILE=`echo "$i" | sed 's/^[^ ]* *//'` if [ "$OLDSUM" == "$NEWSUM" ]; then echo ln -f "$OLDFILE" "$NEWFILE" else OLDSUM="$NEWSUM" OLDFILE="$NEWFILE" fi done 

This finds all files with the same checksum (whether they're big, small, or already hardlinks), and hardlinks them together.

This can be greatly optimized for repeated runs with additional find flags (eg. size) and a file cache (so you don't have to redo the checksums each time). If anyone's interested in the smarter, longer version, I can post it.

NOTE: As has been mentioned before, hardlinks work as long as the files never need modification, or to be moved across filesystems.

Since I'm not a fan of Perl, here's a bash version:

#!/bin/bash DIR="/path/to/big/files" find $DIR -type f -exec md5sum {} \; | sort > /tmp/sums-sorted.txt OLDSUM="" IFS=$'\n' for i in `cat /tmp/sums-sorted.txt`; do NEWSUM=`echo "$i" | sed 's/ .*//'` NEWFILE=`echo "$i" | sed 's/^[^ ]* *//'` if [ "$OLDSUM" == "$NEWSUM" ]; then echo ln -f "$OLDFILE" "$NEWFILE" else OLDSUM="$NEWSUM" OLDFILE="$NEWFILE" fi done 

This finds all files with the same checksum (whether they're big, small, or already hardlinks), and hardlinks them together.

This can be greatly optimized for repeated runs with additional find flags (eg. size) and a file cache (so you don't have to redo the checksums each time). If anyone's interested in the smarter, longer version, I can post it.

NOTE: As has been mentioned before, hardlinks work as long as the files never need modification, or to be moved across filesystems.

Since I'm not a fan of Perl, here's a bash version:

#!/bin/bash DIR="/path/to/big/files" find $DIR -type f -exec md5sum {} \; | sort > /tmp/sums-sorted.txt OLDSUM="" IFS=$'\n' for i in `cat /tmp/sums-sorted.txt`; do NEWSUM=`echo "$i" | sed 's/ .*//'` NEWFILE=`echo "$i" | sed 's/^[^ ]* *//'` if [ "$OLDSUM" == "$NEWSUM" ]; then echo ln -f "$OLDFILE" "$NEWFILE" else OLDSUM="$NEWSUM" OLDFILE="$NEWFILE" fi done 

This finds all files with the same checksum (whether they're big, small, or already hardlinks), and hardlinks them together.

This can be greatly optimized for repeated runs with additional find flags (eg. size) and a file cache (so you don't have to redo the checksums each time). If anyone's interested in the smarter, longer version, I can post it.

NOTE: As has been mentioned before, hardlinks work as long as the files never need modification, or to be moved across filesystems.

Source Link
seren
  • 141
  • 2

Since I'm not a fan of Perl, here's a bash version:

#!/bin/bash DIR="/path/to/big/files" find $DIR -type f -exec md5sum {} \; | sort > /tmp/sums-sorted.txt OLDSUM="" IFS=$'\n' for i in `cat /tmp/sums-sorted.txt`; do NEWSUM=`echo "$i" | sed 's/ .*//'` NEWFILE=`echo "$i" | sed 's/^[^ ]* *//'` if [ "$OLDSUM" == "$NEWSUM" ]; then echo ln -f "$OLDFILE" "$NEWFILE" else OLDSUM="$NEWSUM" OLDFILE="$NEWFILE" fi done 

This finds all files with the same checksum (whether they're big, small, or already hardlinks), and hardlinks them together.

This can be greatly optimized for repeated runs with additional find flags (eg. size) and a file cache (so you don't have to redo the checksums each time). If anyone's interested in the smarter, longer version, I can post it.

NOTE: As has been mentioned before, hardlinks work as long as the files never need modification, or to be moved across filesystems.