0

[ EDIT:
Answered my own question.

Used unison and some hacky post-processing
(
copy-pasted the log output of unison,
tweaked it in my text-editor with multi-selection editing,
then did some shell script processing on that (fishshell)
)

((I've got so much more disaster recovery stuff to slog through, so I guess I'm done with this for now...))
]


So I have an SSD that used to be inside another computer,
and I put it in one of those little SATA-to-USB enclosure/adapter shell things,
mounted it as an external data drive,
and used rsync -aAX to copy the boot partition over into a dir on this computer, for a backup.

But then, after some other events that probably didn't change the contents of the original boot partition,
I made a second backup.

So now I have two dirs on this computer,
which I think are probably two copies of exactly the same backup,
but I want to make sure.


So my question is:
What is the best way to compare/diff these two big backup directories?

Summary of things I have considered/tried but had problems with or am unsure about:

  • diff itself

  • rsync "dry run" trick

  • unison [(just thought of, but it hasn't finished running yet, due to the large size of the backups and the slow speed of my old hardware.)]

Is one of these essentially a good option?
If so, any corrections to the details of how I should go about using it?

Or are there any separate, additional options I should know about..?


Details of my attempts and the results/problems:


diff

The obvious way of doing it for "normal" directories would be like:
$ diff -r dir_A dir_B
(
or maybe $ diff -r --no-dereference dir_A dir_B?
I don't know; I honestly don't properly understand the function of --no-dereference
-- it's just something I've found got me the results I wanted in vaguely similar situations in the past.
)

However, the problem with using diff is that these dirs are of course very unusually large,
and full of "weird" files from the bootable system
(eg "character special files" and "block special file" etc).


rsync

So it occurred to me to just use rsync again between them,
doing a "dry run" and seeing if it reported any changes it would make.

Like:
$ sudo rsyncy -n -aAX dir_A dir_B --log-file=log_file

However, it then occurred to me

  • "what if there were new files in dir_B?"
  • "would rsyncy necessarily always report that?"

So I guessed you would have to check both:
$ sudo rsyncy -n -aAX --delete dir_A dir_B --log-file='log_file[A-to-B]'
and
$ sudo rsyncy -n -aAX --delete dir_B dir_A --log-file='log_file[B-to-A]'
which is starting to feel a little suspiciously like maybe that isn't really the right tool for the job after all...?

The log files I got read:

A-to-B
#=>

2023/07/21 01:43:04 [26686] building file list 2023/07/21 02:12:24 [26686] sent 80.58M bytes received 292.46K bytes 45.93K bytes/sec 2023/07/21 02:12:24 [26686] total size is 229.29G speedup is 2,835.29 (DRY RUN) 

B-to-A

2023/07/21 01:41:58 [26406] building file list 2023/07/21 02:12:15 [26406] sent 80.58M bytes received 292.50K bytes 44.49K bytes/sec 2023/07/21 02:12:15 [26406] total size is 229.29G speedup is 2,835.29 (DRY RUN) 

which are (ignoring the timestamps and speed information),
annoyingly ALMOST the exact same:
both
sent 80.58M bytes

but tiny different received:
received 292.46K bytes
vs
received 292.50K bytes

So yeah, again, I'm feeling doubtful this rsync trick is really the right tool for the job...?

Maybe the correct answer really is like:
"
Just be patient and let diff run for ages to process the two huge directories.
(You can just ignore all the error messages about special file etc.)
"
??


unison

[not sure yet?]

18
  • Have you consider first find different files and use diff only on them? Commented Jul 21, 2023 at 11:25
  • @RomeoNinov Sorry, I'm not sure I understand what you mean. Do you mean something like use the find command and send its output to diff somehow? (Like with the -exec flag or something?) Or...? Commented Jul 21, 2023 at 12:49
  • I mean first to generate hash of files, then compare and filter only pairs that differ and apply diff only on them. Commented Jul 21, 2023 at 13:02
  • @RomeoNinov Do you have a link to a more detailed guide/tutorial for what you're talking about? Because I don't think I could concretely implement that solution just from that abbreviated description... it sounds somewhat tricky and complex, difficult to keep track of? (Easy to mess up if you're figuring it out for the first time, especially working with such an unwieldy amount of data while learning.) Like, is this... going through the tree of directories, creating a copy with each "leaf" file in the tree replaced with a hash of itself...? Or archiving/compressing/zipping first? Or...? Commented Jul 21, 2023 at 13:55
  • I do not have such manual, just get the idea from your question. If there is no other answer I will write a script to do the idea. Commented Jul 21, 2023 at 14:21

1 Answer 1

2

unison

This worked for me in the end.

Here's the output data from unison,
plus a bit of manual checking I did on it using my fishshell at the end.


data from unison

$unison dir_A dir_B

reported a bunch of errors "unknown file type" for the weird stuff from the bootable system
but then a summary:

 0 items will be synced, 326 skipped 0 B to be synced from dir_A to dir_B 0 B to be synced from dir_B to dir_A No updates to propagate Synchronization complete at 07:40:11 (0 items transferred, 326 skipped, 0 failed) 

and then a list like
(
elastic-tabspace aligned,
ordering tweaked,
and a bunch of repetitive lines removed to fit under the stackexchange limit
("Body is limited to 30000 characters; you entered 56882.")
):

 # skipped: dev/console (path dir_A/dev/console has unknown file type) skipped: dev/core (path dir_A/dev/core has unknown file type) skipped: dev/full (path dir_A/dev/full has unknown file type) skipped: dev/hda (path dir_A/dev/hda has unknown file type) skipped: dev/hda1 (path dir_A/dev/hda1 has unknown file type) skipped: dev/hda10 (path dir_A/dev/hda10 has unknown file type) skipped: dev/hda11 (path dir_A/dev/hda11 has unknown file type) skipped: dev/hda12 (path dir_A/dev/hda12 has unknown file type) skipped: dev/hda13 (path dir_A/dev/hda13 has unknown file type) skipped: dev/hda14 (path dir_A/dev/hda14 has unknown file type) skipped: dev/hda15 (path dir_A/dev/hda15 has unknown file type) skipped: dev/hda16 (path dir_A/dev/hda16 has unknown file type) skipped: dev/hda17 (path dir_A/dev/hda17 has unknown file type) skipped: dev/hda18 (path dir_A/dev/hda18 has unknown file type) skipped: dev/hda19 (path dir_A/dev/hda19 has unknown file type) skipped: dev/hda2 (path dir_A/dev/hda2 has unknown file type) skipped: dev/input/event0 (path dir_A/dev/input/event0 has unknown file type) skipped: dev/input/event1 (path dir_A/dev/input/event1 has unknown file type) skipped: dev/input/event10 (path dir_A/dev/input/event10 has unknown file type) skipped: dev/input/event11 (path dir_A/dev/input/event11 has unknown file type) skipped: dev/input/event12 (path dir_A/dev/input/event12 has unknown file type) skipped: dev/input/event13 (path dir_A/dev/input/event13 has unknown file type) skipped: dev/input/event14 (path dir_A/dev/input/event14 has unknown file type) skipped: dev/input/event15 (path dir_A/dev/input/event15 has unknown file type) skipped: dev/input/event16 (path dir_A/dev/input/event16 has unknown file type) skipped: dev/input/event17 (path dir_A/dev/input/event17 has unknown file type) skipped: dev/input/event18 (path dir_A/dev/input/event18 has unknown file type) skipped: dev/input/event19 (path dir_A/dev/input/event19 has unknown file type) skipped: dev/input/js0 (path dir_A/dev/input/js0 has unknown file type) skipped: dev/input/js1 (path dir_A/dev/input/js1 has unknown file type) skipped: dev/input/js10 (path dir_A/dev/input/js10 has unknown file type) skipped: dev/input/js11 (path dir_A/dev/input/js11 has unknown file type) skipped: dev/input/js12 (path dir_A/dev/input/js12 has unknown file type) skipped: dev/input/js13 (path dir_A/dev/input/js13 has unknown file type) skipped: dev/input/js14 (path dir_A/dev/input/js14 has unknown file type) skipped: dev/input/js15 (path dir_A/dev/input/js15 has unknown file type) skipped: dev/input/js16 (path dir_A/dev/input/js16 has unknown file type) skipped: dev/input/js17 (path dir_A/dev/input/js17 has unknown file type) skipped: dev/input/js18 (path dir_A/dev/input/js18 has unknown file type) skipped: dev/input/js19 (path dir_A/dev/input/js19 has unknown file type) skipped: dev/input/keyboard (path dir_A/dev/input/keyboard has unknown file type) skipped: dev/input/mice (path dir_A/dev/input/mice has unknown file type) skipped: dev/input/mouse (path dir_A/dev/input/mouse has unknown file type) skipped: dev/input/mouse0 (path dir_A/dev/input/mouse0 has unknown file type) skipped: dev/input/mouse1 (path dir_A/dev/input/mouse1 has unknown file type) skipped: dev/input/mouse10 (path dir_A/dev/input/mouse10 has unknown file type) skipped: dev/input/mouse11 (path dir_A/dev/input/mouse11 has unknown file type) skipped: dev/input/mouse12 (path dir_A/dev/input/mouse12 has unknown file type) skipped: dev/input/mouse13 (path dir_A/dev/input/mouse13 has unknown file type) skipped: dev/input/mouse14 (path dir_A/dev/input/mouse14 has unknown file type) skipped: dev/input/mouse15 (path dir_A/dev/input/mouse15 has unknown file type) skipped: dev/input/mouse16 (path dir_A/dev/input/mouse16 has unknown file type) skipped: dev/input/mouse17 (path dir_A/dev/input/mouse17 has unknown file type) skipped: dev/input/mouse18 (path dir_A/dev/input/mouse18 has unknown file type) skipped: dev/input/mouse19 (path dir_A/dev/input/mouse19 has unknown file type) skipped: dev/input/uinput (path dir_A/dev/input/uinput has unknown file type) skipped: dev/mem (path dir_A/dev/mem has unknown file type) skipped: dev/null (path dir_A/dev/null has unknown file type) skipped: dev/port (path dir_A/dev/port has unknown file type) skipped: dev/ptmx (path dir_A/dev/ptmx has unknown file type) skipped: dev/random (path dir_A/dev/random has unknown file type) skipped: dev/sda (path dir_A/dev/sda has unknown file type) skipped: dev/sda1 (path dir_A/dev/sda1 has unknown file type) skipped: dev/sda10 (path dir_A/dev/sda10 has unknown file type) skipped: dev/sda11 (path dir_A/dev/sda11 has unknown file type) skipped: dev/sda12 (path dir_A/dev/sda12 has unknown file type) skipped: dev/sda13 (path dir_A/dev/sda13 has unknown file type) skipped: dev/sda14 (path dir_A/dev/sda14 has unknown file type) skipped: dev/sda15 (path dir_A/dev/sda15 has unknown file type) skipped: dev/tty (path dir_A/dev/tty has unknown file type) skipped: dev/tty0 (path dir_A/dev/tty0 has unknown file type) skipped: dev/tty1 (path dir_A/dev/tty1 has unknown file type) skipped: dev/tty10 (path dir_A/dev/tty10 has unknown file type) skipped: dev/tty11 (path dir_A/dev/tty11 has unknown file type) skipped: dev/tty12 (path dir_A/dev/tty12 has unknown file type) skipped: dev/tty13 (path dir_A/dev/tty13 has unknown file type) skipped: dev/tty14 (path dir_A/dev/tty14 has unknown file type) skipped: dev/tty15 (path dir_A/dev/tty15 has unknown file type) skipped: dev/tty16 (path dir_A/dev/tty16 has unknown file type) skipped: dev/tty17 (path dir_A/dev/tty17 has unknown file type) skipped: dev/tty18 (path dir_A/dev/tty18 has unknown file type) skipped: dev/tty19 (path dir_A/dev/tty19 has unknown file type) skipped: dev/urandom (path dir_A/dev/urandom has unknown file type) skipped: dev/zero (path dir_A/dev/zero has unknown file type) # skipped: tmp/runtime-username/pulse/native (path dir_A/tmp/runtime-username/pulse/native has unknown file type) skipped: var/guix/daemon-socket/socket (path dir_A/var/guix/daemon-socket/socket has unknown file type) skipped: var/spool/postfix/private/anvil (path dir_A/var/spool/postfix/private/anvil has unknown file type) skipped: var/spool/postfix/private/bounce (path dir_A/var/spool/postfix/private/bounce has unknown file type) skipped: var/spool/postfix/private/defer (path dir_A/var/spool/postfix/private/defer has unknown file type) skipped: var/spool/postfix/private/discard (path dir_A/var/spool/postfix/private/discard has unknown file type) skipped: var/spool/postfix/private/error (path dir_A/var/spool/postfix/private/error has unknown file type) skipped: var/spool/postfix/private/lmtp (path dir_A/var/spool/postfix/private/lmtp has unknown file type) skipped: var/spool/postfix/private/local (path dir_A/var/spool/postfix/private/local has unknown file type) skipped: var/spool/postfix/private/proxymap (path dir_A/var/spool/postfix/private/proxymap has unknown file type) skipped: var/spool/postfix/private/proxywrite (path dir_A/var/spool/postfix/private/proxywrite has unknown file type) skipped: var/spool/postfix/private/relay (path dir_A/var/spool/postfix/private/relay has unknown file type) skipped: var/spool/postfix/private/retry (path dir_A/var/spool/postfix/private/retry has unknown file type) skipped: var/spool/postfix/private/rewrite (path dir_A/var/spool/postfix/private/rewrite has unknown file type) skipped: var/spool/postfix/private/scache (path dir_A/var/spool/postfix/private/scache has unknown file type) skipped: var/spool/postfix/private/smtp (path dir_A/var/spool/postfix/private/smtp has unknown file type) skipped: var/spool/postfix/private/tlsmgr (path dir_A/var/spool/postfix/private/tlsmgr has unknown file type) skipped: var/spool/postfix/private/trace (path dir_A/var/spool/postfix/private/trace has unknown file type) skipped: var/spool/postfix/private/verify (path dir_A/var/spool/postfix/private/verify has unknown file type) skipped: var/spool/postfix/private/virtual (path dir_A/var/spool/postfix/private/virtual has unknown file type) skipped: var/spool/postfix/public/cleanup (path dir_A/var/spool/postfix/public/cleanup has unknown file type) skipped: var/spool/postfix/public/flush (path dir_A/var/spool/postfix/public/flush has unknown file type) skipped: var/spool/postfix/public/pickup (path dir_A/var/spool/postfix/public/pickup has unknown file type) skipped: var/spool/postfix/public/postlog (path dir_A/var/spool/postfix/public/postlog has unknown file type) skipped: var/spool/postfix/public/qmgr (path dir_A/var/spool/postfix/public/qmgr has unknown file type) skipped: var/spool/postfix/public/showq (path dir_A/var/spool/postfix/public/showq has unknown file type) skipped: var/tmp/audacity-username/.audacity.sock (path dir_A/var/tmp/audacity-username/.audacity.sock has unknown file type) 

plus a few things under home like:

 # # # skipped: home/username/.cache/fontforge/python-socket (path dir_A/home/username/.cache/fontforge/python-socket has unknown file type) skipped: home/username/.cache/keyring-70EDPZ/control (path dir_A/home/username/.cache/keyring-70EDPZ/control has unknown file type) # skipped: home/username/.copy/copyagent-overlay.socket (path dir_A/home/username/.copy/copyagent-overlay.socket has unknown file type) # skipped: home/username/.dropbox/command_socket (path dir_A/home/username/.dropbox/command_socket has unknown file type) skipped: home/username/.dropbox/iface_socket (path dir_A/home/username/.dropbox/iface_socket has unknown file type) # # skipped: home/username/.local/share/parcellite/fifo_c (path dir_A/home/username/.local/share/parcellite/fifo_c has unknown file type) skipped: home/username/.local/share/parcellite/fifo_cmd (path dir_A/home/username/.local/share/parcellite/fifo_cmd has unknown file type) skipped: home/username/.local/share/parcellite/fifo_p (path dir_A/home/username/.local/share/parcellite/fifo_p has unknown file type) # skipped: home/username/.steam/steam.pipe (path dir_A/home/username/.steam/steam.pipe has unknown file type) # skipped: home/username/Dropbox/.emacs.d/packages/gnupg/S.gpg-agent (path dir_A/home/username/Dropbox/.emacs.d/packages/gnupg/S.gpg-agent has unknown file type) # skipped: home/username/ax/bups/hostname/dropbox/0rolling/Dropbox/.emacs.d/packages/gnupg/S.gpg-agent (path dir_A/home/username/ax/bups/hostname/dropbox/0rolling/Dropbox/.emacs.d/packages/gnupg/S.gpg-agent has unknown file type) 

my manual checking with fishshell

I used multi-selection in my text-editor to extract the bits like:
"(path [this bit] has unknown file type)"
(made sure to escape any ' chars in the paths)
so I could iterate over them like
(from within the dir containing dir_A and dir_B):

 # personal functions used # bbl function bbl --description 'big block lines' set -l bb $argv if test "$bb" for l in (string trim $bb) string trim $l end else while read -l l set l (string trim $l) test "$l" and echo $l end end end # p (pretty print feedback thing) function p if test -n "$argv" echo -n (set color $fish_color_comment) '#$ ' (set_color normal) string escape --style script -- $argv |string join " " | fish_indent --ansi -i else while read -l x echo -n (set color $fish_color_comment) '#$ ' (set_color normal) string escape --style script -- $x |string join " " | fish_indent --ansi -i end end end # actually doing it for path in ( bbl ' dir_A/dev/console dir_A/dev/core dir_A/dev/full dir_A/dev/hda dir_A/dev/hda1 dir_A/dev/hda10 dir_A/dev/hda11 dir_A/dev/hda12 dir_A/dev/hda13 dir_A/dev/hda14 dir_A/dev/hda15 dir_A/dev/hda16 dir_A/dev/hda17 dir_A/dev/hda18 dir_A/dev/hda19 dir_A/dev/hda2 dir_A/dev/input/event0 dir_A/dev/input/event1 dir_A/dev/input/event10 dir_A/dev/input/event11 dir_A/dev/input/event12 dir_A/dev/input/event13 dir_A/dev/input/event14 dir_A/dev/input/event15 dir_A/dev/input/event16 dir_A/dev/input/event17 dir_A/dev/input/event18 dir_A/dev/input/event19 dir_A/dev/input/js0 dir_A/dev/input/js1 dir_A/dev/input/js10 dir_A/dev/input/js11 dir_A/dev/input/js12 dir_A/dev/input/js13 dir_A/dev/input/js14 dir_A/dev/input/js15 dir_A/dev/input/js16 dir_A/dev/input/js17 dir_A/dev/input/js18 dir_A/dev/input/js19 dir_A/dev/input/keyboard dir_A/dev/input/mice dir_A/dev/input/mouse dir_A/dev/input/mouse0 dir_A/dev/input/mouse1 dir_A/dev/input/mouse10 dir_A/dev/input/mouse11 dir_A/dev/input/mouse12 dir_A/dev/input/mouse13 dir_A/dev/input/mouse14 dir_A/dev/input/mouse15 dir_A/dev/input/mouse16 dir_A/dev/input/mouse17 dir_A/dev/input/mouse18 dir_A/dev/input/mouse19 dir_A/dev/input/uinput dir_A/dev/mem dir_A/dev/null dir_A/dev/port dir_A/dev/ptmx dir_A/dev/random dir_A/dev/sda dir_A/dev/sda1 dir_A/dev/sda10 dir_A/dev/sda11 dir_A/dev/sda12 dir_A/dev/sda13 dir_A/dev/sda14 dir_A/dev/sda15 dir_A/dev/tty dir_A/dev/tty0 dir_A/dev/tty1 dir_A/dev/tty10 dir_A/dev/tty11 dir_A/dev/tty12 dir_A/dev/tty13 dir_A/dev/tty14 dir_A/dev/tty15 dir_A/dev/tty16 dir_A/dev/tty17 dir_A/dev/tty18 dir_A/dev/tty19 dir_A/dev/urandom dir_A/dev/zero dir_A/tmp/runtime-username/pulse/native dir_A/var/guix/daemon-socket/socket dir_A/var/spool/postfix/private/anvil dir_A/var/spool/postfix/private/bounce dir_A/var/spool/postfix/private/defer dir_A/var/spool/postfix/private/discard dir_A/var/spool/postfix/private/error dir_A/var/spool/postfix/private/lmtp dir_A/var/spool/postfix/private/local dir_A/var/spool/postfix/private/proxymap dir_A/var/spool/postfix/private/proxywrite dir_A/var/spool/postfix/private/relay dir_A/var/spool/postfix/private/retry dir_A/var/spool/postfix/private/rewrite dir_A/var/spool/postfix/private/scache dir_A/var/spool/postfix/private/smtp dir_A/var/spool/postfix/private/tlsmgr dir_A/var/spool/postfix/private/trace dir_A/var/spool/postfix/private/verify dir_A/var/spool/postfix/private/virtual dir_A/var/spool/postfix/public/cleanup dir_A/var/spool/postfix/public/flush dir_A/var/spool/postfix/public/pickup dir_A/var/spool/postfix/public/postlog dir_A/var/spool/postfix/public/qmgr dir_A/var/spool/postfix/public/showq dir_A/var/tmp/audacity-username/.audacity.sock dir_A/home/username/.cache/fontforge/python-socket dir_A/home/username/.cache/keyring-70EDPZ/control dir_A/home/username/.copy/copyagent-overlay.socket dir_A/home/username/.dropbox/command_socket dir_A/home/username/.dropbox/iface_socket dir_A/home/username/.local/share/parcellite/fifo_c dir_A/home/username/.local/share/parcellite/fifo_cmd dir_A/home/username/.local/share/parcellite/fifo_p dir_A/home/username/.steam/steam.pipe dir_A/home/username/Dropbox/.emacs.d/packages/gnupg/S.gpg-agent dir_A/home/username/ax/bups/hostname/dropbox/0rolling/Dropbox/.emacs.d/packages/gnupg/S.gpg-agent ' ) # set -l path_A .dir_A/$path set -l path_B .dir_B/$path # `sudo diff` doesn't work for some reason (permission still denied even with sudo). I guess a weird edgecase bug in diff. # p sudo diff $path_A $path_B # sudo diff $path_A $path_B # or breakpoint # for x in $path_A $path_B p sudo test -e $x sudo test -e $x or breakpoint set -l size (sudo stat -c %s $x) # sudo prolly not needed? p test $size = "0" test $size = "0" or breakpoint end end 


So my my two backups dir_A and dir_B really are the same.

(And I'll check later that all those weird files really are size zero on in original source.)

((I've got so much more disaster recovery stuff to slog through, so I guess I'm done with this for now...))

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.