-1

LONG ago I wrote a backup script for our site and have updated it ever since. However, occasionally things go wrong and some of the older backups are now broken.

In days gone by I had used the utility zipinfo in our automated scripts to try and figure out if a previous backup was bad and then re-attempt a backup, but while we still have some zip laying around there are (at least) two issues.

First, zip has fundamental limitations and so we've used tar for larger backups.

And secondarily, zip doesn't capture as much metadata as tar does and so we prefer it for certain kinds of things.

Further, we've shifted to gzip instead of zip, and we're also compressing our tars as well...

Our backups are now huge and I'm trying to figure out what to remove and what to keep - no point in keeping broken files. So, I'm writing a script that merges our various backup directories - from on and off-site, etc - and I feel a serious need to check the validity of each file because sometimes one copy gets corrupted and the other is OK.

I did a look for a gzip version of zipinfo but didn't find it. And I've never heard of such a thing for tar, but I may just be ignorant!

I sure don't want to have to resort to expanding into disk space!

11
  • 1
    Hmmm... Looks like "tar -t" might be able to do this! ... A "not recoverable" error is one possible output from that command. Commented Jun 26, 2023 at 19:19
  • 2
    By the way, tar -t still decompresses the compressed archive, and tar still reads the whole of it, it's just that they skip the step where they're writing anything to disk Commented Jun 26, 2023 at 19:44
  • 2
    You should be storing cryptographic checksums with your backups to verify the archive integrity without needing to unpack it. This has the additional advantage that it will catch archive corruption that would be missed by tar (because tar doesn't perform any kind of data checksumming internally). Commented Jun 26, 2023 at 19:48
  • @larsks Thanks, larsks, ... could you expand a bit right here: "cryptographic checksums with your backups"? I'm well familiar with checksumming, but I'm not sure what you've got in mind here... Maybe you can share a link to something larger than a comment? Commented Jun 26, 2023 at 21:03
  • @MarcusMüller Thanks, Markus, for the reminder about that. ... I'm more willing to suffer the read I/O than write. However, I note larsks' comment about tar not checksumming so maybe there could still be corruption? Commented Jun 26, 2023 at 21:06

1 Answer 1

1

Regarding gzip:

(This applies to gz, tgz and tz files.)

Noting that zipinfo is based on unzip, I investigated gzip more thoroughly and found that while there is no direct equivalent to zipinfo -t, there's a combination that works similarly with gunzip:

# gunzip -t -v [file-specification(s)] [file-specification]: OK 

Note, however, that the desired output is sent to stderr and not stdout! Further, there's a tab between the output colon and OK, so adjust scripting accordingly.

Regarding tar:

As noted in comments above, I similarly found that while tar doesn't have the checks we'd like, tar -t is a reasonable start at a "better than nothing" solution.

As with gunzip, the confirming output comes from stderr, not stdout, though both output streams can be / are useful, depending on your specific sub-goals.

Strictly speaking, the current version of tar from Fedora 38 complains with:

This does not look like a tar archive

...if the archive is invalid. That is, just because there are no perceptible files in a tar doesn't mean it's necessarily invalid, it simply may have been improperly built in the first place. So, some might consider that an invalid archive! YMMV.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.