5

The lftp utility has a mirror command which synchronizes a local directory with a directory on the remote server. How does this command decide which files need to be transferred?

In particular, if a file already exists on both the local and remote directories, how does it decide whether it should overwrite the file in the destination directory? Is it just based on modification time or does it use a more complex heuristic?

2 Answers 2

5

lftp does not perform file contents integrity check (eg a hash) when comparing files. This is important to know when ensuring the integrity of downloaded files.

I first suspected this when dealing with a corrupted download and noting that the mirror command completed too fast for a hash to be done. I then confirmed by inspecting the lftp source code. Specifically: the FileInfo::SameAs method handles this (latest src on github (pasted below)).

bool FileInfo::SameAs(const FileInfo *fi,int ignore) const { if(defined&NAME && fi->defined&NAME) if(strcmp(name,fi->name)) return false; if(defined&TYPE && fi->defined&TYPE) if(filetype!=fi->filetype) return false; if((defined&TYPE && filetype==DIRECTORY) || (fi->defined&TYPE && fi->filetype==DIRECTORY)) return false; // can't guarantee directory is the same (recursively) if(defined&SYMLINK_DEF && fi->defined&SYMLINK_DEF) return (strcmp(symlink,fi->symlink)==0); if(defined&DATE && fi->defined&DATE && !(ignore&DATE)) { time_t p=date.ts_prec; if(p<fi->date.ts_prec) p=fi->date.ts_prec; if(!(ignore&IGNORE_DATE_IF_OLDER && date<fi->date) && labs(date-fi->date)>p) return false; } if(defined&SIZE && fi->defined&SIZE && !(ignore&SIZE)) { if(!(ignore&IGNORE_SIZE_IF_OLDER && defined&DATE && fi->defined&DATE && date<fi->date) && (size!=fi->size)) return false; } return true; } 

Looking this over you can see that lftp tries to check the following:

  • filename
  • filetype
  • symlink or not
  • date
  • filesize

Even these checks can't be fully trusted though because they are simply skipped if something returns as undefined.

If you are lucky the FTP host will provide a text file with checksum hashes so you can verify the downloaded content. I was not so lucky and had to redownload completely.

2

Most likely that it's checking file size and/or creation time to find out if the file was modified, unless, you specify which files to copy.

A small portion of LFTP manual:

 --ignore-time ignore time when deciding whether to download --ignore-size ignore size when deciding whether to download --only-missing download only missing files --only-existing download only files already existing at target -n, --only-newer download only newer files (-c won't work) --upload-older upload even files older than the target ones --transfer-all transfer all files, even seemingly the same at the target site 
1
  • 1
    That sounds reasonable but I'm having a hard time figuring out exactly what heuristic it's using... Commented Feb 27, 2021 at 1:20

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.