6

Today I downloaded a large file using wget, but I accidentally deleted it before it had finished downloading.

However wget continued to download the file until it had finished, but there was no file saved at the end.

I wonder what happened to rest of the file that Wget downloaded after I deleted it?

2
  • 4
    You did not delete the file. You deleted an entry for it in a directory. This often causes the file itself to be deleted, but not always: as soon as no reference to a file exists in the system, the system will actually delete the file; but wget still held a reference to it. Commented May 21 at 8:42
  • What the other answers didn't touch is that one can actually recover a deleted file while a process still holds a file descriptor for it. A related discussion > Use lsof to find the inode number, and debugfs to recreate a hard link to it. Commented May 21 at 20:14

2 Answers 2

13
  1. wget creates the file, also creating an entry (an "inode") in the file's directory, incremeents file inode's "use count", and receives a "file handle", a small integer used internally by wget.
  2. wget writes downloaded data to the file handle.
  3. $USER deletes the file, marking the inode "deleted", but since the inode's "use count" is non-zero, does no more..
  4. wget continues to write to its file handle, and eventually closes the file handle. The inode's use count is decremented.
  5. Since the use count is 0, and the inode is marked "deleted", the inode's allocated disk blocks are added to available disk space.

The data is regarded as unrecoverable, and gets more so as you continue to use your system.

The "right" way to stop a download, is eithr ^C the wget console, or pkill -9 wget from outside.

6
  • 14
    1) It is NOT correct to pkill -9 wget. That should be pkill wget. The -9 should only be used if the default form fails, or if exit handling and cleanup must be prevented. 2) It is possible to recover the deleted file, but only while to program is still running. One does this by finding the pid, and then the entry in /proc/<pid>/fd/ corresponding to the deleted file, and hard linking or copying it to another name. Commented May 20 at 16:29
  • 2
    @DavidG. hardlinking no longer works Commented May 20 at 18:54
  • 1
    @StéphaneChazelas reading the patch... make sure we don't do something useful, and prevents a race (which should be what the mutex does). The hardlink via proc option is only disabled for deleted files. I could see this as a privileged feature. Commented May 20 at 21:29
  • 3
    A tip if you need to copy: First open the file with a command like sleep 999999 </proc/<pid>/fd/<num>, then wait for the download to finish, then copy from the sleep's pid. Finally, kill the sleep. Commented May 20 at 21:31
  • 2
    "wget creates the file, also creating an entry (an "inode") in the file's directory ..." - I think you are confusing inode and hard link. The inode is the file, and it is not "created in a directory". The inode lives in the filesystem's inode pool. The directory contains the name (formally: hard link) pointing to the inode. To summarize: the file/inode is created, and a name/hardlink pointing to it is created in the directory. It's also worth pointing out wget does not perform these steps; the kernel does both for wget when it calls open(). Commented May 21 at 10:23
7

As long as the process of wget still has the file open, the data remains on disk, but no longer accessible by filename because you removed a hard link to the file data on the disk.

The file was deleted in the directory structure, but wget was still writing to it invisibly.

The OS kept the file's data alive because wget still held a file descriptor to it and continued writing data to the now unlinked file.

wget finished and closed the file, the os freed the remaining disk space since no proces was using it anymore.

After completion, wget closed the file handle, and normal the OS cleaned up the orphaned data.

Orphaned data refers to data that is no longer associated with a corresponding record or entity in a database, data storage or other information system. This situation typically arises when a record, file, or object is deleted, but the associated data remains in the system without a proper link to a parent entity.

An orphaned inode is an inode which isn't attached to a directory entry in the filesystem, which means it can't be reached.....

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.