Can I use wget to download all files recursively, but not their actual content?

Question

Sometimes, I wish to only get the directory structure of a website, but the files themselves are not important. I only want their name. Sort of like a mirror where every entry is just an empty dummy file.

Of course, doing a wget -r and afterwards run a script to empty all the files works fine, but it feels wasteful because it is not nice to neither the server nor my bandwidth. A more efficient, but even less elegant way is to manually stop and restart the process every time you hit a large file, or set a very short time-out. At least that significantly reduces the amount of data I have to download.

My question is: Can I make wget only create a file, but not download its content? Or am I using the wrong tool for the job?

See the --spider option. For example: wget -r -nv --spider http://example.com, then parse the output. — Satō Katsura
– Satō Katsura, Commented Jun 25, 2016 at 18:47
@SatoKatsura Not exactly what I want, the --spider option actually downloads the files, but deletes them afterwards. That does not save any bandwidth. — SE - stop firing the good guys
– SE - stop firing the good guys, Commented Jun 25, 2016 at 18:53
You can't know what example.html links to without downloading it first. There is no such thing as a "ls -R over HTTP", spidering is your best option. And I believe you do save some bandwidth with --spider, f.i. I don't think image files and the like are downloaded. — Satō Katsura
– Satō Katsura, Commented Jun 25, 2016 at 18:56
@SatoKatsura Oh... Yeah, thinking of it, following links without downloading them is a bit hard... you are right, my test was a bit flawed, and images or other content is ignored. Want to write up an answer I can accept? — SE - stop firing the good guys
– SE - stop firing the good guys, Commented Jun 25, 2016 at 18:59

Satō Katsura · Accepted Answer · 2016-06-25 19:04:42Z

Posting an answer as requested:

Use the --spider option:

wget -r -nv --spider http://example.com

Then you can parse the structure of the site from the output. This won't download files that stand no chance to contain links, such as images.

Stack Exchange Network

Can I use wget to download all files recursively, but not their actual content?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Can I use wget to download all files recursively, but not their actual content?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions