2

Sometimes, I wish to only get the directory structure of a website, but the files themselves are not important. I only want their name. Sort of like a mirror where every entry is just an empty dummy file.

Of course, doing a wget -r and afterwards run a script to empty all the files works fine, but it feels wasteful because it is not nice to neither the server nor my bandwidth. A more efficient, but even less elegant way is to manually stop and restart the process every time you hit a large file, or set a very short time-out. At least that significantly reduces the amount of data I have to download.

My question is: Can I make wget only create a file, but not download its content? Or am I using the wrong tool for the job?

4
  • See the --spider option. For example: wget -r -nv --spider http://example.com, then parse the output. Commented Jun 25, 2016 at 18:47
  • @SatoKatsura Not exactly what I want, the --spider option actually downloads the files, but deletes them afterwards. That does not save any bandwidth. Commented Jun 25, 2016 at 18:53
  • You can't know what example.html links to without downloading it first. There is no such thing as a "ls -R over HTTP", spidering is your best option. And I believe you do save some bandwidth with --spider, f.i. I don't think image files and the like are downloaded. Commented Jun 25, 2016 at 18:56
  • @SatoKatsura Oh... Yeah, thinking of it, following links without downloading them is a bit hard... you are right, my test was a bit flawed, and images or other content is ignored. Want to write up an answer I can accept? Commented Jun 25, 2016 at 18:59

1 Answer 1

4

Posting an answer as requested:

Use the --spider option:

wget -r -nv --spider http://example.com 

Then you can parse the structure of the site from the output. This won't download files that stand no chance to contain links, such as images.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.