7

I'm trying to create a 'Download Manager' for Linux that lets me download one single file using multiple threads. This is what I'm trying to do :

  1. Divide the file to be downloaded into different parts by specifying an offset
  2. Download the different parts into a temporary location
  3. Merge them into a single file.

Steps 2 and 3 are solvable, and it is at Step #1 that I'm stuck. How do I specify an offset while downloading a file?

Using something along the lines of open("/path/to/file", "wb").write(urllib2.urlopen(url).read()) does not let me specify a starting point to read from. Is there any alternative to this?

5
  • 1
    Why do you want to download using multiple threads? The download won't be any quicker. Commented Mar 14, 2012 at 12:13
  • Can't you use multiple files and merging after? You save a file in a temp directory for every chunk of the remote file and you merge everything after. Commented Mar 14, 2012 at 12:17
  • 1
    @JakubZaverka : It usually is. You can see the difference if you try downloading the same file using wget and a download manager like DownThemAll (for Firefox) or even try to Multi-Thread wget using this. Commented Mar 14, 2012 at 12:17
  • @hurtledown : My question is, how do I download the different parts of a single file? Commented Mar 14, 2012 at 12:19
  • 2
    Here you have answer: stackoverflow.com/questions/3328059/… Commented Mar 14, 2012 at 12:21

3 Answers 3

5

first, the http server should return Content-Length header. this is usually means the file is a static file, if it is a dynamic file, such as a result of php or jsp, you can not do such split.

then, you can use http Range header when request, this header tell the server which part of file should return. see python doc for how set and parse http head.

to do this, if the part size is 100k, you first request with Range: 0-1000000 100k will get first part, and in its conent-length in response tell your the size of file, then start some thread with different Range, it will work

Sign up to request clarification or add additional context in comments.

1 Comment

Ive been searching for this answere for 6 months thank you🙏
4

To download part of the file, just set the Range header like this

req = urllib2.Request(url) req.headers['Range'] = 'bytes=%s-%s' % (start, end) f = urllib2.urlopen(req) 

Not all server support the Range header though. Most file sharing service don't.

Comments

0

See file.seek in http://docs.python.org/library/stdtypes.html#file-objects.

This might do the trick.

Out of interest, what is the reason behind splitting the file?

1 Comment

Its a remotely hosted file that I need to access. Not one on my system.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.