13

I'm using Requests to download a file (several gigabytes) from a server. To provide progress updates (and to prevent the entire file from having to be stored in memory) I've set stream=True and wrote the download to a file:

with open('output', 'w') as f: response = requests.get(url, stream=True) if not response.ok: print 'There was an error' exit() for block in response.iter_content(1024 * 100): f.write(block) completed_bytes += len(block) write_progress(completed_bytes, total_bytes) 

However, at some random point in the download, Requests throws a ChunkedEncodingError. I've gone into the source and found that this corresponds to an IncompleteRead exception. I inserted a log statement around those lines and found that e.partial = "\r". I know that the server gives the downloads low priority and I suspect that this exception occurs when the server waits too long to send the next chunk.

As is expected, the exception stops the download. Unfortunately, the server does not implement HTTP/1.1's content ranges, so I cannot simply resume it. I've played around with increasing urllib3's internal timeout, but the exception still persists.

Is there anyway to make the underlying urllib3 (or Requests) more tolerant of these empty (or late) chunks so that the file can completely download?

3
  • What platform are you on? Might I suggest the use of a tool that may be specialized for your use that you can call through the shell, such as curl? Commented Sep 15, 2016 at 0:14
  • can you try setting a longer timeout in the get (the kwarg timeout should be working with stream=True in 2.3, see github.com/kennethreitz/requests/issues/1803). I would also verify that your headers for content type and encoding match what you are expecting to ensure it's not truncating the stream Commented Sep 29, 2016 at 2:58
  • Have you tried with a smaller block? Seems like I've always used 1024 or 2048. Commented Jan 19, 2017 at 22:39

1 Answer 1

1
import httplib def patch_http_response_read(func): def inner(*args): try: return func(*args) except httplib.IncompleteRead, e: return e.partial return inner httplib.HTTPResponse.read = patch_http_response_read(httplib.HTTPResponse.read) 

I can not reproduce your problem right now, but I think this could be a patch. It allows you to deal with defective http servers.

Most bad servers transmit all data, but due implementation errors they wrongly close session and httplib raise error and bury your precious bytes.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.