17

I am using requests to download files, but for large files I need to check the size of the file on disk every time because I can't display the progress in percentage and I would also like to know the download speed. How can I go about doing it ? Here's my code :

import requests import sys import time import os def downloadFile(url, directory) : localFilename = url.split('/')[-1] r = requests.get(url, stream=True) start = time.clock() f = open(directory + '/' + localFilename, 'wb') for chunk in r.iter_content(chunk_size = 512 * 1024) : if chunk : f.write(chunk) f.flush() os.fsync(f.fileno()) f.close() return (time.clock() - start) def main() : if len(sys.argv) > 1 : url = sys.argv[1] else : url = raw_input("Enter the URL : ") directory = raw_input("Where would you want to save the file ?") time_elapsed = downloadFile(url, directory) print "Download complete..." print "Time Elapsed: " + time_elapsed if __name__ == "__main__" : main() 

I think one way to do it would be to read the file every time in the for loop and calculate the percentage of progress based on the header Content-Length. But that would be again an issue for large files(around 500MB). Is there any other way to do it?

3 Answers 3

27

see here: Python progress bar and downloads

i think the code would be something like this, it should show the average speed since start as bytes per second:

import requests import sys import time def downloadFile(url, directory) : localFilename = url.split('/')[-1] with open(directory + '/' + localFilename, 'wb') as f: start = time.clock() r = requests.get(url, stream=True) total_length = r.headers.get('content-length') dl = 0 if total_length is None: # no content length header f.write(r.content) else: for chunk in r.iter_content(1024): dl += len(chunk) f.write(chunk) done = int(50 * dl / total_length) sys.stdout.write("\r[%s%s] %s bps" % ('=' * done, ' ' * (50-done), dl//(time.clock() - start))) print '' return (time.clock() - start) def main() : if len(sys.argv) > 1 : url = sys.argv[1] else : url = raw_input("Enter the URL : ") directory = raw_input("Where would you want to save the file ?") time_elapsed = downloadFile(url, directory) print "Download complete..." print "Time Elapsed: " + time_elapsed if __name__ == "__main__" : main() 
Sign up to request clarification or add additional context in comments.

4 Comments

This code looks good but IMO it won't show dynamic downloading , since when we request for requests.get(...) it will download entire file then it will come out of get function. This is dynamic features .
@sonukumar, notice the stream parameter in the get call request.get(url , stream=True). Check out the documentation.
@freeforalltousez What's the meaning of multiplying 50 when calculating the downloaded percentage ?
@Juancho it's the length of the progress bar. See the linked answer.
9

An improved version of the accepted answer for python3 using io.Bytes (write to memory), result in Mbps, support for ipv4/ipv6, size and port arguments.

import sys, time, io, requests def speed_test(size=5, ipv="ipv4", port=80): if size == 1024: size = "1GB" else: size = f"{size}MB" url = f"http://{ipv}.download.thinkbroadband.com:{port}/{size}.zip" with io.BytesIO() as f: start = time.perf_counter() r = requests.get(url, stream=True) total_length = r.headers.get('content-length') dl = 0 if total_length is None: # no content length header f.write(r.content) else: for chunk in r.iter_content(1024): dl += len(chunk) f.write(chunk) done = int(30 * dl / int(total_length)) sys.stdout.write("\r[%s%s] %s Mbps" % ('=' * done, ' ' * (30-done), dl//(time.perf_counter() - start) / 100000)) print( f"\n{size} = {(time.perf_counter() - start):.2f} seconds") 

Usage Examples:

speed_test() speed_test(10) speed_test(50, "ipv6") speed_test(1024, port=8080) 

Output Sample:

[==============================] 61.34037 Mbps 100MB = 17.10 seconds 

Available Options:

size: 5, 10, 20, 50, 100, 200, 512, 1024

ipv: ipv4, ipv6

port: 80, 81, 8080


Updated on 20221011:

  • time.perf_counter() replaced time.clock(), which has been deprecated on python 3.3 (kudos to shiro)

2 Comments

The function time.clock() has been removed, after having been deprecated since Python 3.3: use time.perf_counter() in above solution code.
Answer updated, tks.
0

I had a problem with a specific slow server to download a big file

  1. no Content-Length header.
  2. big file (42GB),
  3. no compression,
  4. slow server (<1MB/s),

Beeing this big, I had also problem with memory usage during the request. Requests doesn't write output on file, like urlibs does, looks like it keep it in memory.

No content length header makes the accepted answer.. not monitoring.

So I wrote this -basic- method to monitor speed during the csv download following just the "requests" documentation.

It needs a fname (complete output path), a link (http or https) and you can specify custom headers.

BLOCK=5*1024*1024 try: with open(fname, 'wb') as f: r = requests.get(link, headers=headers, stream=True) ## This is, because official dozumentation suggest it, ## saying it's more reliable thatn cycling directly on iterlines, to don't lose data lines = r.iter_lines() ## Init the base vars, for monitor and block management ## Obj is a byte object, because iterlines returno objects tsize = 0; obj = bytearray(); t0=time.time(); i=0; for line in lines: ## calculate the line size, in bytes, and add to the byte object tsize+=len(line) obj.extend(line) ## When condition reached, if tsize > BLOCK: ## Increment the block number i+=1; ## Calculate the speed.. this is in MB/s, ## but you can easily change to KB/s, or Blocks/s t1=time.time() t=t1-t0; speed=round(5/t, 2); ## Write the block to the file. f.write(obj) ## Write stats print('got', i*5, 'MB ', 'block' ,i, ' @', speed,'MB/s') ## Reinit all the base vars, for a new block obj=bytearray(); tsize=0; t0=time.time() ## Write the last block part to the file. f.write(obj) except Exception as e: print("Error: ", e, 0) 

1 Comment

I don't get your argumentations for iter_lines: 1) " because official dozumentation suggest it, ", where? 2) " don't lose data" in which way? The only thing a read is that is not reentrant safe

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.