3

I'm uploading a large file (about 2GB) to an API that accepts POST method using requests module of Python, which results in loading the file to the memory first and increasing memory usage significantly. I believe there will be some other ways to stream the file to the API without burdening the memory. Any suggestions?

P.S.
This old way worked for me, but consumed too much memory.

file = {'file': open(path, 'rb')} requests.post(url, files = file) 

Below streaming way sees no memory gorged but returns code 400 from the server.

requests.post(url,data=open(path, 'rb')) 
2
  • does this answer help? stackoverflow.com/a/29811518/202168 Commented Jul 8, 2022 at 12:23
  • @Anentropic Please see my latest edit just now to help me cope with the issue effectively. Thank you! Commented Jul 8, 2022 at 13:26

2 Answers 2

3

Any suggestions?

Use Streaming Upload, as docs put it:

Requests supports streaming uploads, which allow you to send large streams or files without reading them into memory. To stream and upload, simply provide a file-like object for your body:

with open('massive-body', 'rb') as f: requests.post('http://some.url/streamed', data=f) 
Sign up to request clarification or add additional context in comments.

2 Comments

Now I know where the memory issue lies: I put the file object in a dict: file = {'file': open(path, 'rb')} and then posted: requests.post(url,files = file). If putting the file object directly into the post data as you wrote, I run into no issue. Thank you!
I'm sorry but if I don't upload the file in the file = {'file': open(path, 'rb')} way, the server will respond with code 400. I have updated my question to reflect this feedback.
1

When you pass files arg then requests lib makes a multipart form upload. i.e. it is like submitting a form, where the file is passed as a named field (file in your example)

I suspect the problem you saw is because when you pass a file object as data arg, as suggested in the docs here https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads then it does a streaming upload but the file content is used as the whole http post body.

So I think the server at the other end is expecting a form with a file field, but we're just sending the binary content of the file by itself.

What we need is some way to wrap the content of the file with the right "envelope" as we send it to the server, so that it can recognise the data we are sending.

See this issue where others have noted the same problem: https://github.com/psf/requests/issues/1584

I think the best suggestion from there is to use this additional lib, which provides streaming multipart form file upload: https://github.com/requests/toolbelt#multipartform-data-encoder

For example:

from requests_toolbelt import MultipartEncoder import requests encoder = MultipartEncoder( fields={'file': ('myfilename.xyz', open(path, 'rb'), 'text/plain')} ) response = requests.post( url, data=encoder, headers={'Content-Type': encoder.content_type} ) 

1 Comment

Yes, at length I found the same lib as yours and tested it out. It worked like a charm! This seemingly simple question led me to go a long and complex journey. Thank you!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.