Ideal method for sending multiple HTTP requests over Python? [duplicate]

Question

Possible Duplicate:
Multiple (asynchronous) connections with urllib2 or other http library?

I am working on a Linux web server that runs Python code to grab realtime data over HTTP from a 3rd party API. The data is put into a MySQL database. I need to make a lot of queries to a lot of URL's, and I need to do it fast (faster = better). Currently I'm using urllib3 as my HTTP library. What is the best way to go about this? Should I spawn multiple threads (if so, how many?) and have each query for a different URL? I would love to hear your thoughts about this - thanks!

There is a new answer that I can't add because this question was closed. The best way to do this today is using requests-futures github.com/ross/requests-futures — Chris Broski
– Chris Broski, Commented Jun 22, 2018 at 18:05

iamtodor · Accepted Answer · 2021-09-05 14:45:04Z

30

If a lot is really a lot than you probably want use asynchronous io not threads.

requests + gevent = grequests

GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.

import grequests urls = [ 'http://www.heroku.com', 'http://tablib.org', 'http://httpbin.org', 'http://python-requests.org', 'http://kennethreitz.com' ] rs = (grequests.get(u) for u in urls) grequests.map(rs)

edited Sep 5, 2021 at 14:45

iamtodor

87412 silver badges22 bronze badges

answered May 12, 2012 at 0:03

Piotr Dobrogost

42.7k47 gold badges243 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

John Over a year ago

I want to use this method for sending requests to about 50,000 urls. Is it a good strategy? Also, what about exceptions like timeout etc?

Piotr Dobrogost Over a year ago

@John Yes, it is. As to exceptions see safe_mode parameter and issue 953

AliBZ Over a year ago

I can't send more than 30 requests using grequest. When I do, I get "Max retries exceeded with url: ..., Too many open files". Is there anyway to fix this problem?

Pedro Over a year ago

Word of warning: grequests seems to be abandoned, and does not have error handling. My personal recommendation is github.com/ross/requests-futures , which is equally fast and, with backports, also works on 2.7.

Ehtesh Choudhury Over a year ago

@droope it doesn't look like grequests is abandoned, and it seems easier to run on python_ver < 3.4. Do you have a link to the backports package you're talking about? This is the most popular package I see: pypi.python.org/pypi/backports.ssl_match_hostname

|

Maksym Polshcha · Accepted Answer · 2012-05-11 16:42:28Z

You should use multithreading as well as pipelining requests. For example search->details->save

The number of threads you can use doesn't depend on your equipment only. How many requests the service can serve? How many concurrent requests does it allow to run? Even your bandwidth can be a bottleneck.

If you're talking about a kind of scraping - the service could block you after certain limit of requests, so you need to use proxies or multiple IP bindings.

As for me, in the most cases, I can run 50-300 concurrent requests on my laptop from python scripts.

Agree with Polscha, here. Most of the time, when you're making HTTP requests to an arbitrary service, most of the (clock) time expended is in waiting for for the network and the remote service to respond. So, within reason, the more threads, the better as at any given moment, most of those threads will just be in wait queues. Definitely heed Polscha's notes on service throttling.
thanks guys - the service is commercial and we are paying for it. it is very fast and will not be the bottleneck. in this case, what would be the best option?
@user1094786 In this case just try to build a pipeline of requests and experiment with a number of threads on each stage. Just try, sooner or later you'll found the upper limit :-)

Community · Accepted Answer · 2017-05-23 12:17:10Z

Sounds like an excellent application for Twisted. Here are some web-related examples, including how to download a web page. Here is a related question on database connections with Twisted.

Note that Twisted does not rely on threads for doing multiple things at once. Rather, it takes a cooperative multitasking approach---your main script starts the reactor and the reactor calls functions that you set up. Your functions must return control to the reactor before the reactor can continue working.

Collectives™ on Stack Overflow

Ideal method for sending multiple HTTP requests over Python? [duplicate]

3 Answers 3

7 Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

3 Comments

Comments

Linked

Related