19

for this code:

import sys import gevent from gevent import monkey monkey.patch_all() import requests import urllib2 def worker(url, use_urllib2=False): if use_urllib2: content = urllib2.urlopen(url).read().lower() else: content = requests.get(url, prefetch=True).content.lower() title = content.split('<title>')[1].split('</title>')[0].strip() urls = ['http://www.mail.ru']*5 def by_requests(): jobs = [gevent.spawn(worker, url) for url in urls] gevent.joinall(jobs) def by_urllib2(): jobs = [gevent.spawn(worker, url, True) for url in urls] gevent.joinall(jobs) if __name__=='__main__': from timeit import Timer t = Timer(stmt="by_requests()", setup="from __main__ import by_requests") print 'by requests: %s seconds'%t.timeit(number=3) t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2") print 'by urllib2: %s seconds'%t.timeit(number=3) sys.exit(0) 

this result:

by requests: 18.3397213892 seconds by urllib2: 2.48605842363 seconds 

in sniffer it looks this:

description: first 5 requests are sended by requests library, next 5 requests are sended by urllib2 library. red - is time when work was freeze, dark - when data receiving... wtf?!

How it posible if socket library patched and libraries must work identically? How use requests without requests.async for asynchronious work?

3
  • Can you explain your problem a little further? Why do you not want to use the requests.async module? Commented Mar 1, 2012 at 6:12
  • Requests dont work asynchronious. Why? I dont want use requests.async because is contain bad interface for use and dont work asynchronious too. Look at image, there is visible how work requests and urllib2. Commented Mar 1, 2012 at 12:16
  • 1
    See stackoverflow.com/questions/9110593/… and github.com/kennethreitz/grequests. Commented Feb 8, 2013 at 12:34

5 Answers 5

15

Sorry Kenneth Reitz. His library is wonderful.

I am stupid. I need select monkey patch for httplib like this:

gevent.monkey.patch_all(httplib=True) 

Because patch for httplib is disabled by default.

Sign up to request clarification or add additional context in comments.

2 Comments

Not valid anymore: ValueError: gevent.httplib is no longer provided, httplib must be False
Use grequests (by @KennethReitz). It mostly overrides the main verbs, and inherits the rest.
7

As was pointed out by Kenneth, another thing we can do is to let the requests module handle the asynchronous part. I've made changes to your code accordingly. Again, for me, the results show consistently that requests module performs better than urllib2

Doing this means that we cannot "thread" the call back part. But that should be okay, because the major gain should only be expected with the HTTP requests due to the request/response delay.

import sys import gevent from gevent import monkey monkey.patch_all() import requests from requests import async import urllib2 def call_back(resp): content = resp.content title = content.split('<title>')[1].split('</title>')[0].strip() return title def worker(url, use_urllib2=False): if use_urllib2: content = urllib2.urlopen(url).read().lower() title = content.split('<title>')[1].split('</title>')[0].strip() else: rs = [async.get(u) for u in url] resps = async.map(rs) for resp in resps: call_back(resp) urls = ['http://www.mail.ru']*5 def by_requests(): worker(urls) def by_urllib2(): jobs = [gevent.spawn(worker, url, True) for url in urls] gevent.joinall(jobs) if __name__=='__main__': from timeit import Timer t = Timer(stmt="by_requests()", setup="from __main__ import by_requests") print 'by requests: %s seconds'%t.timeit(number=3) t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2") print 'by urllib2: %s seconds'%t.timeit(number=3) sys.exit(0) 

Here's one of my results:

by requests: 2.44117593765 seconds by urllib2: 4.41298294067 seconds 

4 Comments

Hi! Thank for interest for my problem. I have executed your code and have updated picture of first post for show how your code work.
This is results of work: by requests: 25.532893147 seconds by urllib2: 9.65230888283 seconds
I am afraid I can't replicate you problem. I upgraded gevent to 0.13.6 and tried it on two different machines. But the requests module was working asynchronously. FYI, I tried it on Ubuntu 11.04 and 11.10.
I am test code on different machine with linux and it work asynchronously. I have understood that problem in windows build of gevent and I have succesful try to rebuild gevent and greenlet libraries, but it not helped. I dont know what do next.
6

Requests has gevent support integrated into the codebase:

http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests

3 Comments

Sorry, but it is not asynchronously working too. Why is it posible?
I am thanks very much for you for interest for my problem. It is an honor for me that author of this library have answered for my question.
"Sorry this page doesn't exist yet"
2

I ran your code on my machine (python 2.7.1, gevent 0.13.0, requests 0.10.6). It turned out that the time was always a good second or two faster when using the requests module . What versions are you using? An upgrade might simply solve the issue for you.

by requests: 3.7847161293 seconds by urllib2: 4.92611193657 seconds by requests: 2.90777993202 seconds by urllib2: 7.99798607826 seconds 

2 Comments

I am using this versions: python 2.7.2.5, gevent 0.13.6, requests 0.10.6
Your versions are even more advanced, so that's really weird. I've posted another answer, that might help you.
2

From the requests doc Blocking Or Non-Blocking:

If you are concerned about the use of blocking IO, there are lots of projects out there that combine Requests with one of Python's asynchronicity frameworks. Two excellent examples are grequests and requests-futures.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.