12

I'm working with a process which is basically as follows:

  1. Take some list of urls.
  2. Get a Response object from each.
  3. Create a BeautifulSoup object from the text of each Response.
  4. Pull the text of a certain tag from that BeautifulSoup object.

From my understanding, this seems ideal for grequests:

GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.

But yet, the two processes (one with requests, one with grequests) seem to be getting me different results, with some of the requests in grequests returning None rather than a response.

Using requests

import requests tickers = [ 'A', 'AAL', 'AAP', 'AAPL', 'ABBV', 'ABC', 'ABT', 'ACN', 'ADBE', 'ADI', 'ADM', 'ADP', 'ADS', 'ADSK', 'AEE', 'AEP', 'AES', 'AET', 'AFL', 'AGN', 'AIG', 'AIV', 'AIZ', 'AJG', 'AKAM', 'ALB', 'ALGN', 'ALK', 'ALL', 'ALLE', ] BASE = 'https://finance.google.com/finance?q={}' rs = (requests.get(u) for u in [BASE.format(t) for t in tickers]) rs = list(rs) rs # [<Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # ... # <Response [200]>] # All are okay (status_code == 200) 

Using grequests

# Restarted my interpreter and redefined `tickers` and `BASE` import grequests rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers]) rs = grequests.map(rs) rs # [None, # <Response [200]>, # None, # None, # None, # None, # None, # None, # None, # None, # None, # None, # None, # None, # None, # None, # None, # None, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>, # <Response [200]>] 

Why the difference in results?

Update: I can print the exception type as follows. Related discussion here but I have no idea what's going on.

def exception_handler(request, exception): print(exception) rs = grequests.map(rs, exception_handler=exception_handler) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) # ("bad handshake: SysCallError(-1, 'Unexpected EOF')",) 

System/version info

  • requests: 2.18.4
  • grequests: 0.3.0
  • Python: 3.6.3
  • urllib3: 1.22
  • pyopenssl: 17.2.0
  • All via Anaconda
  • System: same issue on both Mac OSX HS & Windows 10, build 10.0.16299
7
  • 1
    If you look at the README it suggests that failed requests result in a None. I'm guessing that Google is getting angry when you make too many unauthenticated requests all at once. Reading down slightly more in the README describes how to write an exception handler that would tell you what's going on. Commented Sep 13, 2017 at 19:35
  • 1
    Print the exception, rather than a fixed string Commented Sep 13, 2017 at 20:03
  • 1
    If it's a system thing, you may need to include more information like your OS and it's version, Python version/build, and versions of requests, grequests, urllib3, PyOpenSSL (if installed). Sounds more like a bug report then... Commented Sep 13, 2017 at 20:30
  • 3
    you could try to limit gevent concurrency with grequests.map(rs, size=2) Commented Dec 7, 2017 at 17:37
  • 5
    I see this comment on the github site: "Note: You should probably use requests-threads or requests-futures instead." Also, the last code update appears to be 2 years ago. Commented Dec 7, 2017 at 18:09

2 Answers 2

10
+50

You are just sending requests too fast. As grequests is an async lib, all of these requests are almost sent simultaneously. They are too many.

You just need to limit the concurrent tasks by grequests.map(rs, size=your_choice), I have tested grequests.map(rs, size=10) and it works well.

Sign up to request clarification or add additional context in comments.

2 Comments

What does "too fast" mean? Where is the bottleneck, what is the limitation? Is it measurable or can it be optimized for? Why do you figure size=10 is optimal for your machine, and how do you find the value on other machines?
The "fast" is to the server, the server doesn't want to accept so many requests from one client as it will crash the server. You reduce the speed to show respect to the server, so the server is happy to serve you.
5

I do not know the exact reason for the observed behavior with .map(). However, using the .imap() function with size=1 always returned a 'Response 200' for my few minutes testing. Here is the code snipet:

rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers]) rsm_iterator = grequests.imap(rs, exception_handler=exception_handler, size=1) rsm_list = [r for r in rsm_iterator] print(rsm_list) 

And if you don't want to wait for all requests to finish before working on their answers, you can do this like so:

rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers]) rsm_iterator = grequests.imap(rs, exception_handler=exception_handler, size=1) for r in rsm_iterator: print(r) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.