8

I have to send a lot of HTTP requests, once all of them have returned, the program can continue. Sounds like a perfect match for asyncio. A bit naively, I wrapped my calls to requests in an async function and gave them to asyncio. This doesn't work.

After searching online, I found two solutions:

  • use a library like aiohttp, which is made to work with asyncio
  • wrap the blocking code in a call to run_in_executor

To understand this better, I wrote a small benchmark. The server-side is a flask program that waits 0.1 seconds before answering a request.

from flask import Flask import time app = Flask(__name__) @app.route('/') def hello_world(): time.sleep(0.1) // heavy calculations here :) return 'Hello World!' if __name__ == '__main__': app.run() 

The client is my benchmark

import requests from time import perf_counter, sleep # this is the baseline, sequential calls to requests.get start = perf_counter() for i in range(10): r = requests.get("http://127.0.0.1:5000/") stop = perf_counter() print(f"synchronous took {stop-start} seconds") # 1.062 secs # now the naive asyncio version import asyncio loop = asyncio.get_event_loop() async def get_response(): r = requests.get("http://127.0.0.1:5000/") start = perf_counter() loop.run_until_complete(asyncio.gather(*[get_response() for i in range(10)])) stop = perf_counter() print(f"asynchronous took {stop-start} seconds") # 1.049 secs # the fast asyncio version start = perf_counter() loop.run_until_complete(asyncio.gather( *[loop.run_in_executor(None, requests.get, 'http://127.0.0.1:5000/') for i in range(10)])) stop = perf_counter() print(f"asynchronous (executor) took {stop-start} seconds") # 0.122 secs #finally, aiohttp import aiohttp async def get_response(session): async with session.get("http://127.0.0.1:5000/") as response: return await response.text() async def main(): async with aiohttp.ClientSession() as session: await get_response(session) start = perf_counter() loop.run_until_complete(asyncio.gather(*[main() for i in range(10)])) stop = perf_counter() print(f"aiohttp took {stop-start} seconds") # 0.121 secs 

So, an intuitive implementation with asyncio doesn't deal with blocking io code. But if you use asyncio correctly, it is just as fast as the special aiohttp framework. The docs for coroutines and tasks don't really mention this. Only if you read up on the loop.run_in_executor(), it says:

# File operations (such as logging) can block the # event loop: run them in a thread pool. 

I was surprised by this behaviour. The purpose of asyncio is to speed up blocking io calls. Why is an additional wrapper, run_in_executor, necessary to do this?

The whole selling point of aiohttp seems to be support for asyncio. But as far as I can see, the requests module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?

2
  • 1
    The purpose of ayncio is not to speed things up in general, it's to reduce latency. Both of your approaches do that, while the executor might require a few more resources. Commented Nov 12, 2018 at 11:03
  • executor is based on threads. asyncio using non-blocking socket so it can request many with one thread but requests is not Commented Nov 12, 2018 at 11:38

1 Answer 1

17

But as far as I can see, the requests module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?

Running code in executor means to run it in OS threads.

aiohttp and similar libraries allow to run non-blocking code without OS threads, using coroutines only.

If you don't have much work, difference between OS threads and coroutines is not significant especially comparing to bottleneck - I/O operations. But once you have much work you can notice that OS threads perform relatively worse due to expensively context switching.

For example, when I change your code to time.sleep(0.001) and range(100), my machine shows:

asynchronous (executor) took 0.21461606299999997 seconds aiohttp took 0.12484742700000007 seconds 

And this difference will only increase according to number of requests.

The purpose of asyncio is to speed up blocking io calls.

Nope, purpose of asyncio is to provide convenient way to control execution flow. asyncio allows you to choose how flow works - based on coroutines and OS threads (when you use executor) or on pure coroutines (like aiohttp does).

It's aiohttp's purpose to speed up things and it copes with the task as shown above :)

Sign up to request clarification or add additional context in comments.

6 Comments

Asyncio coroutines are not really green threads, because green threads are stackful. Carrying a full stack allows them to switch at arbitrary places and avoid the function color problem, but at the cost of each green thread being much more heavyweight than a coroutine/fiber. An example of Python implementation of green threads is the greenlet module and the gevent event loop based on it.
@user4815162342 thanks for clarification! I altered answer.
@MikhailGerasimov, thanks for the elaboration on aiohttps performance, +1 from me :) I still have some conceptual problems though, currently updating my question
I have updated my question. I don't understand the intersection between asyncio and aiohttp. Asyncio has non-blocking coroutines without OS-threads ? That sounds like a huge feature. Is this a part of asyncio ? If yes, why isn't that the default. If not, how is aiohttp based on asyncio (async/await are a language feature and not directly a part of asyncio) ?
@lhk Yes, asyncio has non-blocking coroutines without OS-threads, and it is a huge feature. Aiohttp is based on asyncio because it relies on asyncio's abstractions built on top of the raw async/await. See answers to this question, particularly this one, for in-depth coverage of the topic.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.