Why is asyncio switches between tasks way much slower than threading.Thread?

Question

It's well known that asyncio is designed to speed up server ,enhance it's ability to carry up more requests as a web server. However according to my test today, I shockedly found that for the puropse of switching between tasks ,using Thread is much more faster than using coroutine (eventhough under a thread lock as guarantee). Is that means it meaningless using coroutine?

Wondering why ,could anyone please help me figure out?

Here's my testting code : add a global variable 2 000 000 times in two tasks by turns.

from threading import Thread , Lock import time , asyncio def thread_speed_test(): def add1(): nonlocal count for i in range(single_test_num): mutex.acquire() count += 1 mutex.release() mutex = Lock() count = 0 thread_list = list() for i in range(thread_num): thread_list.append(Thread(target = add1)) st_time = time.time() for thr in thread_list: thr.start() for thr in thread_list: thr.join() ed_time = time.time() print("runtime" , count) print(f'threading finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s' ,end='\n\n') def asyncio_speed_test(): count = 0 @asyncio.coroutine def switch(): yield async def add1(): nonlocal count for i in range(single_test_num): count += 1 await switch() async def main(): tasks = asyncio.gather( *(add1() for i in range(thread_num)) ) st_time = time.time() await tasks ed_time = time.time() print("runtime" , count) print(f'asyncio finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s') asyncio.run(main()) if __name__ == "__main__": single_test_num = 1000000 thread_num = 2 thread_speed_test() asyncio_speed_test()

got the following result in my pc:

2000000 threading finished in 0.9332s ,speed 2143159.1985q/s 2000000 asyncio finished in 16.044s ,speed 124657.3379q/s

append：

I realized that when thread number increase , threading mode goes slower but async mode goes faster. here's my test results:

# asyncio # thread_num numbers of switching in 1sec average time of a single switch(ns) 2 122296 8176 32 243502 4106 128 252571 3959 512 253258 3948 4096 239334 4178 # threading # thread_num numbers of switching in 1sec average time of a single switch(ns) 2 2278386 438 4 737829 1350 8 393786 2539 16 367123 2720 32 369260 2708 64 381061 2624 512 381403 2622

What is the await switch() for? It seems to slow down asyncio. Results with that like commented: runtime 2000000 threading finished in 4.0422s ,speed 494785.7794q/s runtime 2000000 asyncio finished in 0.1093s ,speed 18302989.839q/s — Joe
– Joe, Commented Mar 9, 2020 at 12:52
thanks for reply ,await switch() makes a coroutine give back its control, which allows event loop trigger another task thus these tasks could be run in turns.If you comment that line ,the code will turn into a simple synchronized mode ,which means the event loop will run forloop one by one ,it wont start another task until the forloop before is completed. — AdamHommer
– AdamHommer, Commented Mar 9, 2020 at 13:39
Besides ,here's a newly discovered fact which is quite interesting , according to your test, it's faster in asyncio's abstract encapsulation than actual running a forloop .If you start a forloop for 2 000 000 times then youll find that cpython3.7 loops about 10 000 000 times per sec ,however it loops about 2 times faster if you run forloop in encapsulation of asyncio — AdamHommer
– AdamHommer, Commented Mar 9, 2020 at 13:48

Paul Cornelius · Accepted Answer · 2020-03-10 10:07:46Z

To make a more fair comparison, I changed your code slightly.

I replaced your simple Lock with a Condition. This allowed me to force a thread switch after each iteration of the counter. The Condition.wait() function call always blocks the thread where the call is made; the thread continues only when another thread calls Condition.notify(). Therefore a thread switch must occur.

This is not the case with your test. A task switch will only occur when the thread scheduler causes one, since the logic of your code never causes a thread to block. The Lock.release() function does not block the caller, unlike Condition.wait().

There is one small difficulty: the last running thread will block forever when it calls Condition.wait() for the last time. That is why I introduced a simple counter to keep track of how many running threads are left. Also, when a thread is finished with its loop it has to make one final call to Condition.notify() in order to release the next thread.

The only change I made to your async test is to replace the "yield" statement with await asyncio.sleep(0). This was for compatibility with Python 3.8. I also reduced the number of trials by a factor of 10.

Timings were on a fairly old Win10 machine with Python 3.8.

As you can see, the threading code is quite a bit slower. That's what I would expect. One of the reasons to have async/await is because it's more lightweight than the threading mechanism.

from threading import Thread , Condition import time , asyncio def thread_speed_test(): def add1(): nonlocal count nonlocal thread_count for i in range(single_test_num): with mutex: mutex.notify() count += 1 if thread_count > 1: mutex.wait() thread_count -= 1 with mutex: mutex.notify() mutex = Condition() count = 0 thread_count = thread_num thread_list = list() for i in range(thread_num): thread_list.append(Thread(target = add1)) st_time = time.time() for thr in thread_list: thr.start() for thr in thread_list: thr.join() ed_time = time.time() print("runtime" , count) print(f'threading finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s' ,end='\n\n') def asyncio_speed_test(): count = 0 async def switch(): await asyncio.sleep(0) async def add1(): nonlocal count for i in range(single_test_num): count += 1 await switch() async def main(): tasks = asyncio.gather(*(add1() for i in range(thread_num)) ) st_time = time.time() await tasks ed_time = time.time() print("runtime" , count) print(f'asyncio finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s') asyncio.run(main()) if __name__ == "__main__": single_test_num = 100000 thread_num = 2 thread_speed_test() asyncio_speed_test() runtime 200000 threading finished in 4.0335s ,speed 49584.7548q/s runtime 200000 asyncio finished in 1.7519s ,speed 114160.9466q/s

Joe · Accepted Answer · 2020-03-09 16:00:37Z

I am not sure, you might be comparing apples to oranges.

You are basically punishing async, sort of forcing it to switch contexts, which takes time, while the threads are allowed to run freely.

asyncio is thought for tasks that have to wait for input for some time. This is not the case in your benchmark.

For a fair comparison you should simulate some realistic delay.

Switching between threads is pre-emptive. The Lock.release() statement doesn't cause a context switch, so the same thread continues to acquire and release the Lock many times before it gets pre-empted. With coroutines the context-switching is cooperative: a yield statement (or the equivalent) always causes a context switch if another coroutine is pending. The answer is correct - your test code doesn't compare an equal number of context switches.

Collectives™ on Stack Overflow

Why is asyncio switches between tasks way much slower than threading.Thread?

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related