Are threads faster than asyncio?

Question

I'm working on a project that parses data from a lot of websites. Most of my code is done, so i'm looking forward to use asyncio in order to eliminate that I/O waiting, but still i wanted to test how threading would work, better or worse. To do that, i wrote some simple code to make requests to 100 websites. Btw i'm using requests_html library for that, fortunately it supports asynchronous requests as well.

asyncio code looks like:

import requests import time from requests_html import AsyncHTMLSession aio_session = AsyncHTMLSession() urls = [...] # 100 urls async def fetch(url): try: response = await aio_session.get(url, timeout=5) status = 200 except requests.exceptions.ConnectionError: status = 404 except requests.exceptions.ReadTimeout: status = 408 if status == 200: return { 'url': url, 'status': status, 'html': response.html } return { 'url': url, 'status': status } def extract_html(urls): tasks = [] for url in urls: tasks.append(lambda url=url: fetch(url)) websites = aio_session.run(*tasks) return websites if __name__ == "__main__": start_time = time.time() websites = extract_html(urls) print(time.time() - start_time)

Execution time (multiple tests):

13.466366291046143 14.279950618743896 12.980706453323364

BUT If i run an example with threading:

from queue import Queue import requests from requests_html import HTMLSession from threading import Thread import time num_fetch_threads = 50 enclosure_queue = Queue() html_session = HTMLSession() urls = [...] # 100 urls def fetch(i, q): while True: url = q.get() try: response = html_session.get(url, timeout=5) status = 200 except requests.exceptions.ConnectionError: status = 404 except requests.exceptions.ReadTimeout: status = 408 q.task_done() if __name__ == "__main__": for i in range(num_fetch_threads): worker = Thread(target=fetch, args=(i, enclosure_queue,)) worker.setDaemon(True) worker.start() start_time = time.time() for url in urls: enclosure_queue.put(url) enclosure_queue.join() print(time.time() - start_time)

Execution time (multiple tests):

7.476433515548706 6.786043643951416 6.717151403427124

The thing that i don't understand .. both libraries are used against I/O problems, but why are threads faster ? The more i increase the number of threads, the more resources it uses but it's a lot faster.. Can someone please explain to me why are threads faster than asyncio in my example ?

Thanks in advance.

The line "websites = extract_html(urls:100])" in the async-io code seems to be messed up. — Roy2012
– Roy2012, Commented Jun 22, 2020 at 7:38
@Roy2012 Fixed, forgot to close the parentheses when pasting the code. — Cristian Tozuna
– Cristian Tozuna, Commented Jun 22, 2020 at 7:40

Vincent · Accepted Answer · 2020-06-22 08:02:24Z

5

It turns out requests-html uses a pool of threads for running the requests. The default number of threads is the number of core on the machine multiplied by 5. This probably explains the difference in performance you noticed.

You might want to try the experiment again using aiohttp instead. In the case of aiohttp, the underlying socket for the HTTP connection is actually registered in the asyncio event loop, so no threads should be involved here.

answered Jun 22, 2020 at 8:02

Vincent

13.5k1 gold badge51 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Roy2012 Over a year ago

Question: out of curiosity: the GitHub repo of requests-html doesn't seem to include any actual code. Just tests, doc, and an 'ext' directory with a single file. Where's the actual code?

Vincent Over a year ago

There's a link to the source code in my answer, the file is called request_html.py.

Cristian Tozuna Over a year ago

@Vincent Thank you, great response, it makes sense, tonight i'm going to try to write my own function with aiohttp and i'm going to reply back with the results.

Cristian Tozuna Over a year ago

Came back to say that everything works good with aiohttp, a lot faster. Indeed the problem was from requests_html library.

Collectives™ on Stack Overflow

Are threads faster than asyncio?

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related