I am screenshotting several thousand web pages with pyppeteer. I discovered by accident, that running the same script in 2 open terminals doubles the output I get. I tested this by opening up to 6 terminals and running the script and I was able to get up to 6 times the performance.
I am considering using loop.run_in_executor to run the script in multiple processes or threads from a main program.
Is this the right call or is am I hitting some IO/CPU limit in my script?
Here is how I'm thinking of doing it. I don't know if this is the right thing to do.
import asyncio import concurrent.futures async def blocking_io(): # File operations (such as logging) can block the # event loop: run them in a thread pool. with open('/dev/urandom', 'rb') as f: return f.read(100) async def cpu_bound(): # CPU-bound operations will block the event loop: # in general it is preferable to run them in a # process pool. return sum(i * i for i in range(10 ** 7)) def wrap_blocking_io(): return asyncio.run(blocking_io()) def wrap_cpu_bound(): return asyncio.run(cpu_bound()) async def main(): loop = asyncio.get_running_loop() # Options: # 1. Run in the default loop's executor: result = await loop.run_in_executor( None, wrap_blocking_io) print('default thread pool', result) # 2. Run in a custom thread pool: with concurrent.futures.ThreadPoolExecutor(max_workers=6) as pool: result = await loop.run_in_executor( pool, wrap_blocking_io) print('custom thread pool', result) # 3. Run in a custom process pool: with concurrent.futures.ProcessPoolExecutor(max_workers=6) as pool: result = await loop.run_in_executor( pool, wrap_cpu_bound) print('custom process pool', result) asyncio.run(main())
loop.run_in_executorwithin your async code