How to use asyncio and aiohttp for looping instead of for looping?

Question

My code is working in this way but it's speed is very slow because of for loops, can you help me, to make it work with aiohttp, asyncio?

def field_info(field_link): response = requests.get(field_link) soup = BeautifulSoup(response.text, 'html.parser') races = soup.findAll('header', {'class': 'dc-field-header'}) tables = soup.findAll('table', {'class': 'dc-field-comp'}) for i in range(len(races)): race_name = races[i].find('h3').text race_time = races[i].find('time').text names = tables[i].findAll('span', {'class': 'title'}) trainers = tables[i].findAll('span', {'class': 'trainer'}) table = [] for j in range(len(names)): table.append({ 'Name': names[j].text, 'Trainer': trainers[j].text, }) return { 'RaceName': race_name, 'RaceTime': race_time, 'Table': table } links = [link1, link2, link3] for link in links: scraped_info += field_info(link)

Why? Neither asyncio nor aiohttp will give your code magic parallelism, nor will they speed up CPU-bound tasks. They're meant for asynchronous programming. — ForceBru
– ForceBru, Commented Jun 10, 2019 at 15:14
This is unrelated to your question, but instead of using range(len(names)), you can use for name, trainer in zip(names, trainers) and avoid the index lookups inside the loop. — dirn
– dirn, Commented Jun 10, 2019 at 15:57

Mikhail Gerasimov · Accepted Answer · 2019-06-10 16:58:01Z

1) Create a coroutine to make request asynchronously:

import asyncio import aiohttp async def get_text(url): async with aiohttp.ClientSession() as session: async with session.get(url) as resp: return await resp.text()

2) Replace all synchronious requests with awaiting for this coroutine, making outer functions coroutines also:

async def field_info(field_link): # async - to make outer function coroutine text = await get_text(field_link) # await - to get result from async funcion soup = BeautifulSoup(text, 'html.parser')

3) Make outer code to do jobs concurrently using asyncio.gather():

async def main(): links = [link1, link2, link3] scraped_info = asyncio.gather(*[ field_info(link) for link in links ]) # do multiple field_info coroutines concurrently (parallely)

4) Pass top-level coroutine to asyncio.run():

asyncio.run(main())

Thank you for your step by step answer, it really helped me to understand how this stuff is working.
@paskh you're welcome! You may be also interested in reading this answer - it's about how asyncio works and how to use it in general.

Collectives™ on Stack Overflow

How to use asyncio and aiohttp for looping instead of for looping?

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related